Evaluation of bone marrow aspirates in patients with acute myeloid leukemia at day 14 of induction therapy

Background Early assessment of response to chemotherapy in acute myeloid leukemia may be performed by examining bone marrow aspirate (BMA) or biopsy (BMB); a hypocellular bone marrow sample indicates adequate anti-leukemic activity. We sought to evaluate the quantitative and qualitative assessment of BMA performed on day 14 (D14) of chemotherapy, to verify the inter-observer agreement, to compare the results of BMA and BMB, and to evaluate the impact of D14 blast clearance on the overall survival (OS). Methods A total of 107 patients who received standard induction chemotherapy and had bone marrow samples were included. BMA evaluation was performed by two observers using two methods: quantitative assessment and a qualitative (Likert) scale. ROC curves were obtained correlating the BMA quantification of blasts and the qualitative scale, by both observers, with BMB result as gold-standard. Results There was a significant agreement between the two observers in both the qualitative and quantitative assessments (Kw = 0.737, p < 0.001, and rs = 0.798, p < 0.001; ICC = 0.836, p < 0.001, respectively). The areas under the curve (AUC) were 0.924 and 0.946 for observer 1 and 0.867 and 0.870 for observer 2 for assessments of the percentage of blasts and qualitative scale, respectively. The best cutoff for blast percentage in BMA was 6 % and 7 % for observers 1 and 2, respectively. A similar analysis for the qualitative scale showed the best cutoff as “probably infiltrated”. Patients who attained higher grades of cytoreduction on D14 had better OS. Conclusions Evaluation of D14 BMA using both methods had a significant agreement with BMB and between observers, identifying a population of patients with poor outcome.


Background
The outcome of patients with acute myeloid leukemia (AML) has improved substantially over the past decades, thanks to the development of more aggressive therapies and better supportive care. However, a substantial proportion of patients still do not obtain complete remission (CR), and others eventually relapse after achieving CR [1][2][3]. In an attempt to stratify subgroups with different survival rates, several prognostic factors have been identified, including age, gender, baseline white blood cell count, lactic dehydrogenase serum level, immunophenotype, karyotypic abnormalities and genetic profiles [4][5][6][7].
In addition to baseline variables, early assessment of response to chemotherapy may help to define prognosis. Previous studies have shown an association between the lack of early blasts clearance and failure to obtain CR after a first cycle of induction [8,9]. This early assessment of treatment response is usually performed between the 14th (D14) and 17th day of the first cycle of induction chemotherapy, by analyzing the cellular content of the bone marrow aspirate (BMA) and/or biopsy (BMB). A hypocellular bone marrow sample suggests adequate anti-leukemic activity [8,10]. However, its interpretation may be inaccurate because of different levels of expertise among pathologists and hematologists, and a great variability in BMA and BMB sample quality [11]. Furthermore, a BMA blast count above which poor response to chemotherapy is predicted has not been clearly defined, with values ranging from 5 % to 40 % [8][9][10][11][12][13][14][15][16][17][18][19]. By contrast, the BMB provides a better assessment of marrow cellularity [20], but the results are available only a few days after the BMA, delaying the decision to administer a second course of induction chemotherapy for non-responders.
Given these uncertainties, we sought to evaluate the quantitative and qualitative assessment of D14 BMA, to verify the inter-observer agreement, and to compare the results of BMA and BMB. We also assessed the impact of D14 blast clearance on the overall survival (OS).

Study population and treatment
All patients diagnosed with AML at University Hospital Clementino Fraga Filho, Universidade Federal do Rio de Janeiro (UFRJ) Brazil, from January 1979 to December 2008 were retrospectively evaluated. Entry criteria for this study included: a diagnosis of AML other than acute promyelocytic leukemia, no previous treatment in other institution, receipt of standard induction chemotherapy (cytarabine + antracycline), and performance of BMA on D14 of induction chemotherapy. The study was approved by the local ethics committee (Hospital Clementino Fraga Filho/Universidade Federal do Rio de Janeiro, CAAE n°. 0094.0.197.000-09) and was conducted in accordance with the principles of Helsinki declaration. Informed consent was not obtained due to its retrospective nature of this study did not affect the healthcare of the included individuals. Moreover, confidentiality was preserved.
The diagnosis of AML was based on available procedures at the time, including BMA and BMB, and cytogenetic and immunophenotype analyses. Cases were classified according to de French-American-British (FAB) criteria [21]. The treatment regimens changed over time (Table 1) [22].

Bone marrow aspirate and biopsy
Routine assessments of BMA and BMB were performed on D14 of induction remission. Aspirate smears were prepared at the bedside and stained with Wright-Giemsa stain, and biopsy samples were fixed in 10 % buffered formalin, and stained with hematoxylin and eosin. Patients with persistent disease according to D14 assessment received a second cycle of induction as early as possible [2,13]. All glass slides were kept in storage units in the hospital achieves.  We reviewed all available slides from BMA performed at diagnosis and on D14. The analysis was performed by two independent observers (board certified hematologists), blinded for patient identification and outcome. The evaluation included confirmation of the initial diagnosis of AML and identification of D14 residual leukemia in a quantitative (percentage) and qualitative (scale) manner. Quantitative evaluation was performed by counting the percentage of blasts in 200 nucleated marrow cells. The qualitative assessment was determined by stratification in a Likert scale [23] of five categories: definitely infiltrated, probably infiltrated, doubtful, probably free and definitely free.
The results of D14 BMB were obtained by reviewing patients' medical records and registries from the Pathology Service of the hospital. The reports were categorized as aplastic (leukemia free) or infiltrated.

Statistical analysis
The qualitative assessment of blasts was first treated as an ordinal categorical variable and latter grouped in two categories, and treated as dichotomous categorical variable. Agreement between the two observers was assessed using the kappa coefficient (Cohen's kappa) and quadratic weighted kappa coefficient (K w ). The kappa coefficient may range from −1 (complete disagreement) to +1 (complete agreement) and the correlation is usually classified as poor (below 0), mild (0 to 0.2), low (0.21 to 0.4), moderate (from 0.41 to 0.6) substantial (0.61 to 0.8) and almost perfect (0.81 to 1.00) [24]. Further evaluation of the marginal homogeneity of proportions was performed with the McNemar test for dichotomous categorical variables and the McNemar modified test for ordinal categorical variables. In both tests, the presence of a significant p value (<0.05) indicates excessive variation between observers [25].
The quantitative assessment of blasts was treated as a discrete variable with a non-normal distribution; comparisons between observers were performed with Spearman's Correlation Coefficient (r s ). Measurements between observers were also compared using Intraclass Correlation Coefficient (ICC) and the Bland and Altman method [26].
The D14 BMA evaluation was compared with the BMB (considered as "gold standard") using receiver operating characteristic (ROC) curves to assess the best cut-off point in terms of sensitivity, specificity and accuracy. The areas under the ROC curves (AUC) were compared using the method of Delong [27]. OS was defined as the time from diagnosis to death of any cause or last follow-up. Survival curves were estimated with the Kaplan-Meier method and differences were compared with the log-rank test. Multivariate analysis for OS was conducted using a Cox model and hazard ratios (HR) were obtained for each observer. All tests were 2sided, and p values <0.05 were considered statistically significant. Statistical analyses were performed using SPSS 11.0 (SPSS Inc., 1989-2001), MedCalc 11.3 and MH Program 1.2142.

Patients
Of 295 patients with AML identified in the hospital records, 119 fulfilled entry criteria. Among these 119 patients who had a BMA on D14, we could recover 107 sets of BMA smears, containing samples of the diagnosis and D14 assessment. The median age was 38 years (range 12-77), 12 % were >60 years-old and 58 % were males. In addition, we were able to compare D14 BMA and BMB in 82 patients.

Agreement analysis between observers
The comparisons between observers of D14 BMA evaluation using the qualitative scale is shown in Table 2. The quadratic weighted kappa coefficient was 0.74 (95 % confidence interval [95 % CI] 0.64 -0.83, p < 0.001), and no bias was observed (p = 0.8, modified McNemar test). Typical qualitative categories are shown in Fig. 1.

Comparison of bone marrow aspiration and bone marrow biopsy on D14
The evaluation of BMB on D14 showed 33 patients with bone marrow infiltration and 49 free of leukemia. Table 3 shows the distribution of the categories of the qualitative scale according to the BMB status. We observed an association between the categories of definitely free and probably free with leukemia free in the BMB, and the categories of definitely infiltrate and  probably infiltrated with infiltrated BMB (85.4 % for observer 1 and 75.6 % for observer 2). Doubtful results of BMA represented mainly leukemia free BMB for both observers. Figure 3 shows the ROC curves correlating the BMA quantification of blasts and qualitative scale, by both observers, according to BMB results. The AUCs for the quantitative and qualitative assessments were 0.924 and 0.946 for observer 1, and 0.867 and 0.870 for observer 2, respectively. We also compared the ROC curves of the quantitative and qualitative analysis of each observer. The difference in AUCs was 0.025 for observer 1 (p = 0.22) and 0.002 for observer 2 (p = 0.97).
Based on the best cut-off point of qualitative assessment, we divided the five categories of the scale in two: "free" and "infiltrated". The first represents the grouping of categories definitely free, probably free and doubtful, while the second included the categories probably infiltrated and definitely infiltrated. The kappa coefficient for the comparison between observers was 0.66 (95 % CI 0.51 -0.80, p < 0.001), with no bias per McNemar test (p = 0.1) ( Table 4).

Impact of D14 blasts on survival
Five-year OS was significantly longer in patients with <5 % blasts on D14 for both observers (Fig. 4). With Likert scale, a better outcome in patients with lower grades of marrow involvement was also observed (Fig. 5). The same results were obtained among 55   (Table 6).

Discussion
In this study we found substantial agreement between observers using two different methods: a quantitative assessment, with the determination of the percentage of bone marrow blasts, and a qualitative, based on the perception of marrow infiltration. In addition, a cutoff value of 6-7 % of blasts in the quantitative assessment and "probably infiltrated" marrow in the qualitative assessment was established, with good discriminatory power to identify patients with infiltrated BMB. Moreover, we observed a higher OS in patients who obtained higher grades of cytoreduction by day 14 marrow evaluation.
While risk assessment in AML relies mainly on age and cytogenetic profile [5], the assessment of in vivo chemosensitivity by determining early response to induction therapy is an additional predictive marker. Indeed, this parameter has been used to guide clinicians in deciding for an early second cycle of chemotherapy [13,28,29]. However, the type of D14 bone marrow evaluation (BMA, BMB or both) has varied, with some studies relying on BMA [8,16], others used BMB [18], and occasionally no clear information was provided [9,10,17,19].
In our study we observed that the qualitative and the quantitative methods were equally predictive of BMB results, with a substantial inter-observer agreement. Bone marrow evaluation by more than one observer has been previously reported [16,17], but to our best knowledge, our study was the first that reported the assessment of inter-observer agreement.
All analyzes of response assessment by D14 BMA by both methods (qualitative and quantitative) and both observers resulted in higher specificity than sensitivity. Likewise, the concordance between observers was very good for "definitely/probably infiltrated", but not so good for "definitely/ probably free". Therefore, there is no debate that a large amount of leukemic blast on day 14 constitutes unequivocal evidence of residual leukemia. However, the presence of a few blasts in a paucicellular or hemodilute marrow sample cannot be considered as definite evidence of residual disease. Indeed, most guidelines determine a second induction cycle for unequivocal residual disease and most dilemmas occurs in patients with low blast count (5-15 %) [32].
Few previous studies have shown an association between D14 marrow findings and long-term outcome [8,9,10,17,30]. In the present study, multivariate analysis showed that the evaluation of the bone marrow infiltration by Likert scale (but not the percentage assessment) was significantly associated with poor outcome.
Our study shares the limitations of all retrospective studies. It was not possible to recover D14 BMA and BMB slides from all cases. In addition, survival analysis was performed without the inclusion of wellknown prognostic factors such as chromosomal and  Fig. 6 Overall survival according to the qualitative evaluations of D14 BMA by two observers in patients (n = 55) treated with two or more cycles of intensification molecular abnormalities. Finally, we did not analyze the potential effect of the different induction regimens given throughout the study period and the number of entry-patients over the study period. Despite these limitations, we were able to show that BMA may be considered the procedure of choice to assess treatment response on D14 because it provides results immediately, and exhibited good agreement between observers and good correlation with BMB and OS.

Conclusions
We conclude that the assessment of BMA on day 14th of remission induction chemotherapy in patients with AML is a reproducible test with a substantial agreement between observers, both quantitatively and qualitatively, has good correlation with BMB and with OS. The percent cut-off 6-7 % or "probably infiltrated" may help to early identify a population of patients with unfavorable prognosis.