Trends in the full blood count blood test and colorectal cancer detection: a longitudinal, case-control study of UK primary care patient data

Background The full blood count (FBC) is a common blood test performed in general practice. It consists of many individual parameters that may change over time due to colorectal cancer. Such changes are likely missed in practice. We identified trends in these FBC parameters to facilitate early detection of colorectal cancer. Methods We performed a retrospective, case-control, longitudinal analysis of UK primary care patient data. LOWESS smoothing and mixed effects models were derived to compare trends in each FBC parameter between patients diagnosed and not diagnosed over a prior 10-year period. Results There were 399,405 males (2.3%, n = 9,255 diagnosed) and 540,544 females (1.5%, n = 8,153 diagnosed) in the study. There was no difference between cases and controls in FBC trends between 10 and four years before diagnosis. Within four years of diagnosis, trends in many FBC levels statistically significantly differed between cases and controls, including red blood cell count, haemoglobin, white blood cell count, and platelets (interaction between time and colorectal cancer presence: p <0.05). FBC trends were similar between Duke’s Stage A and D colorectal tumours, but started around one year earlier in Stage D diagnoses. Conclusions Trends in FBC parameters are different between patients with and without colorectal cancer for up to four years prior to diagnosis. Such trends could help earlier identification.


Plain English summary
Colorectal cancer is a common type of cancer in the UK. It is the second most common cause of cancer-related death in the UK. Chances of surviving depend heavily on the tumour stage at diagnosis, which represents how much the tumour has developed. If diagnosed and treated at the earliest stage, where the tumour is confined to the colon, nine in 10 patients are expected to be alive five years later. If diagnosed at the latest stage, when the cancer has spread outside the colon, this drops to one in 10 surviving. The majority of UK patients with colorectal cancer are diagnosed with late-stage tumours, so are likely to die. Detecting and treating the cancer earlier can save lives.
There is a blood test called the Full Blood Count, which is commonly ordered by doctors for many reasons. This test includes many blood levels, such as haemoglobin, which carries oxygen around the body. Growing tumours cause subtle changes in the blood levels over time, but it is unclear what these changes are and if they could help find cancer in the early stages.
In our study, we looked at blood tests from almost one million patients in the UK, including around 17,000 with colorectal cancer. We checked how blood levels change over 10 years before diagnosis. We found that in the few years before patients are diagnosed, patients usually had blood levels that rapidly started increasing or declining (depending on the blood level) and this was often not seen in patients without colorectal cancer. Our study highlights that using trends over time in blood test results may be useful to identify colorectal cancer. Such trends could facilitate earlier detection because they were present for years before diagnosis. That would improve the chances of successful treatment and chances of survival.

Introduction
Population incidence rates for colorectal cancer have been decreasing only slightly yearly since 2012. Colorectal cancer currently accounts for 11% of all new cancers diagnosed in the UK, being the fourth most common type of cancer. It is the second most common cause of cancer-related death. Prognosis is heavily influenced by tumour stage at diagnosis, which can be assessed in various ways. Five-year survival is 93% if diagnosed at Stage I, where the cancer is confined to the bowel lining, and 10% if at Stage IV, where it has spread to other organs (Cancer Research UK -Bowel cancer statistics).
Symptoms for colorectal cancer, such as abdominal pain and change in bowel habit, often appear when the disease has developed to a relatively late-stage, where it is difficult to treat and the likelihood of survival reduced. Current evidence suggests that symptoms are on average first reported to clinical care less than six months before diagnosis 1 . Identifying colorectal cancer at earlier stages, where the likelihood of survival is greatest and before overt symptoms appear, would be of considerable benefit to reduce mortality 2 .
The full blood count (FBC), a blood test commonly performed in primary care practices, may play a role in earlier detection 3 .
For example, anaemia determined from the FBC test is a known risk factor for diagnosis and may warrant further investigation under the current screening programme if due to iron deficiency (WHO: Guide to early diagnosis, NICE: Suspected cancer recognition and referral). The FBC test consists of up to 20 individual parameters, including haemoglobin, platelet count, and white blood cell count.
Our recent systematic review identified 53 studies that assessed the FBC blood test for colorectal cancer diagnosis 4,5 . Our review indicated that diagnosed patients have a significantly lower red blood cell count, haemoglobin, and mean corpuscular volume and higher red blood cell distribution width, white blood cell count, and platelets within six months of diagnosis compared to patients not diagnosed. Smaller differences were observed compared to those observed earlier than six months before diagnosis, suggesting changes in the FBC differ over time. There may be relevant trends that could help identify patients who have a diagnosis.
The aim of this study was to identify trends in FBC parameter levels prior to colorectal cancer diagnosis and compare trends to those in patients without a diagnosis. To identify opportunities for earlier detection, we also assessed trends between tumour stages in diagnosed patients.

Methods
Study reporting follows the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines 6 .

Study design
We performed a retrospective case-control study to explore changes in FBC results before a diagnosis compared to patients without a diagnosis. FBC data were obtained from a UK primary care database, the Clinical Practice Research Datalink (CPRD) GOLD, and diagnosis data from the UK National Cancer Registration and Analysis Service (NCRAS). CPRD has ethical approval from the Health Research Authority to hold anonymised patient data and support research using that data. The CPRD Independent Scientific Advisory Committee's approval of data access for individual research projects includes ethics approval and consent for those projects. Ethical approval was therefore covered for this study by the CPRD (protocol 14_195RMn2A2R). Clinical codes to extract data from CPRD and NCRAS are in Table 1 and Table 2.
Study entry was defined as the latest date of registration with the practice, patient's 40 th birthday, or 1 st January 2000. Study exit was defined as the earliest date of leaving the practice, date of death, or 14 th January 2014 (the NCRAS data-cut date).

Participants
Patients with at least one FBC blood test within 10 years before index date (defined below) were included. Patients were excluded if registered with their primary care practice for less than one year, had a history of colorectal cancer before study entry, or diagnosed after study exit. Patients diagnosed with  Overlapping lesion of rectum, anus and anal canal (Incl.: Anorectal junction, Anorectum, and Malignant neoplasm of rectum, anus and anal canal whose point of origin cannot be classified to any one of the categories C20-C21.2) another cancer type before or simultaneously with colorectal cancer diagnosis were excluded. Patients with an available date of diagnosis but no indication of the cancer type were excluded.

Clinical outcome
The outcome was the first diagnosis of colorectal cancer in the NCRAS database. For cases (patients diagnosed), the index date was the date of colorectal cancer diagnosis in the patient's study period. For controls (patients without a diagnosis), the index date was a randomly selected date in the patient's study period. A random date was chosen to mimic the sporadic nature of diagnoses in the overall study period for cases.

Demographic and FBC variables
Year of birth and sex were available for all patients in the CPRD dataset. We extracted the date of each FBC test and included 14 of the 20 parameters (exposure variables of interest) that make up the FBC in this study. We excluded five: percentage basophils, eosinophils, lymphocytes, monocytes, and neutrophils because we used the corresponding counts (also FBC parameters). Additionally, red blood cell distribution width was excluded because this parameter is not recorded in general practice so was missing for almost all FBCs. We excluded FBC results outside biologically plausible ranges, such as negative values (see Table 3 for further details), FBCs performed earlier than 10 years before index date, and FBCs performed after index date.  A detailed account of our data preparation and validation processes has previously been reported 7 . We also previously provided summary statistics for each FBC parameter.

Statistical analysis
We used LOWESS smoothing to describe trends in FBC parameters graphically for 10-yearly age groups. Controls with many FBCs are likely to have some other condition/disease, which could affect blood levels. Therefore, for both cases and controls, we randomly selected three FBCs per patient (if there were more than three) to reduce the influence of these many effected FBCs on LOWESS trends.
Mixed effects models were developed for each FBC parameter separately (using all available FBCs per patient), using restricted maximum likelihood estimation. To model differences in FBC levels between cases and controls over time, colorectal cancer status (yes/no) and time to index date (years) were included as fixed effects together with an interaction between them. Each model was adjusted for age at index date (years) as a fixed effect and interactions between age and time and age and colorectal cancer status were included if trends over time or by colorectal cancer status differed by age group upon graphical inspection of LOWESS plots. Each patient was modelled using a random intercept and time using a random slope with an unstructured covariance matrix to account for correlation in repeated measures.
Non-linearity of continuous variables was based on visual inspection, Akaike information criteria, and Bayesian information criteria, which compared linear splines, restricted cubic splines, and fractional polynomials, and number of knots and knot locations [8][9][10] . Where non-linear, time to index date was modelled using piecewise linear splines with three knots: at one, two, and four years before index date. Age at index date was modelled using piecewise linear splines with knots at ages 60, 70, and 80 years for red blood cell-related parameters and platelet count and were variable for white blood cell count-related parameters.
To identify opportunities for early detection, we assessed differences in FBC levels over time between cases diagnosed at Duke's tumour Stage A (earliest stage) and D (latest stage). Mixed effects models were developed using the same methods described above but were limited to cases alone and included Duke's tumour stage (A versus D) accordingly as fixed effects instead of colorectal cancer status. To explore the association between microcytic anaemia and diagnosis, we calculated the proportion of patients with microcytic anaemia per six-monthly time band, up to five years before index date.
Microcytic anaemia presence was based on any FBC in the time band, if a patient had multiple, and proportions were calculated out of the number of patients in the time band. Additionally, we derived age-adjusted odds ratios (95% confidence interval (CI)) for microcytic anaemia presence using logistic regression for each time band separately. We visually compared trends to microcytic anaemia thresholds and FBC reference ranges to identify whether trends can pre-date single-value, iron-deficiency referral thresholds and blood-abnormality.
All analyses were stratified by sex. A two-sided significance level of 5% was used for all statistical analyses. Analyses were conducted using Stata/SE 15.1 (RRID: SCR_012763). Alternative, open-access software, such as R (RRID: SCR_001905), can perform the equivalent analyses.

Sensitivity analysis
We recreated the trends and mixed effects models for each FBC parameter using a matched design. Cases and controls were matched 1:5 on age at index and follow-up time. Follow-up was time (years) from first FBC to index, converted into six-monthly bands for matching. A random index date within the control's study period was used instead of the index date of their matched case because the latter heavily reduced the sample size. For example, many controls had an index date after study exit or had no FBCs before index. Mean platelet volume (fL) 9.0-12. The FBC test On average, 14 of the 15 parameters were available within a FBC across FBCs for both males and females separately. Red blood cell distribution width was the FBC parameter missing for almost 100% of FBC tests for both males and females. This is likely because this parameter was historically not reported to the general practice by haematology laboratories, despite being automatically derived by haematology analysers (i.e. machines). Consequently, red blood cell distribution width was excluded from further analyses. Haemoglobin had the least amount of missing data, missing for 1.9% (n = 22,637) and 1.8% (n = 34,678) of FBC tests for males and females, respectively. Missing data for each parameter are provided in Table 3.   controls, 0.2 (0, 10.0) years for female cases and 0.9 (0, 10.0) years for female controls ( Figure 3).
Mixed models for red blood cell-related parameters are in Table 6 (males) and Table 7 (females), platelet-related in Table 8, and white blood cell count-related in Table 9 (males) and Table 10 (females). The presence of colorectal cancer was statistically significantly associated with all parameter levels (p <0.05 for each model) except white blood cell count and eosinophil count for both males and females. Figure 2. Histogram of follow-up time from first FBC to index date 1,2 . 1 Index date was the date of diagnosis for cases and a randomly selected date in the patient's study period for controls. 2 The spike at time=0 in cases is likely due to patients undergoing cancer investigation. This was not expected to influence trends, as the trends rely on sufficient data at each time-point, not comparability of follow-up.  1 Index date was the date of diagnosis for cases and a randomly selected date in the patient's study period for controls. 2 The spike at time=0 in cases is likely due to patients undergoing cancer investigation. This was not expected to influence trends, as the trends rely on sufficient data at each time-point, not comparability of follow-up.  Index date was the date of diagnosis for cases and a randomly selected date in the patient's study period for controls.
Abbreviations: RBC = red blood cells; Hb = haemoglobin; Hc = haematocrit; MCV = mean corpuscular volume; MCH = mean corpuscular haemoglobin; MCHC = mean corpuscular haemoglobin concentration.  Index date was the date of diagnosis for cases and a randomly selected date in the patient's study period for controls.
Abbreviations: RBC = red blood cells; Hb = haemoglobin; Hc = haematocrit; MCV = mean corpuscular volume; MCH = mean corpuscular haemoglobin; MCHC = mean corpuscular haemoglobin concentration.   Index date was the date of diagnosis for cases and a randomly selected date in the patient's study period for controls.
Abbreviations: WBC = white blood cell count.  Index date was the date of diagnosis for cases and a randomly selected date in the patient's study period for controls.
Abbreviations: WBC = white blood cell count. Figure 4- Figure 17. In the raw data (LOWESS curves), there was no apparent difference in trends measured from 10 to four years before index date between cases and controls for both males and females. Within four years before index date, levels changed steadily over time in patients without a diagnosis, such as the reduction in haemoglobin over time that may be due to increasing age. However, cases had trends in FBC levels that diverged from controls for all parameters, with the interaction term between presence of colorectal cancer and time to index date (including splines terms) reaching statistical significance in the mixed effects models (p <0.05 for each FBC parameter model), except basophil count in males (coefficient = -0.0002 (95% CI = -0.0005, 0.0001)). Additionally for cases, our Figures indicate that the rate of change in many FBC levels increased as the time to diagnosis approached. Individualised trends (raw and predicted) are given for four cases ( Figure 18) and four controls ( Figure 19).

Colorectal cancer and microcytic anaemia
Anaemia was present in 48.8% of male cases and 54.3% of female cases on any FBC within one year of diagnosis. At each six-monthly interval up to five years before index date, the proportion with microcytic anaemia was higher in cases than controls ( Figure 20). In cases, the proportion increased as the time to diagnosis approached and was highest at 0-3 months prior to diagnosis: 23.3% (n = 1,188) of 5,107 males and 28.4% (n = 1,286) of 4,521 females with FBCs in that period. The odds of diagnosis (corresponding to microcytic anaemia presence) increased as time to index date approached (Table 11). Presence of microcytic anaemia statistically significantly increased odds of diagnosis at each time band within three years before index date for both males (three-year OR = 2.2 (95% CI = 1.5, 3.1)) and females (three-year OR = 1.7 (95% CI = 1.3, 2.3)). No odds ratio achieved statistical significance at earlier time points.
We compared our graphical trends to microcytic anaemia thresholds. For haemoglobin, trends suggest that the threshold is on average only reached in cases very close to the time of diagnosis, except in the oldest age groups, where the threshold is reached slightly earlier but even controls in this age group reach the threshold. For mean corpuscular volume, trends suggest the threshold is on average not reached, regardless of age group. This suggests only a minority of patients have iron-deficiency determined from the FBC test (maximum 23.3% males and 28.4% females, Figure 20).

Colorectal cancer and FBC reference ranges
For all FBC parameters, the graphical trends showed that levels remained in the reference range for both cases and controls, except red blood cell count, haemoglobin, haematocrit, mean corpuscular volume, and mean platelet volume. In these five parameters, the trends suggest blood levels often only reach abnormal thresholds within approximately six months of diagnosis in younger cases. However, in older cases, levels are abnormal for approximately three years before diagnosis, which was also observed for older controls.
Tumour staging and the FBC The number of cases diagnosed per Duke's tumour stage is in Table 12. Mixed models including Duke's stage at diagnosis, developed using cases alone, are provided for red blood cell-related parameters in Table 13 (males) and Table 14 (males), platelet-related in Table 15, and white blood cell-related in Table 16 (males) and Table 17 (females). In the raw data (LOWESS curves), there appeared to be no difference in trends over time between Stage A and Stage D tumours among older patients. However, changes started up to one year earlier in patients with Stage D in younger patients (see Figure 21 for haemoglobin and Figure 22 for platelets -for the remaining parameters, please see 'Data availability'). This was observed in all FBC parameters except mean platelet volume, basophil count, eosinophil count, and lymphocyte count for both males and females, which showed no apparent difference between tumour stages. There was no difference in graphical trends when using a matched design (see Figure 23 for haemoglobin and Figure 24 for platelets -for the remaining parameters, please see 'Data availability'). Additionally, coefficients from mixed effects models changed only slightly, with 86.4% and 86.3% of coefficients changing by only <0.1 for males and females respectively -for the models, please see 'Data availability'.

Summary
We identified age-and sex-adjusted trends in many FBC parameters that differed between patients with and without a diagnosis within approximately four years before diagnosis. Differences in cases grew larger in the run up to diagnosis, with levels in patients without a diagnosis changing less rapidly over time. Trends may be more useful to identify cases than relying on FBC abnormalities or referral thresholds, as these thresholds were only reached close to diagnosis, with relevant trends present beforehand.
Summary statistics indicate an imbalance in age at index and follow-up time between cases and controls. Our sensitivity analysis showed no apparent differences between matched and unmatched designs. This was expected a priori, because in our unmatched design, graphical trends are already reported by age and sex separately. Additionally, (sex-stratified) mixed effects models included age, accounting for the imbalance. Furthermore, length of follow-up, despite imbalanced (Figure 2), did not influence the trends, as there were many cases and controls with tests available at each time-point, increasing the quality/precision of the trend.

Comparison with existing literature
Two prior studies assessed changes over time in haemoglobin between patients with and without colorectal cancer: one in an Israeli population 11 and one in a combined Swedish and Danish population 12 . We report similar findings in a UK population, with haemoglobin levels that diverge around four years before diagnosis and a greater decline in the run up to diagnosis. We also report changes over time for many other FBC parameters. Another study used machine-learning methods to develop an algorithm called the ColonFlag, which assesses change over time in various parameters (at 18 and 36 months before index FBC) from a single patient to derive a monotone score for diagnosis from 0-100 in an Israeli population (EarlySign) 13 . It is unclear what trends are considered related to colorectal cancer.
A fourth study used logistic regression to test whether the difference between the two most recent FBCs was associated with diagnosis for five parameters in UK primary care data 14 .
With the two tests performed at any time in a mean follow-up period of 6.3 years, they report no association in change in red blood cell count (p = 0.13), white blood cell count (p = 0.06), or haematocrit (p = 0.23) but do report an association for change in mean corpuscular volume (p = 0.04) and mean corpuscular haemoglobin (p = 0.02). However, our study suggests that red blood cell count, white blood cell count, and haematocrit do change over time due to colorectal cancer.
A recent study of haemoglobin levels in newly diagnosed colorectal cancer patients (mean age approximately 70 years) in Finland   Index date was the date of diagnosis for cases and a randomly selected date in the patient's study period for controls.
Abbreviations: RBC = red blood cells; Hb = haemoglobin; Hc = haematocrit; MCV = mean corpuscular volume; MCH = mean corpuscular haemoglobin; MCHC = mean corpuscular haemoglobin concentration.  Index date was the date of diagnosis for cases and a randomly selected date in the patient's study period for controls.
Abbreviations: RBC = red blood cells; Hb = haemoglobin; Hc = haematocrit; MCV = mean corpuscular volume; MCH = mean corpuscular haemoglobin; MCHC = mean corpuscular haemoglobin concentration.   Index date was the date of diagnosis for cases and a randomly selected date in the patient's study period for controls.
Abbreviations: WBC = white blood cell count.      15 . These levels are similar to those identified in this study (Figure 21), where the trends in most FBC parameters were similar between Stage A and D colorectal cancers, but the divergence from controls started up to one year earlier in Stage D diagnoses compared to Stage A. This divergence in patients diagnosed with Stage A colorectal cancer often occurred within one year prior to diagnosis, suggesting a relatively short time window for detection between the earliest and latest stage.
When compared to NICE and WHO guidelines for anaemia, which could be due to any reason including iron-deficiency, our trends indicated haemoglobin levels often reached the threshold for males (<13 g/dL) and females (<12 g/dL) (NICE: Suspected cancer recognition and referral, NICE: Anaemia -iron deficiency, WHO: anaemia) at approximately 6-12 months before diagnosis. Up to one year before diagnosis, anaemia was present in 48.8% of male cases and 54.3% of female cases. These results are similar to those reported in a previous UK primary care study of anaemia within one year prior to colorectal cancer diagnosis 16 . Microcytic anaemia, commonly caused by irondeficiency, which may warrant further investigation for colorectal cancer, was present in 23% of male cases and 28% of female cases within a year prior to diagnosis. Our study suggests there are relevant changes that occur up to three years before the presence of anaemia, including iron-deficiency anaemia, and these changes could be more helpful to facilitate early detection than relying on low haemoglobin levels.
We also compared the FBC results to normal reference ranges (Oxford University Hospitals NHS UK). Abnormal FBC parameter levels are considered to represent health-related conditions or disease. Although differences in most FBC parameters between cases and controls grew larger as the time to diagnosis approached, they remained small overall and often remained in the normal reference range, except for red blood cell count, haemoglobin, haematocrit, mean corpuscular volume, and  1 Index date was the date of diagnosis for cases and a randomly selected date in the patient's study period for controls. 2 LOWESS trends are age (+/-3 years) and modelled (fixed effects) trends are taken at that specific age. Legend: colorectal cancer (blue line) and no cancer (red line). LOWESS trend (solid line) and modelled trend (dashed line).
mean platelet volume. These parameters often only became abnormal close to diagnosis in younger cases but for many years in older cases, which was also observed in older controls. Therefore, such differences in trends between cases and controls Figure 24. Platelets trends for females by age at index 1,2 : unmatched (left) and matched (right). 1 Index date was the date of diagnosis for cases and a randomly selected date in the patient's study period for controls. 2 LOWESS trends are age (+/-3 years) and modelled (fixed effects) trends are taken at that specific age. Legend: colorectal cancer (blue line) and no cancer (red line). LOWESS trend (solid line) and modelled trend (dashed line).
may not be obvious to a clinician in general practice, as these differences would be considered to represent little-to-no concern, and the opportunity to utilise these changes over time to identify colorectal cancer would be missed. Our study supports the conclusions of another recent report that highlighted how the normal range does not necessarily reflect a healthy individual 17 .
Limitations FBC blood tests are ordered for many reasons in primary care, not colorectal cancer specifically, but these reasons are not recorded in CPRD. Patients with FBCs are suspected to be generally less well than patients who are not tested (except in antenatal screening, which includes a FBC), so controls included in this study may not be entirely healthy. However, many existing studies identified in our recent systematic review have shown that in patients with FBCs, the test has potential to distinguish between patients with and without colorectal cancer 4 .
Red blood cell distribution width is a parameter used to diagnose medical conditions, especially colorectal cancer (Medline: Red cell distribution width). Historically, this parameter has not been reported to primary care practices until relatively recently, hence why almost all tests in our data have this value missing. Consequently, we excluded this relevant parameter from all analyses.
Due the nature of the case-control study design, we modelled 10 years of longitudinal data before index for each age group separately. Therefore, the trends in older age groups for many FBC parameters were subject to the 'survivor effect'. For example, in Figure 5 (trends in haemoglobin), controls who survived to 90 years at index are likely healthier 10 years earlier than controls who survived to 80 years at index, who may or may not have survived to age 90. Thus, older patients would often have FBC levels that reflected healthier individuals 10 years earlier on average than patients diagnosed/censored at that younger age. This 'survivor effect' could not be adjusted for in our analyses.
Other known factors influencing colorectal cancer risk, such as ethnicity and family history, and FBC confounders, such as comorbidity status, were not available so were not included in our models. Many relevant FBC confounders, such as diet, vitamin use, and sleeping patterns, are not recorded in electronic health records. Nonetheless, age and sex are key characteristics to adjust trends. Adjustments for additional factors may be considered in future work.
Tumour stage at diagnosis was missing for approximately onethird of colorectal cancer diagnoses. Therefore, many cases were excluded from analyses of tumour staging, reducing sample size and precision of estimates.

Implications for practice
The differences between cases and controls in the trends over time identified in this study would often go unnoticed in routine practice. It is difficult for busy clinicians filing results to notice minor changes in parameter values over time. Therefore, we are developing a dynamic statistical prediction model 18,19 that makes use of trends in the FBC to derive an individual's risk of diagnosis in the future. To develop the model, we will first determine the predictive value of patient-level trends -in this study, we only report that relevant trends exist and may be of help. The prediction model will aim to support the current colorectal cancer UK screening programme by identifying possible cases for investigation (NHS: Bowel cancer screening). It will utilise trends in multiple FBC parameters over time to detect colorectal cancer at earlier time points than is possible using single parameter thresholds.

Conclusions
Many FBC results change due to the presence of colorectal cancer. We identified differences in trends over a five-year period before diagnosis that differed to trends in patients without colorectal cancer. Such trends may pre-date single-value thresholds for referral for cancer investigation and blood-abnormality. They may therefore facilitate earlier detection, improving the likelihood of successful treatment and improved survival rates.

Underlying data
The datasets used in this study are available from the CPRD. The CPRD maintain access rights to the data to ensure it is only used for research purposes by trustworthy organisations, so sharing of data is prohibited (https://cprd.com/data-access). Checks are conducted on organisations carrying out and funding research to assess whether they are suitable to receive CPRD data. This is to ensure, as examples, that the data stays confidential, and it is only used for its approved purpose. An application to access the data can be made at https://cprd.com/data-access.

Extended data
Additional full blood count parameters were analysed (tumour staging and sensitivity analysis) but are not related to the main claims in the article. These data are available from the authors on request.

Consent
CPRD has ethical approval from the Health Research Authority to hold anonymised patient data and to support research using that data. CPRD's approval of data access for individual research projects includes ethics approval and consent for those projects. Ethical approval was therefore covered for this study by the CPRD (protocol 14_195RMn2A2R).