Early detection of colorectal cancer using symptoms and the ColonFlag: case-control and cohort studies

Tim A. Holt; Pradeep S. Virdee; Clare Bankhead; Julietta Patnick; Brian D. Nicholson; Alice Fuller; Jacqueline Birks

doi:10.3310/nihropenres.13360.1

Home Browse Early detection of colorectal cancer using symptoms and the ColonFlag:...

ALL Metrics

-

Views

87

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

Early detection of colorectal cancer using symptoms and the ColonFlag: case-control and cohort studies

[version 1; peer review: 2 approved with reservations]

Tim A. Holt¹, Pradeep S. Virdee ¹, Clare Bankhead¹, [...] Julietta Patnick², Brian D. Nicholson¹, Alice Fuller¹, Jacqueline Birks³

Tim A. Holt¹, Pradeep S. Virdee ¹, [...] Clare Bankhead¹, Julietta Patnick², Brian D. Nicholson¹, Alice Fuller¹, Jacqueline Birks³

PUBLISHED 24 Jan 2023

Author details Author details

¹ Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, OX2 6GG, UK
² Cancer Epidemiology Unit, Nuffield Department of Population Health, University of Oxford, Oxford, OX3 7LF, UK
³ Centre for Statistics in Medicine, University of Oxford, Oxford, OX3 7LD, UK

Tim A. Holt
Roles: Conceptualization, Funding Acquisition, Investigation, Methodology, Project Administration, Supervision, Writing – Original Draft Preparation, Writing – Review & Editing

Pradeep S. Virdee
Roles: Data Curation, Formal Analysis, Investigation, Visualization, Writing – Review & Editing

Clare Bankhead
Roles: Conceptualization, Funding Acquisition, Methodology, Writing – Review & Editing

Julietta Patnick
Roles: Conceptualization, Funding Acquisition, Writing – Review & Editing

Brian D. Nicholson
Roles: Data Curation, Investigation, Writing – Review & Editing

Alice Fuller
Roles: Data Curation, Funding Acquisition, Writing – Review & Editing

Jacqueline Birks
Roles: Conceptualization, Data Curation, Formal Analysis, Funding Acquisition, Investigation, Methodology, Supervision, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

Abstract

Background

Early detection of colorectal cancer confers substantial prognostic benefit. Most symptoms are non-specific and easily missed. The ColonFlag algorithm identifies risk of undiagnosed colorectal cancer using age, sex and changes in full blood count (FBC) indices. The aim of this study was to investigate whether the ColonFlag detects undiagnosed colorectal cancer prior to the recording of symptoms in general practice.

Methods

We conducted case-control and cohort studies by linking primary care data from the Clinical Practice Research Datalink with colorectal cancer diagnoses from the National Cancer Registry. A ColonFlag score was derived for each FBC. We assessed the prevalence of symptoms at six-monthly intervals prior to index date (diagnosis date for cases, randomly selected date for controls). We then derived odds ratios (ORs) and area under the receiver operating characteristic (AUROC) curve for the ColonFlag, and for symptoms using logistic regression at each interval (primary outcome 18–24 months).

Results

We included 1,893,641 patients, 10,875,556 FBCs and 8,918,037 ColonFlag scores. ColonFlag scores began to increase in cases compared with controls around 3–4 years before diagnosis. The AUROC for a diagnosis 18–24 months following the ColonFlag score was 0.736 (95% CI 0.715-0.759), falling to 0.536 (95% CI 0.523-0.548) with adjustment for age. ORs for individual symptoms became non-significant prior to 12 months before index date, except for abdominal pain (females OR=1.29, p<0.0001 at 12–18 months) and rectal bleeding (females OR=2.09, males OR=1.92, p<0.0001 at 18–24 months).

Conclusions

Symptoms appear relatively late in the colorectal cancer process and are limited for supporting early stage detection. The ColonFlag can discriminate usefully at 18–24 months before diagnosis, suggesting a role for this algorithm in primary care, although some of its discriminatory ability comes from the age variable.

Plain Language Summary

Plain English summary

Colorectal cancer remains one of the commonest causes of cancer death in the UK. Early detection is very important. The sooner treatment is started, the better the chances of survival. An Israeli company developed a tool (the ‘ColonFlag’) for detecting subtle changes in the full blood count, a common blood test in general practice, using up to 20 different markers from it, as well as the person’s age and sex. As the diagnosis approaches, changes become more abnormal and the ColonFlag can spot an affected person more easily. In this project, we investigated whether the ColonFlag could identify people with undiagnosed colorectal cancer in the early stage prior to the onset of symptoms.

First of all, we needed to find out exactly what the timescale is for this ‘pre-symptomatic’ phase. We looked at the (anonymous) health records of 1.9 million people to identify when the reporting of possible colorectal cancer symptoms becomes more frequent in those later diagnosed compared to those who were not. We found that prior to 18 months before diagnosis, there was very little difference, but within 18 months, symptoms become increasingly recorded in colorectal cancer cases.

We then assessed whether the ColonFlag could usefully identify colorectal cancer risk at this early stage before symptoms. We found that the ColonFlag scores of people destined to be diagnosed started to increase around 3–4 years before the diagnosis. However, the differences are initially small. In the phase 18–24 months before diagnosis, the ColonFlag could effectively discriminate patients with cancer from patients without, although at this interval a lot of its ability to do so was relying on the person's age. Our results are encouraging, that detecting changes in full blood count markers could add usefully to current approaches to promote earlier diagnosis and treatment of this common cancer.

Keywords

Colorectal cancer, primary care, early detection, symptoms, ColonFlag, blood test trends

Corresponding author: Pradeep S. Virdee

Competing interests: No competing interests were disclosed.

Grant information: This project is funded by the National Institute for Health and Care Research (NIHR) under its Research for Patient Benefit (RfPB) Programme (Grant Reference Number PB-PG-0817-20025). The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2023 Holt TA et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Holt TA, Virdee PS, Bankhead C et al. Early detection of colorectal cancer using symptoms and the ColonFlag: case-control and cohort studies [version 1; peer review: 2 approved with reservations]. NIHR Open Res 2023, 3:6 (https://doi.org/10.3310/nihropenres.13360.1) First published: 24 Jan 2023, 3:6 (https://doi.org/10.3310/nihropenres.13360.1) Latest published: 24 Jan 2023, 3:6 (https://doi.org/10.3310/nihropenres.13360.1)

Abbreviations

AUROC Area Under the Receiver Operating Characteristic

CI Confidence interval

CPRD Clinical Practice Research Datalink

CRC Colorectal Cancer

FBC Full Blood Count

FIT Faecal Immunochemical Test

NCR National Cancer Registry

OR Odds Ratio

Introduction

Colorectal cancer (CRC) (bowel cancer) is the 4^th commonest cancer in the UK¹. Five-year survival is around 93% if diagnosed at stage 1, where it is confined to the bowel, but only 10% at stage 4, where it has metastasised². This type of cancer develops slowly from pre-cancerous polyps, which may grow for years before becoming malignant, making it amenable to early detection. Endoscopic excision of these lesions may either remove early cancers or prevent cancer developing in the first place³.

Until recently, the UK bowel cancer screening programme in England offered the faecal occult blood test to adults aged 60–74 years every two years to screen for CRC in England, and this has shown to be successful in reducing mortality from CRC⁴. However, its impact was limited, with a 59% response rate⁵ and a negative result in around half of cases (sensitivity 49%)⁶. This test has now been replaced with Faecal Immunochemical Testing (FIT), using a threshold of ≥120µg Hb/g stool⁷. In addition to its role in screening, the high sensitivity and negative predictive value of FIT at the lower threshold of ≥10µg Hb/g stool has led to national recommendations for FIT to be used as a triage test in primary care for people with symptoms⁸. A high proportion of symptomatic patients present at a stage beyond surgical cure. Some patients present with specific CRC symptoms, such as rectal bleeding or change in bowel habit, but symptoms are often non-specific with insidious onset, such as abdominal pain and weight loss. Consequently, symptomatic patients report delays in referral for investigation despite presenting to their GP⁹. Risk algorithms have been developed based on symptoms to improve patient selection for investigation by identifying high-risk groups¹⁰. These algorithms may assist in case finding, but rely on symptoms that appear relatively late in the disease process, and have limited uptake in primary care¹¹.

In 2016, a risk algorithm for CRC was developed using machine learning techniques (Gradient Boosting Model and random forests) applied to full blood count (FBC) indices by an Israeli company, Medial EarlySign¹². For each FBC in a patient’s record, up to 20 indices are available and a 'ColonFlag' score ranging from 0 to 100 can be derived, with higher scores indicating a higher risk of undiagnosed CRC. Previous FBC results may contribute to the current score, by indicating trends in the indices, but a score can be produced even if only one FBC is available. The ColonFlag combines these indices with age and sex, but not with symptoms, to stratify CRC risk but does not estimate an absolute risk of CRC within a fixed time. In 2017, we published a report of an independent validation study of the ColonFlag using UK primary care data obtained from the Clinical Practice Research Datalink (CPRD)¹³. Similar performance was found in the UK as in Israel. The ColonFlag could discriminate high-risk from lower-risk patients 18–24 months prior to CRC diagnosis. However, an important assumption remained to be investigated, that the ColonFlag detects CRC cases prior to the reporting of symptoms in primary care. If confirmed, the ColonFlag could add significantly to current approaches to early cancer detection and may have a role in support of the national bowel cancer screening programme.

The aim of this study was to investigate whether the ColonFlag detects undiagnosed CRC prior to the recording of CRC symptoms in general practice.

Methods

Setting

We conducted our analyses using the CPRD GOLD dataset used for our previous initial external validation project, including patients with at least one FBC between 01/01/2000 and 28/04/2015¹³. The CPRD is linked to over 600 UK general practices and contains information on patients' demographic data, registration status, medical history (including cancer diagnoses), and laboratory test values (including FBC indices)¹⁴.

Study design

Case-control and cohort analyses were conducted to compare when the ColonFlag algorithm identified increased CRC risk with the timing of symptom recording in the primary care record. Patient data was confined between study entry (latest date of: registration with a contributing practice, the patient’s 40^th birthday, or 01/01/2000) and study exit (earliest date of: death, leaving the practice, or the most recent update of the CPRD dataset, which may differ between contributing practices).

Study population

We included people over the age of 40 years with at least one FBC in their primary care CPRD record between 01/01/2000 and 28/04/2015 and an associated ColonFlag score. The ColonFlag score was derived by Medial EarlySign, who applied the ColonFlag algorithm to each FBC considered valid according to their criteria during the previous project. We excluded people with less than two years of follow up after study entry, less than 12 months registered at the practice, or a haemoglobin gene defect, as this may produce abnormalities in red cell indices that mimic iron deficiency, such as reduction in haemoglobin and mean cell volume.

Outcome

We identified cases of CRC up until 14/01/2014 using codelists developed for the first project¹³, for practices linked to the National Cancer Registry (NCR) (clinical codes have previously been reported¹⁵). An initial inspection of the capture of CRC diagnoses in CPRD indicated that linkage to the NCR would be important for high quality outcome assessment (analysis available on request). Our sample was therefore confined to those practices linked to the NCR.

Predictors

Age and sex were available for all individuals. FBC indices were extracted using CPRD Medcodes (clinical codes have previously been reported¹⁵). We extracted symptom codes after searching the literature to identify symptoms associated with CRC^16–19. A final list was produced through discussion and consensus between two clinicians (TAH and BDN) and used to derive a longitudinal dataset for each patient reporting at least one symptom. The eight symptom groups were: abdominal pain, appetite loss, diarrhoea, weight loss, constipation, rectal bleeding, change in bowel habit and 'other'. More than one symptom could be recorded on a single day.

Case-control analysis

Symptoms: the index date for cases was the date of CRC diagnosis. The index date for controls was a randomly selected date between study entry and exit. There were 15 controls matched to each case based on age at index date. For the primary analysis, we ascertained whether a patient had any symptom between 18 and 24 months before the index date. Logistic regression was used to derive odds ratios (ORs) and the AUROC including a binary variable for each symptom group as the independent variables in a multivariable model with age at index date. Females and males were analysed separately throughout.

ColonFlag: a ColonFlag score was identified 18 to 24 months before the index date. If there was no score available, the patient was excluded. If a patient had more than one score in this interval, one score was randomly selected. There were 100 controls matched to each case based on age at ColonFlag score date, sex, and year of score. Logistic regression was used to derive ORs and the AUROC for the independent variable, the ColonFlag score.

We then compared the discrimination performance measures of the ColonFlag score with that of symptoms. The primary outcome was the difference in the AUROC scores in the two analyses for the 18–24 month interval. The analysis was repeated for additional time intervals, 0–6, 6–12, 12–18, and 24–30 months before the index date.

Cohort analysis

Next, to investigate the result of applying the ColonFlag to a defined cohort of patients, all patients with a ColonFlag score in 2011 not previously diagnosed with CRC were followed for up to 24 months from the ColonFlag date (baseline). The year 2011 was selected as this was the most recent year for which 24 months of follow up were available in the NCR. If a patient had more than one ColonFlag score in 2011, then the earliest was selected. All diagnoses confined between 18–24 months following baseline were included, with patients diagnosed before 18 months excluded. Patients without a diagnosis within 24 months were censored at 24 months following baseline, with those with less than 24 months of follow up excluded. Logistic regression was used with CRC diagnosis within 18–24 months of the ColonFlag score as the outcome and the ColonFlag score, age at baseline, and sex, as predictors to derive ORs and the AUROC. In addition, diagnostic performance characteristics for the ColonFlag score were calculated, corresponding to 99.5% specificity and 50% sensitivity.

Subsequently, for the same set of patients, we identified all symptoms reported between 3 months before and 3 months after the date of the ColonFlag score (baseline). We used logistic regression with a diagnosis of CRC within 18–24 months as the outcome and presence of any of the types of symptom, age at baseline, and sex as predictors.

The analyses were repeated for a diagnosis of CRC within 24 months following the ColonFlag date, which, unlike the 18–24 month outcome, included all patients diagnosed within the 24 month follow up period.

Trajectories of ColonFlag scores

Following our case-control analysis design for the ColonFlag score analysis, we modelled the trajectories of the ColonFlag score (which increases spontaneously over time simply due to increasing age) from index date backwards using linear mixed models and compared cases and controls. All ColonFlag scores were included that preceded the patient’s index date. We investigated whether there was an accelerated increase (a 'change-point') in ColonFlag scores for cases compared with controls at an identifiable time interval before diagnosis. This method has been used to identify people at risk of ovarian cancer based on serial CA125 levels²⁰, and for that condition improves detection compared with a single value cut-off²¹.

Ability of ColonFlag to detect increased CRC risk when the FBC is ‘normal’

We then investigated whether the ColonFlag might identify undiagnosed CRC when all FBC indices are within their normal reference ranges by detecting adverse trends in previous indices even if the current FBC is ‘normal’. This is important because ‘normal’ FBCs are not examined in any detail by busy clinicians filing results and in some situations might be ‘batch filed’ without even being opened. Batch filing reduces to zero the already small chance that a human clinician will notice ‘within range’ changes of concern, so the help of an algorithm to do so would be very useful. We examined the proportion of FBCs in our dataset where all the indices were within range; of these, the proportion with a high ColonFlag score above our two thresholds (corresponding to 99.5% specificity and 50% sensitivity); and of these, the proportion diagnosed with CRC within 3 years. Three years was chosen to allow enough time to determine whether CRC was present at the time of the normal FBC. Longer follow up intervals (e.g. 4 years) might risk identifying people with a later cancer diagnosis that was not initially present, but developed subsequently to the normal FBC.

All analyses were performed in Stata SE version 17 (RRID: SCR_012763). Alternative, open-access software, such as R (RRID: SCR_001905), can perform the equivalent analyses.

Results

The initial dataset contained 16,537,017 FBC observations belonging to 2,856,020 patients. Each patient had between 1 and 383 FBCs in their record (median 4). Of these, 13,381,427 (80.6%) had an associated ColonFlag score, as one fifth did not pass the quality/eligibility checks conducted by Medial EarlySign.

We were confident that our study population had good quality recording of the main outcome, a CRC diagnosis, based upon a high proportion (95%) of the diagnoses occurring within 6 months of each other in the two data sources (Figure 1). There were 386 linked practices, providing patient numbers ranging between 431 and 21,102 per practice. Of the 1,893,641 patients, 24,557 (1.3%) had a diagnosis of CRC. We compared the age and sex distribution of our sample (aged at least 40 years) with that of England in two years (2007 and 2012). There was a satisfactory match, apart from in the younger age bands, 40-60, where there was a shortfall of males, presumably due to less frequent FBC testing in this age group (Figure 2). We then compared the incidence of CRC with that of England, and found a close match, based on cancer registrations in the NCR, which were part of our dataset (Figure 3).

Figure 1. Date of colorectal cancer diagnosis in the CPRD compared to the NCR.

Legend: Clinical Practice Research Datalink (CPRD). National Cancer Registry (NCR).

Figure 2. The age and sex distribution of our CPRD sample.

Legend: 2007 (a) and 2012 (b).

Figure 3. Incidence of colorectal cancer in our sample and in England.

Legend: 2007 (a) and 2012 (b).

Restriction to practices linked to the NCR resulted in 386 practices, as described above. This resulted in 1,893,641 patients, 10,875,556 FBCs, and 8,918,037 associated ColonFlag scores.

Case-control analysis

Symptoms: the proportion of cases and controls reporting symptoms is displayed for each 6-monthly interval before index date for any symptom (Figure 4) and each individual symptom group (Figure 5), with proportions reported in Table 1 for both. The prevalence of symptoms fell from an initially high level from 0–6 months in cases to a level comparable with controls by 12–18 months. Prior to this (at 18-24 months), the only symptom significantly more common in cases was rectal bleeding (Table 2). These results confirmed our chosen interval of 18–24 months for the pre-symptomatic phase, which we had also used as the primary outcome interval in our original validation study¹³.

Table 1. Number and proportion of patients reporting symptoms per 6-month time interval before index date.

Legend: Case-control analysis matched 1:15 on age at index date. Time is increasing backwards from index date (-6 is 0–6 months before index, -12 is 6–12 months before index, etc.).

Females
Time interval (months)	0–6		6–12		12–18		18–24		24–30
	Cases n=8344	Controls n=125160	Cases n=8344	Controls n=125160	Cases n=8344	Controls n=125160	Cases n=8344	Controls n=125160	Cases n=8344	Controls n=125160
Symptom
Constipation	613 (7.4%)	2279 (1.8%)	224 (2.7%)	2115 (1.7%)	17 (2.0%)	1981 (1.6%)	115 (1.4%)	1979 (1.6%)	104 (1.2%)	1804 (1.4%)
Abdominal pain	1514 (18.1%)	3447 (2.8%)	381 (4.6%)	3462 (2.8%)	286 (3.4%)	3329 (2.7%)	225 (2.7%)	3316 (2.6%)	210 (2.5%)	3286 (2.6%)
Loss of appetite	59 (0.7%)	157 (0.1%)	19 (0.2%)	148 (0.1%)	10 (0.1%)	166 (0.1%)	9 (0.1%)	134 (0.1%)	9 (0.1%)	141 (0.1%)
Diarrhoea	656 (7.9%)	2265 (1.8%)	184 (2.2%)	2165 (1.7%)	150 (1.8%)	2095 (1.7%)	141 (1.7%)	2098 (1.7%)	144 (1.7%)	2026 (1.6%)
Weight loss	181 (2.2%)	553 (0.4%)	51 (0.6%)	466 (0.4%)	27 (0.3%)	440 (0.4%)	31 (0.4%)	406 (0.3%)	26 (0.3%)	388 (0.3%)
Rectal bleeding	1222 (14.6%)	690 (0.6%)	176 (2.1%)	628 (0.5%)	109 (1.3%)	636 (0.5%)	84 (1.0%)	615 (0.5%)	55 (0.7%)	644 (0.5%)
Change of bowel habit	489 (5.9%)	359 (0.3%)	63 (0.8%)	368 (0.3%)	26 (0.3%)	351 (0.3%)	28 (0.3%)	350 (0.3%)	16 (0.2%)	342 (0.3%)
Other symptom	114 (1.4%)	155 (0.1%)	12 (0.1%)	126 (0.1%)	8 (0.1%)	156 (0.1%)	6 (0.07%)	131 (0.1%)	10 (0.1%)	127 (0.1%)
Any symptom	3886 (46.6%)	8939 (7.1%)	970 (11.6%)	8574 (6.8%)	718 (8.6%)	8329 (6.6%)	585 (7.0%)	8185 (6.5%)	530 (6.4%)	7959 (6.4%)
Males
Time interval (months)	0–6		6–12		12–18		18–24		24–30
	Cases n=9786	Controls n=145590	Cases n=9786	Controls n=145590	Cases n=9786	Controls n=145590	Cases n=9786	Controls n=145590	Cases n=9786	Controls n=145590
Symptom
Constipation	658 (6.7%)	2636 1.8%)	203 (2.1%)	2336 1.6%)	131 (1.3%)	2122 (1.5%)	122 (1.2%)	2021 (1.4%)	138 (1.4%)	1903 (1.3%)
Abdominal pain	1419 (14.5%)	3044 (2.1%)	331 (3.4%)	2856 (2.0%)	246 (2.5%)	2844 (2.0%)	185 (1.9%)	2794 (1.9%)	194 (2.0%)	2871 (2.0%)
Loss of appetite	56 (0.6%)	151 (0.1%)	14 (0.1%)	123 (0.1%)	12 (0.1%)	130 (0.1%)	8 (0.1%)	96 (0.1%)	4 (0.05%)	96 (0.1%)
Diarrhoea	842 (8.6%)	1975 (1.4%)	202 (2.1%)	1927 (1.3%)	134 (1.4%)	1732 (1.2%)	111 (1.1%)	1802 (1.2%)	103 (1.1%)	1698 (1.2%)
Weight loss	255 (2.6%)	526 (0.4%)	45 (0.5%)	487 (0.3%)	28 (0.3%)	481 (0.3%)	34 (0.4%)	419 (0.3%)	27 (0.3%)	415 (0.3%)
Rectal bleeding	1522 (15.6%)	819 (0.6%)	186 (1.9%)	759 (0.5%)	105 (1.1%)	767 (0.5%)	101 (1.0%)	776 (0.5%)	92 (0.9%)	798 (0.6%)
Change of bowel habit	941 (906%)	429 (0.3%)	66 (0.7%)	342 (0.2%)	39 (0.4%)	364 (0.2%)	19 (0.2%)	351 (0.2%)	19 (0.2%)	330 (0.2%)
Other symptom	84 (0.9%)	109 (0.1%)	7 (0.1%)	80 (0.1%)	4 (0.04%)	100 (0.1%)	5 (0.05%)	102 (0.07%)	6 (0.06%)	78 (0.05%)
Any symptom	4737 (48.4%)	8802 (6.1%)	927 (9.5%)	8143 (5.6%)	641 (6.6%)	7819 (5.4%)	522 (5.3%)	7682 (5.3%)	531 (5.4%)	7524 (5.2%)

Table 2. Multivariable logistic regression including types of symptoms per 6-month time interval.

Legend: Case-control analysis matched 1:15 on age at index date. Females (n=133,504) and males (n=155,376). Each symptom is modelled as a binary variable, =1 if the patient experienced that symptom in that time interval.

Females – outcome diagnosis of colorectal cancer
Time before index date	0–6 months		6–12 months		12–18 months		18–24 months		24–30 months
Symptom	OR	p-value	OR	p-value	OR	p-value	OR	p-value	OR	p-value
Constipation	2.62	<0.0001	1.43	<0.0001	1.22	0.02	0.84	0.09	0.85	0.14
Abdominal pain	6.77	<0.0001	1.59	<0.0001	1.29	<0.0001	1.01	0.92	0.96	0.63
Loss of appetite	2.20	<0.0001	1.92	0.008	0.99	0.98	1.01	0.97	1.11	0.76
Diarrhoea	3.12	<0.0001	1.11	0.21	1.00	0.98	0.98	0.84	1.07	0.45
Weight loss	4.01	<0.0001	1.43	0.02	0.89	0.59	1.03	0.87	1.01	0.97
Rectal bleeding	28.47	<0.0001	4.04	<0.0001	2.50	<0.0001	2.09	<0.0001	1.29	0.09
Change of bowel habit	18.72	<0.0001	2.25	<0.0001	0.99	0.98	1.10	0.66	0.79	0.37
Other symptom	8.24	<0.0001	1.14	0.69	0.71	0.38	0.77	0.53	1.21	0.58
Age at index date (years)	1.00	0.11	1.00	0.008	1.00	0.02	1.00	0.03	1.00	0.03
Males – outcome diagnosis of colorectal cancer
Time before index date	0–6 months		6–12 months		12–18 months		18–24 months		24–30 months
Symptom	OR	p-value	OR	p-value	OR	p-value	OR	p-value	OR	p-value
Constipation	2.44	<0.0001	1.17	0.052	0.84	0.09	0.87	0.19	0.99	0.93
Abdominal pain	7.31	<0.0001	1.69	<0.0001	1.29	<0.0001	0.95	0.56	0.98	0.81
Loss of appetite	1.76	0.01	1.31	0.40	0.90	0.79	1.27	0.54	0.75	0.57
Diarrhoea	4.75	<0.0001	1.47	<0.0001	1.10	0.34	0.85	0.14	0.85	0.15
Weight loss	5.31	<0.0001	1.18	0.35	0.83	0.38	1.22	0.30	0.94	0.76
Rectal bleeding	30.12	<0.0001	3.18	<0.0001	2.04	<0.0001	1.92	<0.0001	1.64	<0.0001
Change of bowel habit	31.06	<0.0001	2.46	<0.0001	1.74	0.001	0.74	0.26	0.91	0.72
Other symptom	6.41	<0.0001	1.10	0.82	0.51	0.25	0.76	0.76	0.87	0.79
Age at index date (years)	0.99	<0.0001	0.99	<0.0001	0.99	<0.0001	0.99	<0.0001	0.99	<0.0001

ColonFlag score: We then assessed the predictive ability of the ColonFlag score. Descriptive statistics for cases and controls in the primary (18–24 month) analysis and a summary of the ColonFlag scores are given in Table 3. We derived AUROC curve scores for the ColonFlag score at each time interval, showing that predictive performance decreases the earlier the ColonFlag score is from diagnosis (Table 4). Each interval had its own sub-population (descriptive statistics available on request). The AUROC for ColonFlag without including age in the calculation, due to matching cases to controls using age, was 0.536 (95% CI 0.523, 0.548), which although small was significantly different from chance. The ColonFlag AUROC scores reported in this analysis are substantially lower than those reported in the cohort study below, due to the effective removal of the age component of the ColonFlag through matching for age.

Table 3. Descriptive statistics for age and sex by colorectal cancer (CRC) diagnosis.

Legend: Case-control analysis matched 1:100 on age at ColonFlag score, sex, and year of score. ColonFlag scores are between 18–24 months before diagnosis for cases and with at least 18 months of follow up after the ColonFlag score for controls. Age is at time of ColonFlag score. Cases are according to the National Cancer Registry.

	No diagnosis of CRC			Diagnosis of CRC
	Number	Age (SD) Range (years)	ColonFlag score (SD) Range	Number	Age (SD) Range (years)	ColonFlag score (SD) Range
Female	222300	74.5 (11.3) 40–98	74.4 (20.3) 0–100	2223	74.5 (11.3) 40–98	76.2 (21.0) 0–100
Male	235600	72.9 (10.0) 40–99	81.9 (17.0) 0–100	2356	72.8 (10.0) 40–99	83.3 (16.9) 0–100
Total	457900	73.6 (10.7) 40–99	78.3 (19.1) 0–100	4579	73.6 (10.7) 40–99	80.0 (19.2) 0–100

Table 4. AUROC scores for a diagnosis of CRC at different time intervals before index date.

Legend: Case-control analysis matched 1:100 on age at ColonFlag score, sex, and year of score. The ColonFlag score as the only predictor.

	AUROC score (95% CI)
	Males				Females
Time interval (months)	0–6	6–12	12–18	18–24	0–6	6–12	12–18	18–24
ColonFlag	0.624 (0.618, 0.6360	0.605 (0.593, 0.616)	0.557 (0.545, 0.568)	0.536 (0.523, 0.548)	0.623 (0.616, 0.630)	0.624 (0.612, 0.636)	0.567 (0.554, 0.579)	0.536 (0.523, 0.549)

Figure 4. Percentage of cases and controls reporting any symptom in 6-month time bands.

Legend: Case-control analysis matched 1:15 on age at index date. Time is increasing backwards from index date (-6 is 0–6 months before index, -12 is 6–12 months before index, etc.).

Figure 5. Prevalence of individual symptoms in 6-month time bands.

Legend: Case-control analysis matched 1:15 on age at index date. Time is increasing backwards from index date (-6 is 0–6 months before index, -12 is 6–12 months before index, etc.). Males (a) and females (b).

Cohort analysis

In this analysis, patients diagnosed within 18 months following baseline were omitted (N=1,722). Additionally, patients without a diagnosis who were lost to follow up (N=90,599) or died (24,443) within two years baseline were omitted.

There were 434 patients diagnosed with CRC within 18–24 months following baseline (Table 5). Figure 6 shows the distribution of ColonFlag scores in people at baseline in 2011, comparing those who would and would not be diagnosed with CRC within 18–24 months, showing an excess of high ColonFlag scores in the future cases before adjustment for age. Logistic regression for CRC diagnosed within 18–24 months including the ColonFlag score, age, and sex gave an AUROC of 0.736 (95% CI 0.715, 0.759) (Table 6). Adding any symptom instead of the ColonFlag score gave an AUROC of 0.725 (0.704, 0.747) (Table 7). When breaking down to each symptom group individually in a multivariable model, the only symptom with a significant OR at 18–24 months was rectal bleeding, as expected from the case-control analysis (Table 8). It is clear that much of the discriminatory ability of either approach is due to the age and sex variables, which themselves give an AUROC of 0.725 (0.703, 0.747) (Table 9). This analysis produces the primary outcome for this project, a comparison between the AUROC for the ColonFlag (0.736) and the AUROC for any symptom plus age (0.725) for a diagnosis of CRC 18–24 months into the future (Table 10). It was not possible to remove the age and sex component of the ColonFlag score here. This was conducted in the case-control analysis reported earlier, producing an AUROC at 18–24 months of 0.536, which, although small, is significantly different from chance. ORs for the ColonFlag score are provided in the subgroup of patients with symptoms (Table 11) and without symptoms (Table 12).

Table 5. Characteristics of patients with a ColonFlag score in 2011 with and without a diagnosis in 18–24 months.

Legend: Cohort analysis. Those diagnosed have a diagnosis of colorectal cancer between 18–24 months after the score date in 2011 (baseline) and those with no diagnosis have at least 24 months of follow up after the score date (baseline) with no diagnosis.

	Number	Mean age (range)	Mean ColonFlag score (range)	% Females
With diagnosis	434	73.1 (41–98)	83.8 (6–100)	46.5
No diagnosis	400100	62.8 (40–106)	56.3 (0–100)	57.1

Figure 6. Distribution of ColonFlag scores in 2011 for colorectal cancer in 18–24 months.

Legend: Cohort analysis. Patient who would not (0) and those who would (1) be diagnosed with colorectal cancer.

Table 6. Logistic regression of ColonFlag score for diagnosis within 18–24 months later.

Legend: Cohort analysis. Age at ColonFlag score. Those diagnosed have a diagnosis of colorectal cancer between 18–24 months after the score date in 2011 (baseline) and those with no diagnosis have at least 24 months of follow up after the score date (baseline) with no diagnosis.

Diagnosis at 18–24 months	OR	95% CI	p-value
ColonFlag/unit increase	1.025	1.017, 1.033	<0.0001
Age/year increase	1.025	1.011, 1.039	<0.0001
Female	0.792	0.646, 0.971	0.025
AUROC = 0.736 (95% CI 0.715, 0.759) N=400,534

Table 7. Logistic regression for any symptom for diagnosis within 18–24 months later.

Legend: Cohort analysis. Any symptom is a binary variable, =1 if the patient reported any symptom at time of ColonFlag score (baseline) ± 3 months. Age at ColonFlag score. Those diagnosed have a diagnosis of colorectal cancer between 18–24 months after the score date in 2011 (baseline) and those with no diagnosis have at least 24 months of follow up after the score date (baseline) with no diagnosis.

Diagnosis at 18–24 m	OR	95% CI	p-value
Any symptom	1.115	0.834, 1.491	0.46
Female	0.622	0.514, 0.751	<0.0001
Age/year increase	1.062	1.054, 1.070	<0.0001
AUROC =0.725 (95% CI 0.704, 0.747) (n=400,534)

Table 8. Multivariable logistic regression including individual symptoms for diagnosis within 18–24 months of the ColonFlag score.

Legend: Cohort analysis. Each symptom is a binary variable, =1 if the patient reports that symptom at ColonFlag score (baseline) ± 3 months. Age at ColonFlag score. Those diagnosed have a diagnosis of colorectal cancer between 18–24 months after the score date in 2011 (baseline) and those with no diagnosis have at least 24 months of follow up after the score date (baseline) with no diagnosis.

Diagnosis at 18–24 m	OR	95% CI	p-value
Constipation	0.265	0.085, 0.827	0.022
Abdominal pain	0.785	0.468, 1.317	0.36
Appetite loss	1.223	0.170, 8.771	0.84
Diarrhoea	1.259	0.722, 2.196	0.42
Weight loss	1.151	0.429, 3.092	0.78
Rectal bleeding	2.970	1.707, 5.170	<0.0001
Change in bowel habit	1.904	0.846, 4.285	0.12
Other	1.508	0.211, 10.774	0.682
Female	0.622	0.514,0.752	<0.0001
Age/year increase	1.062	1.054,1.070	<0.0001
AUROC = 0.732 (95% CI 0.710,0.753) n=400534

Table 9. Logistic regression with only age and sex for diagnosis within 18–24 months of the ColonFlag score.

Legend: Cohort analysis. Age at ColonFlag score. Those diagnosed have a diagnosis of colorectal cancer between 18–24 months after the score date in 2011 (baseline) and those with no diagnosis have at least 24 months of follow up after the score date (baseline) with no diagnosis.

Diagnosis at 18–24 m	OR	95% CI	p-value
Age/year increase	1.061	1.053, 1.069	<0.0001
Female	0.621	0.514, 0.751	<0.0001
AUROC = 0.725 (95% CI 0.703, 0.747) (N=400,534)

Table 10. Comparison of AUROCs for a diagnosis of colorectal cancer 18–24 months from ColonFlag score.

Legend: Cohort analysis. Age at ColonFlag score. Those diagnosed have a diagnosis of colorectal cancer between 18–24 months after the score date in 2011 (baseline) and those with no diagnosis have at least 24 months of follow up after the score date (baseline) with no diagnosis.

Diagnosis at 18–24 m	AUROC	95% CI
Age and sex	0.725	0.703, 0.747
Any symptom, age and sex	0.725	0.704, 0.747
ColonFlag, age and sex	0.736	0.715, 0.759

Table 11. Diagnostic accuracy of ColonFlag to predict diagnosis within 18–24 months in patients with symptoms.

Legend: Cohort analysis. Those diagnosed have a diagnosis of colorectal cancer between 18–24 months after the score date in 2011 (baseline) and those with no diagnosis have at least 24 months of follow up after the score date (baseline) with no diagnosis, and all reported any symptom at the time of the ColonFlag score ± 3 months.

Diagnosis in 18–24 months	OR	95% CI	p-value
ColonFlag	1.038	1.017, 1.059	<0.0001
Age	1.000	0.965, 1.035	0.98
Female	1.089	0.613, 1.935	0.77
AUROC = 0.744 (95% CI 0.673, 0.815) n=44626
	CRC diagnosis
	Positive	Negative	Total
Risk≥0.381%	6	223	229
Risk<0.381%	46	44351	44397
	52	44579	44626
At 99.5% specificity: sensitivity=11.5%, PPV=2.6%, NPV=99.9%:
	CRC diagnosis
	Positive	Negative	Total
Risk≥0.228%	26	7212	7238
Risk<0.2284%	26	37362	37388
	52	44574	44626
At 50.0% sensitivity: specificity=83.8%, PPV=0.4%, NPV=99.9%:

Table 12. Diagnostic accuracy of ColonFlag to predict diagnosis within 18–24 months in patients without symptoms.

Legend: Cohort analysis. Those diagnosed have a diagnosis of colorectal cancer between 18–24 months after the score date in 2011 (baseline) and those with no diagnosis have at least 24 months of follow up after the score date (baseline) with no diagnosis, and all reported no symptom at the time of the ColonFlag score ± 3 months.

Diagnosis at 18–24 m	OR	95% CI	p-value
ColonFlag	1.022	1.014, 1.031	<0.0001
Age	1.029	1.014, 1.044	<0.0001
Female	0.751	0.604, 0.935	0.01
AUROC = 0.736 (95% CI 0.712, 0.759) n=355908
	CRC diagnosis
	Positive	Negative	total
Risk ≥0.419%	3	1777	1780
Risk<0.419%	379	353749	354128
	382	355526	355908
At 99.5% specificity: sensitivity=0.8%, PPV=0.2%, NPV=99.9%
	CRC diagnosis
	Positive	Negative	total
Risk≥0.194%	191	66182	66373
Risk<0.194%	191	289344	289535
	382	355526	355908
At 50.0% sensitivity: specificity=81.4%, PPV=0.3%, NPV=99.9%

There were 2,150 patients diagnosed with CRC within 24 months following baseline (Table 13). An increase in the ColonFlag score was associated with a higher likelihood of CRC diagnosis within 24 months (OR=1.050, 95% CI 1.047, 1.053), with an AUROC of 0.783 (95% CI 0.773, 0.792) (Table 14). Additionally, at 99.5% specificity, the ColonFlag score has a sensitivity of 10%. When investigating symptoms, each symptom group was associated with an increased likelihood of diagnosis within 24 months, except appetite loss (OR=1.667, 95% CI 0.989, 3.093) (Table 15). Table 16 reports the numbers of patients at baseline with and without any symptoms reported within 3 months of the ColonFlag score and Table 17 demonstrates the changes in predictive values of the ColonFlag in the presence of symptoms.

Table 13. Characteristics of patients with a ColonFlag score in 2011 with and without a diagnosis within 24 months.

Legend: Cohort study. Those diagnosed have a diagnosis of colorectal between 0–24 months after the ColonFlag score date (baseline) and those with no diagnosis have at least 24 months of follow up.

	Number	Mean Age (range)	Mean ColonFlag score (range)	% Females
With diagnosis	2150	73.0 (41–98)	82.1 (2–100)	46.5
No diagnosis	400092	62.8 (40–106)	56.3 (0–100)	57.1

Table 14. Diagnostic accuracy of ColonFlag to predict diagnosis within 24 months.

Legend: Cohort study. Those diagnosed have a diagnosis of colorectal between 0–24 months after the ColonFlag score date (baseline) and those with no diagnosis have at least 24 months of follow up. Predictive values for ColonFlag score cut-offs >99.8 (associated with 99.5% specificity) and >88.1 (associated with 50% sensitivity).

Diagnosis within 24 months after FBC	OR	95% CI	p-value
ColonFlag/unit increase	1.050	1.047, 1.053	<0.0001
AUROC = 0.783 (95% CI 0.773, 0.792) (N=402242)
	CRC diagnosis
ColonFlag	Positive	Negative	Total
Score ≥99.8	237	2142	2379
Score<99.8	1913	397950	399863
	2150	400092	402242
At 99.5% specificity: sensitivity=10.0%, PPV=10.0%, NPV=99.5%
	CRC diagnosis
ColonFlag	Positive	Negative	Total
Score ≥88.1	1075	54895	55970
Score<88.1	1075	345197	346272
	2150	400092	402242
At 50.0% sensitivity: specificity=86.3%, PPV=1.9%, NPV=99.7%

Table 15. Logistic regression predicting diagnosis within 24 months following the ColonFlag score (baseline).

Legend: Cohort study. Each symptom is included as a binary variable, =1 if the patient reports that symptom at baseline ± 3 months. Those diagnosed have a diagnosis of colorectal between 0–24 months after the ColonFlag score date (baseline) and those with no diagnosis have at least 24 months of follow up.

Diagnosis within 24 months after FBC	OR	95% CI	p-value
Constipation	1.498	1.219, 1.840	<0.0001
Abdominal pain	2.612	2.282, 2.990	<0.0001
Appetite loss	1.667	0.989, 3.093	0.11
Diarrhoea	2.083	1.737, 2.498	<0.0001
Weight loss	2.431	1.809, 3.267	<0.0001
Rectal bleeding	6.905	5.838, 8.168	<0.0001
Change in bowel habit	4.997	4.028, 6.198	<0.0001
Other symptom	4.035	2.453, 6.639	<0.0001
Female	0.602	0.552, 0.656	<0.0001
Age/year increase	1.062	1.058, 1.066	<0.0001
AUROC = 0.767 (95% CI 0.758, 0.776) n=402242

Table 16. Patient characteristics by prevalence of symptoms at time of ColonFlag score (baseline).

Legend: Cohort study. Those diagnosed have a diagnosis of colorectal within 18–24 months after the ColonFlag score date (baseline) and those with no diagnosis have at least 24 months of follow up.

		Number	Mean Age(range)	Mean ColonFlag score(range)	% Females
With symptoms	With diagnosis	52	72.3 (42–89)	78.6 (6–100)	55.6
With symptoms	No diagnosis	44574	62.4 (40–104)	55.5 (0–100)	61.9
No symptoms	With diagnosis	382	73.1 (41–98)	77.5 (6–100)	45.3
No symptoms	No diagnosis	355526	62.8 (40–106)	56.4 (0–100)	56.5

Table 17. Predictive values for the ColonFlag in patients with and without any symptom.

Legend: Cohort study. Symptoms are ± 3 months of ColonFlag score (baseline). Predictive values are at a threshold of specificity = 99.5% for the two outcomes.

	Outcome CRC between 18 and 24 months		Outcome CRC up to 24 months
	PPV (%)	NPV (%)	PPV (%)	NPV (%)
No symptoms	0.2	99.9	5.0	99.6
Symptoms	2.6	99.9	12.5	98.6

Trajectories of the ColonFlag scores

Figure 7 shows the trajectories of the ColonFlag score moving back in time from the diagnosis, in cases compared to controls, using age 70 years at index as an example. Similar patterns were seen for other ages. Changes in ColonFlag scores (and therefore changes in FBC indices) appear to begin to occur up to 3–4 years before the diagnosis. The ColonFlag represents a function of risk whose form is unknown and very high values are rounded down to 100, and this effect is evident in Figure 8 and Figure 9, which display the LOWESS lines derived from our data, and may underestimate the gradient as the diagnosis approaches. An abrupt ‘change point’ in the score trajectory was not observed.

Figure 7. Predicted trends for the ColonFlag score for cases and controls aged 70 years at index.

Legend: Case-control analysis matched 1:100 on age at ColonFlag score, sex, and year of score. “Predicted” refers to trends from mixed models. Time=0 represents index date.

Figure 8. Trend in the ColonFlag for male cases aged 65–75 years at diagnosis.

Figure 9. Trend in ColonFlag for female cases aged 65–75 years at diagnosis.

Ability of ColonFlag to detect risk when the FBC is ‘normal’

Among all FBCs, 18.6% were within ‘normal range’ for all indices (Table 18). Among these patients, there were high ColonFlag scores. At the threshold associated with 99.5% specificity, a small proportion (0.10%) had a positive score, of whom 3.17% were diagnosed with CRC within 3 years. At the 50% specificity threshold, 10.25% had a positive score, of whom 1.46% were diagnosed.

Table 18. Proportions of FBCs that are ‘normal’, with ColonFlag scores above two thresholds, and CRC outcomes.

FBCs available with an associated ColonFlag score	Number (%) with all indices within their reference ranges	Number (%) of these with a ColonFlag score above 99.8 (99.5% specificity)	Number (%) with a ColonFlag score above 88.1 (50% specificity)
8,832,859	1,642,474 (18.60%)	1,671/1,642,474 (0.10%)	168,298/1,642,474 (10.25%)
CRC within 3 years of FBC:		53/1,671 (3.17%)	2,456/168,298 (1.46%)

Discussion

Summary of overall findings

This project was designed to determine whether the ColonFlag, an algorithm derived in Israel through machine learning techniques that draws on FBC data in electronic healthcare records, can identify patients with undiagnosed CRC prior to the reporting of symptoms. In our case-control analysis, trajectories of the ColonFlag score began to diverge in cases compared with controls at around 3–4 years before diagnosis, a time when symptoms are only minimally different between cases and controls. This was further evidenced by non-significant ORs for symptoms for diagnosis in 18–24 months, except rectal bleeding. In our cohort analysis, the addition of either a ColonFlag score or symptoms to age and sex did not significantly improve predictive ability for diagnosis in 18–24 months than age and sex alone. This is consistent with our case-control analysis, showing that much of the discriminatory performance of the ColonFlag at this timescale draws on the age variable rather than the changes in FBC indices, with reduction in AUROC from 0.736 to 0.536 when age is eliminated through the age matched case-control design.

Strengths and limitations

We were confident that our study population had good quality recording of the main outcome, a CRC diagnosis, based upon a high proportion (95%) of the diagnoses occurring within 6 months of each other in the two data sources. Additionally, when we compared the incidence of CRC with that of England, we found a close match, based on cancer registrations in the NCR. Our analysis also confirmed our chosen interval of 18–24 months for the pre-symptomatic phase, which we had also used as the primary outcome interval in our original validation study¹³.

The company Medial EarlySign had, as part of a previous project, derived ColonFlag scores for almost 9 million FBC reports in 1.9 million patients. The study is limited by the need to exclude a proportion of patients to ensure high quality outcome capture from the NCR and that a further one fifth of FBCs were excluded during quality checks by Medial EarlySign. The ColonFlag also represents a function of risk whose form is unknown, with very high values rounded down to 100 and so the score may underestimate risk overall and the increasing ColonFlag gradient as the diagnosis approaches. There is potential bias in UK data (unlike in Israel), in that a FBC is, in the majority of instances, requested for a reason, to answer a clinical question rather than as a purely routine check. A further limitation in the cohort study was the need to exclude people censored prior to 24 months from baseline.

Comparison to the existing literature

A recently published systematic review has identified that the FBC can play a significant role in detection of CRC²². The FBC is a commonly used test both in general and hospital practice. It measures a number of cellular indices in the blood, and determines whether the person has anaemia (low haemoglobin). Hamilton et al reported not only that anaemia recorded in general practice was an independent risk factor for CRC, but also that iron deficiency, even in the absence of anaemia was also predictive²³. Iron deficiency can develop through slow loss of blood from a colonic polyp or cancer, and may become evident subtly through changes in the mean cell volume (MCV), mean cell haemoglobin and mean cell haemoglobin concentration. In addition, a raised platelet count (thrombocytosis) has been identified as a risk marker, particularly for colorectal and lung cancer²⁴. In one study, a third of cases with thrombocytosis and cancer had no recorded cancer symptoms²⁵.

A recent longitudinal study highlighted the potential for early detection of CRC through measuring changes in a number of FBC indices, potentially predating the onset of symptoms¹⁵. This study highlighted that FBC trends in cases diverge from controls around 4 years before a CRC diagnosis, which is consistent with this study, as we report trajectories in ColonFlag scores diverge between cases and controls 3–4 years before diagnosis.

Implications for research and practice

Our case-control analysis suggested a significant AUROC for the ColonFlag at 18–24 months prior to diagnosis, a time interval at which symptoms are no more evident in those with undiagnosed cancer than in controls, with the exception of rectal bleeding. This incidentally reinforces the need to investigate rectal bleeding, which we had expected to be a late development leading to rapid referral and diagnosis. We have also demonstrated an upward trend in the ColonFlag score that appears to diverge in cases compared with controls at a time interval of around 3–4 years prior to diagnosis, which clearly sits in the pre-symptomatic phase based on the symptom data we have presented.

It is evident through this study that much of the discriminatory performance of the ColonFlag draws on the age variable rather than the changes in FBC indices, with reduction in AUROC from 0.736 to 0.536 when age is eliminated through the age matched case-control design. Nevertheless, it is clearly appropriate to include age in such an algorithm, and our cohort analysis demonstrated at a ColonFlag score threshold of >99.8 a sensitivity of 10% and a positive predictive value (PPV) of 10% for CRC. In the presence of symptoms, this PPV increases slightly to 12.5%. These are easily sufficient to justify further investigation in patients with scores above this threshold based on current guidelines that recommend referral for cancer investigation ≥3% cancer risk²⁶. Such patients could be offered the non- invasive FIT test, prior to consideration of a colonoscopy.

The ability of the ColonFlag to detect risk in some cases in the presence of a ‘normal’ FBC emphasises the potential for machine-assisted pattern recognition, supporting clinicians either at the point of care or through off-line population searches. This is important, because ‘normal’ FBCs are not examined in any detail by busy clinicians filing results, and in some situations might be ‘batch filed’ without even being opened. Batch filing reduces to zero the already small chance that a human clinician will notice ‘within range’ changes of concern, so the help of an algorithm to do so would be very useful.

Conclusion

Given the widespread availability of coded FBC data in the NHS, and the relative paucity of symptom recording (much of which is free text and therefore invisible to a symptom based algorithm), this project suggests a role for the ColonFlag applied to large volume health data to identify individuals likely to benefit from further investigation to exclude CRC.

Consent

CPRD has ethical approval from the Health Research Authority to hold anonymised patient data and to support research using that data. CPRD’s approval of data access for individual research projects includes ethics approval and consent for those projects. Ethical approval was therefore covered for this study by the CPRD (protocol 14_195RMn2A2R).

Data availability

Underlying data

The datasets used in this study are available from the CPRD. The CPRD maintain access rights to the data to ensure it is only used for research purposes by trustworthy organisations, so sharing of data is prohibited. Checks are conducted on organisations carrying out and funding research to assess whether they are suitable to receive CPRD data. This is to ensure, as examples, that the data stays confidential, and it is only used for its approved purpose. An application to access the data can be made at https://cprd.com/data-access.

Faculty Opinions recommended

References

1. Cancer Research UK: Bowel Cancer Incidence. (Accessed 23.11.21). Reference Source
2. Cancer Research UK: Bowel Cancer Survival Statistics. (Accessed 23.11.21). Reference Source
3. Winawer SJ, Zauber AG, Ho MN, et al.: Prevention of Colorectal Cancer by Colonoscopic Polypectomy. N Engl J Med. 1993; 329(27): 1977–1981. PubMed Abstract | Publisher Full Text
4. Hewitson P, Glasziou P, Irwig L, et al.: Screening for colorectal cancer using the faecal occult blood test, Hemoccult. Cochrane Database Syst Rev. 2007; 2007(1): CD001216. PubMed Abstract | Publisher Full Text | Free Full Text
5. Logan RF, Patnick J, Nickerson C, et al.: Outcomes of the Bowel Cancer Screening Programme (BCSP) in England after the first 1 million tests. Gut. 2012; 61(10): 1439–1446. PubMed Abstract | Publisher Full Text | Free Full Text
6. Hardcastle JD, Chamberlain JO, Robinson MH, et al.: Randomised controlled trial of faecal-occult-blood screening for colorectal cancer. Lancet. 1996; 348(9040): 1472–7. PubMed Abstract | Publisher Full Text
7. Hol L, Wilschut JA, van Ballegooijan M, et al.: Screening for colorectal cancer: random comparison of guaiac and immunochemical faecal occult blood testing at different cut-off levels. Br J Cancer. 2009; 100(7): 1103–1110. PubMed Abstract | Publisher Full Text | Free Full Text
8. Bailey SER, Abel GA, Atkins A, et al.: Diagnostic performance of a faecal immunochemical test for patients with low-risk symptoms of colorectal cancer in primary care: an evaluation in the South West of England. Br J Cancer. 2021; 124(7): 1231–1236. PubMed Abstract | Publisher Full Text | Free Full Text
9. Allgar VA, Neal RD: Delays in the diagnosis of six cancers: analysis of data from the National Survey of NHS Patients: Cancer Br J Cancer. 2005; 92(11): 1959–70. PubMed Abstract | Publisher Full Text | Free Full Text
10. Hippisley-Cox J, Coupland C: Identifying patients with suspected colorectal cancer in primary care: derivation and validation of an algorithm. Br J Gen Pract. 2012; 62(594): e29–e37. PubMed Abstract | Publisher Full Text | Free Full Text
11. Medina-Lara A, Grigore B, Lewis R, et al.: Cancer diagnostic tools to aid decision-making in primary care: mixed-methods systematic reviews and cost-effectiveness analysis. Health Technol Assess. 2020; 24(66): 1–332. PubMed Abstract | Publisher Full Text | Free Full Text
12. Kinar Y, Kalkstein N, Akiva P, et al.: Development and validation of a predictive model for detection of colorectal cancer in primary care by analysis of complete blood counts: a binational retrospective study. J Am Med Inform Assoc. 2016; 23(5): 879–890. PubMed Abstract | Publisher Full Text | Free Full Text
13. Birks J, Bankhead C, Holt TA, et al.: Evaluation of a prediction model for colorectal cancer: retrospective analysis of 2.5 million patient records. Cancer Med. 2017; 6(10): 2453–2460. PubMed Abstract | Publisher Full Text | Free Full Text
14. https://www.cprd.com/ (Accessed 23.11.21)
15. Virdee PS, Patnick J, Watkinson P, et al.: Trends in the full blood count blood test and colorectal cancer detection: a longitudinal, case-control study of UK primary care patient data [version 1; peer review: 1 approved with reservations, 1 not approved]. NIHR Open Res. 2022; 2: 32. Publisher Full Text
16. Hamilton W, Lancashire R, Sharp D, et al.: The risk of colorectal cancer with symptoms at different ages and between the sexes: a case-control study. BMC Med. 2009; 7: 17. PubMed Abstract | Publisher Full Text | Free Full Text
17. Stapley SA, Rubin GP, Alsina D, et al.: Clinical features of bowel disease in patients aged <50 years in primary care: a large case-control study. Br J Gen Pract. 2017; 67(658): e336–e344. PubMed Abstract | Publisher Full Text | Free Full Text
18. Koo MM, von Wagner C, Abel GA, et al.: The nature and frequency of abdominal symptoms in cancer patients and their associations with time to help-seeking: evidence from a national audit of cancer diagnosis. J Public Health (Oxf). 2018; 40(3): e388–e395. PubMed Abstract | Publisher Full Text | Free Full Text
19. Hamilton W: The CAPER studies: five case-control studies aimed at identifying and quantifying the risk of cancer in symptomatic primary care patients. Br J Cancer. 2009; 101 Suppl 2(Suppl 2): S80–S86. PubMed Abstract | Publisher Full Text | Free Full Text
20. Skates SJ, Menon U, MacDonald N, et al.: Calculation of the risk of ovarian cancer from serial CA-125 values for preclinical detection in postmenopausal women. J Clin Oncol. 2003; 21(10 Suppl): 206s–210s. PubMed Abstract | Publisher Full Text
21. Skates SJ: Ovarian cancer screening: development of the risk of ovarian cancer algorithm (ROCA) and ROCA screening trials. Int J Gynecol Cancer. 2012; 22 Suppl 1(Suppl 1): S24–S26. PubMed Abstract | Publisher Full Text | Free Full Text
22. Virdee PS, Marian IR, Mansouri A, et al.: The Full Blood Count Blood Test for Colorectal Cancer Detection: A Systematic Review, Meta-Analysis, and Critical Appraisal. Cancers (Basel). 2020; 12(9): 2348. PubMed Abstract | Publisher Full Text | Free Full Text
23. Hamilton W, Lancashire R, Sharp D, et al.: The importance of anaemia in diagnosing colorectal cancer: a case-control study using electronic primary care records. Br J Cancer. 2008; 98(2): 323–327. PubMed Abstract | Publisher Full Text | Free Full Text
24. Bailey SE, Ukoumunne OC, Shephard E, et al.: How useful is thrombocytosis in predicting an underlying cancer in primary care? A systematic review. Fam Pract. 2017; 34(1): 4–10. PubMed Abstract | Publisher Full Text
25. Bailey SER, Ukoumunne OC, Shephard EA, et al.: Clinical relevance of thrombocytosis in primary care: a prospective cohort study of cancer incidence using English electronic medical records and cancer registry data. Br J Gen Pract. 2017; 67(659): e405–e413. PubMed Abstract | Publisher Full Text | Free Full Text
26. National Institute for Health and Care Excellence: NICE Guideline NG12. Suspected cancer: recognition and referral. 2015.

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 24 Jan 2023