Keywords
Gestational age, Intervention, Special educational needs, Trial emulation
One third of children in English primary schools have additional learning support called special educational needs (SEN) provision, but children born preterm are more likely to have SEN than those born at term. We aim to assess the impact of SEN provision on health and education outcomes in children grouped by gestational age at birth.
We will analyse linked administrative data for England using the Education and Child Health Insights from Linked Data (ECHILD) database. A target trial emulation approach will be used to specify data extraction from ECHILD, comparisons of interest and our analysis plan. Our target population is all children enrolled in year one of state-funded primary school in England who were born in an NHS hospital in England between 2003 and 2008, grouped by gestational age at birth (extremely preterm (24-<28 weeks), very preterm (28-<32 weeks), moderately preterm (32-<34 weeks), late preterm (34-<37 weeks) and full term (37-<42 weeks). The intervention of interest will comprise categories of SEN provision (including none) during year one (age five/six). The outcomes of interest are rates of unplanned hospital utilisation, educational attainment, and absences by the end of primary school education (year six, age 11). We will triangulate results from complementary estimation methods including the naïve estimator, multivariable regression, g-formula, inverse probability weighting, inverse probability weighting with regression adjustment and instrumental variables, along with a variety for a variety of causal contrasts (average treatment effect, overall, and on the treated/not treated).
We have existing research ethics approval for analyses of the ECHILD database described in this protocol. We will disseminate our findings to diverse audiences (academics, relevant government departments, service users and providers) through seminars, peer-reviewed publications, short briefing reports and infographics for non-academics (published on the study website).
One third of all children need extra help with learning in school, such as support from a teaching assistant. Children born preterm are more likely to need extra help compared to those born at term. In England, this help is called special educational needs (SEN) provision. The aim of this study is to find out whether special educational need provision affects education and health outcomes. We will use information collected by hospitals and schools for all children who were born in England between 2003 and 2008. We will compare those with who received and did not receive extra help in school who have a similar gestational age at birth.
Gestational age, Intervention, Special educational needs, Trial emulation
In the state-funded educational system in England, the system of reasonable adjustments to support children who experience difficulties learning is known as special educational needs (SEN) provision. The current version of SEN provision falls under two categories: SEN support and Educational and Health Care Plans (EHCPs) (Long & Danechi, 2023). SEN support provides classroom-based support, such as extra help from a teacher (or assistant) or access to special learning programmes. EHCPs provide support for pupils who require more support than is available through SEN support. Due to the funding and organisational streams of SEN provision, allocation of SEN provision has been changing over time, impacted by changes in legislation, school governance structure and local authority (Liu et al., 2020). SEN provision is provided more frequently to children with health problems associated with low academic attainment such as children born preterm (Alterman et al., 2021), with congenital anomalies, such as cleft lip and palate (Fitzsimons et al., 2018), or with congenital heart defects (Glinianaia et al., 2021). However, the potential impact of SEN provision on educational and health outcomes during primary school has not been evaluated.
Children who are born preterm (i.e. <37 weeks gestation) disproportionately experience long-term difficulties compared to their full-term peers, including lower educational outcomes (Libuy et al., 2023), higher burden of comorbidities (particularly in very premature births) (Mowitz et al., 2022) and higher contact with health services and emergency health services (Coathup et al., 2020) Increasing rates of SEN provision with earlier gestational age at birth in primary schools in England has been previously documented (Libuy et al., 2023). There are also descriptive publications showing increased hospital utilisation by gestation age (Coathup et al., 2020), and education performance by gestation age (Libuy et al., 2023). However, there is limited evidence on the impact of SEN provision on academic performance, school absences and hospital utilisation in pupils who need SEN provision.
We will emulate a pragmatic target trial study using linked administrative school and hospital records in the ECHILD database. We will separately analyse children grouped according to gestational at birth who, particularly in the most premature groups, have a similar need for SEN provision (Libuy et al., 2023). For each gestational age group, we will estimate the causal effect of SEN provision in year one of primary school on school attainment, school absences and rates of unplanned hospital admissions by the end of primary school (year six, age 10/11).
The emulated target trial aims to reduce risk of confounding and selection bias. Firstly, using known and presumed confounders of the relationship between SEN provision and our outcomes, we will evaluate the assumptions to be invoked for the estimation of causal links between them. In particular, the positivity assumption for the probability of receiving different categories of SEN provision (no SEN provision, SEN support in mainstream school, EHCP in mainstream school, special school attendance) within each gestational age group; that is that, for all combinations of covariates, there is a non-zero probability of recording each category of SEN provision. Secondly, for each gestational age group where the positivity assumption holds (Zhu et al., 2021), assuming there is no unmeasured confounding, we will estimate, and compare potential educational and health outcomes under differential treatment regimens (no SEN provision, SEN support in mainstream school, EHCP in mainstream school, special school attendance).
Prior to developing this protocol, independent meetings were conducted with stakeholders (parents, pupils, teachers) from existing patient advocacy groups including the Young Person’s Advisory Group (YPAG), Council for Disabled Children’s group (FLARE) and the Great Ormond Street National Children’s Bureau Families Research Advisory Group (FRAG). On 14 November 2020, FLARE were introduced the ECHILD dataset and it’s use of linked administrative data and to the observational study design with warm reception Further meetings were held with FLAREon the 18th of September 2021 and with YPAG for research at Great Ormond Street Hospital on the 27th of November 2021. This engagement identified that school entry is an important key milestone when SEN provisions are required. Therefore, in the proposed study, we have used school start as our entry point and will generate further target trials based upon further stakeholder engagement. The Great Ormond Street Hospital for Children’s NHS Foundation Trust Young People’s Forum voiced that school absences were an important topic for research on 20 March 2021. Therefore, using these interactions, we’ve created our research question, which was presented to the HOPE study steering committee, and includes parents of children with disabilities who will review and advise the on the presentation and dissemination of the study findings. Records and learnings from public engagements can be found here.
Trial emulation framework applied to observational educational data linked to healthcare data. Analyses will be conducted in the Office for National Statistics Secure Research Service using Stata 17 and R version 4.0.2 (open source, free software). Once written, the code for the study, including algorithms to identify the population, exposure, outcomes, and confounders, will be made publicly available on publication of the full manuscript.
We will use the ECHILD database, a pseudo-anonymised dataset that links Hospital Episode Statistics (HES) with the National Pupil Database (NPD). A linkage rate of 95% has been reported between NPD and HES in ECHILD, with high linkage rates attributed to a two-stage linkage process (Libuy et al., 2021).
In brief, the ECHILD's extract of NPD contains pupil-level data from state schools in England for academic terms between 2006 and 2020 (Mc Grath-Lone et al., 2022). This includes school, local authority, age, gender, ethnicity, first language, socioeconomic status, free school meal status, recorded absences, social care/children in need related data and SEN status. In addition to the NPD, school level characteristics such as school type (including special or mainstream), school rating, and governance are available through the Department for Education’s opensource ‘Get Information about Schools’ (GIAS) register, and linkable to ECHILD using the school’s unique reference number (GOV.UK, 2022).
The ECHILD’s extract of HES contains details on admitted patient care, outpatient appointments, accident and emergency utilisation, and critical care between 1997 until 2021. It contains details on admission and discharge dates, patient characteristics (e.g., sex, ethnicity, area of residence) and clinical information recorded during hospital admissions (such as, details of diagnoses and operations). HES covers 99% of public hospital activity in England (Herbert et al., 2017). HES also contains birth records which record characteristics such as gestational age, birthweight, maternal age; missingness in an individual’s birth record can be complemented using the corresponding mother’s delivery record. Furthermore, since 1998, HES records are also linked to ONS Mortality data covering information on mortality causes and timing of deaths.
Further details of the ECHILD dataset are documented by Mc Grath-Lone et al., 2022.
Our population is singleton children who were born in NHS-funded hospitals in England between 1 September 2003 and 31 August 2008 and were enrolled in year one of a state-funded primary school in England at age five/six years (see Figure 1). Children will be excluded if they do not have complete information on gestational age. Child will also be excluded from analyses of educational outcomes if they have missing data on the early years foundation stage profile (in reception, age four/five). We will also exclude children with a gestational age of <24 or >44 weeks or those with implausible gestational ages based on birthweight because of a high risk of misclassification. Comparisons of included and excluded children will help to inform whether there are issues of selection bias. This population was chosen as these children can be followed up to the end of primary school (year six, age 10/11) in ECHILD, with the latest academic year of follow up before the COVID-19 pandemic.
The study population will be followed-up from the January census in year one (age five/six) until the first chronological event of: end of primary school (year six, age 11 at exit), lost to follow-up or end of study (30th July 2019). Children will be considered lost to follow-up if they no longer appear in any NPD school census; this may be due to transfer to a non-government funded school or alternative provision, off-rolling (where pupils are illegally excluded from school) (Jay et al., 2022), emigration or death. We begin follow up in year one rather than reception (the first year of primary school in England) as it is the first full school year when education is compulsory for all children. We use the January census (rather than the October census) to allow for time for pupils to be assigned SEN provision.
We expect the impact of SEN to vary according to the child’s need for SEN, which is correlated with decreasing gestational age at birth (Libuy et al., 2023). We will therefore conduct all analyses separately for five subgroups, defined by completed weeks gestation at birth: extremely preterm (24-<28 weeks); very preterm (28-<32 weeks); moderately preterm (32-<34 weeks); late preterm (34-<37 weeks); full term (37 to <42 weeks) (ONS, 2015).
Our intervention consists of four categories of recorded SEN provision in the January census of year one of school: none; SEN support (previously known as School Action/School Action Plus) at mainstream school; EHCP (previously known as statement of SEN) at mainstream school; and special school attendance (where the vast majority of children have an EHCP). Whilst SEN provision can change throughout a child’s educational journey, our implementation of trial emulation focusses on an observational-analogue of intention-to-treat analysis (ITT) of SEN at the start of compulsory education. This analyses the assignment of treatment and not whether treatment was adhered to or provided. We choose the start of compulsory education as we believe this is a population in need of SEN provision from the start of their educational journey based upon prior evidence of educational (Libuy et al., 2023) and healthcare needs (Coathup et al., 2020).
We will evaluate both health and educational outcomes.
For health outcomes, we will evaluate unplanned hospital utilisation, consisting of the number of unplanned admissions to hospital (defined by the admission method in the first episode of care) and contacts with an accident and emergency departments between January of start of year one (age five/six) and at the end of year six (age 11) (Harron et al., 2018).
For educational outcomes, we will evaluate key stage two English and mathematics assessments (taken in Year six, at ages 10/11), including whether assessments are taken (yes or no) and, if taken, attainment in the assessments. To account for time-varying changes in recording of educational outcomes, we will use standardised scores within academic year.
We will also evaluate the number of absences during primary school (January year one to the end of year six) including unauthorised absences and absences related to illness and dental or medical appointments.
To account for determinants of SEN provision assignment in children with similar gestational ages, we will use information on covariates known or suspected to influence (or be associated with) SEN provision based upon prior literature (Coathup et al., 2020; Hutchinson, 2021; Libuy et al., 2023). Table 1 shows our preliminary list of sociodemographic, educational and health related covariates which are related to SEN provision and both educational and health related outcomes. We will use DAGitty version 3.0, an open-source piece of software create directed acyclic graphs (DAGs) to guide our selection of variable adjustment set to reduce the risk of unaccounted confounding, overadjustment and potentially mediating away any true effects.
Covariate Group | Covariate | Categories of measurement | Source |
---|---|---|---|
Clinical | Biological sex | Female Male Unknown (depending on numbers) | HES |
Major congenital anomaly | Presence of congenital anomaly (yes or no), based on the Hardelid UK chronic condition ICD-10 code list identified in infant hospital admissions up to age 2 (Hardelid et al., 2014) | HES | |
Prior unplanned hospitalisation usage before year one of school | Number of days in which a child is recorded as attending an accident and emergency department or admitted to hospital in an emergency adjusted for person-time | HES | |
Education | Early years foundations stage profile (English and mathematics score) | Standardised z-score for English and mathematics within academic year | NPD |
School Governance Type | Local authority managed Academy Other | GIAS | |
School Type | Mainstream Special Alternative Provision Pupil Referal Unit | GIAS | |
Pupil Teacher Ratio | Ratio depicting the number of pupils per teacher in the school | GIAS | |
Socio- demographic | Child's ethnic group | Asian, Black, Mixed or multiple ethnic groups, White, other | NPD |
Maternal age at birth | Continuous values between 10–60. We will censor ages below 10 and above 60 because of a high risk of misclassification | HES | |
Free school meal | Eligible for free school meals Not eligible for free school meals | NPD | |
Month of birth | January to December | HES and NPD must match | |
Deprivation at birth | IMD deciles | HES | |
Deprivation at start of school | IDACI quintiles | NPD | |
English as a first language | Recorded as English Not recorded as English Unknown | NPD |
To reduce confounding and other sources of bias impacting data collected outside of a randomised controlled trial setting, we will adopt the Target Trial Emulation (TTE) framework (Hernán et al., 2022). TTE maps observational data to a hypothetical target experimental trial counterpart by creating the specification of an ideal (pragmatic) trial and using this as a basis to shape the observational study design. TTE consists of firstly, defining the specifications of a hypothetical, ideal experimental trial of the causal question of interest (including the corresponding causal contrast), secondly, emulating the specifications of the ideal target trial using observational data and thirdly, estimating the effects of interest using the emulated trial data. The first component of TTE includes defining an inclusion/exclusion criterion on entry, a treatment strategy (including time of assignment and entry), follow-up frequency and modality, outcome measures, causal contrasts of interest and the analytical estimation methods for an ideal trial. Using the second component of TTE, observational data are wrangled to emulate the distribution of the data if it were to have been gathered prospectively in the ideal trial. Finally, the third component of TTE requires using methods to adjust for known and suspected confounding. In Table 2, we describe the ideal target trial that would be designed to investigate the causal effect of SEN provision (by the upcoming January Census in the first year of compulsory education) on the relevant outcomes and the equivalent emulated trial to be generated from ECHILD.
Protocol component | Target trial specification | Emulation study | Potential challenges and possible solutions |
---|---|---|---|
Eligibility criteria | Born in England between 1 September 2003 and 31 August 2008. Started year one in a state-funded mainstream or special school in England between 2009/10 and 2013/14. Taken part in the EYFSP assessments before year one. | Born in England between 2003 and 2007 with gestational age recorded in birth/delivery record. Linked HES-NPD records. Recorded start of year one between 2009/10 and 2013/14 in a mainstream or special school in any termly School Census of the National Pupil Database. EYFSP assessment is recorded. | Based upon prior experience of using these administrative data, we expect some children appear in year one twice - these will be removed due to uncertainties about the reliability of these data; not all pupils have EYSFP and teacher strikes are expected to reduce key stage 2 assessment availability - missingness patterns will be examined and when a MAR assumption is defensible, multiple imputation will be performed and then the potential selection bias of an incorrect MAR assumption evaluated in sensitivity analyses |
Study design | Randomised controlled trial | Trial emulation framework applied to linked observational hospital-school data | Potential residual or uncontrolled confounding by indication |
Data structure | Prospective data collection as part of the randomised controlled trial | Retrospective wrangling of administrative data leading to prospective information | Missingness patterns will be examined and when a MAR assumption is defensible, multiple imputation will be performed and then the potential selection bias of an incorrect MAR assumption evaluated in sensitivity analyses |
Outcome | • Key stage 2 assessments • School absences (unauthorised and health related) • Unplanned hospital utilisation | • Key stage 2 assessments • School absences (unauthorised and health related) • Unplanned hospital utilisation | Teacher strikes are expected to reduce key stage 2 assessment availability – missingness patterns will be examined as outlined above |
Treatments to be compared | Categories of SEN provision: none, SEN support in mainstream school, EHCP in mainstream school, special school attendance | Categories of SEN provision where there exists pairwise common support: none, SEN support in mainstream school, EHCP in mainstream school and special school attendance | As there may be a delay in applying for SEN provision; we will consider a sensitivity analysis where our treatment assignment will be by year two instead of year one |
Causal contrasts | Intention to treat for SEN provision assignment in the first full year of compulsory education (year one, age five on entry), with none as the reference category | Observational analogue of the intention to treat for SEN provision as recorded in the first full year of compulsory education (year one, age five on entry). Additionally, the average treatment effect in the treated; the average treatment effect in the non- treated (see definitions in Table 4) | |
Analysis plan | Logistic and linear regression for key stage 2 results Poisson (or negative binomial) regression models, as appropriate for school absences and unplanned hospital utilisation | Educational outcomes - logistic and linear regression modelling with appropriate control for confounding adjustment and standardisation (such as regression adjustment and standardisation, propensity score-based methods). Clustering by school and/or local authority to be dealt with using either mixed effects models or robust inference (e.g., generalised estimating equations). Health outcomes - Poisson (or negative binomial) regression models with appropriate control for confounding, followed by standardisation. |
Data wrangling. Based upon the proportion and mechanisms of missingness in the data, we will first use future recordings to complement missing baseline covariates such as gender; secondly, we will complement non-missing data in HES and NPD prior to data imputation; for example, using sex variable from HES to complement missing values in the NPD variable gender (Azur et al., 2011).
Exploratory Analysis. We will first analyse the feasibility counts of the ECHILD data, including gestational age subgroups, the distribution of variables including our exposure (SEN provision) and confounders (Table 1). This will include assessing the feasibility of including children attending alternative provision (including pupil referral units) in our eligibility criteria and follow up; these groups are assumed to have small numbers and hence, their inclusion, may pose violations the positivity assumption.
To understand whether there are violations of the positivity assumption (i.e., whether pupils who are recorded to be requiring different categories of SEN provision are comparable), we will calculate and compare the propensity score distributions for each SEN category within each gestational age group. We will compare the density distribution between each pairwise of groups (Rassen et al., 2013), for example, none versus SEN support in mainstream school, SEN support versus EHCP in mainstream school, none versus EHCP in mainstream and so on. Propensity scores for each SEN provision category will be estimated using logistic regression; to assess their robustness, binary machine learning predictors of each SEN provision category, such as tree-based algorithms, will be used and the resulting propensity scores compared to those obtained when using logistic regression (Lee Brian et al., 2009).
Causal inference. Our causal analyses will be conducted for pairs of interventions where the causal assumptions of non-interference, consistency, positivity, and conditional exchangeability are assumed to hold (Hernán, 2012) (see Table 3). For health outcomes and school absences (which are count data) and educational outcomes (which are continuous variables), we aim to triangulate results from three groups of methods: methods traditionally used in epidemiology, methods that rely on the no-unmeasured confounders assumption and, if possible, methods that exploit instrumental variables or difference in difference methods.
Our first group of methods will implement the naïve and adjusted estimators using general linear models as part of our traditional epidemiological estimates including Poisson based link functions (with the logarithm follow-up time as an offset) for counts of individual health outcomes and absences, and linear link functions for individual educational scores (Arnold et al., 2021). The second group of methods includes outcome-based methods which rely on the no-unmeasured confounding assumption and expand on traditional epidemiological methods by focussing on marginalising results over the population using models such as the parametric g-formula, inverse probability weighting, and inverse probability weighting using regression adjustment (Smith et al., 2022). With these methods inference will be based upon bootstrapping. For both health and educational outcomes, we will calculate and compare the following causal contrasts: observational analogue of the ITT, the overall average treatment effect, the average treatment effect in the treated and the average treatment effect the not treated (see Table 4 for definitions).
The third group of methods includes instrumental variable and difference-in-difference methods and are only suitable if instruments for SEN provision are identified, for example if there are policy changes in provision that are implemented at different times across local authorities (Greenland, 2018). These would lead to estimate (under the assumption of individual homogeneity of effects) the observational analogue of the ITT. Related to these are difference-in-difference based methods that to estimate group differences against predicted trajectories between different groups of recorded SEN provision, leading to estimating the ATT (Richardson et al., 2023). See Table 4 for the research these causal contrasts are addressing.
Missing data. To deal with missing covariate values (there are no missing exposure data by design) we will use Imputation using Chained Equations (ICE) as part of the bootstrap-based estimation of confidence intervals of point estimates, we will use in each replicant as part of bootstrap imputation (Schomaker & Heumann, 2018). All variables will be used to predict missing data including the exposure and the outcome, and any other variables assumed to be informative of the missing values (Azur et al., 2011).
Sensitivity analyses. We aim to conduct a series of sensitivity analyses to estimate the robustness of our results. Firstly, we will adjust our assignment of recorded SEN provision from year one to year two to account for the administrative time it takes for parents/carers to apply for SEN provision. One of our criteria is that pupils must have data on their EYFSP school readiness tests as this is a major confounding variable; this may restrict our population to those able to take the test. Hence, to account for this non-participation, we will use a missingness indicator to capture the information held in missing the test and avoid excluding those without a record (Groenwold et al., 2012). Furthermore, we suspect there maybe missingness in outcome data, particularly for key stage two scores based upon prior knowledge of systematic teacher strikes; in such cases we will use imputation to estimate these key stage two outcomes using year of testing in the imputation model. Finally, we propose analysing the correlation between recorded child sex (reported by physician in HES) and gender (submitted by parent/carer during school registration in NPD). To understand the validity of our models, we will produce a table of how using either variable impacts our point estimates of the intervention variable only.
Permissions to use de-identified data and linked from Hospital Episode Statistics and the National Pupil Database were granted by DfE (DR200604.02B) and NHS Digital (DARS-NIC-381972); consent from patients is not required for HES as the data provided by NHS Digital is pseudo-anonymised and reduces identifiability to researchers; further information on opting out of Hospital Episode Statistics for secondary usage can be found here. Ethical approval for the ECHILD project was granted by the National Research Ethics Service (17/LO/1494), NHS Health Research Authority Research Ethics Committee (20/EE/0180) and UCL Great Ormond Street Institute of Child Health’s Joint Research and Development Office (20PE06).
We gratefully acknowledge all children and families whose de-identified data are used in this research. We would like to acknowledge the contribution of the wider HOPE study team to this work: Sarah Barnes, Kate Boddy, Kristine Black-Hawkins, Lorraine Dearden, Tamsin Ford, Katie Harron, Lucy Karwatowska, Matthew Lilliman, Stuart Logan, Jacob Matthews, Jugnoo Rahi, Jennifer Saxton, Isaac Winterburn and Ania Zylbersztejn. We thank Ruth Blackburn, Matthew Jay, Farzan Ramzan, and Antony Stone for ECHILD database support.
Is the rationale for, and objectives of, the study clearly described?
Yes
Is the study design appropriate for the research question?
Yes
Are sufficient details of the methods provided to allow replication by others?
Yes
Are the datasets clearly presented in a useable and accessible format?
Not applicable
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Biostatistics with focus neonatology, cardiology, and educational outcomes analyses.
Is the rationale for, and objectives of, the study clearly described?
Yes
Is the study design appropriate for the research question?
Yes
Are sufficient details of the methods provided to allow replication by others?
Partly
Are the datasets clearly presented in a useable and accessible format?
Not applicable
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Maternal and child health, epidemiology, adolescent health, nutrition
Is the rationale for, and objectives of, the study clearly described?
Partly
Is the study design appropriate for the research question?
Yes
Are sufficient details of the methods provided to allow replication by others?
Partly
Are the datasets clearly presented in a useable and accessible format?
Yes
References
1. ACOG Committee Opinion No 579: Definition of term pregnancy.Obstet Gynecol. 2013; 122 (5): 1139-1140 PubMed Abstract | Publisher Full TextCompeting Interests: No competing interests were disclosed.
Reviewer Expertise: Perinatal epidemiologist
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | |||
---|---|---|---|
1 | 2 | 3 | |
Version 1 21 Nov 23 |
read | read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Register with NIHR Open Research
Already registered? Sign in
If you are a previous or current NIHR award holder, sign up for information about developments, publishing and publications from NIHR Open Research.
We'll keep you updated on any major new updates to NIHR Open Research
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)