Early special educational needs provision and its impact on unplanned hospital utilisation and school absences in children with isolated cleft lip and/or palate: a demonstration target trial emulation study protocol using ECHILD

Background Special educational needs (SEN) provision is designed to help pupils with additional educational, behavioural or health needs; for example, pupils with cleft lip and/or palate may be offered SEN provision to improve their speech and language skills. Our aim is to contribute to the literature and assess the impact of SEN provision on health and educational outcomes for a well-defined population. Methods We will use the ECHILD database, which links educational and health records across England. Our target population consists of children identified within ECHILD to have a specific congenital anomaly: isolated cleft lip and/or palate. We will apply a trial emulation framework to reduce biases in design and analysis of observational data to investigate the causal impact of SEN provision (including none) by the start of compulsory education (Year One – age five year on entry) on the number of unplanned hospital utilisation and school absences by the end of primary education (Year Six – age ten/eleven). We will use propensity score-based estimators (inverse probability weighting (IPW) and IPW regression adjustment IPW) to compare categories of SEN provision in terms of these outcomes and to triangulate results obtained using complementary estimation methods (Naïve estimator, multivariable regression, parametric g-formula, and if possible, instrumental variables), targeting a variety of causal contrasts (average treatment effect/in the treated/in the not treated) of SEN provision. Conclusions This study will evaluate the impact of reasonable adjustments at the start of compulsory education on health and educational outcomes in the isolated cleft lip and palate population by triangulating complementary methods under a target-trial framework.

absences by the end of primary education (Year Six -age ten/eleven).We will use propensity score-based estimators (inverse probability weighting (IPW) and IPW regression adjustment IPW) to compare categories of SEN provision in terms of these outcomes and to triangulate results obtained using complementary estimation methods (Naïve estimator, multivariable regression, parametric gformula, and if possible, instrumental variables), targeting a variety of causal contrasts (average treatment effect/in the treated/in the not treated) of SEN provision.

Conclusions
This study will evaluate the impact of reasonable adjustments at the start of compulsory education on health and educational outcomes in the isolated cleft lip and palate population by triangulating complementary methods under a target-trial framework.

Introduction
Special educational need (SEN) provision offers reasonable adjustments in children and young people (CYP) in an educational environment who need additional health, academic, or behavioural support.This includes children with complex health requirements or learning difficulties.SEN provision offers support to those with needs using a variety of facilities.SEN provision in the English educational setting is divided into two main categories: SEN Support (previously known as Action, Action Plus or non-Statemented SEN) and Education and Health Care Plan (EHCP, this is previously known as a statement of SEN) (Timpson & Department for Education, 2014).SEN Support is organised by the educational environment (e.g., school or college) and provides access to children and young people in need of SEN provision, with support that may include teaching assistants who aid in communications, specialised adapted learning programmes and supporting physical needs.An EHCP is funded by local authorities for children and young people who require further adjustments and often require additional health specific resources (compared to SEN Support) to aid in education, health, and social care needs.Due to the funding and organisational streams of SEN provision, allocation of SEN provision varies over time and location, which can be impacted by changes in legislation, school governance structure and local authority (Liu et al., 2020).
Currently, there is limited research on the impact of SEN provision on academic and healthcare outcomes in populations who have a need for SEN provision.To estimate the causal effect of SEN provision on outcomes, randomised controlled trials (RCT) would have to be conducted, however such study designs are not always feasible due to the human, time, financial and ethical costs associated.As SEN provision is universally available in primary education environments (schools teaching pupils aged five to eleven years) in England, conducting an RCT would be unfeasible and possibly unethical for certain groups of children.In lieu of RCTs, observational studies provide a pragmatic, data-driven, observational alternative when trials are not possible.One major challenge with using observational data when compared to data collected from RCTs is the risk of confounding, particularly, confounding by indication whereby assignment to treatment is not random and is often related to the severity of a medical condition (Salas et al., 1999).However, attentive study design can mitigate such biases in observational data by emulating the protocol of an equivalent RCT (Hernán & Robins, 2016).
An observational dataset that can be used to estimate the causal effect of recorded SEN provision on healthcare outcomes is the ECHILD dataset (Mc Grath-Lone et al., 2022).The ECHILD dataset is the first dataset in England of linked academic data (National Pupil Database -NPD) with secondary care hospital data (Hospital Episode Statistics -HES) for all pupils educated in state funded schools, and has been used to investigate the associations between health, education and social care (Mc Grath-Lone et al., 2022).Therefore, the ECHILD dataset provides the opportunity to conduct observational studies with long-term follow-up with current follow-up being up to age 25 years (from birth in 1995 until hospitalisations in 2020).With data on clinical diagnoses, social care contacts, hospital visits, academic attainment, school absences and SEN provisions in school, the ECHILD dataset enables adjustment for different confounders in populations at risk and to focus on specific populations at risk, identifiable from the linked hospital data, for example using phenotypes such as children with major congenital anomalies (e.g., Down Syndrome), cerebral palsy, developmental disorder of scholastic skills, epilepsy, diabetes, and premature birth.Education data in ECHILD include provision of support for SEN, free school meal status and measures of socioeconomic deprivation, as well as national examination results (at multiple key stages) and absence and exclusions from school, while health related outcomes such as (specific types of) hospitalisations are also available via linkage to hospital records.
In this study protocol, we describe how we aim to use the ECHILD dataset to design a study that appropriately emulates an RCT to address causal question of the impact of receiving alternative categories of SEN provision (including none) on unplanned hospital utilisation and school absences in children who were born with cleft lip and/or palate (CLP).Incidences of CLP are identified in 900 new-borns in England yearly and impact communications (hearing and speech), dental health (Gallagher & Collett, 2019) and psychosocial health.CLP is associated with reduced academic attainment (Fitzsimons et al., 2018) and has been linked to a three-fold increase in hospitalisations when compared to those without CLP for all ages (Bell et al., 2016).Previously observational evidence has suggested that extra support at the beginning of compulsory education may benefit academic outcomes of children with CLP (Fitzsimons et al., 2018).Whilst previous literature hypothesised the impact of SEN provision on academic outcomes (Fitzsimons et al., 2018), there is limited to no literature assessing the impact of SEN provision on unplanned hospital utilisation and school absences.

Ethics and dissemination
Research ethics committees have approved the use of the ECHILD database; access to the ECHILD database is approved by the ECHILD team, who are contactable at ich.echild@ ucl.ac.uk for proposals for projects using ECHILD.Stake holder groups consisting of focus groups of young people, parents and service providers will help us frame the research question, interpret, and communicate our findings to policy makers, health and education services and families to promote translation of our findings into practice.

Stakeholder involvement
Prior to developing this protocol, two independent meetings were conducted with stakeholders (parents, pupils, teachers) to understand which medical conditions are of interest and which entry timepoints are important for child development.The first meeting was with the Department for Education's national young SEN advisory group (FLARE) on the 18 of September 2021 and the second with the Young Persons Advisory Group for research at Great Ormond Street Hospital on the 27 of November 2021.This engagement identified that school entry is an important key milestone when SEN provisions are required.Therefore, in the proposed study, we have used school start as our entry point and will generate further target trials based upon further patient engagement.

Study design and setting
The study will be an observational study based on data from the ECHILD dataset previously described in (Mc Grath-Lone et al., 2022) which includes individuals born between 1 September 1995 and 31 August 2020 in England.To reduce confounding-by-indication and other forms of biases when using observational data, we will use a target trial framework to guide eligibility, entry, and an appropriate follow-up period (Hernán et al., 2022).Analyses will be conducted in the Office for National Statistics Secure Research Service using Stata ver.17 (proprietary, StataCorp) and R ver.4.0.2(open source, R Foundation) and the code for the study will be made available via GitHub.

Dataset and linkage
The data source we propose is the ECHILD database, a pseudo-anonymised dataset that links the National Pupil Database (NPD) with Hospital Episode Statistics (HES).In brief, the ECHILD's extract of the NPD contains data from academic terms (Summer, Autumn, and Spring) between 2006 and 2020 and contains information on (but not limited to) school, local authority, year/month of birth, gender, ethnicity, first language, socioeconomic status, free school meal status, absence related data, social care/children in need related data and SEN provision.The ECHILD's extract of HES contains details on accident and emergency attendance, admitted patient care, critical care, and outpatient appointments between 1997 until 2021.It contains details on birth admissions, sex recorded by physician, ethnicity, clinical information recorded during hospital admissions, including details of diagnoses, and operations.HES covers 99% of public hospital activity in England (Herbert et al., 2017).HES records since 1998 are also linked to ONS Mortality data covering information on causes and timing of deaths.The linkage coverage periods are described in Mc Grath-Lone et al. (2022).ECHILD has been shown to have a linkage rate between NPD and HES of 95%; the high linkage rates are attributable through a two-stage linkage process (Libuy et al., 2021).
Full methodology of creating the ECHILD dataset is described by Mc Grath-Lone et al., 2022.Enquiries to access to the ECHILD dataset is obtainable by contacting ich.echild@ucl.ac.uk; all researchers accessing the ECHILD dataset will need to be an Office of National Statistics (ONS) accredited researcher.Access to the ECHID dataset will be made through the ONS Secure Research Service, a trusted research environment which requires the researcher's institution to have Assured Organizational Connectivity, Population Our population consists of children with isolated CLP recorded in hospital records and followed from Year One of school (the first full year of compulsory education, where pupils are five years old on the first day) between academic years 2008/2009and 2018/2019(i.e., born between 1 September 2003and 31 August 2013).This period was chosen as it contains information of school readiness tests (known as Early Years Foundation Profile -a known good predictor of SEN provision) and 2018/19 was the last entry-into-school academic year prior to the COVID-19 pandemic where access to hospitals and provision of education vastly changed.To identify pupils who started Year One between 2008/2009 and 2018/2019, the earliest recording of "1" from the "NCActualYear" (National Curriculum Actual Year) variable in the NPD dataset will be used; for children whose "NCActualYear" variable is marked as empty or X, we will use "AgePartAtStartOfAcademicYear" equal to 5.
To identify pupils with isolated CLP, International Classification of Diseases version 10 (ICD-10) codes will be applied to primary and secondary HES diagnoses in any hospital admission prior to the start of compulsory education using the following codes: Q35*, Q36* and Q37*.For each pupil, the earliest recorded date in HES will be considered the "diagnosis" date.Pupils whose first recording of CLP in HES is after their first year in school will not be included as SEN allocation needs to proceed diagnosis to avoid reverse causality.Children born with further major congenital anomalies, will be identified using the EUROCAT code list ('EUROCAT Guide 1.5 Chapter 3.3', 2023) and excluded using the ICD-10 codes listed in Table 1 to reduce competing needs for SEN provision.The EUROCAT code list was used as it captures major congenital anomalies and not minor anomalies.
We will also restrict our analyses to those born in an NHS funded English hospital due to the importance of birth characteristics such as maternal age, birth weight and gestational age.Additionally, as congenital anomalies are disproportionately recorded in those born in hospital, we have further reasons to restrict our population to those with a birth record in HES to avoid misclassification of congenital anomalies.

Intervention
The intervention will be defined by the categories of recorded SEN provision (including none) in Year one of school (ages five or six).Whilst SEN provision can change throughout a CYP's educational journey, our implementation of trial emulation focusses on an intention-to-treat analysis (ITT) with the intervention defined at the start of compulsory education.Therefore, we are analysing the initial assignment of treatment and not whether treatment was adhered to or provided.We choose the start of compulsory education as we believe this is a population in need of SEN provision at the start of their educational journey based upon prior evidence of educational (Fitzsimons et al., 2018) and healthcare needs.
To capture differences in type of SEN provision due to severity of CLP, along with other confounders (see covariates section), we aim to classify our exposure variable as "categories of SEN provision" (None, SEN, EHCP) as opposed to a binary outcome (i.e., SEN vs no SEN).To establish SEN status at Year One, we will use the January (Spring) census in Year One of school due to funding being calculated using these censuses.See Table 2 for a list of variables describing SEN in the NPD.See Statistical Analysis section for more detail on analysing comparison groups.
Figure 1 shows our planned Consolidated Standards of Reporting Trials (CONSORT) flow diagram for identifying the population and classifying it according to the exposure variables (categories of recorded SEN provision).

Follow-up
The study population will be followed-up from the initial January census in Year One (to account for time to apply for SEN) until the end of primary school (end of July of Year Six), lost to follow-up, or end of study, whichever occurred first.Children will be considered lost to follow-up if they no longer appear in any NPD school census during primary education; this could be due to a variety of reasons including, transfer to a non-state-funded school, emigration, death, or off-rolling (Jay et al., 2022).
Although ECHILD can be used to follow up children beyond the end of primary school for some academic cohorts, we will limit our follow-up period in this protocol to Year 6 for two reasons: firstly, entering secondary (or middle) education will re-evaluate the need for SEN, and many pupils may no longer be offered SEN provision, and secondly, the time between the assignment of SEN considered here and the outcome may be too long if beyond Year Six, with the outcome affected by many intermediate factors (pupils starting Year One in 2008, would have 13 years of follow-up in HES).

Outcome variables
The outcomes of interest include both unplanned hospital usage and school absences due to the health needs for SEN provision and the intervention being delivered in an educational setting.
Unplanned hospital usage will be measured in days between January of Year One (recorded allocation of SEN provision) and end of follow-up.To identify unplanned hospital usage in HES Admitted Patient Care, we will use the "admission method" variable of the first episode of each admission in HES (admimeth) (Table 3 for the case definition).For hospital utilisation that did not require an overnight admission, we will use the HES Accident and Emergency dataset to account for non-admitted unplanned hospital utilisation (Harron et al., 2018).We aim to combine the "Admitted Patient Care" and "Accident and Emergency" datasets to create a timeline of unplanned hospitalisation between the January census in Year One and the end of Year Six.When an unplanned admission and recording in accident and emergency occurs on the same day, we will only count this as one day, for example when the pupil is initially presented in accident and emergency and is then admitted on the same day.
Absences in the NPD are recorded termly as sessions, corresponding to half-days in school; the total number of potential sessions in a term is also provided.School absences will be measured as sessions between January of Year One and the end of follow up (maximum: end of Year Six).As the population of interest is based upon health needs, our school absences will include those related to medical need.See Table 4 for the case definition of medically related absences during

Exclude
school.In addition to medically-related absences, we will separately evaluate the impact of SEN on unauthorised absences; such evaluations are needed as absenteeism is related to poor academic performance (Allen et al., 2018).
We are not planning to study academic performance by the end of primary education as an additional outcome.This is motivated by the limited number of children in our population who would have recorded Key Stage Two academic outcomes (only those born between [2003][2004][2005][2006][2007][2008].With an estimated prevalence rate of 900 CLP births per year (CRANE, 2021), of whom an estimated 62% are expected to have isolated CLP (Fitzsimons et al., 2023), we would have very small groups of children receiving (some of) the interventions.We will re-evaluate the suitability of evaluating the impact of SEN provision on academic outcomes and include this in our analyses according to the numbers with Key Stage Two data.

Covariates
To account for non-random SEN provision assignment, we will use information on several covariates that are known or suspected to be associated with SEN provision and hospital contacts based upon prior literature.These include CLP specific influences, further clinical/birth, socio-demographic, geographical, and educational influences.See Table 5 for a list of potential confounders from relevant literature.The distribution of these potential confounders by exposure status will be examined (see Table 5 for an outline) and directed acyclic graphs representing the assumed relationships among these variables, SEN exposure, and the outcome of interest will be drawn  to identify the variables that will be controlled for to estimate the causal effect of the interventions, using the opensource software, DAGitty ver 3.0.Specifically, these covariates, measured at birth or before or at the start of Year One, include: cleft severity (based upon prior literature -(Fitzsimons et al., 2018) -see Table 1 for ICD10 codes to differentiate cleft severity), comorbidities (categories based upon prior literature (Hardelid et al., 2014)), gestational age, maternal age, prior hospital contact (unplanned, and outpatient contacts), gender, ethnicity (latest recorded in NPD to reduce missingness), English as a first language, Income Deprivation Affecting Children Index (IDACI) quintile, free school meal eligibility, month of birth, academic cohort (to account for changes in policy over time) and standardised school ready assessments (Early Years Foundation School Profile).Additional school-level variables we aim to include is the proportion of children in the school the child attends in Year One who were recorded as receiving SEN support/EHCP in the previous academic year, and current pupil teacher ratio.

Biases
To reduce confounding and other sources of bias affecting observational data, we will adopt a Target Trial Emulation (TTE) framework (Hernán et al., 2022).TTE enables observational data to be mapped to a hypothetical target experimental trial counterpart by creating the specification of an ideal (pragmatic) trial and using this as a basis to shape the observational study design.TTE consists of one, defining the specifications of a hypothetical target experimental trial of the causal question of interest (including the corresponding effect), two, emulating the specifications of the ideal target trial using observational data and three, estimating the effects of interest using the emulated trial data.The first component of TTE involves defining inclusion/exclusion criteria for entry, a treatment strategy (including time of treatment assignment), follow-up frequency and modality, outcome measures, causal contrasts of interest (estimands) and estimation methods.The second component of TTE involves handling the observational data to emulate the structure of the data that would be gathered in the specified target trial.Finally, the third component of TTE concerns dealing with the inevitable confounding that affects observational data and explicitly outlining the analytical methodology ahead of the data wrangling.In Table 6 we describe the (ideal) target trial one would design to investigate the causal effect of SEN provision on the selected health and educational outcomes in the first year of compulsory education on CLP children and the equivalent emulated trial to be generated from ECHILD.
The estimands we will target are firstly, the average treatment effect (or average causal effect): this is a causal contrast of average potential outcomes for the whole isolated CLP

Outcome(s)
Unplanned hospital utilisation as defined by days in AE or APC Medical related absences as defined using half-day sessions.
Unauthorised absences as defined using half day sessions Unplanned hospital utilisation as defined by days in AE or APC Medical related absences as defined using half-day sessions.
Unauthorised absences as defined using half day sessions

Interventions to be compared
One of three categories of SEN (none, SEN, EHCP) to be delivered following randomization (between start of reception and end of Year One One of three categories of SEN (none, SEN, EHCP) as recorded by the January census in Year One

Causal contrasts
The average treatment effect of initiating SEN versus non-initiating SEN at all by Year One on the number of unplanned hospital days expressed as a rate ratio.
The average treatment effect of initiating EHCP versus initiating SEN by Year One on the number of unplanned hospital days expressed as a rate ratio.
The average treatment effect of recording SEN versus noninitiating SEN at all by Year One on the number of unplanned hospital days expressed as a rate ratio.
The average treatment effect of recording EHCP versus recording SEN by Year One on the number of unplanned hospital days expressed as a rate ratio.
These estimands will be defined for the whole population and also for the sub-populations of "treated" and "untreated" children, that is the children who were (or were not) recorded to receive the relevant intervention.

Analysis plan
Poisson or Negative Binomial Regression (depending on the degree of overdispersion) of the number of events accountings for duration of follow-up.
Clustering by school and/or local authority to be dealt with using either mixed effects models or robust inference (e.g., GEE).
Appropriate methods for confounding adjustment (such as regression adjustment and standardisation, or propensity score-based methods) involving Poisson or Negative Binomial Regression (depending on the degree of overdispersion) of the number of events accountings for duration of follow-up.
Clustering by school and/or local authority to be dealt with using either mixed effects models or robust inference (e.g., GEE).
population, secondly, the average treatment effect in the treated: this is a causal contrast restricted to the "treated", i.e., those that received SEN and finally, the average treatment effect in the not treated; the causal contrast in those who did not receive SEN in Year One (Wang et al., 2017).

Analysis
Explorative analyses.To estimate the representativeness and external validity of the derived cohort, we will compare the following distributions against existing literature, firstly, the rates of CLP children who start Year One between 2008/09 and 2018/19 (CRANE, 2021) and secondly, previously published rates such as school academic attainment (Fitzsimons et al., 2018).
To understand whether pupils who are recorded to have received different categories of SEN provision had the chance to be recorded with another category and therefore for the intervention levels to be comparable using the available data (i.e. to assess whether the positivity assumption could be invoked when performing casual inference (Zhu et al., 2021)), we will examine the distribution of the propensity scores for the recording of SEN support/EHCP across the subgroups of children defined by their observed characteristics.The propensity scores will be predicted using logistic regression, with the covariates mentioned above included as predictors.As there are three categories of SEN (None, SEN Support and EHCP), pairwise propensity score comparisons (Rassen et al., 2013) will be evaluated for common support between: None vs SEN Support, None vs EHCP and SEN Support vs EHCP.The robustness of the selected propensity score model would be assessed by triangulating the predicted scores with those derived using machine learning methods (such as Classification and Regression Trees) (Lee Brian et al., 2010).
Causal inference.Once explanatory analyses have been completed, causal inference will be conducted for pairs of interventions which have common support.To account for the distribution of our outcome, that is number of unplanned hospital contacts and number of absence sessions, we will use Poisson (or negative binomial) regression models.To account for differential follow-up time, we will use the logarithm of one of the following as offsets: days between January of Year One and end of follow-up for hospitalisation usage and total number of full sessions between January of Year One and end of follow-up for school absences.The likely clustering of pupils within local authority will be addressed either by fitting mixed effects models or by using robust inference (or both).
We will triangulate complementary estimation methods to address confounding bias due to non-random assignment of SEN provision.We will compare results obtained assuming no-unmeasured confounding (that is, we have data on all the relevant confounders) and assuming instead that we have an instrumental variable (if there is for example variation in SEN provision by local authorities).Our analyses will involve three general approaches.
The first approach will include "traditional" epidemiological methods such as reporting of crude associations ("naïve estimator") between intervention and outcome, and of conditional associations obtained by fitting appropriate regression models (Bender, 2009); due to the conditional nature this method, the evaluation of these effects cannot evaluate our causal contracts of interest, such as the average effect on the whole population.
The second approach will involve dealing with measured confounding using outcome-based models (Smith et al., 2022), such as the parametric g-formula, inverse probability weighting of marginal structural models, and inverse probability weighting regression adjustment.Confidence intervals for these models will be estimated using bootstrapping (1000 replicants).These estimation methods target our estimands of interest, including the average treatment effect, average treatment effect in the treated and the average treatment effect in the not treated (see above).
The third set of methods, instrumental variable analysis, will only be possible if suitable instruments for SEN provision are identified (Greenland, 2018), for example if there are policy changes in provision that are implemented at different times across local authorities or changes in school policy for example brought about by governance change (Liu et al., 2020).There is a well-documented change in SEN provision from 2014, which may also allow a difference-in-difference approach to be implemented.

Missing data.
Depending upon the proportion of missingness affecting the data and the mechanisms of missingness, we will first use information across data sources to fill the missing information prior to data imputation; for example, using the variable "Sex" held in HES to complement missing values in the NPD variable "Gender".We will use Imputation using Chained Equations under a missing at random (MAR) assumption to singly imputed missing values, as opposed to multiply imputed, because the imputation will be embedded within the bootstrapping conducted to estimate confidence intervals of point estimates (Schomaker & Heumann, 2018).All relevant variables (including interactions and non-linearities) will be used to predict missing data including the exposure and the outcome (Azur et al., 2011).

Sensitivity analyses.
To account for uncertainty in the recording of observational data that may lead to measurement errors, we aim to conduct sensitivity analyses.First, we will conduct a sensitivity analysis to mitigate against a delayed recording in SEN provisions, by expanding the exposure window to the first term in Year Two as part of the January census; this analysis will include information collated during Year Six as part of the adjustment set of baseline covariates.Secondly, to understand the driver of unplanned hospitalisation, we will decompose our outcome of unplanned hospitalisation into three categories firstly, the number of days recorded in Accident and Emergency or in Unplanned Admitted Patient Care, secondly, the number of days recorded in Accident and Emergency, and thirdly, the number of days recorded in Unplanned Admitted Patient Care.Similarly, we will examine absences in subgroups defined by whether they were medically related or unauthorised.Finally, we propose to analyse the differences between using recorded child sex (reported by physician in HES at birth) and gender (submitted by parent/carer during school registration in NPD) and produce point estimates tables of the intervention variable when using either measure.

Ethics and dissemination
Permissions

Data sharing and access
Aggregate results from the ECHILD dataset will be preprinted, revised as a protocol, and published.De-identified individual record-level data is currently hosted on the Office for National Statistics Secure Research Service's data-sharing service.We are grateful to the Office for National Statistics (ONS) for providing the trusted research environment for the ECHILD Database.This does not imply ONS' acceptance of the validity of the methods used to obtain these figures, or of any analysis of the results.
The ECHILD Database uses data from the Department for Education (DfE).The DfE does not accept responsibility for any inferences or conclusions derived by the authors.This work uses data provided by patients and collected by the National Health Service as part of their care and support.Source data can also be accessed by researchers by applying to NHS Digital.

Conclusions/discussion
his study will contribute towards the understanding of the health and educational impact of Special Educational Needs Provision in a heterogeneous population based upon health needs, specifically those born in England with isolated cleft lip and/or palate.This study will focus on estimating the causal impact of an intervention that can be introduced during a child and young person's educational journey which may impact their experience of health and education during childhood.
This research protocol plan to apply two databases --National Pupil Database and Hospital Episode Statistics to assess the potential relationship between the special educational needs provision and academic performances of children with cleft lip and palate sand.In general, this protocol written very clearly.However, there may be some areas that need further improvement.Firstly, the word "isolated cleft lip and palate" may be a little misleading, I noticed this question has also been mentioned in the second reviewer.At first glance, I thought it was the population with unilateral cleft lip and palate, and I think it will be better with the word "non-syndromic cleft lip and palate".Secondly, speech disorders may trigger a high incidence in patients with cleft palate.However, the incidence rate of this disorder may be slight in patients just with cleft lip.So, I think it will better if research can further investigate the types of special education needs that patients might have receipted.In addition, lower academic performers of patients with cleft lip and palate are caused by various issue, including psychological stress, intelligence, and handedness.These should be considered in the research.
Is the rationale for, and objectives of, the study clearly described?Yes

Is the study design appropriate for the research question? Yes
Are sufficient details of the methods provided to allow replication by others?Yes Are the datasets clearly presented in a useable and accessible format?Yes The population of interest is children born with isolated cleft lip and/or cleft palate, covering all major cleft types but excluding individuals with known syndromes.This is an important population as the population of children born with cleft lip and/or cleft palate in the UK have lower academic outcomes compared to their unaffected peers.Reasons for this have been explored in the literature but our understanding is still limited.The paper describes the planned dataset and analyses clearly.However, the justification for the planned analysis is not fully explained.The title states that the purpose is to consider the impact of early special educational needs (SEN) provision on a) unplanned hospital utilisation and b) absences from school, with each of these acting as proxies for health and educational outcomes.The Plain English Summary explains that the intention of this analysis is to see whether children benefit from SEN provision through a reduction in hospital usage and school absences.What is not clear however is why this relationship might exist.As SEN provision is designed to assist children who are at risk of suboptimal educational attainment, it is difficult to understand why this might be associated with unplanned hospital use.Similarly, while school absence could be considered more directly linked to SEN provision in that both are linked to education, the precise link and justification for considering school absences and SEN provision is not provided.
In the background section of the abstract, the purpose of SEN provision is given as 'to improve their speech and language skills'.I think this is misleading.SEN provision can be provided for many reasons.While children born with cleft palate (with or without cleft lip) are at high risk of having problems with their speech -and may indeed receive SEN provision for this -we cannot assume that that is why they are receiving it.An additional analysis could look at the purpose of the SEN provision -of the categorisation of the child's SEN -in relation to the outcomes, but the analysis as planned must be recognised as being broad SEN provision which could be intended to provide for one or more needs.
I noticed that the first reviewer queried why the exposure of SEN provision (intervention) was limited to year one and I would also question this.There is a need to have consistency in the dataset of course but it can take a long time for some children to be identified as needing SEN and even longer for provision to be put in place.Moreover, EHCP provision is only for those children with significant need which cannot be provided for within the school and it can take some time for children to be recognised as needing this.It would help to have some additional analysis to inform this decision using data from the National Pupil Database regarding the proportion of children receiving SEN provision and having EHCPs in January of Year one compared to later years to get a sense of how many children might be classified as 'no SEN' based on the year one figures but who would be classified as 'SEN' in later years.
It would help readers if the meaning of 'isolated' in 'isolated CLP' was explained.Generally this is used for those children with CLP who have not been identified with a syndrome.However, as cooccurring conditions are included as confounders, it should be explained how syndromic status is being identified as there is an expectation that some children will have co-occurring conditions in the absence of an identified syndrome.Moreover some children may have CLP as part of a syndrome, but they have not yet been identified as such and that should be acknowledged.
I was unclear why some of the numbered variables in table 3 were included as presumably the ones of interest are only the first two rows, those relating to planned and unplanned admission?
Similarly I was unclear in table 4 why row G -number of unauthorised sessions missed during the academic year as pupil is on a holiday….-is included.
Aside from the concern about justification of the main purpose of the work, the plans are welldescribed and replicable and scientifically sound.If a strong rationale for the work can be provided, the analysis and resulting report will be robust.

Minor points
On page 4, in the ethics and dissemination paragraph, there should be a space between '…ECHILD.' and 'Stakeholder…'.
On page 5, column 2, line 2, ECHILD is written as ECHID.Also the final sentence in this paragraph is unended.
The covariate 'free school meal eligibility' is an important variable but I wondered whether 'pupil premium status' would be the relevant term to explore?
In table 6, year one is sometimes spelt as a word and other times as a number (1).Also there is a closing bracket missing in the second column for the row 'Interventions to be compared'.
Page 13, column 2, paragraph 1, there may be a word missing in the fifth line.
Page 14, column 2, beginning of the Conclusion/discussion section -a letter 'T' is missing.
Is the rationale for, and objectives of, the study clearly described?No

Is the study design appropriate for the research question? Yes
Are sufficient details of the methods provided to allow replication by others?Yes Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Cleft lip and/or palate, speech and communication, educational outcomes, psychosocial outcomes, cohort studies, Cleft Collective study I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
definitions of variables included, and decision points.However, specifics related to the hypothesized casual mechanism between SEN and hospitalizations/absences, anticipated sample size, power to run analyses, and criteria for including additional measures (e.g., academic outcomes) were missing.See below for details.
Introduction -This section is written clearly and concisely, with current literature cited.My only recommendation would be to provide more background information/rational for how it is hypothesized that provision of SEN will impact unplanned hospital utilization/school absences.(As hospitalizations and absences are typically determined by severity of illness, how is it felt that provision of SEN would impact these specific outcomes?)Methods -Are there any publications or citations related to the two meetings (FLARE and Young Persons Advisory Group) that led to the decision to use school start as the entry point?Interested readers may want additional information on how the meetings were led, how responses were recorded, and how those responses were further narrowed down to the choice of school start as the entry point.
Intervention -While the rationale for limiting group assignment to provision of SEN within the first year of school is provided, this decision may create bias where findings are limited to those with more severe issues (that present early in education) and does not address those with issues that may become more noticeable/impactful over time (e.g., learning disorders).This limitation should be discussed.
-Are there specific aspects of SEN that are anticipated to have more impact on the outcome measures?For example, could having a teaching assistant have less of an impact than adapted learning programs?Are there plans for a more refined analysis (i.e., more specific coding of elements of SEN [rather than just none vs SEN vs EHCP] that could be tracked?) -Gender is listed as a covariate (pg 11), but is not included in Table 5.This variable should be added to the table with the categories that will be used for clarification on if this is "sex determined by doctor at birth" or gender the child reports.(This is discussed a bit on page 14 but should also be included in Table 5.) -It is appreciated that the authors discussed why academic performance was not considered as an outcome variable.However, the rationale for omission is not supported with specific estimates of sample size and effect size.For example, prevalence of CLP is estimated at 900/year with 62% expected to have isolated CLP.For those participants that would have academic outcomes (those born between 2003-2008), a sample size of 2,232-2,790 participants with isolated CLP would be expected (i.e., 62% of 900 x 4 to 5 years).Even if the prevalence of SEN was 10%, that would still provide over 200 participants with academic outcomes.The authors do indicate that they would revisit the possibility of including academic outcomes based on numbers with Key State Two data, but no information is given on what baseline sample size would be required to power evaluation of academic outcomes.
Overall, this study is well structured and outlined.Concerns are minor and are limited to desire for more specifics related to hypothesized casual mechanisms and estimated effect sizes.
Is the rationale for, and objectives of, the study clearly described?Partly Is the study design appropriate for the research question?Yes Are sufficient details of the methods provided to allow replication by others?Yes Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Oral Cleft, Academics, Learning Disabilities, Neuropsychological Assessment, Neural Imaging I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Figure 1.Flowchart of how the proposed population would be derived.Our population would consist of those who are identified in HES with cleft lip and/or palate before Year 1, who have a birth record in HES and who are linkable to NPD.SEN levels will be based upon SEN recorded by Year 1.No SEN represents either not having recorded SEN provision or receiving SEN after Year 1. Children with cleft lip and/or palate (including bilateral and unilateral): Q35x Q36x Q371, Q373, Q375, Q379, Q370, Q372, Q374, Q378.Exclusion criteria is MCA.
missed during the academic year for an unauthorised absence not covered by any other code/description.Include R Number of authorised sessions missed during the academic year due to religious observance.Exclude S Number of authorised sessions missed during the academic year due to study leave.Exclude T Number of authorised sessions missed during the academic year due to traveller absence.Exclude U Number of unauthorised sessions missed during the academic year as pupil arrived after registers closed.

Table 1 . Inclusion and exclusion criteria used to define isolated cleft lip and/or palate. Condition, by severity ICD-10 codes*
Identified as a primary or secondary diagnoses in any hospital admission record prior to the start of Year 1 of school (age 5 at entry); **all cleft lip and/or palate groups also have congenital anomalies (excluding those relating to cleft lip and/or palate) excluded; ICD-10 = International Classification of Diseases version 10 *

Table 2 . List of variables recording special educational needs in the National Pupil Database. Variable Name in NPD Variable Description as per the NPD data dictionary
PrimarySENtypeNature of pupil's primary special educational need.For pupils with a SEN status of E or K their main or primary need and, if appropriate, their secondary need, should be recorded.
• SENPS: Does pupil have SEN -Action Plus or Statemented?•SEN_ALL: Does pupil have SEN with or without statement or EHC plan?• SENAPK: Does pupil have SEN without statement or EHC plan?• SENSE: Does pupil have SEN with statement or EHC plan?LatestSEN Provision types under the SEN Code of Practice.SENProvisionMajor Pupil's major SEN provision group based on SEN provision code.SENstatus Provision types under the SEN Code of Practice.SENUnitIndicator Indicates if a pupil with SEN in a mainstream school is a member of a SEN Unit (sometimes called special class)SpecialProvisionIndicator Indicates if a pupil with SEN in a mainstream school is a member of an SEN Unit, special class or resourced provision.

Table 4 . Determining medical related absences in the National Pupil database.
ExcludeGNumber of unauthorised sessions missed during the academic year as pupil is on a family holiday, not agreed, or is taking days in excess of an agreed family holiday.

Table 5 . Socio-demographic, educational and health characteristics by recorded SEN categories.
The table will include means with standard deviations or numbers and row percentages as appropriate, Cleft Lip and/or Palate derived population.

Table 6 . Trial emulation to estimate the causal effect of SEN by Year 1 on unplanned hospitalisations by Year 6 in children with cleft lip and/or palate (without other congenital anomalies).
From: Randomization to the interventionTo: the end of primary school OR loss of follow-up (e.g., emigration) OR death OR end of studyFrom: January Census in Year OneTo: the end of primary school OR loss of follow-up in NPD OR death OR end of study/end of data (for HES: 31 August 2019) to use linked, de-identified data from Hospital Episode Statistics and the National Public Database were granted by the Department for Education (DR200604.02B)andNHSDigital (DARS-NIC-381972); consent from patients is not required for HES as the data provided by NHS Digital is pseudo-anonymised and reduces identifiability to researchers; further information on opting out of Hospital Episode Statistics for secondary usage can be found here.Ethical approval for the ECHILD project was granted by the National Research Ethics Service (17/LO/1494), NHS Health Research Authority Research Ethics Committee (20/EE/0180) and UCL Great Ormond Street Institute of Child Health's Joint Researchand Development Office (20PE06).Stakeholders (academics, clinicians, educators, and child/young people advocacy groups) will consistently be consulted to refine populations, interventions and outcomes of studies that use the ECHILD dataset to conduct target trial emulation.Scientific, lay and policy briefings will be produced to inform public health policy through partners in the Department of Education and the Department of Health and Social Care.