Artificial Intelligence for Multiple Long-term conditions (AIM): A consensus statement from the NIHR AIM consortia

Recent advances in causal machine learning and wider artificial intelligence (AI) methods could provide new insights into the natural histories and potential prevention of clusters of multiple long-term conditions or multimorbidity (MLTC-M). When combined with expertise in clinical practice, applied health research and social science, there is potential to systematically identify and map new clusters of disease, understand the trajectories of patients with these conditions throughout their life course, predict serious adverse outcomes, optimise therapies and consider the influence of wider determinants such as environmental, behavioural and psychosocial factors. The National Institute of Health Research (NIHR) recently funded multidisciplinary consortia to bring together AI specialists, experts in big data and MLTC-M in the first and second waves of this new programme. The so-called AIM consortia of researchers will spearhead the use of artificial intelligence methods and develop insights for the identification and subsequent prevention of MLTC-M. This consensus agreement is aimed at facilitating a community of learning within the AIM consortia, promoting cooperation, transparency and rigour in our approaches while maintaining high methodological standards and consistency in defining and reporting within our research. In bringing together these research collaborations, there is also an opportunity to foster shared learning, synergies and rapidly compare and validate new AI approaches across our respective studies. This step is critical to implementation on the pathway to patient and public benefit.

the use of artificial intelligence methods and develop insights for the identification and subsequent prevention of MLTC-M. This consensus agreement is aimed at facilitating a community of learning within the AIM consortia, promoting cooperation, transparency and rigour in our approaches while maintaining high methodological standards and consistency in defining and reporting within our research. In bringing together these research collaborations, there is also an opportunity to foster shared learning, synergies and rapidly compare and validate new AI approaches across our respective studies. This step is critical to implementation on the pathway to patient and public benefit.

Scope and aim
This statement was developed by the first wave of the NIHR AIM consortia and received input from the second wave. It includes representatives across thirteen universities from Edinburgh, Birmingham, Oxford, Southampton, Nottingham, Kent, Manchester, St. Andrews, Liverpool, Newcastle, QMUL, Loughborough and UCL. The multidisciplinary collaborations include front-line primary, secondary and social care staff; researchers in primary, secondary, and social care; health informatics and data science (including AI) experts; epidemiologists; qualitative researchers; statisticians; clinical and health services researchers, geographers; health economists; sociologists, human factors design researchers and public contributors. A summary of each study represented within the AIM consortia and their respective aims is included in table 1. The agreements reached are entirely those of the consortia; sponsors and funders have had no role in the development or reporting of this statement.
After initial discussions on the need for such a statement in our individual projects, we met to refine ideas and reach an agreement on item inclusion. We acknowledged variations in aims and purpose of research but found shared interest and overlapping aims with regards to the development of MLTC-M clusters that could then be externally validated across our respective studies. We collectively agreed on the need for a priori consensus on definitions of MLTC-M, clustering variables, outcomes, managing data requests, as well transparency in reporting and methodological approaches to permit meaningful comparison between findings and validation that in turn, could contribute toward more rapid translation to patient benefit.

Follow-up period
Up to 10 years Up to 10 years Objective 1: Up to 10 years.
Objective 4: Prediction is short time horizon (likely 3 years or less), but data will cover 5+ years (but include COVID periods which are a hard interruption of process mining for example).
Up to 12 years Up to 10 years Up to20 years Birth to age 65 for the primary analyses

AI methods
Semi-automated based on shallow machine learning clustering using expert crafted features calculated from raw data;

Defining MLTC-M
Researchers acknowledged the substantial variations within the literature in defining MLTC-M and in the conditions that might be included under this terminology.
[1] [2]Moreover, each consortium may have a slightly different clinical or public health focus meriting inclusion of a diverse range of conditions [ref]. Acknowledging these challenges, the consortia agreed to conform to the definition of MLTC-M set out by Guthrie et al., ( paper in submission) and to include a minimum set of conditions within our data requests based on the 59 core conditions summarised in figure 1 below. Not all these conditions will be relevant to every study and indeed researchers will include additional conditions. However, the agreement on the availability of a minimum set of conditions by every consortium will facilitate future external validation across studies.

Protocols, data requests and coding
Study protocols and analysis plans will be made widely available, and wherever possible the consortia will include within data requests and approvals, opportunities for unspecified replication for other studies within the NIHR AIM programme of research [3]. This is an essential step to achieving the overall objective of cross-collaboration replication and validation. Code-sets and analytical code should also be made available. We constructed clinical code lists using a rigorous process, which involved reviewing existing code lists (e.g., CALIBER and Cambridge CPRD codes), applying comprehensive search terms to identify codes and finally review by clinicians and where necessary a consensus process to produce the final list of codes. The process is documented for transparency and was developed using the DExtER code builder. Initial code list was generated by the University of Birmingham for the 59 agreed conditions and shared freely across the consortia. Individual groups will include additional codes where appropriate according to their projects. Analytical code will be [4,5]available through the HDR-UK phenotype library. Researchers will also consider The FAIR Guiding Principles for scientific data management and stewardship, [6] and utilise appropriate reporting standards such as STROBE, RECORD or TRIPOD. [4,7]This will facilitate study consistency and rigour, improve transparency and reduce ambiguity in both methods and reporting.

Data variables
The consortia agreed to include a core minimum list of variables in their data requests. Individual study aims, objectives and methodologies will vary and thus not all variables will be included in all analytical models. Additional variables could be required accordingly. Depending on study aims, variables might be used as exposures, covariates or outcomes and thus we have purposely not specified these here. The inclusion of a minimum set of clustering variables means that different research teams will be able to test their algorithms and validate research findings across datasets. We have agreed on the following core groups of variables but acknowledge that they may be measured, reported and defined differently across datasets and by each consortium. Clear reporting and explanations of variables will be included to permit meaningful interpretation and comparison between measures across consortia: