Skip to content
ALL Metrics
-
Views
4
Downloads
Get PDF
Get XML
Cite
Export
Track
Review

Scoping the use of AI to organise, synthesis and analyse a large scale community engagement and consultation exercise.

[version 1; peer review: 1 approved]
PUBLISHED 09 Oct 2025
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Policy Research Programme gateway.

Abstract

Artificial Intelligence (AI) is increasingly recognised as a transformative tool in qualitative research, promising to enhance both the scalability and efficiency of processes which are traditionally resource intensive. While qualitative research is valued for its capacity to capture rich, nuanced insights into complex phenomena, accelerating these methods to enable large-scale impact across public sector contexts is an emerging priority. Automation with AI has demonstrated clear potential to save time and reduce costs, yet human-led qualitative approaches remain essential for interpreting emotional nuance and intricate social interactions.

Governmental strategies increasingly advocate for the adoption of AI in local government and public sector organisations to analyse qualitative data from various sources, with the aim of facilitating rapid extraction of meaningful insights and enabling prompt, evidencebased decision making. Despite the advantages of automating certain tasks, can AI match qualitative methodologies which are indispensable for revealing the contextual and granular nature of social and behavioural phenomena? literature review was conducted to inform recommendations regarding the suitability of employing AI software, specifically Microsoft 365 Co-pilot, for the initial synthesis and analysis of qualitative data gathered through a large-scale local council community engagement consultation exercise. The review considers issues of cost-effectiveness, practicality, and the ability to manage high volumes of responses within constrained resources, assessing the potential for AI to streamline and strengthen qualitative analysis in this context.

Plain Language Summary

Background to the research

The research addresses the challenge of efficiently analysing large-scale qualitative data from community engagement consultations. Traditional methods are time-consuming and resource-intensive, which can delay decision-making and reduce the impact of public feedback on policy and service improvements. This problem affects public services and society by limiting the ability to quickly and accurately incorporate community insights into strategic planning.

Aim of the research

The research aims to evaluate the effectiveness of using Microsoft 365 Co-pilot, an AI tool, for analysing qualitative data from community consultations. The key question is whether AI can enhance the speed and accuracy of qualitative analysis compared to traditional human-led methods and also capture community voices effectively. The main objectives are to assess the accuracy, consistency, and practicality of integrating AI into routine local authority consultations.

Research plan

To answer the research question, the study will compare other studies where AI-facilitated analysis has been used with traditional human-led methods. The research design includes a scoping literature review and a comparative analysis of qualitative data from a large-scale community consultation. The methods involve using AI for initial data coding and organization, followed by human review to ensure depth and accuracy. This hybrid approach is chosen to leverage the strengths of both AI and human analysis.

Working with diverse people and communities

The proposal has been shaped by feedback from diverse community groups involved in the consultation process. Their insights have informed the design and implementation of the research, ensuring that the AI tool is used in a way that respects and accurately represents their experiences. The research will continue to be influenced by these groups through ongoing engagement and feedback mechanisms.

Knowledge mobilisation

The plan for knowledge mobilisation includes sharing the findings with local authorities, policymakers, and community organizations. This will be done through reports, presentations, and workshops to ensure that the insights gained from the research are effectively communicated and used to improve future community consultations and policy development.

Patient and public involvement

There was no patient or public involvement in the design or implementation of this review, although it was undertaken to inform the analysis of data from a public engagement consultation.

Keywords

Artificial Intelligence, public sector, local government, public engagement, qualitative analysis

Introduction

Artificial Intelligence (AI) is emerging as a potentially transformative tool in qualitative research, offering opportunities to enhance the depth and scalability of a labour intensive and time consuming process1. Qualitative research emphasises the richness of data over breadth or quantity, however, there is a growing need to accelerate qualitative research across various sectors to enable large-scale impact2. Globally, AI is currently being used across a variety of public sector contexts. For example, in healthcare settings, it is used widely to analyse patient narratives and feedback; significantly reducing the time invested by clinicians to improve treatment protocols and prioritise patient care experiences3,4. A six-month Australian Government trial found that Microsoft 365 Co-pilot helped departments efficiently summarise documents, draft meeting minutes, and reword reports. Most managers (64%) saw improved team efficiency and quality, while 40% of staff were able to spend more time on strategic tasks thanks to AI support5.

In Scotland, AI is currently being used by the NHS, Police Scotland and the Scottish Prison Service in a variety of ways, for example, assisting in effective communication with residents whose first language is not English and supporting those with visual or hearing impairments; enhancing resource allocation by analysing vast quantitative data to forecast demand and optimise logistics, such as reducing fuel costs and environmental impact6. Researchers from Edinburgh and Durham universities, in collaboration with Public Health Scotland and the Alan Turing Institute, have improved a tool for Scottish emergency departments using AI. This updated version better identifies individuals at high risk of urgent hospital care and helps healthcare providers manage resources efficiently. Emergency admissions, which make up about half of all hospital stays in Scotland, are now easier to predict and address6.

Automation in these sectors can clearly save time and reduce costs. However, qualitative data, such as that from qualitative interviews, focus groups or surveys is characterised by rich and deep insights into complex, real-world phenomena. Researchers in social and behavioural sciences adopt qualitative research methodologies, such as inductive coding, grounded theory, and data triangulation, to interpret text and capture the granularity and complexity of phenomena7. Human-led qualitative research excels at capturing nuanced emotions and complex social interactions, offering detailed insights. It provides contextual understanding, and allows researchers to adapt their techniques based on real-time feedback8. In their publication, ‘Scotland’s Artificial Intelligence Strategy’ the Scottish government encourages public sector organisations to utilise artificial intelligence for the analysis of qualitative data collected through surveys, interviews, and community feedback, suggesting that the implementation of AI could facilitate the rapid extraction of meaningful insights, thereby enabling more prompt evidence-based decision-making6.

The literature review reported in this paper was undertaken to inform recommendations around the suitability of employing AI software, specifically Microsoft 365 Co-pilot, to undertake the first-stage synthesis and analysis of qualitative data from a large scale local council community engagement consultation exercise.

Research context

In March 2025, Community Planning Aberdeen and Aberdeen City Council launched the Your Place, Your Plans, Your Future consultation9. This community engagement consultation aimed to inform a range of strategic plans and policies, including Housing, Health and Social Care, Community Learning and Development, the Visitor Levy Proposal, and the Local Development Plan Evidence Report. Multiple consultations were combined into a single process to streamline engagement, minimise duplication, and reduce the burden on residents and the resultant ‘consultation fatigue’10.

The consultation utilised the 14 themes of the Place Standard Tool, a nationally recognised Scottish framework for assessing the quality of places where people live, work, and interact11. The tool encourages structured conversations by inviting residents to rate their local area across themes such as Housing, Public Transport, Safety, Social Interaction, Identity and Belonging, and Influence and Sense of Control. In a previous 2023 consultation using this tool, 367 residents participated. This generated over 10,000 responses. Due to the volume of responses, tight deadlines and limited staff capacity, a subset of this data was analysed using a framework analysis approach and a report of findings produced12.

The current consultation anticipates over 2,000 responses, all of which will be organised and analysed. To address time and staffing constraints, Microsoft 365 Co-pilot was considered as an analysis tool since it was already included in Aberdeen City Council’s existing software package. This presented a potentially practical and cost-effective way to enhance qualitative analysis without requiring investment in additional software.

Aim

The aim is to compare the accuracy and consistency of AI-facilitated qualitative analysis, specifically using Microsoft 365 Co-pilot, with traditional human-led methods. Ultimately, the goal is to enable robust, evidence-based outcomes that respect citizens’ voices, and to assess whether integrating Co-pilot into routine local authority consultations offers a sustainable and cost-effective enhancement to qualitative research and analysis.

Methods

Search strategy

A scoping search using the same databases listed below found no papers which reported on the analysis of qualitative data from public engagement consultations using AI (search string: ((Artificial Intelligence) AND ((Qualitative Analysis) OR (Qualitative Research) AND (community engagement) OR (public engagement) OR (engagement consultation) (place based consultation))). A second search strategy was designed which was very broad in its scope. We searched several social science and public health databases; PubMed, Springer Link, JSTOR, Embase and Scopus. A typical search string was: ((Artificial Intelligence) AND ((Qualitative Analysis) OR (Qualitative Research) OR (interviews) OR (focus groups) OR (survey))). We also made use of Google Scholar where we used the words “artificial intelligence qualitative analysis”. The search was limited to English language records, and included peer-reviewed journal articles, dissertations and grey literature, such as third-sector reports and government documents. To identify further relevant studies, we used backward and forward snowballing.

Inclusion and exclusion criteria

Studies were included if they reported a comparison between, or a dual-approach to, human-led qualitative analysis and AI-led analysis. Only studies that reported on data from interviews, focus groups or free text survey responses were included. Only those papers published in English were included. No limitations were put on publication date, although papers were excluded if they only reported findings from a COVID related study. Papers were included if they reported data from at least one of the following: patient satisfaction engagement; public health, local government, disadvantaged communities, community engagement, third sector or education research. Articles reporting on AI in market research, AI as a tool for students, the use of Chatbots in customer service or on health apps were excluded. Results of the search are shown below in Figure 1.

b18ef535-c161-4917-a810-a5ca562512c2_figure1.gif

Figure 1. PRISMA diagram of search results13.

Results

Search results

Ten studies are included in the review. Searches of PubMed, JSTOR and Scopus yielded no results. Springer Link and Embase yielded one result each. Four papers were found on Google Scholar and four through snowballing.

Overview of studies

Three articles focused on analysing the data from patient satisfaction evaluations; a survey following a pain relief intervention14; interviews following an obesity intervention15; interviews regarding patients’ experiences at an eye clinic16. One analysed data from interviews with men in Belize which focused on their healthcare needs17. One used data from interviews investigating the psychosocial impact of disfiguring scars18. Other studies used interview data from an inpatient substance use programme19, a guaranteed income pilot study20, interviews about experiences of social justice21 and the transcript of discussions recording during problem solving exercises in an academic setting22. Focus group data from students in their first year at university and dual earner couples with caregiving responsibilities was used in the final study23. An overview of the included studies is shown in Table 1.

Table 1. Overview of included studies.

ReferenceTopicMethodologiesFindings/Conclusions
Bennis and
Mouwafaq,
2025
Comparison of AI-
driven and human-led
thematic analysis;
lived experiences of
scarring.
Analysis of 448 direct quotations from
student responses, on psychological
effects of scarring from interviews.

Traditional qualitative analysis compared
with nine generative AI models.
Advanced A.I. models showed “impressive
congruence” with reference standards.
Qualitative research should focus on analytical
rigour by combining A.I. capabilities and human expertise and
standardised reporting.
Flaherty and
Oliver, 2024
Comparison
between and AI-
assisted and manual
thematic review of
post-intervention
satisfaction survey.
Analysis of post-program satisfaction
survey with 82 individuals in the
Empowered Relief programme. Responses
were uploaded to ChatGPT for coding,
which were then manually organised into
themes. The results were reviewed by two
researchers.
The findings from the LLM-derived analysis
provided valuable insights into the program
and the field of pain psychology by using
patients' own words to guide program
evaluation. ChatGPT offers an ergonomic
solution for the evaluation of open ending
questioning which provides rich and
unexpected information.
Hamilton et al.,
2023
Comparison of analysis
using ChatGPT outputs
with human-created
coding of qualitative
interviews with
guaranteed income
pilot recipients.
From 71 interviews, 1125 'significant
statements' or quotes were put into
ChatGPT and then it was prompted,
“Please identify common themes from the
statements.”
ChatGPT identified ten themes in less than
30 seconds.
AI-generated themes, offered efficiency and
scalability but lacked nuanced understanding
of the broader social, economic, and cultural
contexts that shaped participants’ lives.
Combining the strengths human and AI
analysis can lead to a more comprehensive
understanding of qualitative data.
Jalali and
Akhavan, 2024
A comparison between
human-led and
ChatGPT analysis of
interview data from
an obesity prevention
intervention.
The human-led analysis of 40 semi-
structured interviews to explore the
dynamics of obesity prevention. Using
ChatGPT and the prompt: “Go through
these interview data and extract the key
variables of interest?
ChatGPT identified themes which added to the
richness of data, however, it did not identify
nuances in the transcripts which significantly
changed the meanings of participants'
experiences. A dual approach could offer a
more thorough and balanced understanding.
Jiang et al.,
2025
The feasibility of
utilizing artificial
intelligence (AI) for
qualitative data
analysis in equity-
focused research
Human-led thematic analysis two
researchers conducted a thematic coding
of the data: they identified quotes used
quotes to develop themes.
Chat GPT3 was given a series of
accumulative prompts to generate key
quotes and themes. Turing-style test
issued. 10 participant response were
analysed.
Results suggest that the AI model, when
provided with suitable prompts, can
proficiently perform thematic analysis,
demonstrating considerable comparability
with human coders. to interpret data through
social justice perspectives.
Turing Test - The reviewers, consisting of
professors and doctoral students in the
education field, were unable to discern the
analysis results generated by human coders
from those produced by GPT-3.
Kon et al.,
2024
A comparison of a
manual and a ChatGPT
analysis of interview
data from patients
describing their
experiences of an eye
clinic
Three transcripts were analysed by an
independent researcher. Next, specific
aims, instructions and de-identified
transcripts were uploaded to ChatGPT.
Concordance in the themes was calculated
and the number of subthemes and time
taken by
ChatGPT
The average time taken per transcript was
11.5 min, 11.9 min and 240 min for ChatGPT
3.5, ChatGPT 4.0 and researcher respectively.
ChatGPT significantly reduced analysis
time with moderate to good concordance
with human-led analysis. This highlighted
the potential of ChatGPT to facilitate
rapid preliminary analysis. However, in-
depth analysis needs to be conducted by
researchers.
Leeson, et al.,
2019
A proof of concept
study to evaluate
the potential of AI to
analyse qualitative data
from a men’s health
needs assessment.
Comparison of the qualitative method of
open coding with two forms of AI, Topic
Modelling, and Word2Vec to analyse
transcripts from interviews conducted in
rural Belize asking men about their health
needs. Transcripts from 56 semi-structured
interviews were coded using NVivo
software by one researcher – these were
verified by four members of the research
team. Two NLP models were used to
analyse the same data. Codes and themes
were compared.
Similar concepts were labelled with similar
codes in both human-led coding and AI.
The AI models were not able to code data if
responses were vague. The form of prompts
affects the accuracy of the AI models. AI
models are a useful adjunct to traditional
qualitative research methods but are not
accurate enough to replace them.
Morgan,
David, 2023
A direct comparison
between manual
analyses of two
qualitative datasets
and the results from
querying the same
datasets with the free
program ChatGPT.
First data set, from 24 participants in six
focus groups asking about issues students
face in their first year at university. Second
data set 73 participants in 19 focus groups
describing the experiences of dual-earner
couples with caregiving responsibilities.
Manual analysis of the first data set
followed a reflexive thematic analysis
approach, including manual coding of the
data. Manual analysis of second data set
followed an iterative thematic inquiry.
There were discrepancies in the priority given
to themes and sub-themes in the ChatGPT
analysis which emphasise specific aspects
of the data, without pointing to the bigger
picture that united these specifics; less
successful at locating subtle, interpretive
themes, and more successful at reproducing
concrete, descriptive themes.
Siiman et al.,
2023
A comparison of AI
in deductive and inductive qualitative
analysis and human-
led analysis.
Researchers used data from a previous
study where pairs of adults collaborated
via a free-form, text-based chat to solve
a computer simulation problem about
balancing a seesaw.
The data from this study was re-analysed
using AI assistance, both deductive and
inductive approaches were applied and the
results compared to human coding and
human interpretation of the data. (Number
of participants not stated)
Important to structure and phrase prompts
so that AI responses best align with human
interpretation. Deductive analysis performed
better than inductive analysis, presumably
because prompts with richer contextual
information. AI-assisted qualitative analysis
has the potential to improve transparency in
the coding of qualitative data by encouraging
human analysts to report AI prompts so they
in turn can be reused by other researchers.
Yang Ma, 2025A comparison of
using ChatGPT for
analysing interviews
with individuals in
a substance use
programme.
Comparison between ChatCGPT's coding
to Ph.D. level researchers with expertise in
qualitative analysis and clinical experience
in substance use disorders using interview
transcripts from 60 individuals completing
a 28-day substance use programme.
The study also tested the validity of
ChatGPT's results by examining the
degree to which the findings aligned with
psychological theories, reflecting construct
validity.
The comparison of inductive coding
performed by ChatGPT and qualitative experts
revealed a high percentage of agreement,
supporting the utility of AI’s inductive
coding capabilities. The study underscores
the importance of qualitative expertise in
evaluating the reliability and construct validity
of GPT's outputs.

Findings

It has been reported that Large Language Models like ChatGPT can quickly identify themes and provide valuable insights using patients' own words to guide programme. Flaherty and Oliver14 report on a study which involved enrolling 82 individuals in the Empowered Relief program, a pain management intervention, and conducting a post-class satisfaction survey. A dual-method analytical approach was used, combining LLM-assisted and human-led thematic analysis. Responses were uploaded to ChatGPT for coding, which were then manually organised into themes. The findings from the LLM-derived analysis provided valuable insights into the program and the field of pain psychology by using patients' own words to guide program evaluation. The authors conclude that ChatGPT offers program directors the opportunity to broaden their evaluation of treatments to capture rich and unexpected details about patients’ experiences. Previously, data collection focused solely on improvements in patients‘ disability due to limited capacity to manually analyse open-ended data. Including open-ended data in program evaluation, which ChatGPT can analyse efficiently and effectively, can better capture patients’ lived experience of pain and lead to improved services.

Similarly, Kon et al.16 report on a study which compared human-led and ChatGPT analysis of interview data from patients describing their experiences at an eye clinic. The average time taken per transcript was 11.9 minutes for ChatGPT 4.0, and 240 minutes for the researcher. Preliminary results showed that ChatGPT significantly reduced analysis time with moderate to good concordance compared with current practice. The authors suggest that their study highlights the potential adoption of ChatGPT to facilitate rapid preliminary analysis, thus freeing up resources to focus on their lived experiences and thus better tailor their care. Meanwhile, Leeson et al.17, in a study capturing men’s healthcare needs, report on the practicalities of coding with two forms of NLP, Topic Modelling and Word2Vec, which they compared to human-led coding. They conclude that incorporating these AI tools during the coding process can augment traditional human-led qualitative analysis by improving accuracy and consistency. AI was utilised as a preliminary step before human-led coding to guide the creation of a codebook and as a post-check on the accuracy of human-generated codes.

Several studies highlight the efficiency of AI models. Hamilton et al.20 assessed the speed of manual thematic analysis versus AI theme generation. In the first phase of the study, researchers coded qualitative interviews with 71 guaranteed income pilot recipients. Quotes that helped understand how participants experienced the scheme were identified. From these, meanings were formulated to construct themes, providing a comprehensive understanding of individuals’ experiences. In the second phase, researchers fed all 1125 identified “significant statements” into ChatGPT and instructed: “Act as a phenomenological qualitative researcher. All 1125 significant statements above come from interviews with 71 guaranteed income pilot participants. Please identify common themes from the statements”. ChatGPT identified ten themes in less than 30 seconds. They observe that although AI-led analysis provides efficiency and scalability in data processing, it lacks the nuanced understanding and interpretive flexibility inherent in human research. The authors report that the human-led thematic analysis resulted in a comprehensive and holistic understanding of participants' experiences, taking into account the broader social, economic, and cultural contexts that shape their lives. They suggest that using both human-led and AI-led qualitative analysis can provide a more thorough understanding of qualitative data.

In a comparison between human-led thematic analysis and the use of ChatGPT to analyse data from interviews exploring the dynamics of obesity prevention, Jalali and Akhavan15, also note a lack of nuance in the AI generated content. The human-led qualitative analysis included coding interview transcripts to identify potential variables, links, and feedback loops. Researchers then used the interview transcripts from the original study and asked ChatGPT the following prompt: Could you please go through these interview data and extract the key variables of interest? ChatGPT identified themes that the original analysis had not included; this added to the richness of data, however, it did not identify some nuances in the transcripts which significantly changed the meanings of participants' views and experiences. The authors suggest this is because, although ChatGPT is able to access a broad spectrum of literature, using this knowledge effectively for nuanced analysis requires extensive training, experimentation, and navigation of ambiguities. Thus, it is challenging for AI tools to incorporate social contexts and human complexities into analysis.

Similarly, Morgan23 reports that ChatGPT was less successful at locating subtle, interpretive themes, and more successful at reproducing concrete, descriptive themes. The author reports on a direct comparison between a human-led thematic analyses and running queries of the same datasets with the free program ChatGPT. The data analysed was from two sets of focus group data, one asking about issues students face in their first year at university, and the other, the experiences of dual-earner couples with caregiving responsibilities. ChatGPT showed a clear tendency to emphasise more specific aspects of the data, without pointing to the bigger picture that united these specifics. The author concludes that data generated by AI models should not be taken as complete by itself, instead AI produces raw material that should be followed up with a more interpretive human-led qualitative data analysis.

Another study19 aimed to address a research gap concerning the application of artificial intelligence in social and behavioural sciences. The authors used deidentified transcripts from 60 individual interviews with clients undergoing a 28-day inpatient substance use program. They undertook a comparative analysis between inductive coding performed by ChatGPT and human-led qualitative coding. The findings demonstrated a high percentage of agreement, indicating the effectiveness of ChatGPT's inductive coding capabilities. The authors also note that the thematic analysis conducted by ChatGPT yielded themes consistent with emotion regulation theories, providing insights into the emotional aspects of substance use and the importance of emotion regulation strategies in treatment. They highlight that the application of ChatGPT and other LLMs for use in social and behavioural science is evolving in promising ways, namely understanding domain-specific theories and applying these reliably.

Also testing the potential of AI models for qualitative research in a specific domain, Jiang et al.21 evaluated the feasibility of using AI for analysis of equity-focused research. They report on a human-led thematic analysis and a comparison with a ChatGPT analysis which used a series of accumulative prompts. The findings of the comparison indicate that the AI model, when provided with suitable prompts, can effectively conduct thematic analysis, showing considerable comparability with human coders. The authors note that there must exist biases in the training of any AI model but were impressed that ChatGPT was able to interpret the data through social justice perspectives. The researchers also employed a Turing-style test to determine whether the AI generated results were distinguishable from the human-led analysis results. The reviewers, consisting of professors and doctoral students, were unable to discern the analysis results generated by human analysts from those produced by ChatGPT. The authors conclude that they are cautiously optimistic about the use of AI in qualitative research, emphasising that human interpretation and oversight is needed.

Bennis and Mouwafaq18 highlight an “impressive congruence” between human-led thematic analysis and that conducted by AI models. Like Flaherty and Oliver14, who suggest ChatGPT is capable of capturing nuanced detail’s about patients’ experiences, the authors suggest that AI models could be incorporated into complex psychosocial analysis. Bennis and Mouwafaq18 report on a dataset which consisted of 448 direct quotations related to the psychosocial effects of cutaneous leishmaniasis scars (cutaneous leishmaniasis is a parasitic disease caused by the bite of an infected female phlebotomine sandfly). One of the researchers conducted a manual thematic qualitative analysis on the data, the second used generative AI models to analyse the direct quotations thematically. The analysis was concurrent and neither researcher was involved in the other analysis. The authors report that there was considerable agreement between both analyses and conclude that AI should be incorporated into qualitative research methodology, particularly in complex psychosocial analysis where the deep learning models proved to be highly efficient and accurate. They suggest that the future of qualitative research methodology should focus on using a combination of AI capabilities and human expertise whilst all the time reporting the process with transparency.

The need for transparency in how AI models are employed in qualitative analysis is emphasised across the studies. Some authors suggest a standardised checklist of the reporting of the process with a particular focus on transparency around what prompts are used16. The significance of using the right prompt is also emphasised by those who argue that ‘crafting’ effective prompts for use with AI models and sharing these in the reporting of studies can enhance the transparency of qualitative analysis20. They report on a comparison between a human-led analysis and AI-prompted analysis around the extent to which students shared information and demonstrated shared understanding in a problem-solving task. The findings demonstrate significant agreement between analyses. The authors note that in traditional qualitative research, achieving consistency often involves resolving disagreements between analysts through discussion, though these discussions are rarely described. They argue that sharing the AI prompts used, precisely articulates the approach taken, allowing other researchers to reproduce and verify the process.

Discussion

The studies in this review do not report on data at the scale of that expected to be generated by the Your Place, Your Plans, Your Future community engagement consultation exercise. In addition, no study could be found that has used Microsoft 365 Co-pilot, specifically, as a tool in qualitative analysis. Most of the studies reviewed used a version of ChatGPT, a Large Language Model like Microsoft 365 Co-pilot, thus they offer insight into the advantages and some of the challenges of using Microsoft 365 Co-pilot to assist in the analysis and synthesis of the qualitative data generated by a community engagement exercise.

All the studies included highlight the transformative potential of AI in qualitative research and raise questions around its capabilities. Across the studies the themes of efficiency, scalability, and AI’s skill at understanding nuance and context are discussed. Some studies14,16 highlight ChatGPT's ability to quickly identify themes and provide valuable insights, significantly reducing analysis time with moderate to good concordance compared to traditional methods. Similarly, others20 emphasise the rapid identification of themes by AI models, noting that ChatGPT identified ten themes in less than 30 seconds. These findings align with other academic research24,25, which underscore AI's role in enhancing initial phases of qualitative data analysis while stressing the importance of human oversight to ensure relevance and specificity.

While AI models excel in efficiency, they often lack nuance and interpretive flexibility, i.e., the understanding that a text or narrative can be understood and interpreted in multiple ways depending on perspective and context. While nuance is inherent in human research, AI-generated content often misses subtleties that potentially alter the meanings of participants' views and experiences15. AI models such as ChatGPT are more successful at reproducing concrete, descriptive themes than at identifying subtle, interpretive ones23. Not surprisingly, the need for human oversight to ensure accurate interpretation and comprehensive analysis is emphasised across the literature. For example, Viard et al.26, discuss interpretive flexibility that human researchers provide, contrasting this with AI. They suggest that AI technology needs to ‘learn’ the complex relationships between social groups in order to fully interpret the complexities of qualitative data.

Also emphasised across the studies, is the need for transparent and replicable prompt engineering and the recognition that this requires training, practice and much trial and error21,22,27. Khalid and Witmer26 outline four key elements for prompt design: a clear task directive, relevant examples, contextual background, and specific response instructions. Microsoft 365 suggest Co-pilot prompts should include four parts: the goal, context, expectations, and source28. Even with careful prompt design, using the same prompt multiple times can result in different responses. Large Language Models are built upon a neural network, which introduces some randomness, thus even a carefully designed input prompt, will produce slightly different results each time it is used26. Collectively, these studies underscore the importance of a hybrid approach, where AI's efficiency is balanced with the nuanced understanding provided by human researchers, ensuring a thorough and accurate analysis of qualitative data.

The studies reviewed suggest that AI should be used as an adjunct to, rather than a replacement for, human-led qualitative analysis, emphasising that AI acts as an assistant in data exploration and evaluation, enhancing the effectiveness of established analytical methods without replacing them25. These studies highlight the importance of a hybrid approach, where AI's capabilities are leveraged to enhance human-led qualitative analysis, ensuring a comprehensive and accurate understanding of complex data1,18,20,21.

Ethical considerations

Our examination of the integration of AI in qualitative analysis highlights several ethical concerns that should be addressed. Confidentiality and privacy are paramount, requiring stringent protocols to protect sensitive participant data and prevent breaches of trust29. Bias and fairness are also critical, as AI algorithms can introduce biases that skew research findings and lead to unfair representations. Researchers must scrutinise AI outputs for potential biases and implement mechanisms to ensure fair treatment of participants, transparency and accountability in order to maintain credibility and trust in research; as suggested above, researchers should document carefully how AI-derived conclusions are reached30.

It has been noted that there are unique challenges to the adoption of AI in public sector and government settings, stemming from their capacity, structure, and operational methods31. The potential shortage of technical staff to implement and evaluate new technologies may result in inconsistency in, and of potential misuse of, AI, especially around security and privacy issues. The lack of resources and training around the use of AI may also hinder transparency and consistency in AI usage32. Addressing these ethical concerns is vital for the responsible integration of AI in qualitative analysis, especially in the public sector, to ensure that the benefits of AI are realised without compromising the integrity and ethical standards of qualitative research.

Future directions

The use of AI in qualitative research, and particularly in public sector contexts, is not well documented. Much of the literature on the use of AI-generated content or processes comes from the private sector where early adoption has taken place33. Thus, a great deal can be learned from alternative, non-academic sources, about how to navigate the complexities of using AI moving forward. According to a 2024 McKinsey report34, AI's role in policy development and public sector management is gaining significant interest largely driven by advancements in natural language processing and large language models. These technologies offer significant potential for enhancing qualitative data analysis through automated coding, thematic analysis, and even theory formulation. Researchers are encouraged to proceed with caution, balancing the benefits of AI augmentation with the need for rigorous ethical standards and human oversight35.

Limitations

While this review provides valuable insights into the integration of AI, particularly large language models, in qualitative research, several limitations should be acknowledged. First, there is a notable scarcity of studies directly examining the use of Microsoft 365 Co-pilot in qualitative analysis. Most existing research focuses on related models such as ChatGPT, meaning our conclusions rely on the assumption that findings from these models can be extended to Co-pilot.

Second, the review highlights the lack of studies addressing qualitative data at the scale anticipated in large public sector initiatives like the Your Place, Your Plans, Your Future consultation. Much of the existing literature is based on smaller-scale datasets, which may limit the generalisability of findings to large-scale, community-driven consultations in the public sector.

The rapid evolution of AI technologies presents an inherent challenge. Studies included in this review may quickly become outdated as language models are updated and new features are introduced. As a result, some observations or limitations noted here may shift as the technology matures.

Finally, while ethical considerations are discussed, there is limited evidence in the reviewed literature about how ethical challenges are practically navigated or resolved in real-world applications, especially concerning issues of transparency, bias, and participant privacy within large-scale governmental projects. As AI continues to develop and its adoption widens, ongoing research will be essential to address these evolving ethical and methodological challenges.

Conclusion

The integration of AI in qualitative research presents a transformative opportunity to enhance the efficiency, scalability, and depth of analysis. The studies reviewed in this article highlight the significant advantages of AI-led qualitative analysis, particularly in terms of speed and consistency. Large language models, of which ChatGPT and Microsoft365 Co-pilot are only two, have demonstrated their ability to quickly identify themes and provide valuable insights, significantly reducing the time required for manual analysis. However, the studies also underscore the limitations of AI in capturing the nuanced understanding and interpretive flexibility inherent in human-led qualitative research. AI-generated content often misses subtle nuances, potentially altering the meanings of participants' views and experiences. This suggests that while AI can provide a useful first stage of analysis, additional human input is needed to ensure a comprehensive and accurate interpretation of qualitative data.

Moreover, the integration of AI in qualitative analysis brings several ethical concerns that must be addressed to ensure responsible use, particularly as there is a lack of AI strategies currently in place across the public sector. Confidentiality, privacy, bias, fairness, transparency, and accountability are paramount considerations. Researchers must implement stringent protocols to protect sensitive participant data, scrutinise AI outputs for potential biases, and document the AI-derived conclusions to maintain credibility and trust in scientific research.

While AI offers significant advantages in terms of efficiency, scalability, and consistency, it should be integrated with human expertise to ensure the richness and complexity of qualitative data are fully captured. The future of qualitative research lies in human researchers using human-led qualitative analysis and AI tools, leveraging the strengths of both to achieve a comprehensive and accurate understanding of complex data.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 09 Oct 2025
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
VIEWS
52
 
downloads
4
Citations
CITE
how to cite this article
Woods-Brown C, Swanson A, Cannings H et al. Scoping the use of AI to organise, synthesis and analyse a large scale community engagement and consultation exercise. [version 1; peer review: 1 approved]. NIHR Open Res 2025, 5:97 (https://doi.org/10.3310/nihropenres.14085.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 09 Oct 2025
Views
5
Cite
Reviewer Report 23 Oct 2025
MOHAMED BARODI, University Ibn Tofail Kenitra, Kenitra, Morocco 
Approved
VIEWS 5
This paper shows how artificial intelligence is, especially models like ChatGPT. I find it fascinating that AI can analyse interviews and surveys in minutes while maintaining a good level of accuracy compared to human analysis. Still, the authors remind us ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
BARODI M. Reviewer Report For: Scoping the use of AI to organise, synthesis and analyse a large scale community engagement and consultation exercise. [version 1; peer review: 1 approved]. NIHR Open Res 2025, 5:97 (https://doi.org/10.3310/nihropenres.15317.r37957)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 09 Oct 2025
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

Are you an NIHR-funded researcher?

If you are a previous or current NIHR award holder, sign up for information about developments, publishing and publications from NIHR Open Research.

You must provide your first name
You must provide your last name
You must provide a valid email address
You must provide an institution.

Thank you!

We'll keep you updated on any major new updates to NIHR Open Research

Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.