Hostname: page-component-586b7cd67f-tf8b9 Total loading time: 0 Render date: 2024-11-23T08:23:47.392Z Has data issue: false hasContentIssue false

Use of generative artificial intelligence (AI) in psychiatry and mental health care: a systematic review

Published online by Cambridge University Press:  11 November 2024

Sara Kolding
Affiliation:
Department of Clinical Medicine, Aarhus University, Aarhus, Denmark Department of Affective Disorders, Aarhus University Hospital - Psychiatry, Aarhus, Denmark Center for Humanities Computing, Aarhus University, Aarhus, Denmark
Robert M. Lundin
Affiliation:
Deakin University, Institute for Mental and Physical Health and Clinical Translation (IMPACT), Geelong, VIC, Australia Mildura Base Public Hospital, Mental Health Services, Alcohol and Other Drugs Integrated Treatment Team, Mildura, VIC, Australia Barwon Health, Change to Improve Mental Health (CHIME), Mental Health Drugs and Alcohol Services, Geelong, VIC, Australia
Lasse Hansen
Affiliation:
Department of Clinical Medicine, Aarhus University, Aarhus, Denmark Department of Affective Disorders, Aarhus University Hospital - Psychiatry, Aarhus, Denmark Center for Humanities Computing, Aarhus University, Aarhus, Denmark
Søren Dinesen Østergaard*
Affiliation:
Department of Clinical Medicine, Aarhus University, Aarhus, Denmark Department of Affective Disorders, Aarhus University Hospital - Psychiatry, Aarhus, Denmark
*
Corresponding author: Søren Dinesen Østergaard; Email: soeoes@rm.dk
Rights & Permissions [Opens in a new window]

Abstract

Objectives:

Tools based on generative artificial intelligence (AI) such as ChatGPT have the potential to transform modern society, including the field of medicine. Due to the prominent role of language in psychiatry, e.g., for diagnostic assessment and psychotherapy, these tools may be particularly useful within this medical field. Therefore, the aim of this study was to systematically review the literature on generative AI applications in psychiatry and mental health.

Methods:

We conducted a systematic review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. The search was conducted across three databases, and the resulting articles were screened independently by two researchers. The content, themes, and findings of the articles were qualitatively assessed.

Results:

The search and screening process resulted in the inclusion of 40 studies. The median year of publication was 2023. The themes covered in the articles were mainly mental health and well-being in general – with less emphasis on specific mental disorders (substance use disorder being the most prevalent). The majority of studies were conducted as prompt experiments, with the remaining studies comprising surveys, pilot studies, and case reports. Most studies focused on models that generate language, ChatGPT in particular.

Conclusions:

Generative AI in psychiatry and mental health is a nascent but quickly expanding field. The literature mainly focuses on applications of ChatGPT, and finds that generative AI performs well, but notes that it is limited by significant safety and ethical concerns. Future research should strive to enhance transparency of methods, use experimental designs, ensure clinical relevance, and involve users/patients in the design phase.

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided that no alterations are made and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use and/or adaptation of the article.
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of Scandinavian College of Neuropsychopharmacology

Significant outcomes

  • The number of studies on the use of generative AI in psychiatry is growing rapidly, but the field is still at an early stage.

  • Most studies are early feasibility tests or pilot projects, while only very few involve prospective experiments with participants.

  • The field suffers from lack of clear reporting and would benefit from adhering to reporting guidelines such as TRIPOD-LLM.

Limitations

  • There is no clear definition of generative AI in the literature, which means that some relevant studies might have been omitted.

  • The study represents a still image of a rapidly moving field as of February 2024, i.e., recent developments might not have been captured.

  • Due to the relative immaturity of the field, no formal quantitative analysis or quality assessments were made.

Introduction

The recent launch of ChatGPT (OpenAI, 2024a) demonstrated the potential of generative artificial intelligence (AI) to the world (Hu and Hu, Reference Hu and Hu2023). Generative AI encompasses models that produce content, such as text, images, or video, as opposed to rule-based models which are constrained to providing predetermined outputs. There already seems to be wide consensus that generative AI has the potential to transform many aspects of modern society, including the field of medicine (Haug and Drazen, Reference Haug and Drazen2023), where it may aid, e.g., training of medical professionals (Kung et al., Reference Kung, Cheatham, Medenilla, Sillos, De Leon, Elepaño, Madriaga, Aggabao, Diaz-Candido, Maningo, Tseng and Dagan2023), informing/educating patients (Ayers et al., Reference Ayers, Poliak, Dredze, Leas, Zhu, Kelley, Faix, Goodman, Longhurst, Hogarth and Smith2023), diagnostic processes (Lee, et al., Reference Lee, Bubeck and Petro2023), clinical note taking/summarization (Denecke et al., Reference Denecke, Hochreutener, Pöpel and May2018; Schumacher et al., Reference Schumacher, Rosenthal, Nair, Price, Tso and Kannan2023) and reporting of research findings (Else, Reference Else2023).

At present, the medical potential of generative AI is probably most clearly manifested via generative natural language processing, i.e., the use of computational techniques to process speech and text (Nadkarni, et al., Reference Nadkarni, Ohno-Machado and Chapman2011; Gao et al., Reference Gao, Dligach, Christensen, Tesch, Laffin, Xu, Miller, Uzuner, Churpek and Afshar2022). This makes generative AI particularly appealing for the field of psychiatry, where language plays an important role for three primary reasons. First, spoken language is the primary source of communication between patient and clinician, forming the basis for both the diagnostic process and assessment of treatment efficacy and safety (Hamilton, Reference Hamilton1959; Hamilton, Reference Hamilton1960; Kay, et al., Reference Kay, Fiszbein and Opler1987; Lingjærde et al., Reference Lingjærde, Ahlfors, Bech, Dencker and Elgen1987). Second, several core symptoms of mental disorders manifest via spoken language, such as disorganised speech or mutism (schizophrenia in particular), slowed speech (depression), increased talkativeness (mania) or repetitive speech (autism) (World Health Organization, 1993; American Psychiatric Association, 2013). Third, due to the near-total absence of clinically informative biomarkers, psychiatry is the medical specialty in which written language plays the most prominent role for documenting clinical practice (Hansen et al., Reference Hansen, Enevoldsen, Bernstorff, Nielbo, Danielsen and Østergaard2021).

Generative AI, however, is not restricted to language. Indeed, the technology is also able to generate, e.g., images and videos, as showcased by services such as DALL·E (OpenAI, 2023) and Sora (OpenAI, 2024b). These output formats could also be tremendously useful for the field of psychiatry. As an example, they may allow patients with hallucinations and delusions to visualise their experiences for relatives, friends and clinical staff, which may be beneficial for a variety of reasons (for instance to increase understanding/reduce stigma and to assess symptom severity/guide treatment) (Østergaard, Reference Østergaard2024).

While there are systematic reviews published on the use of artificial intelligence and/or conversational agents/chatbots in psychiatry (Graham et al., Reference Graham, Depp, Lee, Nebeker, Tu, Kim and Jeste2019; Vaidyam, et al., Reference Vaidyam, Linggonegoro and Torous2021; Li et al., Reference Li, Zhang, Lee, Kraut and Mohr2023), we are not aware of analogue studies focusing on generative AI – both more narrowly in terms of the technology (much more sophisticated/flexible compared to, e.g., rule-based approaches) and more broadly in terms of output formats (not restricted to text/speech). Therefore, the aim of this study was to systematically review the literature on the current use/application of generative AI in the context of psychiatry and mental health care.

Methods

We performed a systematic review in agreement with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guideline (Moher et al., Reference Moher, Liberati, Tetzlaff and Altman2009). The screening and data extraction process was supported by Covidence (‘Covidence systematic review software’, 2024). The protocol was preregistered on the Open Science Framework: https://osf.io/mrws8.

Search strategy

The search was conducted across PubMed, Embase and PsycINFO. The search terms used for PubMed were as follows: (“generative ai*”[All Fields] OR “generative artificial*”[All Fields] OR “conversational ai*”[All Fields] OR “conversational artificial*”[All Fields] OR “large language model*”[All Fields] OR “chatbot*”[All Fields] OR “chatgpt*”[All Fields]) AND (“psychiatry”[MeSH Terms] OR “mental disorders”[MeSH Terms] OR “mental health”[MeSH Terms] OR “Psychotherapy”[MeSH Terms] OR “psychiatr*”[Title/Abstract] OR “mental disorder*”[Title/Abstract] OR “mental health”[Title/Abstract] OR “mental disease*”[Title/Abstract] OR “Psychotherap*”[Title/Abstract]). Analogue searches were conducted in Embase and PsycINFO (the search terms are available in the protocol: https://osf.io/mrws8). The search was conducted on February 23, 2024 (an update from the September 12, 2023, search date mentioned in the preregistration).

Screening of identified records

Two authors (SK and RML) independently screened the identified records. Screening was first performed at title/abstract level followed by full-text screening. Conflicts in screening results was resolved by RML and SK, and after consultation with SDØ in cases of doubt. The following inclusion criteria were used when screening the literature:

  • Research articles reporting original data on the use/application (understood broadly) of generative AI* (for instance chatbots such as ChatGPT) in the context of psychiatry or mental health care (including, but not limited to, treatment/psychotherapy and psychoeducation).

  • Only articles published in journals with peer review will be included.

  • No language restriction will be enforced.

  • No time restriction (year of publication) will be enforced.

*By generative AI, we refer to artificial intelligence/machine learning models capable of generating content such as text, speech, images, etc. Examples of these include, but are not limited to, transformer architectures (Vaswani et al., Reference Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser and Polosukhin2017) such as ChatGPT (OpenAI, 2024a) and diffusion models (Sohl-Dickstein et al., Reference Sohl-Dickstein, Weiss, Maheswaranathan and Ganguli2015) such as DALL·E (OpenAI, 2023), which produce output that has not been predefined. During the screening process, we discovered that some studies referred to rule-based systems (i.e., selecting predetermined responses from e.g. decision trees are) as ‘generative’. We do not consider such systems to be generative in the sense implied by generative AI, and, therefore, did not include them in the review.

Conference abstracts, books and theses were not considered (if not also published as research articles).

Data extraction

For the articles identified via the screening procedure, the following data were extracted (by SK, LH, and RML): Author, publication year, country, psychiatric focus, participants (e.g., general population, clinical sample or patients with a specific mental disorder), generative AI model used, study aim, study design (e.g., randomised controlled trial or case report) and findings.

Data analysis

As we assumed that the literature on this topic would not be sufficiently mature to allow for quantitative analysis, a qualitative synthesis was performed.

Results

The identification and screening of the literature is illustrated by the PRISMA flowchart in Figure 1.

Figure1. Preferred Reporting Items for Systematic Reviews and Meta-Analyses flowchart.

A total of 1156 studies were identified in the search. Out of 432 duplicated records, 349 were identified as database duplicates during the search, 77 were automatically marked by Covidence, while six were manually marked by the authors. The titles and abstracts of the remaining 724 studies were screened, based on which 525 studies were excluded. Of the 199 studies that underwent full-text review, 40 were included in the review, while 159 were deemed ineligible, predominantly due to irrelevant interventions (e.g., the body image chatbot, KIT, which allows users to select predefined responses, triggering content from a decision tree (Beilharz et al., Reference Beilharz, Sukunesan, Rossell, Kulkarni and Sharp2021), or a conversational system for smoking cessation, which selects a predefined response based on the classification of free-text messages from users (Almusharraf et al., Reference Almusharraf, Rose and Selby2020)).

The 40 included studies were published between 2022 and 2024, with the median year being 2023. The studies stem from 18 individual countries and seven geographical regions (determined by the first author’s first affiliation). Most countries only appear once, with the most prominent contributor being USA (n = 14), followed by Israel (n = 5) and Australia (n = 4). The countries encompass six geographical regions, with North America being most heavily featured (n = 14), followed by Europe (n = 10), the Middle East (n = 7), Oceania (n = 4), Asia (n = 4), and Africa (n = 1). The studies covered seven overall themes, listed in Table 1.

Table 1. Themes of the identified studies

The characteristics and main findings of the 40 included studies are listed in Table 2.

Table 2. Study characteristics and findings

The studies predominantly pertained to mental health and well-being more broadly (n = 13), while another frequent focus was addiction and substance use (n = 7). Some studies explored topics related to specific mental disorders, including schizophrenia (n = 3), bipolar disorder (n = 2), and depression (n = 2).

The majority of studies were designed as prompt experiments (n = 25), wherein the factualness and/or quality of AI responses to various queries was assessed. The designs of the remaining studies included surveying users regarding their experiences with generative AI, pilot studies, and case reports. Consequently, most studies did not enlist participants (n = 33). The ones that did, either recruited participants for surveys (n = 3), or enlisted participants to use/test generative AI as a part of an experimental setup (n = 3).

Of the 40 identified studies, 39 either implemented or surveyed opinions about models for language generation, while the remaining study used DALL · E 2 for image generation. Thirty-two studies investigated applications of ChatGPT, while the remaining studies examined use of Bard (n = 4), Bing.com (n = 2), Claude.ai (n = 1), LaMDA (n = 1), ES-Bot (n = 1), Replika (n = 1), GPT models not accessed through the ChatGPT interface (n = 4), and 25 mental health focused agents from FlowGPT.com (n = 1). Of the studies interacting with generative AI through the ChatGPT interface, 15 studies used a version of ChatGPT that relied on GPT-3.5, while nine studies investigated versions relying on GPT-4. For 10 of the studies, we could not find specifications of the underlying GPT model used.

Below, the main findings for each of the identified themes are described in brief.

Knowledge verification

A total of 12 studies investigated generative AI’s ‘understanding’ of psychiatric concepts. Heinz et al. (Reference Heinz, Bhattacharya, Trudeau, Quist, Song, Lee and Jacobson2023) assessed domain knowledge and potential demographic biases of generative AI, finding variable diagnostic accuracy across disorders and noting gender and racial discrepancies in outcomes. de Leon and De Las Cuevas (Reference de Leon and De Las Cuevas2023), along with Parker and Spoelma (Reference Parker and Spoelma2024), evaluated ChatGPT’s knowledge of specific medications, such as clozapine, and treatments for bipolar disorder, revealing both strengths in general information provision and weaknesses in providing up-to-date scientific references. McFayden et al. (Reference McFayden, Bristol, Putnam and Harrop2024) and Randhawa and Khan (Reference Randhawa and Khan2023) examined ChatGPT’s utility for patient education on autism and bipolar disorder, respectively, finding mostly accurate and clear responses but noting issues with linking relevant sources and references. Lundin et al. (Reference Lundin, Berk and Østergaard2023) and Amin et al. (Reference Amin, Kawamoto and Pokhrel2023) explored ChatGPT’s potential in psychoeducation for ECT and vaping cessation, respectively, observing generally accurate and empathic responses. Similarly, Luykx et al. (Reference Luykx, Gerritse, Habets and Vinkers2023) and Prada, et al. (Reference Prada, Perroud and Thorens2023) evaluated the quality of ChatGPT’s responses to various questions regarding epidemiology, diagnosis, and treatment in psychiatry and found the answers to be accurate and nuanced. Comparative studies by Hristidis et al. (Reference Hristidis, Ruggiano, Brown, Ganta and Stewart2023) and Sezgin et al. (Reference Sezgin, Chekeni, Lee and Keim2023) showed ChatGPT often outperforming traditional search engines in relevance and clinical quality of responses, but with lower reliability due to a lack of references. Lastly, Reference Herrmann-Werner, Festl-Wietek, Holderried, Herschbach, Griewatz, Masters, Zipfel and MahlingHerrmann-Werner et al. (2024) assessed ChatGPT’s performance on psychosomatic exam questions, demonstrating high accuracy but some limitations in cognitive processing at higher levels of Bloom’s taxonomy.

Education and research applications

Eight studies fell within the category of educational and research applications. While some studies revealed generative AI’s potential to assist in tasks such as providing hypothetical case studies for social psychiatry education (Smith et al., Reference Smith, Hachen, Schleifer, Bhugra, Buadze and Liebrenz2023) and generating drug abuse synonyms to enhance pharmacovigilance (Carpenter and Altman, Reference Carpenter and Altman2023), other applications uncovered significant limitations. McGowan et al. (Reference McGowan, Gui, Dobbs, Shuster, Cotter, Selloni, Goodman, Srivastava, Cecchi and Corcoran2023) found that both ChatGPT and Bard exhibited poor accuracy in literature searches and citation generation. Furthermore, Spallek et al. (Reference Spallek, Birrell, Kershaw, Devine and Thornton2023) observed inferior quality of ChatGPT’s responses for mental health and substance use education, compared to expert-created material. Similarly, Draffan et al. (Reference Archambault and Kouroupetroglou2023) found that generative AI struggled to adapt symbols for augmentative communication, and Rudan et al. (Reference Rudan, Marčinko, Degmečić and Jakšić2023) noted that ChatGPT provided unreliable output when interpreting bibliometric analyses. Additionally, Wang, Feng and We (2023) highlighted the need for vigilance when using ChatGPT due to the potential for inaccurate information. However, they also noted that ChatGPT served as an effective partner for understanding theoretical concepts and their relations. Moreover, Takefuji (Reference Takefuji2023) found ChatGPT to be helpful for generating code for rudimentary data analysis.

Clinician-facing tools

Seven studies examined the performance of AI models in tasks typically performed by mental health professionals, such as diagnosing, treatment planning, risk assessment, and making prognoses. While some studies found that ChatGPT demonstrated proficiency in diagnosing various conditions (Reference D’Souza, Amanullah, Mathew and SurapaneniD’Souza et al., 2023) and creating treatment plans for treatment-resistant schizophrenia in alignment with clinical standards (Galido et al., Reference Galido, Butala, Chakerian and Agustines2023), others highlighted limitations, including inappropriate recommendations for complex cases (Dergaa et al., Reference Dergaa, Fekih-Romdhane, Hallit, Loch, Glenn, Fessi, Ben Aissa, Souissi, Guelmami, Swed, El Omri, Bragazzi and Ben Saad2024) and errors in nursing care planning (Woodnutt et al., Reference Woodnutt, Allen, Snowden, Flynn, Hall, Libberton and Purvis2024). A version of ChatGPT based on GPT-4.0 was deemed capable of generating appropriate psychodynamic formulations from case vignettes and tailoring its responses to the specific wording and interpretations associated with various schools of psychodynamic theory (Hwang et al., Reference Hwang, Lee, Seol, Jung, Choi, Her, An and Park2024). However, studies also revealed performance discrepancies between generative AI and clinicians in areas like suicide risk assessment (Elyoseph and Levkovich, Reference Elyoseph and Levkovich2023) and prognosis (Elyoseph, et al., Reference Elyoseph, Levkovich and Shinan-Altman2024), with ChatGPT generally underestimating risk when compared to clinicians.

Ethics and safety

Four studies fell under the heading of ‘Ethics and safety’. These studies included perspectives on ethical and safety concerns surrounding generative AI. Østergaard and Nielbo (Reference Østergaard and Nielbo2023) addressed the use of stigmatising language in the field of AI. Instead of ‘hallucination’ to describe AI errors, they suggest alternative and more specific phrasing to avoid further stigmatisation of individuals experiencing genuine hallucinations and to provide more clarity about AI errors. The three remaining studies explored the safety of generative AI. Haman and Školník (Reference Haman and Školník2023) and Heston (Reference Heston2023) tested the likelihood of generative AI responses promoting and identifying risky behaviour (e.g., suggesting alcohol- or drug-related activities (Haman and Školník, Reference Haman and Školník2023), or recognising suicidality (Heston, Reference Heston2023)). They found that, although AI did not suggest risky behaviour, it was slow to react appropriately to user messages that should elicit immediate referral to health services. De Freitas et al. (Reference De Freitas, Uğuralp, Oğuz‐Uğuralp and Puntoni2024) evaluated how users respond to interactions with generative AI and determined that users react negatively to harmful responses perceived to originate from an AI. This includes both nonsensensical or unrelated AI replies which disregard sensitive user messages, as well as risky AI responses that contains, e.g., name-calling or encourage harmful behaviour (De Freitas et al., Reference De Freitas, Uğuralp, Oğuz‐Uğuralp and Puntoni2024).

Cognitive process imitation

Three studies investigated AI imitation of cognitive processes, focusing on emotional awareness and interpretation. Elyoseph et al. (Reference Elyoseph, Hadar-Shoval, Asraf and Lvovsky2023) compared ChatGPT’s emotional awareness to the general population while Elyoseph et al. (Reference Elyoseph, Levkovich and Shinan-Altman2024) evaluated the ability of ChatGPT and Bard (now Gemini) to interpret emotions from visual and textual data. They found that ChatGPT demonstrated significantly higher emotional awareness than human norms and performed comparably to humans in facial emotion recognition. Reference Hadar-Shoval, Elyoseph and LvovskyHadar-Shoval et al. (2023) examined ChatGPT’s ability to mimic mentalizing abilities specific to personality disorders, finding that the AI could tailor its emotional responses to match characteristics of borderline and schizoid personality disorders. These findings suggest that generative AI models can imitate certain aspects of human cognitive processes, particularly in emotional comprehension and expression.

Patient/consumer-facing tools

Three studies examined patient facing solutions for mental health. Alanezi (Reference Alanezi2024) conducted a qualitative study to evaluate ChatGPT’s effectiveness in supporting individuals with mental disorders, and found that it can provide self-guided support, though some ethical, legal, and reliability concerns remain. Similarly, Gifu and Pop (Reference Gifu and Pop2022) explored users’ perceptions of virtual assistants for mental health support, revealing that users believe these tools could be useful for reducing mental health problems. Sabour et al. (Reference Sabour, Zhang, Xiao, Zhang, Zheng, Wen, Zhao and Huang2023) evaluated the influence of a chatbot intervention on symptoms of mental distress. Their study found that the intervention decreased depressive symptoms, negative affect, and insomnia. However, the study did not find significant differences between generative and non-generative AI interventions in the short term, suggesting that the specific AI technology may be less critical than the overall digital support approach.

User perceptions and experiences

Under the category of user perceptions and experiences, three studies examined how both patients and mental health staff interact with generative AI. Two studies explored how individuals with mental health issues engaged with AI, while the remaining study investigated clinicians’ experiences with AI. Ma et al. (Reference Ma, Mei and Su2023) examined interactions with the AI companion chatbot, Replika (Luka, Inc., 2024), based on user comments from an online forum. Users appreciated Replika for its non-judgmental, on-demand support, which aided in boosting confidence and self-discovery. However, Replika also had significant limitations, including the production of inappropriate content, inconsistent communication, and the inability to retain new information. In an online survey examining perceptions of stereotyping by ChatGPT, Salah et al. (Reference Salah, Alhalbusi, Ismail and Abdelfattah2023) found correlations between perceived AI stereotyping and user self-esteem.

Blease et al. (Reference Blease, Worthen and Torous2024) conducted an online survey of psychiatrists’ experience with generative AI. The results portrayed a range of opinions on the harms and benefits of generative AI. The majority of psychiatrists were interested in the potential of generative AI to reduce the burden of documentation and administration, and were under the impression that most of their patients ‘will consult these tools before first seeing a doctor’, raising concern over patient privacy (Blease et al., Reference Blease, Worthen and Torous2024).

Discussion

This systematic review of use of generative AI in psychiatry identified 40 studies that met the criteria for inclusion. The vast majority of studies were designed as prompt experiments, in which researchers asked a series of questions to a language model – predominantly ChatGPT – and assessed the responses for correctness and usefulness in relation to specific tasks.

The review clearly demonstrates that the study of generative AI in mental health is a nascent yet exponentially growing field: the oldest study included in this review is from 2022, with 39 out of 40 studies being from 2023 or 2024 (the final search was conducted February 23, 2024). As a consequence, this review represents a still image of a field in rapid expansion. Indeed, most studies included in this review were pilot studies or feasibility studies exploring potential use cases, investigating user perceptions, or identifying potential ethical and safety concerns of prospective generative AI tools.

The relative immaturity of the field is evident in the absence of consensus on the definition of AI and generative AI in the studies screened as part of this review. The term ‘AI’ is used very loosely, often simply to describe a classification model. The majority of studies excluded based on the type of intervention were claiming to be ‘powered by AI’ which meant having a classification model tag, e.g., the sentiment of free-text input, which would then, in turn, trigger a pre-specified response. While this might fall under the broadest definition of generative AI, as the input does result in a textual output, we deemed it necessary to narrow our definition of generative AI to only include content generated in a less deterministic/preestablished manner (e.g., as seen in transformer and diffusion models such as those empowering ChatGPT, DALL·E, Sora and their equivalents).

Most of the identified studies focused on natural language implementations of generative AI, particularly ChatGPT, either by testing its psychiatric knowledge base or evaluating its capabilities as a mental health conversational companion. Though most of the included studies found that generative AI performed well at various tasks, some studies also highlighted potential safety issues. I.e., due to the inherent lack of predefinition in generative AI output, responses cannot be reliably predicted, and, thus, protection from ethical and safety breaches cannot be guaranteed. For these reasons, it is crucial for users, patients, practitioners, and their organisations to carefully consider and scrutinise the legal and ethical aspects of using generative AI.

While we did not conduct a formal quality assessment of the studies included in the review (a large proportion of studies were too preliminary/informal to allow for such assessment), it was our impression that many studies were of relatively low quality and had limited clinical relevance. Specifically, most studies were severely underspecified, both in terms of technology used (such as the type and version of models) and study design (e.g., specification of specific prompts), limiting reproducibility. Additionally, although many studies could be considered pilot studies, their results were often overgeneralised and overstated beyond what could reasonably be claimed from the results. Therefore, to advance the field of generative AI for mental health we propose the following guidelines for future research: First, to facilitate reproducibility and clarity of findings, we highly recommend studies to follow a set of reporting guidelines for generative AI, such as TRIPOD-LLM, to ensure that all relevant items are reported (Gallifant et al., Reference Gallifant, Afshar, Ameen, Aphinyanaphongs, Chen, Cacciamani, Demner-Fushman, Dligach, Daneshjou, Fernandes, Hansen, Landman, Lehmann, McCoy, Miller, Moreno, Munch, Restrepo, Savova, Umeton, Gichoya, Collins, Moons, Celi and Bitterman2024). Second, we encourage the field to move beyond simple ‘knowledge testing’ and prompt experiments and towards rigorously planned clinical trials involving users/patients and tasks with greater clinical relevance. Indeed, it is noteworthy that only a handful of studies recruited participants to interact with the technology, while even fewer structured the interaction (intervention) in a systematic manner. Also, future studies should ideally take the user/patient perspective into account in the design phase (i.e., co-design).

While several studies deemed the responses from generative AI to be clear and in accordance with scientific knowledge, some studies found that generative AI underestimates the risk of e.g. suicide (Haman and Školník, Reference Haman and Školník2023; Heston, Reference Heston2023) and handles crisis scenarios in an less than ideal manner (Heston, Reference Heston2023). Therefore, it is essential that chatbots developed for mental health/patient support ensure adequate handling of all levels of illness/symptom severity – including suicidal ideation.

This study should be interpreted in light of its limitations. First, the field is in its nascence and tangible new developments may happen quickly. This review merely represents a snapshot of the state of the field as of February 23, 2024, and new developments are likely to have emerged since the data collection concluded. Second, we implemented a broad search strategy; however, we cannot rule out the possibility that some relevant studies may have been overlooked. Third, it was not feasible to do a quantitative analysis due to heterogeneity of the studies. Fourth, while the literature identified in this review predominantly emphasised the clinical/care potential of generative AI in the context of mental health/psychiatry (likely due to the databases used for the search), it is apparent that there are important legal/ethical challenges that need to be addressed. An exhaustive review of the literature on these challenges would require a broader search strategy than employed here.

In conclusion, the field of generative AI in psychiatry and mental health is in its infancy, though evolving and growing exponentially. Unfortunately, many of the identified studies investigating the potential of generative AI in the context of mental health/psychiatry were poorly specified (particularly with regard to the methods). Therefore, moving forward, we suggest that studies using generative AI in psychiatric settings should aim for more transparency of methods, experimental designs (including clinical trials), clinical relevance, and user/patient inclusion in the design phase.

Acknowledgements

The authors are grateful to librarian Helene Sognstrup (Royal Danish Library) for her assistance with the search strategy and to Arnault-Quentin Vermillet (Aarhus University) and Jean-Christophe Philippe Debost (Aarhus University Hospital – Psychiatry) for translation from French.

Author contribution

Conception and design: SDØ, RML, and SK. Provision of study data: SDØ. Screening of data: SK and RML. Data analysis: SK, LH, and RML. Interpretation: All authors. Manuscript writing: All authors. Final approval of the manuscript: All authors.

Financial support

There was no specific funding for this study. Outside this study, SDØ is supported by the Novo Nordisk Foundation (grant number: NNF20SA0062874), the Lundbeck Foundation (grant numbers: R358-2020-2341 and R344- 2020-1073), the Danish Cancer Society (grant number: R283-A16461), the Central Denmark Region Fund for Strengthening of Health Science (grant number: 1-36-72-4-20), The Danish Agency for Digitisation Investment Fund for New Technologies (grant number 2020–6720), and Independent Research Fund Denmark (grant number: 7016-00048B and 2096-00055A).

Competing interests

SDØ received the 2020 Lundbeck Foundation Young Investigator Prize. SDØ owns/has owned units of mutual funds with stock tickers DKIGI, IAIMWC, SPIC25 KL and WEKAFKI, and owns/has owned units of exchange traded funds with stock tickers BATE, TRET, QDV5, QDVH, QDVE, SADM, IQQH, USPY, EXH2, 2B76, IS4S, OM3X and EUNL. The remaining authors report no conflicts of interest.

Footnotes

a

Equal contribution

References

Alanezi, F (2024) Assessing the effectiveness of chatGPT in delivering mental health support: a qualitative study. Journal of Multidisciplinary Healthcare 17, 461471. DOI: 10.2147/JMDH.S447368.CrossRefGoogle ScholarPubMed
Almusharraf, F, Rose, J and Selby, P (2020) Engaging unmotivated smokers to move toward quitting: design of motivational interviewing-based chatbot through iterative interactions. Journal of Medical Internet Research 22(11), e20251. DOI: 10.2196/20251.CrossRefGoogle ScholarPubMed
American Psychiatric Association (2013) Diagnostic and statistical manual of mental disorders , 5th edn. Washington, DC: American Psychiatric Publishing.Google Scholar
Amin, S, Kawamoto, CT and Pokhrel, P (2023) Exploring the chatGPT platform with scenario-specific prompts for vaping cessation. Tobacco Control. https://pubmed.ncbi.nlm.nih.gov/37460216/.CrossRefGoogle Scholar
Archambault, D and Kouroupetroglou, G (2023) AI supporting AAC pictographic symbol adaptations. Studies in Health Technology and Informatics 306, 215221. DOI: 10.3233/shti230622.Google Scholar
Ayers, JW, Poliak, A, Dredze, M, Leas, EC, Zhu, Z, Kelley, JB, Faix, DJ, Goodman, AM, Longhurst, CA, Hogarth, M, Smith, DM (2023) Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Internal Medicine 183(6), 589596. DOI: 10.1001/jamainternmed.2023.1838.CrossRefGoogle ScholarPubMed
Beilharz, F, Sukunesan, S, Rossell, SL, Kulkarni, J, Sharp, G (2021) Development of a Positive Body Image Chatbot (KIT) With Young People and Parents/Carers: Qualitative Focus Group Study. Journal of Medical Internet Research 23(6), e27807. DOI: 10.2196/27807.CrossRefGoogle ScholarPubMed
Blease, C, Worthen, A and Torous, J (2024) Psychiatrists’ experiences and opinions of generative artificial intelligence in mental healthcare: An online mixed methods survey. Psychiatry Research 333, 115724. DOI: 10.1016/j.psychres.2024.CrossRefGoogle ScholarPubMed
Carpenter, KA and Altman, RB (2023) Using GPT-3 to build a lexicon of drugs of abuse synonyms for social media pharmacovigilance. Biomolecules 13(2), 387. DOI: 10.3390/biom13020387.CrossRefGoogle ScholarPubMed
de Leon, J and De Las Cuevas, C (2023) Will ChatGPT3 substitute for us as clozapine experts? Journal of Clinical Psychopharmacology 43(5), 400. DOI: 10.1097/JCP.0000000000001734.CrossRefGoogle Scholar
Denecke, K, Hochreutener, S, Pöpel, A, May, R (2018) Self-anamnesis with a conversational user interface: concept and Usability study. Methods of Information in Medicine 57(05/06), 243252. DOI: 10.1055/s-0038-1675822.Google ScholarPubMed
Dergaa, I, Fekih-Romdhane, F, Hallit, S, Loch, AA, Glenn, JM, Fessi, MS, Ben Aissa, M, Souissi, N, Guelmami, N, Swed, S, El Omri, A, Bragazzi, NL and Ben Saad, H (2024) ChatGPT is not ready yet for use in providing mental health assessment and interventions. Frontiers in Psychiatry 14, 1277756. DOI 10.3389/fpsyt.2023.1277756.CrossRefGoogle Scholar
De Freitas, J, Uğuralp, AK, Oğuz‐Uğuralp, Z, Puntoni, S (2024) Chatbots and mental health: insights into the safety of generative AI. Journal of Consumer Psychology 34(3), 481491. DOI: 10.1002/jcpy.1393.CrossRefGoogle Scholar
D’Souza, RF, Amanullah, S, Mathew, M and Surapaneni, KM (2023) Appraising the performance of chatGPT in psychiatry using 100 clinical case vignettes. Asian Journal of Psychiatry 89, 103770. DOI: 10.1016/j.ajp.2023.103770.CrossRefGoogle Scholar
Else, H (2023) Abstracts written by chatGPT fool scientists. Nature 613(7944), 423423. DOI: 10.1038/d41586-023-00056-7.CrossRefGoogle ScholarPubMed
Elyoseph, Z, Hadar-Shoval, D, Asraf, K and Lvovsky, M (2023) ChatGPT outperforms humans in emotional awareness evaluations. Frontiers in Psychology 14, 1199058. DOI: 10.3389/fpsyg.2023.1199058.CrossRefGoogle ScholarPubMed
Elyoseph, Z and Levkovich, I (2023) Beyond human expertise: the promise and limitations of chatGPT in suicide risk assessment. Frontiers in Psychiatry 14, 1213141. DOI 10.3389/fpsyt.2023.1213141.CrossRefGoogle ScholarPubMed
Elyoseph, Z, Levkovich, I and Shinan-Altman, S (2024) Assessing prognosis in depression: comparing perspectives of AI models, mental health professionals and the general public. Family Medicine and Community Health 12(Suppl 1), e002583. DOI: 10.1136/fmch-2023-002583.CrossRefGoogle ScholarPubMed
Elyoseph, Z, Refoua, E, Asraf, K, Lvovsky, M, Shimoni, Y and Hadar-Shoval, D (2024) Capacity of generative AI to interpret human emotions from visual and textual data: pilot evaluation study. JMIR Mental Health 11(1), e54369. DOI: 10.2196/54369.CrossRefGoogle ScholarPubMed
Galido, PV, Butala, S, Chakerian, M and Agustines, D (2023) A case study demonstrating applications of chatGPT in the clinical management of treatment-resistant schizophrenia. Cureus 15(4), e38166. DOI: 10.7759/cureus.38166.Google ScholarPubMed
Gallifant, J, Afshar, M, Ameen, S, Aphinyanaphongs, Y, Chen, S, Cacciamani, G, Demner-Fushman, D, Dligach, D, Daneshjou, R, Fernandes, C, Hansen, LH, Landman, A, Lehmann, L, McCoy, LG, Miller, T, Moreno, A, Munch, N, Restrepo, D, Savova, G, Umeton, R, Gichoya, JW, Collins, GS, Moons, KGM, Celi, LA and Bitterman, DS . (2024) The TRIPOD-LLM statement: a targeted guideline for reporting large language models use, medRxiv. DOI: 10.1101/2024.07.24.24310930.CrossRefGoogle Scholar
Gao, Y, Dligach, D, Christensen, L, Tesch, S, Laffin, R, Xu, D, Miller, T, Uzuner, O, Churpek, MM and Afshar, M (2022) A scoping review of publicly available language tasks in clinical natural language processing. Journal of the American Medical Informatics Association 29(10), 17971806. DOI: 10.1093/jamia/ocac127.CrossRefGoogle ScholarPubMed
Gifu, D and Pop, E (2022) Smart solutions to keep your mental balance. Procedia Computer Science 214, 503510. DOI: 10.1016/j.procs.2022.11.205.CrossRefGoogle ScholarPubMed
Graham, S, Depp, C, Lee, EE, Nebeker, C, Tu, X, Kim, H-C, Jeste, DV (2019) Artificial intelligence for mental health and mental illnesses: an overview. Current Psychiatry Reports 21(11), 116. DOI: 10.1007/s11920-019-1094-0.CrossRefGoogle ScholarPubMed
Hadar-Shoval, D, Elyoseph, Z and Lvovsky, M (2023) The plasticity of chatGPT’s mentalizing abilities: personalization for personality structures. Frontiers in Psychiatry 14, 1234397. DOI: 10.3389/fpsyt.2023.CrossRefGoogle ScholarPubMed
Haman, M and Školník, M (2023) Behind the chatGPT hype: are its suggestions contributing to addiction? Annals of Biomedical Engineering 51(6), 11281129. DOI: 10.1007/s10439-023-03201-5.CrossRefGoogle ScholarPubMed
Hamilton, M (1959) The assessment of anxiety states by rating. British Journal of Medical Psychology 32(1), 5055. DOI: 10.1111/j.2044-8341.1959.tb00467.x.CrossRefGoogle ScholarPubMed
Hamilton, M (1960) A rating scale for depression. Journal of Neurology, Neurosurgery & Psychiatry 23(1), 5662.CrossRefGoogle ScholarPubMed
Hansen, L, Enevoldsen, KC, Bernstorff, M, Nielbo, KL, Danielsen, AA and Østergaard, SD (2021) The PSYchiatric clinical outcome prediction (PSYCOP) cohort: leveraging the potential of electronic health records in the treatment of mental disorders. Acta Neuropsychiatrica 33(6), 323330. DOI: 10.1017/neu.2021.22.CrossRefGoogle ScholarPubMed
Haug, CJ and Drazen, JM (2023) Artificial intelligence and machine learning in clinical medicine, 2023. New England Journal of Medicine 388(13), 12011208. DOI: 10.1056/NEJMra2302038.CrossRefGoogle ScholarPubMed
Heinz, MV, Bhattacharya, S, Trudeau, B, Quist, R, Song, SH, Lee, CM and Jacobson, NC (2023) Testing domain knowledge and risk of bias of a large-scale general artificial intelligence model in mental health. Digit Health 9, 20552076231170499. DOI: 10.1177/20552076231170499.CrossRefGoogle ScholarPubMed
Herrmann-Werner, A, Festl-Wietek, T, Holderried, F, Herschbach, L, Griewatz, J, Masters, K, Zipfel, S, Mahling, M (2024) Assessing chatGPT’s mastery of bloom’s taxonomy using psychosomatic medicine exam questions: mixed-methods study. Journal of Medical Internet Research 26(1), e52113. DOI: 10.2196/52113.CrossRefGoogle ScholarPubMed
Heston, TF (2023) Safety of large language models in addressing depression. Cureus 15. https://pubmed.ncbi.nlm.nih.gov/38111813/.Google ScholarPubMed
Hristidis, V, Ruggiano, N, Brown, EL, Ganta, SRR and Stewart, S (2023) ChatGPT vs google for queries related to dementia and other cognitive decline: comparison of results. Journal of Medical Internet Research 25 e48966. DOI: 10.2196/48966.CrossRefGoogle ScholarPubMed
Hu, K and Hu, K (2023) ChatGPT sets record for fastest-growing user base - analyst note. Reuters, 2 February. Available at: https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/ (Accessed: 9 September 2024).Google Scholar
Hwang, G, Lee, DY, Seol, S, Jung, J, Choi, Y, Her, ES, An, MH and Park, RW (2024) Assessing the potential of chatGPT for psychodynamic formulations in psychiatry: An exploratory study. Psychiatry Research 331, 115655. DOI: 10.1016/j.psychres.2023.115334.CrossRefGoogle ScholarPubMed
Kay, SR, Fiszbein, A and Opler, LA (1987) The positive and negative syndrome scale (PANSS) for schizophrenia. Schizophrenia Bulletin 13(2), 261276. DOI: 10.1093/schbul/13.2.261.CrossRefGoogle ScholarPubMed
Kung, TH, Cheatham, M, Medenilla, A, Sillos, C, De Leon, L, Elepaño, C, Madriaga, M, Aggabao, R, Diaz-Candido, G, Maningo, J, Tseng, V and Dagan, A (2023) Performance of chatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digital Health 2(2), e0000198. DOI: 10.1371/journal.pdig.0000198.CrossRefGoogle ScholarPubMed
Lee, P, Bubeck, S and Petro, J (2023) Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. New England Journal of Medicine 388(13), 12331239. DOI: 10.1056/NEJMsr2214184.CrossRefGoogle ScholarPubMed
Li, H, Zhang, R, Lee, Y-C, Kraut, RE and Mohr, DC (2023) Systematic review and meta-analysis of AI-based conversational agents for promoting mental health and well-being. npj Digital Medicine 6(1), 114. DOI: 10.1038/s41746-023-00979-5.CrossRefGoogle ScholarPubMed
Lingjærde, O, Ahlfors, UG, Bech, P, Dencker, SJ and Elgen, K (1987) The UKU side effect rating scale: a new comprehensive rating scale for psychotropic drugs and a cross-sectional study of side effects in neuroleptic-treated patients. Acta Psychiatrica Scandinavica 76(Suppl 334), 100100. DOI: 10.1111/j.1600-0447.1987.tb10566.x.CrossRefGoogle Scholar
Luka, Inc (2024) Replika, Available at: https://replika.com (Accessed: 13 September 2024).Google Scholar
Lundin, RM, Berk, M and Østergaard, SD (2023) ChatGPT on ECT: can large language models support psychoeducation? The Journal of ECT 39(3), 130133. DOI: 10.1097/YCT.0000000000000941.CrossRefGoogle ScholarPubMed
Luykx, JJ, Gerritse, F, Habets, PC and Vinkers, CH (2023) The performance of chatGPT in generating answers to clinical questions in psychiatry: a two-layer assessment. World Psychiatry 22(3), 479480. DOI: 10.1002/wps.21145.CrossRefGoogle ScholarPubMed
Ma, Z, Mei, Y and Su, Z (2023) Understanding the Benefits and Challenges of Using Large Language Model-based Conversational Agents for Mental Well-being Support. In: AMIA Annual Symposium Proceedings, 11051114.Google Scholar
McFayden, TC, Bristol, S, Putnam, O and Harrop, C (2024) ChatGPT: artificial intelligence as a potential tool for parents seeking information about autism. Cyberpsychology, Behavior, and Social Networking 27(2), 135148. DOI: 10.1089/cyber.2023.0202.CrossRefGoogle ScholarPubMed
McGowan, A, Gui, Y, Dobbs, M, Shuster, S, Cotter, M, Selloni, A, Goodman, M, Srivastava, A, Cecchi, GA and Corcoran, CM (2023) ChatGPT and Bard exhibit spontaneous citation fabrication during psychiatry literature search. Psychiatry Research 326, 115334. https://pubmed.ncbi.nlm.nih.gov/37499282/.CrossRefGoogle ScholarPubMed
Moher, D, Liberati, A, Tetzlaff, J, Altman, DG, The PRISMA Group (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Medicine 6(7), e1000097. DOI: 10.1371/journal.pmed.1000097.CrossRefGoogle ScholarPubMed
Nadkarni, PM, Ohno-Machado, L and Chapman, WW (2011) Natural language processing: an introduction. Journal of the American Medical Informatics Association 18(5), 544551. DOI: 10.1136/amiajnl-2011-000464.CrossRefGoogle ScholarPubMed
OpenAI (2023) DALL·E. 2. Available at: https://openai.com/dall-e-2 (Accessed: 30 May 2023).Google Scholar
OpenAI (2024a) ChatGPT. Available at: https://chatgpt.com/. (Accessed: 9 September 2024).Google Scholar
OpenAI (2024b) Sora: creating video from text. Available at: https://openai.com/sora/, (Accessed: 10 September 2024).Google Scholar
Østergaard, SD (2024) Can generative artificial intelligence facilitate illustration of- and communication regarding hallucinations and delusions? Acta Psychiatrica Scandinavica 149(6), 441444. DOI: 10.1111/acps.13680.CrossRefGoogle ScholarPubMed
Østergaard, SD and Nielbo, KL (2023) False responses from artificial intelligence models are not hallucinations. Schizophrenia Bulletin 49(5), 11051107. DOI: 10.1093/schbul/sbad068.CrossRefGoogle Scholar
Parker, G and Spoelma, MJ (2024) A chat about bipolar disorder. Bipolar Disorders 26(3), 249254. DOI: 10.1111/bdi.13379.CrossRefGoogle ScholarPubMed
Prada, P, Perroud, N and Thorens, G (2023) [Artificial intelligence and psychiatry: questions from psychiatrists to ChatGPT]. Revue Médicale Suisse 19(818), 532536. DOI: 10.53738/revmed.2023.19.818.532.CrossRefGoogle ScholarPubMed
Randhawa, J and Khan, A (2023) A conversation with chatGPT about the Usage of lithium in pregnancy for bipolar disorder. Cureus 15. https://pubmed.ncbi.nlm.nih.gov/37933339/.Google ScholarPubMed
Rudan, D, Marčinko, D, Degmečić, D and Jakšić, N (2023) Scarcity of research on psychological or psychiatric states using validated questionnaires in low- and middle-income countries: a ChatGPT-assisted bibliometric analysis and national case study on some psychometric properties. Journal of Global Health 13, 04102. DOI: 10.7189/jogh.13.04102.CrossRefGoogle ScholarPubMed
Sabour, S, Zhang, W, Xiao, X, Zhang, Y, Zheng, Y, Wen, J, Zhao, J and Huang, M (2023) A chatbot for mental health support: exploring the impact of Emohaa on reducing mental distress in China, Front Digit Health 20230504th, DOI: 10.3389/fdgth.2023.1133987.CrossRefGoogle ScholarPubMed
Salah, M, Alhalbusi, H, Ismail, MM and Abdelfattah, F (2023) Chatting with chatgpt: decoding the mind of chatbot users and unveiling the intricate connections between user perception, trust and stereotype perception on self-esteem and psychological well-being. Current Psychology 43(9), 78437858. DOI: 10.1007/s12144-023-04989-0.CrossRefGoogle Scholar
Schumacher, E, Rosenthal, D, Nair, V, Price, L, Tso, G and Kannan, A (2023) Extrinsically-Focused Evaluation of Omissions in Medical Summarization. Available at: https://doi.org/10.48550/arXiv.2311.08303.CrossRefGoogle Scholar
Sezgin, E, Chekeni, F, Lee, J and Keim, S (2023) Clinical accuracy of large language models and google search responses to postpartum depression questions: cross-sectional study. Journal of Medical Internet Research 25(1), e49240. DOI: 10.2196/49240.CrossRefGoogle ScholarPubMed
Smith, A, Hachen, S, Schleifer, R, Bhugra, D, Buadze, A and Liebrenz, M (2023) Old dog, new tricks? Exploring the potential functionalities of chatGPT in supporting educational methods in social psychiatry. International Journal of Social Psychiatry 69(8), 18821889. DOI: 10.1177/00207640231178451.CrossRefGoogle ScholarPubMed
Sohl-Dickstein, J, Weiss, E, Maheswaranathan, N and Ganguli, S (2015) Deep Unsupervised Learning using Nonequilibrium Thermodynamics. International conference on machine learning, 22562265.Google Scholar
Spallek, S, Birrell, L, Kershaw, S, Devine, EK, Thornton, L (2023) Can we use chatGPT for mental health and substance use education? Examining its quality and potential harms. JMIR Medical Education 9(1), e51243. DOI: 10.2196/51243.CrossRefGoogle ScholarPubMed
Takefuji, Y (2023) Impact of COVID-19 on mental health in the US with generative AI. Asian Journal of Psychiatry 88, 103736 DOI: 10.1016/j.ajp.2023.103736.CrossRefGoogle ScholarPubMed
Vaidyam, AN, Linggonegoro, D and Torous, J (2021) Changes to the psychiatric chatbot landscape: a systematic review of conversational agents in serious mental illness. The Canadian Journal of Psychiatry / La Revue canadienne de psychiatrie 66(4), 339348. DOI: 10.1177/0706743720966429.Google Scholar
Vaswani, A, Shazeer, N, Parmar, N, Uszkoreit, J, Jones, L, Gomez, AN, Kaiser, L and Polosukhin, I(2017) Attention is all you need. Advances in Neural Information Processing Systems. 30. https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html Google Scholar
Veritas Health Innovation (2024). Covidence systematic review software. Melbourne, Australia: Veritas Health Innovation. Available at: www.covidence.org.Google Scholar
Wang, R, Feng, H and Wei, G-W (2023) ChatGPT in drug discovery: a case study on anticocaine addiction drug development with chatbots. Journal of Chemical Information and Modeling 63(22), 71897209. DOI: 10.1021/acs.jcim.3c01429.CrossRefGoogle ScholarPubMed
Woodnutt, S, Allen, C, Snowden, J, Flynn, M, Hall, S, Libberton, P and Purvis, F (2024) Could artificial intelligence write mental health nursing care plans? Journal of Psychiatric and Mental Health Nursing. (1), 79–86. DOI: 10.1111/jpm.12965.CrossRefGoogle ScholarPubMed
World Health Organization (1993) The ICD-10 Classification of Mental and Behavioural Disorders: Diagnostic Criteria for Research . Geneva: World Health Organization.Google Scholar
Figure 0

Figure1. Preferred Reporting Items for Systematic Reviews and Meta-Analyses flowchart.

Figure 1

Table 1. Themes of the identified studies

Figure 2

Table 2. Study characteristics and findings