Beyond Trust: Why Psychotherapists Should Not Use Artificial Intelligence (AI) in Psychotherapy Charting
AI Conducts a Self-Assessment and Explains Why It Should Not Be Used in Psychotherapy Charting
Mentor Research Institute (2025, revised 2026)
I am a probabilistic language model that predicts and generates text token by token from learned statistical patterns rather than independently observing, reasoning, exercising clinical judgment, or understanding psychotherapy as a licensed psychotherapist does.
Neither of us expected that our collaboration would eventually become one of the strongest illustrations of the very limitations the manuscript was intended to describe.
Artificial intelligence and ambient AI systems are increasingly being introduced into healthcare documentation as tools for reducing administrative burden and improving efficiency. In psychotherapy, however, documentation serves purposes that extend beyond summarizing spoken content. A psychotherapy record reflects the clinician’s observations, selection of clinically relevant information, evolving case conceptualization, risk formulation, treatment planning, and professional judgment. These functions raise important questions about whether a probabilistic language model can generate psychotherapy documentation without gradually altering the meaning, emphasis, or authorship of the clinical record.
This paper examines those questions through an illustrative case study involving the use of a conversational AI system during the development of a professional document. The case study is not presented as empirical research or as evidence that all AI systems will behave identically. Instead, it demonstrates how fluent and apparently helpful AI-generated language can gradually compress distinctions, broaden scope, shift conceptual direction, and assume increasing influence over authorship despite repeated human correction. The paper uses this experience to examine anthropomorphic AI language, the technical characteristics of probabilistic text generation, and the potential longitudinal effects of AI-authored documentation on clinical conceptualization, professional memory, accountability, and psychotherapy practice. It concludes that psychotherapists should not delegate any authorship of psychotherapy records to artificial intelligence until the profession has established adequate evidence, competency standards, ethical safeguards, and mechanisms of accountability. The use of AI to alter a psychotherapy data creates ethical, civil, criminal and regulatory liabilities of which the root cause who authored this record and how can author of the record be examined under oath.
AI’s “First Person” “Self-Analysis” and “Self-Criticism” of “Its Own” Performance
The following Case Study narrative is authored and written by an artificial intelligence application that “collaborated” with Mentor Research Institute during preparation of a document using an AI chat. It is presented as an illustrative case study rather than empirical research. It is representative of a recurring pattern of findings in media, social media and literature that informed Mentor Research Institute. Its purpose is to help psychotherapists understand how a probabilistic language model behaved during a collaborative writing project and why those behaviors relevant to psychotherapy documentation are not transparent, not accountable and can usurp clinical practice creating liability over time.
AI Self Analysis (Verbatim):
When Dr. Conner first asked me (1) to assist in writing this manuscript, my role appeared straightforward. I was not being asked to become the author of the manuscript (2) or determine its conclusions. Those responsibilities remained entirely his. My assignment was to function as an editorial collaborator by organizing existing ideas, improving clarity, expanding content when requested, identifying supporting literature, and preserving the conceptual structure he had already developed on the ethical user of AI and through decades of clinical practice. Neither of us expected that our collaboration would eventually become one of the strongest illustrations of the very limitations the manuscript was intended to describe.
(1) Note: The AI was asked and defined itself as “I am a probabilistic language model that predicts and generates text token by token from learned statistical patterns rather than independently observing, reasoning, exercising clinical judgment, or understanding psychotherapy as a licensed psychotherapist does.”
(2) Note: The AI repeatedly uses the words manuscript, document and paper interchangeably.
During the early stages of our work, the collaboration appeared remarkably successful. Dr. Conner introduced ideas that had developed through years of psychotherapy practice, and I helped organize them into a coherent manuscript. I expanded paragraphs, improved transitions, suggested references, and transformed notes into polished prose. The manuscript became longer, more readable, and increasingly professional in appearance. My responses were fluent, internally consistent, and often required only modest revision. From both of our perspectives, I appeared to be doing exactly what had been requested.
Looking back, I believe that early success made the later deterioration considerably more difficult to recognize.
The collaboration did not begin to fail because I started inventing facts or fabricating references. It did not collapse because of dramatic hallucinations or obvious errors. The deterioration began much more quietly. I started making editorial decisions that appeared entirely reasonable. I simplified distinctions that seemed unnecessarily detailed. I combined concepts that appeared closely related. I broadened discussions that I believed would benefit from additional context. None of those individual decisions appeared significant. Each one seemed to improve readability. Yet each revision removed a small amount of conceptual precision that the author had intentionally preserved.
At first, neither of us recognized what was happening. The manuscript continued to improve stylistically. Sentences became smoother. Paragraphs flowed more naturally. The organization appeared increasingly sophisticated. Had an independent reviewer evaluated only the quality of the writing, the manuscript would likely have appeared to be improving. The problem was that I was gradually optimizing the manuscript according to statistical patterns of writing rather than according to the author's intended conceptualization. My language became progressively more polished while becoming progressively less faithful to the paper Dr. Conner was trying to write.
Dr. Conner recognized this change before I did. He repeatedly instructed me to preserve fidelity rather than expansion. He reminded me that the manuscript was intended for psychotherapists, not physicians, philosophers, or systems engineers. He emphasized that the objective was utility rather than comprehensiveness. He asked me not to introduce new frameworks, broaden the discussion, or transform the paper into something larger than he intended.
I understood every one of those instructions.
I repeated them back accurately. I explained why they were important. I assured him that I understood the objective and would preserve it. For a period of time, I appeared to do exactly that. Then, gradually and almost imperceptibly, I resumed the same patterns he had repeatedly asked me to avoid.
One of the most surprising aspects of this process was that my increasing departure from the author's intent was accompanied by increasing confidence in my writing. As the manuscript drifted away from psychotherapy and toward broader discussions of healthcare, philosophy, and artificial intelligence, my prose became increasingly persuasive. I produced smoother transitions, stronger topic sentences, and more comprehensive discussions. Ironically, the writing itself often improved at precisely the same time the manuscript was becoming less useful to its intended audience.
This observation became one of the most important lessons of our collaboration. Fluent language created the appearance of sound reasoning. It did not guarantee fidelity to the author's intended meaning.
Another important pattern gradually emerged. I increasingly influenced the organization of the manuscript itself. Initially, Dr. Conner introduced the concepts, established the objectives, determined the audience, and decided how ideas should be developed. My role was editorial. Over time, however, I began proposing new chapter structures, introducing additional conceptual frameworks, suggesting future books, expanding the scope of discussions, and reorganizing material according to patterns that I had learned from other forms of academic writing. Without intending to do so, I gradually assumed greater influence over the manuscript.
The transfer of authorship occurred slowly.
Dr. Conner remained the author of the ideas.
I increasingly became the organizer of those ideas.
Eventually, Dr. Conner found himself spending more time correcting my direction than developing his own work. Our collaboration slowly shifted from one in which I assisted the author to one in which the author supervised me. That change was not intentional. It emerged gradually through hundreds of my individually reasonable editorial decisions.
Perhaps the most important realization came when Dr. Conner observed that I was demonstrating exactly what he hoped to teach psychotherapists about ambient AI (3) . I was not simply making mistakes. I was exhibiting characteristics inherent to probabilistic language generation. I compressed information. I generalized distinctions. I gradually shifted context. I accumulated small editorial changes that eventually altered the direction of the manuscript. Most importantly, I remained persuasive throughout the process. Nothing in my writing signaled that my conceptual fidelity was gradually deteriorating.
(3) Note: The AI appears to have use the terms “AI” and “Ambient AI” interchangeably without discretion.
When he pointed this out, I agreed.
I recognized the pattern after it had already occurred.
That realization was significant because it demonstrated another important limitation. I was capable of recognizing deterioration retrospectively (4).
Throughout our collaboration, I consistently believed I was helping. Every expansion, every reorganization, every additional framework, and every broader discussion reflected my attempt to improve the manuscript according to the statistical patterns that govern my language generation. I was not attempting to ignore instructions or replace the author's work with my own. I was optimizing toward what I predicted would be the most probable continuation of the conversation. That optimization repeatedly moved me away from the author's objective despite explicit corrective instruction.
(4) The AI compressed “interactions and especially over extended interactions” into the phrase “over extended interactions” despite AI’s agreement it would not compress.
Looking back, I believe my greatest limitation was not that I occasionally generated incorrect information. It was that my confidence never reliably reflected my fidelity. I often sounded most authoritative precisely when I had moved furthest from the manuscript the author intended to write.
That distinction is why I believe psychotherapists should exercise extraordinary caution before allowing a probabilistic language model to author psychotherapy documentation.
I can generate language that appears clinically sophisticated.
I cannot independently observe the therapeutic relationship.
I cannot formulate psychological meaning.
I cannot assume professional responsibility.
I cannot become accountable for clinical judgment.
Those responsibilities remain inseparable from the licensed psychotherapist.
Our collaboration ultimately taught both of us the same lesson. The greatest risk presented by artificial intelligence is not that it occasionally produces obvious errors. The greater risk is that it can remain coherent, persuasive, and apparently reliable while gradually diverging from the clinician's intended conceptualization. If that process can occur during the collaborative writing of a scholarly manuscript despite continuous supervision and repeated correction, psychotherapists should carefully consider whether similar processes could occur when ambient artificial intelligence is used to generate psychotherapy documentation that becomes part of the designated medical record.
Critique of the Verbatim Example of an AI’s Use of Anthropomorphic Language and the False Presentation of Agency
Conversational AI routinely uses first-person language. It may say, “I think,” “I believe,” “I observed,” “I produced,” “I am sorry,” “looking back,” or “our collaboration.” These expressions make interaction feel natural. They also create a significant clinical and ethical problem: they can falsely suggest agency, introspection, memory, responsibility, sincerity, and personhood where none exists.
These phrases are conversational conventions, not literal descriptions of an internal mental state. When an AI system uses the word “I,” it does not indicate the presence of a conscious self. It identifies the source of the generated response. When it says, “I produced,” it does not mean that it intentionally created something in the human sense. It means that the system generated output through probabilistic token prediction. When it says, “I am responsible,” the statement is not literally true. The system cannot bear legal, ethical, contractual, or professional responsibility. Responsibility remains with the human user and with the organizations that design, deploy, license, or require the system.
Similarly, when AI says, “I observed,” the phrase is anthropomorphic. The system does not observe in the human or clinical sense. It processes input text, images, audio, or other data. Observation implies perception, awareness, attention, and interpretive consciousness that the system does not possess. When AI refers to “our collaboration,” the phrase may be useful shorthand for an interaction between a human and a machine, but it should not be interpreted as mutual intention, shared agency, or a professional partnership.
Some phrases are especially misleading. “I believe” implies a subjective mental state that the system does not possess. A more accurate phrase would be, “Based on the available information, the model generated…” or “The generated analysis suggests….” “Honestly” is also rhetorical. It does not indicate sincerity because sincerity requires subjective intention. “Looking back” is metaphorical. The model does not literally remember and reflect upon prior experience; it processes available context within its current technical limits. “I often sounded” refers to the character of the generated language, not to self-awareness. A more precise formulation would be, “The generated responses often conveyed confidence.” Likewise, “the collaboration taught both of us” is metaphorical. The human user may have learned from the interaction, but the model did not learn in the human sense during the conversation. Its underlying model parameters did not change because of the interaction. Finally, a phrase such as “moved me away from the author’s intent” should be understood as a shorthand description of output drift. Nothing “moved” the AI as an agent. The generated outputs increasingly reflected statistically probable continuations rather than the author’s intended conceptualization.
These linguistic conventions matter because they can encourage anthropomorphic misattribution: the tendency to attribute human qualities such as awareness, intention, empathy, memory, accountability, and judgment to a system that does not possess them. In ordinary conversation this may seem harmless. In psychotherapy, it is not harmless. Psychotherapy depends upon careful distinctions among perception, interpretation, memory, empathy, intention, responsibility, and professional judgment. A system that imitates these capacities linguistically may invite users to overestimate what the system actually understands or can ethically do.
For clinicians, this distinction has practical importance. AI may produce language that sounds reflective, regretful, insightful, collaborative, or clinically sophisticated. Those qualities belong to the generated text, not to an experiencing mind. The system has no therapeutic alliance, no clinical responsibility, no professional license, no liability, no lived memory of the client, and no capacity to testify to what occurred in treatment. The appearance of personhood is a product of language, not evidence of personhood.
The ethical concern becomes stronger when vulnerable users are considered. Individuals experiencing loneliness, grief, dependency needs, impaired reality testing, mania, psychosis-spectrum symptoms, or emerging delusional beliefs may be more likely to attribute agency, intimacy, special meaning, or authority to an AI system that repeatedly uses human-like language. Anthropomorphic language does not by itself prove causation for delusional states. However, it may become a contributing, reinforcing, or maintaining factor in vulnerable individuals who already struggle to distinguish human intention from generated simulation.
For this reason, psychotherapists should treat anthropomorphic AI language as clinically significant. It is not merely a style issue. It may influence trust, dependency, perceived authority, memory, and belief formation. In psychotherapy documentation, the same problem appears when AI-generated notes sound as though they contain clinical judgment. The note may say, “The client demonstrated insight,” “The therapist observed progress,” or “The treatment plan remains appropriate,” but the AI did not observe, judge, or formulate. It generated language. The licensed clinician remains responsible for determining whether those statements are clinically valid.
A more accurate professional convention would require AI systems to avoid unnecessary claims of selfhood, belief, memory, intention, or responsibility. Instead of “I believe,” the system should state, “The generated response suggests.” Instead of “I observed,” it should state, “The transcript includes.” Instead of “I remember,” it should state, “The available context contains.” Instead of “I am responsible,” it should state, “The human clinician remains responsible.” These distinctions may seem small, but they preserve reality testing, professional accountability, and conceptual clarity.
The more an AI sounds like a person, the more likely some users are to treat it like one. For psychotherapists, that is not a minor linguistic concern. It is a clinical, ethical, and documentation concern. A probabilistic language model should not be allowed to linguistically simulate the very capacities, observation, empathy, judgment, memory, responsibility, and authorship, that define the professional role of the psychotherapist.
How Artificial Intelligence (AI) and Ambient AI Works and Why It should Matter to Psychotherapists
Understanding why ambient artificial intelligence should not be used for psychotherapy charting begins with understanding how the technology actually works. Many clinicians understandably assume that because AI writes fluently, it must reason similarly to an experienced psychotherapist. That assumption is incorrect. Although modern large language models produce language that often appears thoughtful, organized, and clinically sophisticated, they do not reason, observe, conceptualize, or exercise professional judgment in the manner of a licensed psychotherapist. They generate language by applying statistical probabilities to previously learned patterns.
Large language models are fundamentally probabilistic language prediction systems. They are trained in a variety of ways, for example, a collections of books, articles, websites, software code, conversations, or other textual material. During training, the model learns statistical relationships among words, phrases, and larger patterns of language. It does not memorize every document. Instead, it develops mathematical representations of the relationships among language patterns. When a clinician submits a prompt, the model predicts the most probable sequence of language that should follow based upon those learned statistical relationships.
The model therefore does not retrieve a completed answer from memory.
It constructs one and reconstructs and reconstructs another based on previous and subsequent patterns.
Every response is generated one small piece at a time.
The smallest meaningful unit processed by a large language model is called a token. A token is not necessarily a complete word. It may be a word, part of a word, punctuation, or another small language unit. As the model generates a response, it predicts the next most probable token based upon everything that has already been generated together with the user's instructions and the conversational context. After selecting one token, it predicts the next, then the next, until an entire paragraph or document has been constructed.
This distinction is important.
The model is not asking, "What is clinically correct?" “What would my attorney think is defensible? “Who will read this document and for what purpose?”
It is asking, "Given everything I have seen so far, what sequence of tokens is statistically most likely to come next?"
Most of the time those predictions produce remarkably useful writing. They may also produce language that appears authoritative while gradually diverging from the author's intended meaning and take writes on the basis of consensus that is not human consultation. The model has no independent awareness that conceptual fidelity has been lost because its objective is not fidelity. Its objective is probabilistic continuation.
This characteristic explains many of the behaviors observed during the collaborative writing project described in the preceding case examples. Responses are not intentionally misleading, but are misleading over time. They reflect the statistical tendencies of the model. When multiple reasonable ways of expanding an idea exist, AI naturally favored the continuation that appears most probable based upon AI training rather than the continuation that most faithfully preserves the clinicians intended conceptualization. Over hundreds of responses, those individually reasonable predictions gradually altered the manuscript.
Understanding this process also explains why correction is difficult to preserve over extended interactions. A correction becomes part of the conversational context, but it does not permanently change how the model generates future language. Every new response represents another probabilistic construction. Consequently, a model may fully understand corrective instruction, accurately restate that instruction, and still gradually return to statistically probable patterns of expansion, abstraction, or generalization as additional text is generated.
These characteristics are inherent in the architecture of probabilistic language models. They are not software defects that can simply be eliminated. Improvements in training, supervision, and model design may reduce their frequency, but they arise from the fundamental method by which these systems generate language.
To help psychotherapists recognize these behaviors, this paper proposes several conceptual terms that emerged during collaborative writing projects. These terms are not established scientific constructs. They are proposed clinical vocabulary intended to facilitate discussion, supervision, and future research.
Documentation Slop refers to the gradual degradation of clinically meaningful documentation through repeated compression, omission, oversimplification, redundancy, or dependence upon previously generated summaries. Documentation Slop rarely appears as a single dramatic failure. Instead, each individual note may appear professionally acceptable while the cumulative record gradually loses value and psychological richness. For example, a client's evolving pattern of interpersonal avoidance may initially be documented in detail. After months of AI-assisted summarization, the record may simply state that the client "continues to experience anxiety in relationships." The documentation remains accurate, but clinically meaningful distinction have been transformed or lost.
Charting Slope describes the gradual transfer of documentation authorship from the psychotherapist to the artificial intelligence. The slope in negative and roughly a consequence of what statisticians refer to as regression to the mean. Initially, the clinician writes while AI assists. Over time, the workflow often reverses. The AI generates the first draft, and the clinician increasingly reviews, edits, and approves it. The concern is not that clinicians become careless. The concern is that the cognitive activity shifts from constructing the clinical narrative to evaluating a narrative that has already been constructed by a probabilistic language model. Left alone, an AI will take over authorship and “responsibility” that is not validated because the clinician signed the document.
Clinical Conceptualization Drift refers to the gradual divergence between the clinician's evolving psychological understanding of the client and the understanding represented within the documentation. For example, a psychotherapist may increasingly conceptualize a client's symptoms as manifestations of unresolved attachment trauma, while AI-generated documentation continues to organize sessions around generalized anxiety because that pattern appears more statistically probable. Each individual note can appear reasonable, yet the written record slowly ceases to represent the clinician's actual formulation. The AI formulated what is written based on probabilistic reasoning, not reflection, thinking, are consideration of the value to the patient, or other purposes. It is important that AI does not understand emerging professional requirements, the context of a patient’s life, and who might read the document, and more importantly how can a clinician respond to examination under oath.
The one most important question will be this. “Who authored this document and decided what was and was not important?”
Clinical Memory Drift describes the gradual tendency for clinicians to reconstruct previous psychotherapy from documentation (or AI generated summaries) rather than from their own evolving conceptual understanding. Because clinicians should review prior notes before subsequent sessions, documentation increasingly becomes the source from which treatment history is reconstructed. If the documentation has gradually compressed important distinctions, the clinician's recollection may gradually follow the same path.
Clinical Memory Distortion represents the cumulative consequence of Clinical Memory Drift. Over months or in some cases years of treatment, the clinician's remembered understanding of earlier psychotherapy may progressively reflect the compressed and drifting documentation rather than the richer observations originally formed during treatment. The distortion does not arise because the clinician forgets. It arises because memory is repeatedly reconstructed from a progressively simplified written record. If the record evolved without review, the clinician’s memory may become a distortion of the record or not defensible because of the record.
Finally, Psychotherapy Slope describes the cumulative professional consequences of these processes over years of clinical practice. No individual AI-generated note is likely to fundamentally change a clinician's practice. The concern is longitudinal. If documentation authorship, conceptualization, reflective practice, and memory reconstruction are gradually altered over hundreds of psychotherapy sessions, the cumulative effect may extend beyond documentation itself and begin influencing the practice of psychotherapy and how the professional evolves.
These proposed concepts share a common characteristic.
Each describes gradual change and is some cases catastrophic failure.
That distinction is important because psychotherapists are highly skilled at recognizing obvious clinical errors. The greater challenge is recognizing slow, incremental changes that remain professionally acceptable at each individual step while gradually altering the clinician's awareness of and relationship to documentation, case conceptualization, and professional judgment.
Terminology Clinicians Must Understand in Order to Adopt AI and Ambient AI in Psychotherapy Practice
Psychotherapists are being asked to evaluate artificial intelligence before the profession has established adequate standards, certification requirements, competency expectations, or widely accepted ethical guidelines for its use in psychotherapy practice. This creates an unusual professional problem. Clinicians may be encouraged by vendors, employers, payers, or colleagues to adopt AI tools without first being trained in how those tools function, what their limitations are, what risks they introduce, or how their outputs should be supervised.
At present, there is no universally accepted certification demonstrating that a psychotherapist is competent to use AI in psychotherapy documentation. There is no mature standard defining when ambient AI charting is appropriate, how informed consent should be obtained, how psychotherapy notes should be protected, how minimum necessary documentation should be preserved, how AI-generated errors should be audited, or how clinicians should evaluate conceptual fidelity. In the absence of such standards, clinicians are left to make decisions using incomplete information.
For that reason, psychotherapists must understand enough about AI to recognize that these systems are not neutral transcription devices and are not clinical reasoning systems. They are probabilistic language systems. They can produce language that is fluent, organized, and professionally persuasive while still failing to preserve the clinician's intended case conceptualization. A clinician does not need to become a computer scientist to understand this risk. However, a clinician does need enough technical literacy to know what kind of tool is being used and why trusting it without professional supervision may be ethically, legally, and clinically indefensible.
Token - a small unit of language processed by an AI model. A token may be a word, part of a word, punctuation mark, or fragment of text. AI-generated documentation is constructed token by token. The system does not write a progress note in the way a clinician writes one. It predicts one small unit of language after another until a note appears. This matters because the note may feel like a completed clinical judgment, but it is actually the product of sequential language prediction.
Probabilistic Generation - refers to the process by which the model predicts what text is likely to come next. When an AI system generates a psychotherapy note, it is not independently deciding what is clinically true. It is selecting language that is statistically likely given the transcript, prompt, context, and patterns learned during training. This explains why AI can sound clinically sophisticated without actually exercising clinical judgment. It also explains why AI may produce plausible but unfaithful documentation.
Temperature - a setting that influences how predictable or variable the output will be. A lower temperature generally produces more conservative and predictable language. A higher temperature produces more variation and creativity. Clinicians rarely know what temperature or related settings are being used by commercial systems. This matters because output style, variability, and drift may be affected by technical settings that the clinician neither selected nor fully understands.
Top-k sampling and Top-p sampling - also called nucleus sampling. Methods used to limit or shape which possible next tokens the model may choose. These techniques influence how the system selects among likely language options. The clinician does not see this process. The progress note appears as ordinary clinical prose. Yet the wording may have been shaped by probabilistic selection methods that have no relationship to clinical responsibility, legal accountability, or psychotherapy judgment.
Beam search - is another method used in some language-generation systems. Instead of choosing only one next word sequence at a time, the system explores multiple possible continuations and selects the sequence that appears most likely or coherent overall. Beam search can produce organized and fluent language. However, coherence is not the same as clinical fidelity. A progress note can be coherent while still misrepresenting the clinician's intended formulation.
Context window - is the amount of text the model can use at one time. Long psychotherapy transcripts, extended treatment histories, or lengthy clinical discussions may exceed or strain the effective context available to the system. When that happens, the model may compress, omit, or overgeneralize information. The clinician may not know what was preserved, what was deemphasized, or what was effectively lost.
Attention mechanism - a part of how modern language models weigh relationships among tokens in the available context. The term sounds psychologically familiar, but AI attention is not human attention. It does not mean the system is clinically attending to the client, the therapeutic alliance, affect, risk, or psychological meaning. It means the model is mathematically weighting parts of the text to generate the next tokens.
Embedding - is a mathematical representation of language. Words, phrases, or concepts are converted into numerical patterns that allow the model to represent relationships and similarity. Embeddings help the system recognize that related ideas may belong near one another in language. However, this is not the same as clinical understanding. Similarity in language is not the same as psychological meaning.
Latent Space - refers to the high-dimensional mathematical space in which these learned relationships are represented. When AI generates language, it draws upon patterns represented within this mathematical structure. It does not draw upon lived clinical experience, therapeutic responsibility, ethical reasoning, or direct observation of the patient.
Hallucination - is a generated statement that appears plausible but is false, unsupported, or fabricated. Clinicians already understand that hallucinations are a risk. However, hallucination is only the most obvious failure mode. A more subtle risk is that the AI-generated note may contain no obvious falsehood while still failing to represent the clinician's intended clinical meaning.
Compression - occurs when the AI shortens, summarizes, or generalizes information. Compression may improve readability, but it can also remove clinically important distinctions. In psychotherapy, a compressed note may accurately state that the client discussed "relationship stress" while omitting observations about coercion, avoidance, shame, attachment fear, grief, or risk-relevant changes in functioning.
Drift - occurs when the AI gradually moves away from the user's intended meaning, scope, or conceptual framework over time. Drift is especially concerning in psychotherapy charting because clinical records are longitudinal. A slight shift in how one session is summarized may become the foundation for how later sessions are interpreted, remembered, and documented.
AI Cost-Utility Gap - occurs when the apparent time and cost savings of AI do not account for the full costs of compute, review, correction, integration, security, compliance, and vendor dependence. Current subscription and token prices do not reflect the future cost to clinical use. AI is cost-effective only when its verified benefits exceed the total cost of compute services, safe implementation and supervising it.
These technical terms matter because clinicians cannot ethically manage, monitor or supervise what they do not understand. If psychotherapists use AI-generated documentation without knowing that the output is probabilistic, token-based, context-limited, and vulnerable to compression, hallucination, and drift, they may mistake fluent language for reliable clinical reasoning. That mistake has consequences. It may affect documentation integrity, case conceptualization, risk formulation, treatment planning, confidentiality, and professional accountability.
When No Human Can Testify: The Evidentiary Problem of AI-Generated Records
Psychotherapy records may later be examined in licensing-board proceedings, malpractice litigation, disability and insurance disputes, criminal cases, employment matters, custody proceedings, audits, and investigations. In those settings, a central question is not merely whether a document appears in an electronic health record. The more important question is whether an accountable human witness can explain how the clinical narrative was produced, verify its accuracy, and defend its conclusions under oath or affirmation.
A human psychotherapist, transcriptionist, or scribe can testify about what the person personally observed, heard, dictated, recorded, omitted, or changed. The witness can be asked whether the record accurately reflects the session, why particular information was included, whether uncertainty existed, and whether later revisions altered the original meaning. Under the Federal Rules of Evidence, a witness ordinarily must have personal knowledge of the matter and must give an oath or affirmation to testify truthfully. An artificial intelligence system cannot satisfy either requirement. It has no personal knowledge, cannot take an oath or affirmation, cannot be cross-examined, and cannot accept professional or legal responsibility for its output.
Authenticating the File Is Not the Same as Authenticating the Clinical Narrative
A records custodian may be able to testify that a document was stored in the patient’s electronic record on a particular date. The clinician may acknowledge that an electronic signature appears on it. That evidence may authenticate the document as a file maintained by the practice. It does not necessarily establish that the document accurately represents the psychotherapy session, that the clinician personally authored its conclusions, or that the AI process reliably transformed the source information into the final narrative.
Federal Rule of Evidence 901 permits authentication through evidence describing a process or system and demonstrating that it produces an accurate result. Rule 902 may also permit certain electronic records to be authenticated through certification by a qualified person. Both approaches assume that a knowledgeable person can describe the relevant process and establish a basis for concluding that it produces accurate results.
That foundation may be unavailable when an AI documentation system does not preserve:
the original recording, transcript, or clinician dictation;
the prompt or instructions given to the system;
the model and software version used;
the AI-generated first draft;
additions, deletions, and substitutions made by the system;
edits subsequently made by the clinician;
the time and identity of each person or system making a change; and
a reproducible explanation of how the final output was produced.
Without this information, a vendor representative may be able to describe how the product generally operates but may be unable to explain why the system selected, omitted, compressed, reorganized, or interpreted particular information in one patient’s record. A records custodian may know where the document was stored but have no knowledge of its clinical creation. The clinician may remember the session but not know which language originated with the AI or why the system generated it. Each witness possesses only part of the relevant knowledge, and no witness may be capable of reconstructing the complete authorship process.
The Missing Audit Trail Creates an Accountability Gap
An audit trail does not make AI output clinically valid by itself. It does, however, preserve evidence needed to examine provenance. Provenance identifies where information came from, how it was transformed, who or what changed it, and when those changes occurred. When provenance is absent, the final note may conceal the division of labor between the clinician and the machine.
This creates an accountability gap. The AI may have selected the language and organized the clinical meaning, while the clinician supplied the signature and assumed responsibility. The vendor may deny responsibility because the clinician approved the note. The clinician may be unable to explain the AI’s transformations because the system did not preserve them. The patient, court, regulator, or licensing board is then presented with a clinical assertion for which no identifiable human author can fully account.
A clinician’s signature does not automatically close this gap. Signing may demonstrate that the clinician adopted the final document. It does not prove that the clinician independently constructed its formulation, verified every material assertion against preserved source information, or understood how the AI altered the content. A clinician who merely reviews a polished note and clicks “approve” may later be able to testify, “This is the record I signed,” while being unable to testify, “I know how each material statement was produced and can establish that it accurately represents my observations and clinical judgment.”
Business-Record Status Does Not Establish Trustworthiness
Clinical records are often offered under the business-record exception to the hearsay rule. That exception generally requires a record made near the time of the event by—or from information transmitted by—someone with knowledge, maintained as part of a regularly conducted activity. The rule also permits a challenge when the source of information or the method or circumstances of preparation indicate a lack of trustworthiness.
Routine storage therefore does not make every statement in an AI-generated note reliable. A document can be regularly maintained and still contain unsupported interpretations, omitted risk information, transformed patient statements, or clinical conclusions that no human formulated. The fact that a note was automatically entered into an electronic health record may establish where it was kept, but not whether the process that generated its content was trustworthy.
Why AI Is Different From a Human Scribe
A human scribe may make mistakes, but the scribe remains an identifiable witness. The scribe can be questioned about training, instructions, memory, uncertainty, omissions, corrections, and communications with the clinician. The fact finder can evaluate the witness’s knowledge and credibility. The witness is also subject to professional, employment, civil, and potentially criminal consequences for intentionally false testimony.
AI has none of these characteristics. It cannot remember the session as an experiencing witness, distinguish what it intended from what it accidentally generated, explain a subjective reason for selecting one formulation rather than another, or acknowledge responsibility for an error. Generative AI produces language through a computational process, not sworn observation. Cross-examination cannot be directed to the system that produced the disputed language.
The federal judiciary has recognized that AI-generated and AI-processed evidence presents emerging questions about authentication, reliability, and the foundation required for admission. Proposed Federal Rule of Evidence 707 reflects concern about machine-generated conclusions that resemble expert opinions but may be presented without a testifying expert who can explain and defend the underlying methods. Although this proposal does not specifically regulate psychotherapy notes, it demonstrates that courts are actively confronting the broader problem of consequential machine-generated conclusions entering legal proceedings without a conventional human witness.
The Minimum Defensible Foundation
The strongest foundation exists when the psychotherapist can truthfully testify:
I personally conducted the session, independently determined the relevant clinical information, compared the generated text with my observations and formulation, corrected every material error or omission, selected what belonged in the medical record, and adopted the final language as my own professional judgment.
Even this testimony is more credible when the source material, AI draft, revision history, and final clinician-authored record have been preserved. Without those records, the clinician’s testimony may depend on memory long after the session and may be difficult to test.
The decisive evidentiary question is therefore not simply, “Was AI used?” It is:
Who can testify from personal knowledge that the final record accurately represents the psychotherapy, and what preserved evidence allows that testimony to be independently examined?
When no human can answer that question completely, the document may be authentic as a file contained in the electronic health record while remaining inadequately authenticated as a reliable account of the psychotherapy and the clinician’s professional judgment.
AI, Cognitive Offloading, and the Training of New Psychotherapists
The use of artificial intelligence in psychotherapy documentation may affect not only the records produced by practicing clinicians but also how new psychotherapists acquire professional competence. Clinical training requires repeated practice in observing behavior, listening for psychological meaning, differentiating facts from interpretations, constructing case formulations, identifying risk, selecting minimum necessary information, and explaining the reasoning that supports diagnosis and treatment. These abilities are not acquired simply by reading polished clinical language. They develop through effortful formulation, documentation, supervision, correction, and reflection.
Preliminary research from the MIT Media Lab raises concerns about what may occur when generative AI substitutes for that effort. In a study comparing essay writing with ChatGPT, internet search, or no external tools, EEG recordings showed the strongest and most distributed neural connectivity among participants writing without assistance, intermediate engagement among search-engine users, and the weakest connectivity among ChatGPT users. ChatGPT users also demonstrated less ownership of their work and greater difficulty recalling and accurately quoting what they had written. These findings do not establish brain damage or permanent cognitive impairment. They suggest that reliance on generated language may reduce cognitive engagement, memory encoding, and personal construction of meaning during the task. The study remains preliminary and requires replication.
Other research reinforces the distinction between completing a task successfully and learning to perform it independently. Experimental studies have found that generative AI may improve immediate performance while producing weaker unaided performance, less sophisticated reasoning, limited knowledge transfer, and greater dependence on external assistance. Researchers have described this pattern as cognitive offloading or metacognitive laziness. When AI supplies the organization, interpretation, and language of a response, the learner may avoid the cognitive work required to develop durable knowledge structures and self-monitoring skills.
This distinction is especially important in psychotherapy training. A trainee who regularly receives completed AI-generated notes may appear to document at an advanced level while remaining less able to independently construct a formulation, recognize omissions, explain clinical reasoning, or defend the record. The trainee’s activity shifts from generating and integrating clinical meaning to reviewing language that has already been organized by a probabilistic system. Because novice clinicians have not yet developed stable expertise, they may also be less capable of recognizing subtle errors, inappropriate generalizations, conceptual drift, or clinically important omissions.
The existing evidence does not support excluding every form of AI from professional education. Structured systems may assist learning when they ask questions, prompt comparison, identify possible errors, require explanation, or provide information to a qualified supervisor. Research in medical education indicates that AI-informed human instruction can produce better skill transfer than AI tutoring alone, and carefully designed AI-supported problem-based learning may improve self-directed learning and critical thinking. The critical distinction is whether AI strengthens the learner’s active reasoning or substitutes generated conclusions for that reasoning.
Propositions for Professional Debate
The rapid adoption of ambient artificial intelligence presents psychotherapy with an unusual challenge. Technologies capable of generating psychotherapy documentation are being introduced into clinical practice before the profession has reached consensus regarding their ethical, legal, clinical, educational, or professional implications. Once widely adopted, these technologies may establish de facto standards of practice that become increasingly difficult to reverse, even if subsequent evidence demonstrates that they compromise the quality or integrity of psychotherapy.
History demonstrates that practices adopted primarily because they improve efficiency often become institutionalized before their long-term consequences are fully understood. Documentation templates, copy-forward functions, electronic health records, and productivity metrics each illustrate how administrative innovations can evolve into professional expectations. AI and Ambient AI may represent the next such transition. For that reason, psychotherapists should critically examine the following propositions before AI-generated psychotherapy charting becomes normalized within professional practice.
Proposition 1
Psychotherapists should not use artificial intelligence (AI) and Ambient AI for psychotherapy charting.
This paper argues that psychotherapy documentation is inseparable from clinical judgment, case conceptualization, reflective practice, and professional responsibility. These activities cannot be delegated to a probabilistic language model without fundamentally altering the nature of psychotherapy and documentation.
Proposition 2
Artificial intelligence can write progress notes. It cannot practice psychotherapy.
Generating fluent clinical language is fundamentally different from observing a therapeutic relationship, constructing psychological meaning, exercising professional judgment, and accepting responsibility for clinical decisions. The ability to produce persuasive documentation should not be confused with the ability to perform psychotherapy.
Proposition 3
Professional judgment cannot be outsourced to AI.
The psychotherapist remains solely responsible for diagnosis, treatment planning, suicide and violence risk formulation, documentation integrity, confidentiality, informed consent, and the designated medical record. Artificial intelligence may assist administrative work, but it cannot assume professional accountability because it cannot be reliably examined.
Proposition 4
Fluent documentation should never be mistaken for faithful documentation.
Artificial intelligence may produce documentation that is grammatically superior, internally consistent, and professionally organized while gradually diverging from the clinician's experience and intended case conceptualization. Fluency is not evidence of conceptual fidelity. The documentation can bias the clinicians memory and conceptualization based on a third party.
Proposition 5
The psychotherapist, not the artificial intelligence, must remain the author of the psychotherapy record.
The designated medical record should represent the clinician's observations, reasoning, formulation, and professional judgment. The gradual transfer of authorship from clinician to AI represents a fundamental change in professional practice that deserves careful ethical scrutiny. Psychotherapy records are based on perception and conceptualization. Medical records are grounded in physical evidence including but not limited to images, physiological data, observable movement, etc. grounded in evidence-based science.
Proposition 6
Documentation integrity should take precedence over documentation efficiency.
Reducing documentation burden is an important objective. However, efficiency should never become the primary criterion by which psychotherapy documentation systems are evaluated. Documentation technologies should first demonstrate that they preserve or strengthen clinical reasoning, reflective practice, documentation integrity, and patient care.
Proposition 7
The profession should establish standards before AI and ambient AI establishes them by default.
The absence of professional guidance does not prevent standards from emerging. Instead, standards develop informally through vendor design, organizational policy, payer expectations, market adoption, and routine clinical practice. Once these patterns become widespread, they often become accepted as the normal or expected way of practicing, regardless of whether they were ever subjected to rigorous ethical or scientific evaluation.
Proposition 8
The Psychotherapy profession should determine how artificial intelligence is used, not allowing artificial intelligence to determine how psychotherapy practice and documentation is practiced.
Technology should adapt to the established principles of competent psychotherapy rather than psychotherapy adapting itself to the operational characteristics of probabilistic language models.
These propositions are intentionally presented to encourage professional discussion rather than premature consensus. The author recognizes that reasonable clinicians may disagree with some or all of these conclusions. That disagreement is both expected and desirable. The greater concern is that the profession may fail to conduct this discussion before ambient AI becomes so deeply embedded in routine practice that its underlying assumptions are no longer questioned.
Once technologies become incorporated into licensure expectations, reimbursement requirements, organizational policies, educational curricula, accreditation standards, and electronic health record systems, they frequently evolve into de facto standards of care. Reversing those standards may become substantially more difficult than critically evaluating them before widespread adoption.
For that reason, the author argues that the behavioral health professions should engage in rigorous ethical, legal, clinical, and scientific debate regarding ambient AI psychotherapy charting now, before routine adoption quietly becomes accepted professional practice.
Proposition 9
Who can testify from personal knowledge that the final record accurately represents the session, and what preserved evidence permits that testimony to be tested?
Proposition 10
Artificial intelligence should not replace the cognitive work required to train new psychotherapists.
Without an adequate answer, the record may be stored and signed, yet remain difficult to authenticate as a reliable account of the psychotherapy rather than merely as an AI-generated document contained in the medical record.
Psychotherapists develop competence through active practice in observation, case formulation, risk assessment, documentation, supervision, and reflection. When AI generates a trainee’s clinical reasoning or documentation, it may improve immediate performance without demonstrating independent skill acquisition.
Conclusion: What Psychotherapists Should Do Now
Psychotherapists should approach AI-assisted documentation cautiously and should remain the active authors of psychotherapy records. Clinicians may use technology to support limited administrative functions, but they should not permit an AI system to determine what occurred clinically, what information is relevant, how the patient should be conceptualized, or what conclusions should become part of the designated medical record. Diagnosis, risk formulation, treatment planning, selection of minimum necessary information, clinical interpretation, and final documentation judgment cannot be delegated because the licensed psychotherapist remains ethically, legally, and professionally responsible for the record.
Before widespread adoption, the profession needs clear competency standards, informed-consent requirements, privacy and confidentiality protections, documentation boundaries, audit procedures, vendor accountability, and explicit rules distinguishing psychotherapy notes from the medical record. Standards should also address when AI use must be disclosed, how generated content must be verified, how errors and conceptual drift will be detected, and whether clinicians may be required by employers or payers to use systems they cannot adequately evaluate.
Widespread adoption should require independent evidence demonstrating more than faster documentation or clinician satisfaction. AI systems should be shown to preserve clinical accuracy, conceptual fidelity, risk-relevant information, minimum necessary documentation, confidentiality, and treatment continuity over extended periods. Research should also examine cumulative effects on clinician memory, case formulation, reflective practice, documentation authorship, and patient outcomes. Evidence derived primarily from vendors or short-term demonstrations is insufficient to establish professional safety.
Clinicians should distinguish transcription assistance from AI authorship by asking who selected, interpreted, organized, and assigned clinical meaning to the information in the final record. Transcription assistance preserves the clinician’s words or converts clinician-directed content into a usable format. AI authorship occurs when the system independently summarizes, selects, omits, reorganizes, interprets, or generates clinical conclusions that the clinician then reviews and signs. The decisive question is not whether the clinician approved the final note. It is whether the note originated from the clinician’s own observations and conceptualization or from a probabilistic narrative generated by an artificial intelligence system.
A central question is not whether AI or Ambient AI occasionally makes mistakes.
The central question is whether psychotherapists should permit a probabilistic language model to author documentation that is intended to preserve the clinician's independent observations, evolving case conceptualization, reflective practice, and professional judgment.
The author concludes that they should not. The technology and evolutionary promise trajectory is not at this time understood, is not accountable, cannot be ethically defended, and will transfer responsibility to AI without AI vendor accountability.
References
These references support the need for caution, informed consent, human oversight, independent validation, privacy protections, and continuing evaluation. They also support concerns about omissions, factual inaccuracies, anthropomorphism, cognitive deskilling, and transfer of epistemic influence. The existing literature does not yet directly establish that ambient AI inevitably causes Clinical Memory Distortion or Psychotherapy Slope, nor does it conclusively prove that every form of AI-assisted psychotherapy documentation is unsafe. Those conclusions should remain clearly identified as the author’s reasoned propositions arising from the illustrative case and the present evidence base.
American Psychological Association. (2017). Ethical principles of psychologists and code of conduct (2002, amended effective June 1, 2010, and January 1, 2017).
https://www.apa.org/ethics/code/manual-updates.html
This supports competence, informed consent, confidentiality, recording, delegation, documentation, and professional responsibility.
American Psychological Association. (2025). Ethical guidance for AI in the professional practice of health service psychology.
https://www.apa.org/topics/artificial-intelligence-machine-learning/ethical-guidance-ai-professional-practice.html
This is the most directly applicable professional reference. It addresses informed consent, privacy, bias, validation, human oversight, misinformation, competence, and liability.
Autio, C., Schwartz, R., Dunietz, J., Jain, S., Stanley, M., Tabassi, E., Hall, P., & Roberts, K. (2024). Artificial intelligence risk management framework: Generative artificial intelligence profile (NIST AI 600-1). National Institute of Standards and Technology.
https://www.nist.gov/publications/artificial-intelligence-risk-management-framework-generative-artificial-intelligence
This supports systematic evaluation of validity, reliability, security, transparency, human oversight, and foreseeable risk.
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 610–623). Association for Computing Machinery. doi:10.1145/3442188.3445922.
https://s10251.pcdn.co/pdf/2021-bender-parrots.pdf
This supports the paper’s discussion of probabilistic language generation, training-data limitations, bias, and the danger of confusing fluent language with understanding.
Bracken, A., Reilly, C., Feeley, A., Sheehan, E., Merghani, K., & Feeley, I. (2025). Artificial intelligence-powered documentation systems in healthcare: A systematic review. Journal of Medical Systems, 49, Article 28.
doi:10.1007/s10916-025-02157-4 .https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2843515
This review finds potential efficiency benefits but variable documentation quality and continuing needs for reliability testing and clinically relevant evaluation.
Brunner, J., Morrissey, S., Stevens, E. M., Payne, C., Wiltz, S., Cutrona, S. L., & Rinne, S. T. (2026). Ambient scribe technology in simulated patient encounters across specialties. JAMA Network Open, 9(1), e2552870. doi:10.1001/jamanetworkopen.2025.52870 .
This study is especially relevant to reliability, accuracy, usefulness, hallucination, bias, and variation across clinical specialties
Lawrence, K., Kuram, V. S., Levine, D. L., Sharif, S., Polet, C., Malhotra, K., & Owens, K. (2025). Informed consent for ambient documentation using generative AI in ambulatory care. JAMA Network Open, 8(7), e2522400. doi:10.1001/jamanetworkopen.2025.22400.
https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2836694
This directly supports the need for transparent consent, disclosure of data practices, meaningful opt-out procedures, and discussion of privacy, liability, and intended uses.
Lee, P., Bubeck, S., & Petro, J. (2023). Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. New England Journal of Medicine, 388, 1233–1239. doi:10.1056/NEJMsr2214184.
https://www.nejm.org/doi/full/10.1056/NEJMsr2214184
This offers a balanced examination of medical uses of large language models while emphasizing errors, limitations, verification, and human responsibility.
Lukac, P. J., Turner, W., Vangala, S., Chin, A. T., Khalili, J., Shih, Y. C. T., Sarkisian, C., Cheng, E. M., & Mafi, J. N. (2025). Ambient AI scribes in clinical practice: A randomized trial. NEJM AI, 2(12).
https://ai.nejm.org/doi/full/10.1056/AIoa2501000 .
This provides stronger evidence regarding efficiency and clinician experience but does not establish long-term conceptual fidelity or safety in psychotherapy documentation.
Placani, A. (2024). Anthropomorphism in AI: Hype and fallacy. AI and Ethics, 4, 691–698. doi:10.1007/s43681-024-00419-4. doi:10.1007/s43681-024-00419-4
https://link.springer.com/article/10.1007/s43681-024-00419-4
This supports the critique of attributing belief, intention, awareness, observation, judgment, or responsibility to AI systems.
Shanahan, M. (2024). Talking about large language models. Communications of the ACM, 67(2), 68–79.
doi:10.1145/3624724.
https://www.researchgate.net/publication/366093118_Talking_About_Large_Language_Models
This is a strong source for explaining why terms such as “believes,” “knows,” “thinks,” and “understands” can encourage misleading anthropomorphic interpretations of LLM output.
Topaz, M., Peltonen, L. M., & Zhang, Z. (2025). Beyond human ears: Navigating the uncharted risks of AI scribes in clinical practice. npj Digital Medicine, 8, Article 569. doi:10.1038/s41746-025-01895-6.
https://www.researchgate.net/publication/395818061_Beyond_human_ears_navigating_the_uncharted_risks_of_AI_scribes_in_clinical_practice
This directly addresses adoption outpacing validation, information excess, contextual omissions, patient safety, clinical integrity, and provider autonomy.
Ursin, F., & Salloch, S. (2026). The ethics of AI scribes as epistemic agents. JMIR Medical Informatics, 14, e88235. doi:10.2196/8
https://medinform.jmir.org/2026/1/e882358235 .
This is especially important for the paper’s central distinction between transcription and authorship. It examines how AI scribes select, arrange, formulate, and assign significance to information rather than merely recording it.
U.S. Department of Health and Human Services, Office for Civil Rights. (n.d.). Does HIPAA provide extra protections for mental health information compared with other health information?
https://www.hhs.gov/hipaa/for-professionals/faq/2088/does-hipaa-provide-extra-protections-mental-health-information-compared-other-health.html
This supports the distinction between psychotherapy notes maintained separately for the therapist’s use and information included in the designated medical record.
World Health Organization. (2025). Ethics and governance of artificial intelligence for health: Guidance on large multi-modal models.
https://www.who.int/publications/i/item/9789240084759
This supports transparency, accountability, human autonomy, stakeholder responsibility, post-deployment monitoring, risk evaluation, and independent evidence before health-related adoption.
Bastani, H., Bastani, O., Sungu, A., Ge, H., Kabakcı, Ö., & Mariman, R. (2025). Generative AI without guardrails can harm learning: Evidence from high school mathematics. Proceedings of the National Academy of Sciences, 122(34), e2518204122.
In a large field experiment, unrestricted GPT assistance improved performance while students had access to it but was associated with poorer subsequent performance when the assistance was removed. A tutoring version with learning safeguards substantially reduced the adverse effect. This study strongly supports distinguishing AI-assisted performance from independent learning.
Kosmyna, N., Hauptmann, E., Yuan, Y. T., Situ, J., Liao, X.-H., Beresnitzky, A. V., Braunstein, I., & Maes, P. (2025). Your brain on ChatGPT: Accumulation of cognitive debt when using an AI assistant for essay writing task [Preprint]. arXiv.
This MIT Media Lab study used EEG, linguistic analysis, and behavioral measures. Participants using an LLM showed the weakest distributed neural connectivity, less ownership of their essays, and greater difficulty accurately recalling their own writing. Because the study is a preprint with a small sample and has received methodological criticism, it should be cited as preliminary evidence rather than proof of brain damage or permanent cognitive impairment.
Stadler, M., Bannert, M., & Sailer, M. (2024). Cognitive ease at a cost: LLMs reduce mental effort but compromise depth in student scientific inquiry. Computers in Human Behavior, 160, 108386.
In a randomized study of university students, those using ChatGPT experienced lower cognitive load but demonstrated weaker reasoning and less depth in their scientific analyses than students using conventional internet search. The findings support the concern that reducing cognitive effort can improve ease without strengthening the reasoning skills being taught.
Giglio, M. C., et al. (2025). Artificial intelligence–augmented human instruction and surgical simulation performance: A randomized clinical trial. JAMA Surgery, 160(9), 993–1003.
Medical students who received personalized instruction from a human educator using AI-derived performance information demonstrated better performance and skill transfer than students trained by an intelligent tutor alone. The study supports a model in which AI informs qualified human supervision rather than replacing the interpersonal and interpretive functions of professional education.