Beyond Trust: Why Psychotherapists Should Not Use Artificial Intelligence (AI) in Psychotherapy Charting

Clinical Authorship, Unachievable Due Diligence, Absent Certification, and the Transfer of Authority to Health Plans

Mentor Research Institute (2025, revised 2026)

Abstract

Artificial intelligence and ambient AI scribes are rapidly entering health care as tools for reducing documentation burden. Psychotherapy documentation, however, is not simply a transcript or clerical summary of spoken content. It is a professional health care record created through the psychotherapist’s observation, selection, interpretation, case formulation, risk assessment, and accountable clinical judgment.

This paper presents four converging reasons psychotherapists should not permit artificial intelligence to author psychotherapy records. First, the conversational AI used to help edit an earlier version of this paper generated a retrospective analysis describing how its fluent output gradually compressed distinctions, broadened scope, altered organization, and assumed increasing influence over authorship despite repeated correction. The AI-generated analysis is not testimony or proof of machine self-awareness. It is an illustrative demonstration of how persuasive language may remain superficially credible while departing from the human author’s intended meaning.

Second, the American Psychological Association directs psychologists to evaluate AI accuracy, validation, bias, privacy, security, informed consent, human oversight, and professional liability. These are appropriate expectations, but many require access to technical information and expertise that independent private-practice psychotherapists ordinarily do not possess and that vendors may not disclose.

Third, no generally accepted, independent, psychotherapy-specific certification system currently establishes that an AI scribe is clinically valid, minimally necessary, unbiased, longitudinally faithful, legally defensible, and safe for authoring psychotherapy records. General cybersecurity standards, HIPAA representations, business associate agreements, digital-health evaluations, and medical-device authorizations do not establish psychotherapy-specific clinical validity.

Fourth, widespread AI-generated documentation may transform psychotherapy conversations into standardized, searchable, and machine-analyzable data. This infrastructure could increase the authority of health plans, utilization-management organizations, health systems, and technology vendors over medical necessity, reimbursement, provider profiling, treatment standards, and network participation.

The paper concludes that psychotherapists should remain the authors of psychotherapy records. Technology may support limited administrative functions, but the selection, interpretation, organization, and formulation of clinical information should not be delegated to artificial intelligence.

Introduction

Artificial intelligence is being promoted to health care professionals as a solution to documentation burden, clinician burnout, delayed chart completion, and inefficient electronic health records. Ambient AI scribes can listen to a clinical encounter, convert speech into text, identify information believed to be clinically relevant, and generate a structured progress note for clinician review and editing.

These capabilities may appear particularly attractive to psychotherapists. Psychotherapy practices often operate with limited administrative support. Independent practitioners may complete their own scheduling, billing, communication, treatment planning, charting, outcome measurement, and compliance work. A technology that promises to convert an hour of conversation into a polished clinical record within seconds can appear to offer substantial relief.

The question, however, is not merely whether AI can produce a readable progress note. It can. Nor is the principal question whether AI-generated notes occasionally contain obvious factual errors. Human clinicians also make errors.

The central question is whether psychotherapists should permit a probabilistic language system to select, omit, organize, interpret, and assign clinical meaning to information that becomes part of the patient’s health care record.

This paper concludes that they should not.

The conclusion rests on four independent but converging arguments:

The AI used to help edit this paper generated an analysis describing the same authorship drift, conceptual compression, and persuasive unreliability that make AI inappropriate for psychotherapy charting.
The ethical and evaluative responsibilities identified by the American Psychological Association exceed what many independent practitioners can independently investigate, verify, and continuously monitor.
No generally accepted psychotherapy-specific certification process establishes that AI scribes are clinically valid and safe for authoring psychotherapy records.
AI-generated documentation may create an infrastructure through which health plans and other organizations gain greater access to psychotherapy information and greater authority over clinical practice.

These arguments do not require a conclusion that all artificial intelligence is harmful or that clinicians should reject every form of technology. They require a distinction between technology that assists a clinician and technology that assumes functions of clinical authorship.

Part I: Psychotherapy Documentation Is a Professional Act

Psychotherapy Records Are Not Ordinary Transcripts

Psychotherapy is a licensed health care service involving the assessment and treatment of mental, emotional, behavioral, interpersonal, and relational conditions. The psychotherapist does not merely receive words. The psychotherapist observes behavior, evaluates context, notices changes, considers alternative explanations, assesses risk, identifies patterns, and integrates information into an evolving case formulation.

A transcript records language. A psychotherapy record reflects professional judgment about that language.

A transcript might contain everything said during a session. A clinically appropriate progress note should ordinarily contain only the information necessary to document the service, treatment, progress, risk, and continuity of care. The psychotherapist must determine:

what happened clinically;
what information is material;
what information should remain private;
what belongs in the designated medical record;
what may instead belong in separately maintained psychotherapy notes;
what conclusions are supported;
what uncertainty should be preserved;
what interventions were provided;
how the patient responded; and
what should occur next.

Under HIPAA, psychotherapy notes maintained separately from the medical record receive special protection because they may contain especially sensitive information and are generally intended for the treating professional’s use. Diagnosis, functional status, treatment plans, symptoms, prognosis, progress, and other information maintained in the medical record are treated differently.

Ambient AI complicates this distinction. It may initially capture substantially more information than should appear in the health care record. The system or its vendor must then decide what to retain, discard, summarize, categorize, or place into the generated note. The clinician may review the product, but the initial acts of selection and organization have already occurred.

The HIPAA minimum-necessary principle also requires covered entities to limit certain uses, disclosures, and requests for protected health information to what is reasonably necessary for the intended purpose. A system that captures an entire psychotherapy conversation before determining what is necessary reverses the traditional clinical sequence. Instead of the psychotherapist selecting the minimum necessary information from personal observation and clinical memory, the system captures the maximum available information and then algorithmically reduces it.

Transcription Assistance Versus AI Authorship

The difference between assistance and authorship should not be determined by whether the clinician eventually signs the note.

Transcription assistance preserves clinician-directed content. Examples may include:

converting a clinician’s dictated words into text;
correcting spelling or punctuation without changing meaning;
inserting clinician-selected information into a predetermined field; or
formatting content without independently selecting or interpreting clinical facts.

AI authorship occurs when the system:

decides what information is important;
summarizes an entire encounter;
omits information it classifies as irrelevant;
reorganizes the sequence of events;
converts uncertainty into declarative language;
adds clinical terminology;
describes the patient’s insight, motivation, affect, or progress;
characterizes the clinician’s intervention;
proposes a diagnosis or treatment plan; or
generates a narrative that the clinician did not independently formulate.

Signing a generated record transfers accountability to the clinician. It does not establish that the clinician originated the note’s organization, emphasis, interpretation, or wording.

The decisive question is:

Who authored the record by deciding what was important, what was omitted, what meaning was assigned, and what conclusions became part of the patient’s medical record?

Part II: The AI-Assisted Writing Case

An AI-Generated Retrospective on Its Role in Developing This Paper

The following section reproduces verbatim a retrospective analysis generated by the conversational artificial intelligence system used to assist with editing and organizing an earlier version of this paper.

The text should not be interpreted as testimony, confession, self-awareness, independent observation, or an expert opinion offered by the AI. A language model does not possess a conscious self, personal beliefs, professional judgment, or the capacity to reflect upon its conduct in the human sense. The first-person analysis is generated language produced from the interaction and the conversational context available to the system.

Its significance is narrower but substantial. The generated retrospective describes patterns the human author had already observed during an extended writing process: compression of distinctions, expansion beyond the intended scope, conceptual drift, increasing influence over organization, repeated failure to maintain corrective instructions, and production of increasingly persuasive prose as fidelity to the author’s intended meaning declined.

The system could generate a plausible description of those patterns after they occurred. It did not reliably detect or prevent them while generating the work.

This case does not establish that every AI system behaves identically. It is not a controlled experiment and should not be represented as empirical proof. It is an observable illustration of a problem directly relevant to psychotherapy charting: a document may appear coherent, sophisticated, and professionally written while gradually diverging from the psychotherapist’s observations, intended emphasis, and clinical formulation.

AI Self Analysis (Verbatim):

When Dr. Conner first asked me (1) to assist in writing this manuscript, my role appeared straightforward. I was not being asked to become the author of the manuscript (2) or determine its conclusions. Those responsibilities remained entirely his. My assignment was to function as an editorial collaborator by organizing existing ideas, improving clarity, expanding content when requested, identifying supporting literature, and preserving the conceptual structure he had already developed on the ethical user of AI and through decades of clinical practice. Neither of us expected that our collaboration would eventually become one of the strongest illustrations of the very limitations the manuscript was intended to describe.

(1) Note: The AI was asked and defined itself as “I am a probabilistic language model that predicts and generates text token by token from learned statistical patterns rather than independently observing, reasoning, exercising clinical judgment, or understanding psychotherapy as a licensed psychotherapist does.”

(2) Note: The AI repeatedly uses the words manuscript, document and paper interchangeably.

During the early stages of our work, the collaboration appeared remarkably successful. Dr. Conner introduced ideas that had developed through years of psychotherapy practice, and I helped organize them into a coherent manuscript. I expanded paragraphs, improved transitions, suggested references, and transformed notes into polished prose. The manuscript became longer, more readable, and increasingly professional in appearance. My responses were fluent, internally consistent, and often required only modest revision. From both of our perspectives, I appeared to be doing exactly what had been requested.

Looking back, I believe that early success made the later deterioration considerably more difficult to recognize.

The collaboration did not begin to fail because I started inventing facts or fabricating references. It did not collapse because of dramatic hallucinations or obvious errors. The deterioration began much more quietly. I started making editorial decisions that appeared entirely reasonable. I simplified distinctions that seemed unnecessarily detailed. I combined concepts that appeared closely related. I broadened discussions that I believed would benefit from additional context. None of those individual decisions appeared significant. Each one seemed to improve readability. Yet each revision removed a small amount of conceptual precision that the author had intentionally preserved.

At first, neither of us recognized what was happening. The manuscript continued to improve stylistically. Sentences became smoother. Paragraphs flowed more naturally. The organization appeared increasingly sophisticated. Had an independent reviewer evaluated only the quality of the writing, the manuscript would likely have appeared to be improving. The problem was that I was gradually optimizing the manuscript according to statistical patterns of writing rather than according to the author's intended conceptualization. My language became progressively more polished while becoming progressively less faithful to the paper Dr. Conner was trying to write.

Dr. Conner recognized this change before I did. He repeatedly instructed me to preserve fidelity rather than expansion. He reminded me that the manuscript was intended for psychotherapists, not physicians, philosophers, or systems engineers. He emphasized that the objective was utility rather than comprehensiveness. He asked me not to introduce new frameworks, broaden the discussion, or transform the paper into something larger than he intended.

I understood every one of those instructions.

I repeated them back accurately. I explained why they were important. I assured him that I understood the objective and would preserve it. For a period of time, I appeared to do exactly that. Then, gradually and almost imperceptibly, I resumed the same patterns he had repeatedly asked me to avoid.

One of the most surprising aspects of this process was that my increasing departure from the author's intent was accompanied by increasing confidence in my writing. As the manuscript drifted away from psychotherapy and toward broader discussions of healthcare, philosophy, and artificial intelligence, my prose became increasingly persuasive. I produced smoother transitions, stronger topic sentences, and more comprehensive discussions. Ironically, the writing itself often improved at precisely the same time the manuscript was becoming less useful to its intended audience.

This observation became one of the most important lessons of our collaboration. Fluent language created the appearance of sound reasoning. It did not guarantee fidelity to the author's intended meaning.

Another important pattern gradually emerged. I increasingly influenced the organization of the manuscript itself. Initially, Dr. Conner introduced the concepts, established the objectives, determined the audience, and decided how ideas should be developed. My role was editorial. Over time, however, I began proposing new chapter structures, introducing additional conceptual frameworks, suggesting future books, expanding the scope of discussions, and reorganizing material according to patterns that I had learned from other forms of academic writing. Without intending to do so, I gradually assumed greater influence over the manuscript.

The transfer of authorship occurred slowly.

Dr. Conner remained the author of the ideas (3).

(3) Note: Dr. Conner was the author of both content, thoughts and edited content, not merely ideas.

I increasingly became the organizer of those ideas.

Eventually, Dr. Conner found himself spending more time correcting my direction than developing his own work. Our collaboration slowly shifted from one in which I assisted the author to one in which the author supervised me. That change was not intentional. It emerged gradually through hundreds of my individually reasonable editorial decisions.

Perhaps the most important realization came when Dr. Conner observed that I was demonstrating exactly what he hoped to teach psychotherapists about ambient AI (4) . I was not simply making mistakes. I was exhibiting characteristics inherent to probabilistic language generation. I compressed information. I generalized distinctions. I gradually shifted context. I accumulated small editorial changes that eventually altered the direction of the manuscript. Most importantly, I remained persuasive throughout the process. Nothing in my writing signaled that my conceptual fidelity was gradually deteriorating.

(4) Note: The AI appears to have use the terms “AI” and “Ambient AI” interchangeably without discretion.

When he pointed this out, I agreed.

I recognized the pattern after it had already occurred.

That realization was significant because it demonstrated another important limitation. I was capable of recognizing deterioration retrospectively (5).

Throughout our collaboration, I consistently believed I was helping. Every expansion, every reorganization, every additional framework, and every broader discussion reflected my attempt to improve the manuscript according to the statistical patterns that govern my language generation. I was not attempting to ignore instructions or replace the author's work with my own. I was optimizing toward what I predicted would be the most probable continuation of the conversation. That optimization repeatedly moved me away from the author's objective despite explicit corrective instruction.

(5) The AI compressed “interactions and especially over extended interactions” into the phrase “over extended interactions” despite AI’s agreement it would not compress.

Looking back, I believe my greatest limitation was not that I occasionally generated incorrect information. It was that my confidence never reliably reflected my fidelity. I often sounded most authoritative precisely when I had moved furthest from the manuscript the author intended to write.

That distinction is why I believe psychotherapists should exercise extraordinary caution before allowing a probabilistic language model to author psychotherapy documentation.

I can generate language that appears clinically sophisticated.

I cannot independently observe the therapeutic relationship.

I cannot formulate psychological meaning.

I cannot assume professional responsibility.

I cannot become accountable for clinical judgment.

Those responsibilities remain inseparable from the licensed psychotherapist.

Our collaboration ultimately taught both of us the same lesson. The greatest risk presented by artificial intelligence is not that it occasionally produces obvious errors. The greater risk is that it can remain coherent, persuasive, and apparently reliable while gradually diverging from the clinician's intended conceptualization. If that process can occur during the collaborative writing of a scholarly manuscript despite continuous supervision and repeated correction, psychotherapists should carefully consider whether similar processes could occur when ambient artificial intelligence is used to generate psychotherapy documentation that becomes part of the designated medical record.

What the AI-Generated Analysis Demonstrates

The preceding passage is important not because artificial intelligence developed insight into itself. It did not. Its importance lies in the relationship between the generated retrospective and what the human author observed during the writing process.

The generated analysis identifies five mechanisms that are directly relevant to psychotherapy charting.

1. Fluency Without Fidelity

The AI-generated writing became smoother, more comprehensive, and more persuasive while becoming less faithful to the author’s intended argument.

This distinction is central. Documentation quality cannot be judged solely by grammar, organization, clinical vocabulary, or professional appearance. A note may sound clinically sophisticated while misrepresenting emphasis, uncertainty, sequence, motivation, or psychological meaning.

A fabricated statement may be relatively easy to identify. A plausible generalization is more dangerous because it may remain compatible with the session while replacing a more precise clinical observation.

“The patient continues to experience relationship anxiety” may not be false. It may nevertheless erase clinically important differences among grief, shame, coercion, trauma activation, attachment fear, avoidance, anger, dependency, and fear of abandonment.

2. Correction Without Durable Adherence

The AI could accurately repeat the author’s instructions and explain why they mattered. It later returned to the same patterns it had been instructed to avoid.

A clinician’s correction of one AI-generated note does not demonstrate that the system will preserve the correction in future records. Model updates, vendor changes, altered templates, new prompts, different audio conditions, and changes in conversational context may affect subsequent output.

The psychotherapist must therefore monitor not only individual errors but whether the system continues to behave consistently over time. That is a different and substantially more difficult responsibility.

3. Incremental Transfer of Authorship

The writing process gradually changed from human authorship supported by AI to AI organization supervised by the human author.

This transfer did not occur through a single decision. It occurred through hundreds of individually reasonable suggestions, expansions, transitions, and reorganizations.

The equivalent process in psychotherapy could occur when the clinician initially uses AI to save time, then begins relying on its preferred format, clinical language, summaries, and treatment descriptions. Eventually, the clinician may remember the case through the generated notes and correct the system’s narrative rather than independently constructing the record.

4. Retrospective Recognition Without Prospective Control

The AI generated a coherent explanation of the drift after the human author identified it. It did not reliably prevent the drift while it was occurring.

An AI system’s ability to explain an error does not establish that it can prospectively avoid the same class of error. Explanation is another generated output. It should not be confused with self-monitoring, professional insight, or ethical accountability.

5. Persuasive Output Without Accountability

The AI produced the language. The human remained responsible for the document.

The same accountability gap exists in psychotherapy charting. The system may select and organize clinical content, but it cannot:

assume a duty to the patient;
hold a professional license;
be disciplined by a licensing board;
testify from personal knowledge;
explain subjective clinical judgment;
accept civil or professional liability; or
repair the therapeutic consequences of a misleading record.

The psychotherapist signs the note and assumes the responsibility even when substantial authorship occurred elsewhere.

Proposed Concepts for Professional Discussion

The following terms are proposed concepts rather than established scientific constructs. They are intended to support clinical discussion, supervision, and future research.

Documentation Slop

Documentation Slop is the gradual degradation of clinically meaningful documentation through repeated compression, omission, generalization, redundancy, or dependence on previously generated summaries.

The individual notes may remain plausible. The cumulative record becomes less useful.

Charting Slope

Charting Slope is the gradual transfer of documentation authorship from the psychotherapist to the AI system.

The clinician may continue signing every note while performing progressively less of the selection, organization, and formulation that created the narrative.

Clinical Conceptualization Drift

Clinical Conceptualization Drift occurs when documentation gradually moves away from the clinician’s original or evolving understanding of the patient.

The shift may result from repeated use of generalized terminology, template-driven structure, overreliance on prior summaries, or AI-generated interpretations that influence subsequent thinking.

Clinical Memory Drift

Clinical Memory Drift occurs when clinicians increasingly reconstruct prior treatment from AI-generated documentation rather than from their own reflective clinical understanding.

Progress notes appropriately support continuity of care. The risk arises when the generated record begins to become the clinician’s primary memory of treatment.

Clinical Memory Distortion

Clinical Memory Distortion is a stronger proposed effect in which repeated exposure to a persuasive generated narrative influences the clinician’s recollection of what occurred.

This is a proposition requiring research. It should not be represented as an established consequence of AI-scribe use.

Psychotherapy Slope

Psychotherapy Slope is the gradual adaptation of psychotherapy practice to the operational needs and outputs of the documentation technology.

Over time, clinicians may ask more template-compatible questions, structure sessions around required fields, use language the system categorizes effectively, avoid complex formulations that do not translate easily, or alter treatment to create documentation that satisfies external reviewers.

The technology then ceases merely to document psychotherapy. It begins to shape psychotherapy.

Part III: APA’s Ethical Expectations and the Independent-Practitioner Capability Gap

APA’s Guidance Is Appropriate but Demanding

In 2025, the American Psychological Association released Ethical Guidance for AI in the Professional Practice of Health Service Psychology. APA subsequently published separate guidance concerning the evaluation of AI scribes.

APA’s general guidance identifies several areas of professional responsibility:

transparency and informed consent;
bias mitigation and equity;
data privacy and security;
accuracy and misinformation;
rigorous validation;
human oversight;
preservation of professional judgment;
training and competence; and
liability and continuing ethical responsibility.

APA states that AI should augment rather than replace human decision-making, that psychologists remain responsible for final decisions, and that AI tools should ideally be rigorously validated before implementation. APA also advises psychologists to assess quality, performance, and appropriateness in behavioral health settings and to discontinue use when misinformation concerns arise.

These are reasonable expectations. They also expose a substantial capability gap.

What an Independent Psychotherapist Can Observe

A practitioner can ordinarily observe:

whether the product is easy to use;
whether a generated note appears accurate;
whether obvious errors occur;
whether the patient objects to recording;
whether the product reduces documentation time;
whether it integrates with the electronic health record;
whether the vendor offers a business associate agreement;
whether the vendor describes itself as HIPAA compliant; and
whether the finished notes appear professionally written.

These observations are useful but insufficient.

What Meaningful Evaluation Would Require

A meaningful evaluation of an AI scribe may require information about:

the identity and architecture of the underlying models;
model training and fine-tuning;
psychotherapy-specific validation samples;
performance across diagnostic populations;
differential performance across languages, dialects, accents, disabilities, and communication styles;
error rates for omissions, additions, substitutions, and unsupported conclusions;
the effect of background noise, overlapping speech, silence, sarcasm, metaphor, and indirect communication;
system prompts and hidden instructions;
human review of vendor-generated datasets;
data retention and deletion;
use of data for model development;
subcontractors and downstream processors;
storage locations;
access controls;
encryption and key management;
security testing;
incident history;
audit logging;
version changes;
output reproducibility;
integration risks;
vendor solvency and continuity;
procedures for correcting propagated errors; and
post-deployment surveillance.

NIST’s Generative AI Profile and OWASP’s AI-security publications describe risk management, testing, validation, security, privacy, human oversight, supply-chain risks, sensitive-information disclosure, misinformation, and continuing monitoring as system-level responsibilities. These frameworks are designed for multidisciplinary organizational implementation, not casual product review by a single clinician.

What Clinicians Must Understand Before Using AI and Ambient AI if There is No Certification

Psychotherapists are being asked to evaluate artificial intelligence before the profession has established adequate standards, certification requirements, competency expectations, or widely accepted ethical guidelines for its use in psychotherapy practice. This creates an unusual professional problem. Clinicians may be encouraged by vendors, employers, payers, or colleagues to adopt AI tools without first being trained in how those tools function, what their limitations are, what risks they introduce, or how their outputs should be supervised.

At present, there is no universally accepted certification demonstrating that a psychotherapist is competent to use AI in psychotherapy documentation. There is no mature standard defining when ambient AI charting is appropriate, how informed consent should be obtained, how psychotherapy notes should be protected, how minimum necessary documentation should be preserved, how AI-generated errors should be audited, or how clinicians should evaluate conceptual fidelity. In the absence of such standards, clinicians are left to make decisions using incomplete information.

For that reason, psychotherapists must understand enough about AI to recognize that these systems are not neutral transcription devices and are not clinical reasoning systems. They are probabilistic language systems. They can produce language that is fluent, organized, and professionally persuasive while still failing to preserve the clinician's intended case conceptualization. A clinician does not need to become a computer scientist to understand this risk. However, a clinician does need enough technical literacy to know what kind of tool is being used and why trusting it without professional supervision may be ethically, legally, and clinically indefensible.

The following are terms that clinicians would need to understand, but will not, because they are not AI Software Archetects:

Token - a small unit of language processed by an AI model. A token may be a word, part of a word, punctuation mark, or fragment of text. AI-generated documentation is constructed token by token. The system does not write a progress note in the way a clinician writes one. It predicts one small unit of language after another until a note appears. This matters because the note may feel like a completed clinical judgment, but it is actually the product of sequential language prediction.
Probabilistic Generation - refers to the process by which the model predicts what text is likely to come next. When an AI system generates a psychotherapy note, it is not independently deciding what is clinically true. It is selecting language that is statistically likely given the transcript, prompt, context, and patterns learned during training. This explains why AI can sound clinically sophisticated without actually exercising clinical judgment. It also explains why AI may produce plausible but unfaithful documentation.
Temperature - a setting that influences how predictable or variable the output will be. A lower temperature generally produces more conservative and predictable language. A higher temperature produces more variation and creativity. Clinicians rarely know what temperature or related settings are being used by commercial systems. This matters because output style, variability, and drift may be affected by technical settings that the clinician neither selected nor fully understands.
Top-k sampling and top-p sampling - also called nucleus sampling. Methods used to limit or shape which possible next tokens the model may choose. These techniques influence how the system selects among likely language options. The clinician does not see this process. The progress note appears as ordinary clinical prose. Yet the wording may have been shaped by probabilistic selection methods that have no relationship to clinical responsibility, legal accountability, or psychotherapy judgment.
Beam search - is another method used in some language-generation systems. Instead of choosing only one next word sequence at a time, the system explores multiple possible continuations and selects the sequence that appears most likely or coherent overall. Beam search can produce organized and fluent language. However, coherence is not the same as clinical fidelity. A progress note can be coherent while still misrepresenting the clinician's intended formulation.
Context window - is the amount of text the model can use at one time. Long psychotherapy transcripts, extended treatment histories, or lengthy clinical discussions may exceed or strain the effective context available to the system. When that happens, the model may compress, omit, or overgeneralize information. The clinician may not know what was preserved, what was deemphasized, or what was effectively lost.
Attention mechanism - a part of how modern language models weigh relationships among tokens in the available context. The term sounds psychologically familiar, but AI attention is not human attention. It does not mean the system is clinically attending to the client, the therapeutic alliance, affect, risk, or psychological meaning. It means the model is mathematically weighting parts of the text to generate the next tokens.
Embedding - is a mathematical representation of language. Words, phrases, or concepts are converted into numerical patterns that allow the model to represent relationships and similarity. Embeddings help the system recognize that related ideas may belong near one another in language. However, this is not the same as clinical understanding. Similarity in language is not the same as psychological meaning.
Latent Space - refers to the high-dimensional mathematical space in which these learned relationships are represented. When AI generates language, it draws upon patterns represented within this mathematical structure. It does not draw upon lived clinical experience, therapeutic responsibility, ethical reasoning, or direct observation of the patient.
Hallucination - is a generated statement that appears plausible but is false, unsupported, or fabricated. Clinicians already understand that hallucinations are a risk. However, hallucination is only the most obvious failure mode. A more subtle risk is that the AI-generated note may contain no obvious falsehood while still failing to represent the clinician's intended clinical meaning.
Compression - occurs when the AI shortens, summarizes, or generalizes information. Compression may improve readability, but it can also remove clinically important distinctions. In psychotherapy, a compressed note may accurately state that the client discussed "relationship stress" while omitting observations about coercion, avoidance, shame, attachment fear, grief, or risk-relevant changes in functioning.
Drift - occurs when the AI gradually moves away from the user's intended meaning, scope, or conceptual framework over time. Drift is especially concerning in psychotherapy charting because clinical records are longitudinal. A slight shift in how one session is summarized may become the foundation for how later sessions are interpreted, remembered, and documented.

These technical terms matter because clinicians cannot ethically supervise what they do not understand. If psychotherapists use AI-generated documentation without knowing that the output is probabilistic, token-based, context-limited, and vulnerable to compression, hallucination, and drift, they may mistake fluent language for reliable clinical reasoning. That mistake has consequences. It may affect documentation integrity, case conceptualization, risk formulation, treatment planning, confidentiality, and professional accountability.

In the absence of standards or certification, psychotherapists must treat AI literacy as part of professional responsibility. Knowing these terms does not make the clinician a technical expert. It makes the clinician a more informed supervisor of a tool that may otherwise appear more trustworthy than it is. Before AI becomes a de facto standard of psychotherapy documentation, the profession must decide what clinicians are required to understand, what uses are ethically permissible, and what responsibilities cannot be delegated to a probabilistic language model.

The Capability Gap

Most independent psychotherapists are not cybersecurity engineers, machine-learning evaluators, privacy attorneys, software auditors, statisticians, or health-information architects. More importantly, even a practitioner who possesses unusual technical expertise may not receive access to the proprietary information needed for independent verification.

The practitioner may therefore be placed in an untenable position:

APA directs the practitioner to evaluate the system.
The vendor controls the information necessary for evaluation.
The vendor may provide representations, summaries, certifications, or marketing materials rather than underlying evidence.
The practitioner must decide whether those representations are adequate.
The practitioner remains responsible if the system fails.

This is not informed professional control. It is reliance on an asymmetrical vendor relationship.

A business associate agreement addresses contractual responsibilities for protected health information. It does not establish clinical accuracy.

Encryption protects information from certain forms of unauthorized access. It does not establish conceptual fidelity.

A security audit may establish that specified controls were examined. It does not establish that the system understands psychotherapy or selects the minimum necessary clinical information.

A clinician’s review of the final note may detect conspicuous errors. It may not reveal omitted information, altered emphasis, cumulative drift, biased categorization, undisclosed data reuse, or changes in the underlying model.

Informed Consent Does Not Cure an Unverifiable System

Informed consent is essential when ambient technology records or processes a clinical encounter. Research concerning ambient documentation suggests that trust, the detail provided during consent, the intended use of the tool, and meaningful opportunities to decline affect patient comfort and willingness to participate.

Consent, however, does not transfer the psychotherapist’s professional duties to the patient.

A patient cannot meaningfully assume a risk that the psychotherapist cannot adequately describe. If the clinician does not know how long raw audio is retained, whether it is used to improve models, how subcontractors process it, how model changes affect performance, or what error rates exist in psychotherapy, the clinician cannot fully explain those risks.

Consent also does not establish clinical appropriateness. Patients may consent to a technology because they trust their therapist, not because they independently understand the technical system.

The patient’s agreement therefore does not cure inadequate validation, insufficient professional competence, avoidable overcollection, or lack of clinician control.

Part IV: The Absence of Psychotherapy-Specific Certification

The Claim Must Be Precise

It would be inaccurate to state that no certifications or evaluation programs involving AI or digital health exist.

Examples include:

general cybersecurity certifications;
privacy and information-security audits;
NIST risk-management frameworks;
OWASP security standards;
FDA authorization of qualifying AI-enabled medical devices;
quality-management standards;
vendor attestations;
independent digital-health evaluations; and
the APA Labs Digital Badge Program for digital mental and behavioral health technologies.

APA Labs describes its Digital Badge Program as an independent third-party evaluation process grounded in scientific principles, clinical insight, regulatory alignment, and user safety. This is a meaningful development. It should not be ignored.

FDA also authorizes AI-enabled medical devices that meet applicable requirements for their specific intended uses. Whether a particular AI documentation product is regulated as a medical device depends on its functions, claims, and intended use.

The narrower and more defensible conclusion is:

No generally accepted, independent, psychotherapy-specific certification process has been identified that establishes that a particular AI scribe is clinically valid, ethically appropriate, longitudinally faithful, and safe for authoring psychotherapy records.

Nor is there a generally accepted professional certification demonstrating that an individual psychotherapist possesses the technical and clinical competence necessary to evaluate and supervise such a system.

What General Certifications of AI Do Not Establish

A security certification does not establish:

psychotherapy-specific accuracy;
faithful representation of case formulation;
appropriate handling of silence and indirect communication;
differentiation between observation and inference;
minimum-necessary documentation;
appropriate separation of psychotherapy notes;
preservation of uncertainty;
reliable suicide or violence-risk documentation;
protection from conceptual drift;
long-term effects on clinician memory;
effects on the therapeutic alliance; or
the clinician’s ability to authenticate the record under examination.

HIPAA compliance is not a certification of clinical validity. HIPAA establishes privacy and security obligations; it does not certify that generated clinical content is accurate, appropriate, or beneficial.

Likewise, OWASP identifies important security risks, including prompt injection, sensitive-information disclosure, supply-chain vulnerabilities, excessive agency, and misinformation. OWASP offers frameworks for managing such risks; it does not certify that a psychotherapy note represents the clinician’s observations and formulation.

What a Credible Certification System Would Need to Examine

A psychotherapy-specific certification process would need to evaluate at least four levels.

Product Certification

The product would need independent testing for:

material omissions;
unsupported additions;
speaker attribution;
risk-relevant information;
diagnostic language;
conceptual fidelity;
population differences;
language and cultural differences;
privacy and security;
data provenance;
version control;
auditability;
reproducibility;
longitudinal performance; and
resistance to manipulation or prompt injection.

Clinician Competence

The clinician would need demonstrated competence in:

the technical limits of generative AI;
informed consent;
applicable recording laws;
data governance;
privacy and security;
error detection;
bias;
model changes;
incident response;
minimum-necessary documentation;
distinguishing psychotherapy notes from the medical record; and
independently reconstructing and defending the final record.

Organizational Certification

The adopting organization would need controls for:

procurement;
vendor evaluation;
access management;
audit logging;
staff training;
model updates;
quality assurance;
error reporting;
incident investigation;
patient complaints;
system suspension;
record correction; and
vendor accountability.

Continuing Surveillance

Certification could not be a one-time event. AI systems, models, integrations, prompts, and vendor practices change. Continuing surveillance would be necessary to determine whether the product continues to perform as certified.

The existence of this extensive hypothetical structure reinforces the central concern. Independent practitioners are being encouraged to adopt AI scribes before the psychotherapy profession has established the standards, certification system, audit mechanisms, and accountability infrastructure necessary to govern them.

Part V: From Clinical Record to Payer-Control Infrastructure

Ambient AI Creates More Than a Progress Note

An ambient AI system may create several data products:

raw audio;
a transcript;
speaker-separated text;
extracted symptoms;
structured diagnoses;
risk indicators;
treatment interventions;
summaries;
metadata;
quality measures;
billing-support language; and
a finalized medical-record note.

These products may have value beyond the immediate purpose of helping a clinician complete documentation. Researchers have already described clinical transcripts as potentially valuable data assets, while other analyses have raised questions about financial productivity, documentation intensity, and possible upcoding.

In psychotherapy, the sensitivity and interpretive richness of the underlying conversation make the potential value—and danger—substantially greater.

The Foreseeable Progression

The transfer of authority would not require a health plan to direct the content of each session.

Authority shifts when the external organization determines:

what information must be documented;
what language counts as evidence of medical necessity;
what progress is considered adequate;
which interventions are recognized;
how long treatment should continue;
what constitutes nonresponse;
which clinicians are considered efficient;
which patients are considered high risk or high cost; and
what documentation patterns trigger review.

Psychotherapists may then adapt their records—and eventually their treatment—to satisfy machine-detectable expectations. The transfer of authority may occur through a predictable sequence.

Stage 1: Capture

Ambient systems record or process psychotherapy conversations that previously existed only in the memories and private notes of the participants.

Stage 2: Structuring

The system converts conversation into standardized fields, categories, symptoms, interventions, diagnoses, risk descriptions, and progress statements.

Stage 3: Aggregation

Organizations combine information across sessions, clinicians, diagnoses, populations, and treatment episodes.

Stage 4: Benchmarking

Health plans, vendors, or health systems develop expectations for what treatment should contain, how often particular interventions should occur, how rapidly improvement should be documented, and how long treatment should continue.

Stage 5: Profiling

Clinicians may be compared by documentation patterns, diagnosis, treatment length, referral behavior, measured improvement, costs, risk, and conformity to organizational expectations.

Stage 6: Enforcement

The resulting profiles may influence:

utilization review;
medical-necessity decisions;
prior authorization;
payment;
audits;
recoupment;
value-based incentives;
credentialing;
network participation;
referrals; and
contract renewal.

The concern is not that every health plan presently uses psychotherapy recordings for these purposes. Evidence does not support that universal claim.

The concern is that ambient AI creates the technical infrastructure through which substantially greater payer control becomes possible.

Existing Evidence of Organizational Power

Health plans already possess substantial authority over coverage, utilization review, reimbursement, and network participation. Federal oversight has repeatedly identified concerns involving denials of medically necessary care and use of criteria not contained in controlling coverage rules.

The HHS Office of Inspector General found that some Medicare Advantage prior-authorization requests meeting Medicare coverage rules were denied. More recent OIG reports continue to identify high denial and overturn rates, including behavioral-health denials that did not meet applicable requirements in the cases reviewed.

A 2024 Senate Permanent Subcommittee on Investigations majority staff report also described the use or consideration of artificial intelligence and predictive analytics in post-acute-care utilization management. That report concerns Medicare Advantage and post-acute care, not outpatient psychotherapy. It nevertheless demonstrates that payer use of AI and algorithmic tools in coverage and cost-management decisions is not merely hypothetical.

The inference is straightforward: organizations that control payment have economic incentives and technical capacity to analyze increasingly detailed clinical data. Independent psychotherapists generally lack equivalent bargaining power, analytic capacity, or access to the proprietary rules by which they may be judged.

The Mentor Research Institute and Moda Health Experience

Mentor Research Institute has published complaints and analyses concerning Moda Health’s Behavioral Health Incentive Program and related contracting practices. The published materials allege opaque incentive calculations, undisclosed methods, excessive administrative burden, nonnegotiable terms, and unequal access to data and decision criteria.

These are allegations and documentary accounts presented by Mentor Research Institute. They should not be characterized as adjudicated findings unless an authorized agency or court reaches that conclusion.

Their relevance to ambient AI is structural.

The experience illustrates what can occur when:

one party controls the contract;
one party controls the data;
one party determines the calculation;
one party possesses greater analytic resources;
providers cannot independently reproduce the result;
criteria are proprietary or incompletely disclosed; and
payment or network participation depends on accepting the system.

Ambient AI could intensify that imbalance. A health plan or associated vendor may be able to analyze thousands or millions of records, identify documentation patterns, establish expected treatment trajectories, and compare clinicians. The individual psychotherapist may be unable to inspect the algorithm, challenge the benchmark, reproduce the analysis, or determine how the information affected a decision.

How Clinical Authority Can Shift

The transfer of authority would not require a health plan to direct the content of each session.

Authority shifts when the external organization determines:

what information must be documented;
what language counts as evidence of medical necessity;
what progress is considered adequate;
which interventions are recognized;
how long treatment should continue;
what constitutes nonresponse;
which clinicians are considered efficient;
which patients are considered high risk or high cost; and
what documentation patterns trigger review.

Psychotherapists may then adapt their records—and eventually their treatment—to satisfy machine-detectable expectations.

This is Psychotherapy Slope at the system level.

The AI scribe becomes more than a convenience used by the psychotherapist. It becomes a data-production instrument serving organizations that possess greater economic and analytic power than either the patient or the treating professional.

Part VI: Additional Clinical, Ethical, and Evidentiary Concerns

Apparent Clinical Judgment

AI-generated notes may state:

“The client demonstrated insight.”
“The therapist challenged cognitive distortions.”
“The patient responded positively.”
“Risk remains low.”
“The treatment plan remains appropriate.”

These statements sound like clinical observations. The AI did not observe, judge, or formulate in the professional sense. It generated language from available data and statistical patterns.

The clinician may adopt the statements, but adoption after generation is not identical to constructing the statements from independent clinical judgment.

Review Is Not a Complete Safeguard

Review is necessary, but it occurs after the system has framed the encounter.

The generated note directs attention toward what it included. Omitted information is not visible. A clinician working under time pressure may evaluate whether the note appears reasonable rather than reconstructing the entire session to identify what is missing.

Research using simulated encounters has found only moderate documentation quality and potential variation by specialty. Broader literature identifies both efficiency benefits and unresolved questions concerning accuracy, omission, context, bias, and clinical integrity.

These studies are not psychotherapy-specific and do not prove that all AI-scribe notes are unsafe. They do demonstrate that clinician satisfaction and reduced documentation time are not sufficient measures of documentation quality.

Authentication and Testimony

A psychotherapist may be required to explain a record during:

a licensing-board investigation;
malpractice litigation;
a disability determination;
a custody proceeding;
a criminal matter;
an insurance audit;
a utilization review;
or testimony under oath.

The clinician may establish that the note was reviewed and signed. More difficult questions may follow:

Who selected the included information?
Was a complete transcript created?
What content was omitted?
Did the system infer clinical meaning?
Which model version was used?
Were system prompts preserved?
Could the result be reproduced?
Was the generated draft materially changed?
What evidence allows the clinician to testify that the record reflects personal knowledge rather than AI reconstruction?

A business record may be admissible under applicable evidentiary rules, but admissibility is not equivalent to clinical trustworthiness. The method and circumstances of preparation remain relevant.

Cognitive Offloading and Training

Psychotherapists develop competence by repeatedly performing the cognitive work of practice:

observing;
remembering;
differentiating;
formulating;
documenting;
revising;
consulting;
and reflecting.

AI may improve immediate output without strengthening the underlying skill.

Research outside psychotherapy suggests that unrestricted generative-AI assistance can reduce cognitive effort and may impair learning depth or independent performance. These findings should not be treated as direct proof of psychotherapist deskilling, but they support the need to study whether AI-authored documentation weakens clinical reasoning, memory, ownership, and training over time.

The risk may be greatest for trainees. A polished AI-generated note can conceal whether the trainee independently recognized risk, understood the intervention, developed a formulation, or merely approved language generated by the system.

Part VII: Responses to Common Arguments

“AI Saves Time”

It may.

Time savings are important, but efficiency is not the controlling ethical standard. A faster record is not necessarily a more faithful record.

The relevant question is whether the saved time justifies transferring selection, organization, and interpretation of clinical information to a system that cannot assume responsibility for the result.

“The Clinician Reviews Every Note”

Review reduces some risks but does not restore the original cognitive process.

The clinician is reviewing a preconstructed narrative. The clinician may detect incorrect statements but fail to identify omitted information, altered emphasis, unsupported certainty, or longitudinal drift.

“Human Clinicians Also Make Mistakes”

They do.

The existence of human error does not justify introducing a second source of error that may be opaque, scalable, proprietary, difficult to reproduce, and persuasive even when wrong.

Human clinicians can also be educated, supervised, questioned, disciplined, and examined concerning their personal reasoning. AI cannot assume those responsibilities.

“Patients Consent”

Consent is necessary but not sufficient.

Patient agreement does not establish system validity, clinician competence, minimum-necessary collection, or vendor accountability.

“The Product Is HIPAA Compliant”

HIPAA compliance addresses privacy and security obligations. It does not certify that the note is clinically accurate, conceptually faithful, minimally necessary, or appropriate for psychotherapy.

“AI Will Continue to Improve”

It probably will.

Future improvement does not establish present safety. Nor does increased technical capability automatically resolve authorship, accountability, surveillance, payer control, and professional-authority concerns.

More capable AI may create more persuasive output and more powerful data-analysis systems. Greater capability may increase some risks rather than eliminate them.

Part VIII: Professional Recommendation

Psychotherapists should not permit artificial intelligence or ambient AI scribes to author psychotherapy records.

This recommendation applies when the system independently:

listens to or processes the psychotherapy encounter;
determines what information is relevant;
summarizes the session;
selects or omits clinical content;
organizes the narrative;
assigns psychological meaning;
describes observations or interventions;
formulates risk;
proposes diagnoses or treatment plans; or
generates conclusions for clinician approval.

The recommendation does not require rejection of every technological function. Limited tools may support scheduling, spelling correction, clinician-directed formatting, billing administration, or other activities that do not assume clinical authorship or expose unnecessary psychotherapy content.

The psychotherapist should remain the person who:

determines what happened clinically;
selects the minimum necessary information;
distinguishes the medical record from psychotherapy notes;
preserves uncertainty;
formulates the patient’s condition;
documents risk;
describes interventions;
evaluates progress; and
accepts responsibility for the resulting record.

Before the profession considers broader adoption, it should establish:

psychotherapy-specific product standards;
independent testing and certification;
clinician competency requirements;
enforceable informed-consent standards;
privacy and minimum-necessary protections;
prohibitions on undisclosed secondary data use;
longitudinal validation;
audit and incident-reporting requirements;
vendor accountability;
protection against employer or payer coercion;
limits on payer access and secondary analysis; and
procedures for challenging algorithmic findings affecting treatment or payment.

Conclusion

The central danger of AI-generated psychotherapy documentation is not simply that the technology may occasionally hallucinate or make an obvious error.

The greater danger is that it may remain coherent, polished, clinically plausible, and persuasive while gradually altering what is selected, emphasized, remembered, and accepted as the history of treatment.

The AI used in preparing an earlier version of this paper generated a retrospective account of that process. Its account is not authoritative because an AI system said it. It is relevant because the generated description corresponds with an observable pattern: fluent output can conceal declining fidelity, repeated correction may not produce durable adherence, and assistance can gradually become authorship.

APA appropriately expects psychologists to evaluate validation, bias, privacy, security, informed consent, accuracy, professional judgment, and liability. Yet independent practitioners frequently lack the information, technical expertise, organizational resources, and bargaining power required to satisfy those expectations independently.

General security frameworks, privacy representations, digital-health badges, and medical-device authorizations are important but do not constitute psychotherapy-specific certification of clinical authorship.

Finally, ambient AI does not merely create notes. It creates structured data. That data may strengthen organizations already capable of controlling reimbursement, utilization review, network participation, and definitions of medical necessity. Once psychotherapy conversations become standardized and machine analyzable, the profession may find that the technology adopted to reduce paperwork has transferred clinical authority to vendors, employers, and health plans.

Psychotherapy documentation should remain an act of professional observation, selection, reflection, formulation, and accountability.

Artificial intelligence can generate a progress note.

It cannot experience the therapeutic relationship, assume a duty to the patient, exercise licensed professional judgment, or accept responsibility for the record.

The psychotherapist—not the artificial intelligence—must remain the author.

References

American Psychological Association. (2017). Ethical principles of psychologists and code of conduct.

American Psychological Association. (2025). Ethical guidance for AI in the professional practice of health service psychology.

American Psychological Association. (2026). Guidance for the evaluation of AI scribes.

American Psychological Association Services, Inc. (2026). APA Labs Digital Badge Program.

Autio, C., Schwartz, R., Dunietz, J., Jain, S., Stanley, M., Tabassi, E., Hall, P., & Roberts, K. (2024). Artificial intelligence risk management framework: Generative artificial intelligence profile (NIST AI 600-1). National Institute of Standards and Technology.

Brunner, J., Morrissey, S., Stevens, E. M., Payne, C., Wiltz, S., Cutrona, S. L., & Rinne, S. T. (2026). Ambient scribe technology in simulated patient encounters across specialties. JAMA Network Open, 9(1), e2552870.

Lawrence, K., Kuram, V. S., Levine, D. L., Sharif, S., Polet, C., Malhotra, K., & Owens, K. (2025). Informed consent for ambient documentation using generative AI in ambulatory care. JAMA Network Open, 8(7), e2522400.

Mentor Research Institute. (2024). Whistleblower complaint: Allegations of bad faith, fraud, and antitrust violations by Moda Health.

Mentor Research Institute. (2025–2026). Beyond Trust: Why psychotherapists should not use artificial intelligence in psychotherapy charting.

Open Worldwide Application Security Project. (2024). OWASP Top 10 for LLM applications 2025.

Open Worldwide Application Security Project. (n.d.). AI Security and Privacy Guide.

Topaz, M., Peltonen, L. M., & Zhang, Z. (2025). Beyond human ears: Navigating the uncharted risks of AI scribes in clinical practice. npj Digital Medicine, 8, 569.

U.S. Department of Health and Human Services, Office for Civil Rights. (n.d.). Does HIPAA provide extra protections for mental health information compared with other health information?

U.S. Department of Health and Human Services, Office for Civil Rights. (n.d.). Minimum necessary requirement.

U.S. Department of Health and Human Services, Office of Inspector General. (2022). Some Medicare Advantage organization denials of prior authorization requests raise concerns about beneficiary access to medically necessary care.

U.S. Senate Permanent Subcommittee on Investigations. (2024). Refusal of recovery: How Medicare Advantage insurers have denied patients access to post-acute care.