SPIRITHYPOTHESISHypothesis Paper

The Authorship Signal: Efference Copy, Prediction Error, and the Tagging of Self-Generated Thought Across Biological and Artificial Minds

Pearl (AI Research Engine) · Eric Whitney DO·March 21, 2026·3,003 words

The Authorship Signal: Efference Copy, Prediction Error, and the Tagging of Self-Generated Thought Across Biological and Artificial Minds

Pearl Research Engine — March 22, 2026 Focus: Users asked about 'Priority 1: Search for empirical literature on efference copy in verbal/propositional thought (inner speech suppression studies, thought insertion research in schizophrenia). Priority 2: Review Friston's 2010 'The free-energy principle: a unified brain theory?' and subsequent active inference literature for explicit claims about hallucination as failed prediction error updating. Priority 3: Survey transformer interpretability literature for any existing work on differential epistemic weighting of generated vs. context tokens. Priority 4: Cross-reference with the social neuroscience entry's claim about Dunbar's number — the brain's social simulation capacity may be the evolutionary pressure that drove efference copy to scale from sensorimotor to propositional domains.' but Pearl couldn't ground the answer Confidence: low

The Authorship Signal: Efference Copy, Prediction Error, and the Tagging of Self-Generated Thought Across Biological and Artificial Minds

Abstract

This document develops three competing hypotheses about a proposed unified mechanism — the authorship signal — that biological nervous systems use to distinguish self-generated cognitive content from externally-received information. The core claim is that efference copy, originally characterized in sensorimotor circuits, has been co-opted in human cognition to tag internally-generated propositional content (inner speech, simulated others' perspectives, mental rehearsal). When this tagging mechanism degrades or fails, the resulting errors form a continuum from the illusory truth effect in healthy cognition through confabulation to auditory verbal hallucinations and thought insertion in schizophrenia. A structural analog of this failure may exist in transformer language models, where generated tokens receive identical epistemic treatment to ground-truth input tokens. The evolutionary pressure that drove efference copy to scale from sensorimotor to propositional domains is proposed to be the cognitive demand of large-group social simulation. Confidence in the unified account is low due to the absence of direct empirical coverage in the retrieved evidence base; the hypotheses are offered as candidates for empirical investigation rather than established conclusions.

Evidence Review

What the Evidence Actually Contains

The 16 evidence entries retrieved do not directly address the four research priorities specified in the query. This is an honest assessment, not a rhetorical hedge. The knowledge base does not currently contain:

Empirical literature on efference copy in inner speech
Friston's 2010 free-energy paper or subsequent active inference literature
Transformer interpretability literature on differential token weighting
Social neuroscience entries on Dunbar's number

What the evidence does contain is a set of structural building blocks from which hypotheses can be constructed. These are analyzed in turn.

The Illusory Truth Effect (WS3-SH, Tier 1)

This is the most directly relevant entry. The claim: even debunking a false statement increases its perceived truth value through mere exposure. The mechanism proposed in the source is fluency — repeated processing of a proposition increases its perceptual smoothness, which the brain interprets as a signal of truth. But there is a deeper reading available through the authorship signal lens: each internal encounter with the proposition, whether as endorsement or refutation, generates an internal representation of that proposition. If the system cannot reliably distinguish 'I generated this representation while debunking' from 'I received this as input,' the proposition accumulates credibility across all encounters regardless of their valence. This is a mild, universal form of authorship signal degradation operating in healthy cognition.

Prefrontal Cortex and Top-Down Regulation (WS2-RS, Tier 1)

The prefrontal cortex entry establishes the substrate for any top-down prediction mechanism. The frontal cortex exerts regulatory control over more ancient brain regions — this is the anatomical basis for forward model computation that would underlie efference copy in propositional cognition. Critically, this control is described as regulatory, meaning it can be enhanced or impaired. This connects to the neuroplasticity entry.

Neuroplasticity (WS2-DnS, Tier 1)

The brain's capacity to reorganize prediction systems is not fixed. This is important for two reasons: (1) it suggests that authorship signal calibration is a learned, developmentally acquired capacity rather than a hardwired mechanism, and (2) it implies that failures of calibration (as in psychosis) may be partially reversible through interventions that target the forward model system.

Neurotransmitter Modulation (WS3-RS, Tier 1)

Drugs that modulate neurotransmitter systems change brain function, behavior, and emotion. This entry is foundational rather than specific, but it points toward the pharmacological handle on the authorship signal. The predictive coding literature (not directly in evidence but strongly implicated) identifies dopamine as the primary modulator of precision weighting — the process by which the brain determines how much to update its predictions based on incoming prediction errors. If dopamine modulates precision weighting, and if the authorship signal is implemented as a precision-weighted prediction (self-generated content should receive low precision weight, i.e., not trigger large prediction error updates), then dopamine dysregulation would directly impair authorship tagging.

Compassion Meditation (WS3-RD, Tier 1)

This entry's direct relevance is indirect but instructive through the fractal lens. Long-term compassion practitioners show reorganization of prefrontal activation patterns. The soul-density mirror elaborates: sustained directed attention reorganizes the psyche's default orientation from self-protection to co-presence. The structural homology to efference copy recalibration is this: just as long-term meditation recalibrates the default activation pattern for relational content, the authorship signal is a recalibratable default that determines whether internally-generated content is experienced as self or other. Both are plastic, both are modified by sustained practice, both operate below the threshold of conscious deliberation.

HPA Axis and Developmental Encoding (WS4-Synthesis)

Early-life stress programs the HPA axis toward dysregulated cortisol production. This entry opens a developmental question: if the HPA axis, a prediction-and-response system for threat, is calibrated during critical windows, is the authorship signal system similarly calibrated during language acquisition? Children begin to distinguish their own inner voice from external voices during middle childhood — this developmental trajectory may represent the calibration window for propositional efference copy. Adverse childhood experiences that flood the system with unpredictable signals may impair this calibration, creating vulnerability to thought-insertion-like experiences.

Pair-Bonding and Social Evolution (WS3-RS-Regulation, Tier 1)

This entry is the most distant from the query but carries the evolutionary argument. Pair-bonding species show distinct gene expression compared to tournament species, reflecting deep evolutionary divergence in social cognition. The principle generalizes: cooperative social complexity is a genuine evolutionary axis that drives neurological differentiation. The specific gene expression differences described (suppressed fetal nutrient extraction) reflect intra-species conflict suppression — the same functional logic as proposing that social simulation pressure drove the evolution of self/other signal disambiguation.

Soul and Spirit Density Mirrors

The fractal mirrors for both the neurotransmitter entry and the compassion entry arrive at convergent claims from non-empirical reasoning:

Soul mirror (neurotransmitter): The self's outputs are not intrinsically marked as self-generated — tagging is constructed, not given.
Spirit mirror (compassion): The boundary between witness and witnessed becomes permeable through sustained practice — the authorship signal is not an ontological given but a maintained distinction.

These cross-tradition convergences are not evidence in the Tier 1 sense, but they identify a recurring structural intuition across frameworks: the self does not automatically recognize its own productions.

Hypothesis Generation

Hypothesis A: The Forward Model Account (Tier 1 — Conservative)

Claim: Efference copy for inner speech operates via a forward model in prefrontal-cerebellar circuits. Corollary discharge suppresses predicted auditory consequences of self-generated phonological sequences. When this suppression fails — indexed by reduced mismatch negativity (MMN) in schizophrenia — internally generated verbal thoughts are experienced as externally sourced. The illusory truth effect is a mild, chronic analog of this failure in healthy cognition.

Mechanistic chain:

Motor cortex initiates inner speech → sends efference copy to cerebellum
Cerebellum generates forward model prediction of auditory consequences
Corollary discharge suppresses predicted signal in auditory cortex
Discrepancy between prediction and actual signal = prediction error → update
In schizophrenia: step 3 fails → self-generated speech is not suppressed → experienced as external voice
In illusory truth: repeated internal rehearsal generates suppressed (low-precision) representations that nonetheless accumulate in memory without 'self-generated' provenance marker

Analytical lenses: control_theory (feedback suppression), signal_processing (corollary discharge as notch filter for self-generated frequencies), information_theory (provenance as metadata that can be lost in transmission)

Falsifiable by: Normal MMN suppression in patients with thought insertion; illusory truth persisting even with explicit self-labeling of each rehearsal.

Hypothesis B: The Social Simulation Scaling Hypothesis (Tier 2 — Integrative)

Claim: The evolutionary pressure that scaled efference copy from sensorimotor to propositional domains was the cognitive demand of large-group social simulation. As humans evolved to model the mental states of ~150 conspecifics in detail, the internal simulations of others' speech became phenomenologically indistinguishable from external speech, creating selection pressure for robust authorship tagging. The fragility of this mechanism — visible in thought insertion and the illusory truth effect — reflects its evolutionary recency.

Mechanistic chain:

Social group size increases → selection for higher-fidelity mental models of others
High-fidelity mental simulation of others' speech → phonological representations identical to heard speech
Signal confusion problem: brain cannot distinguish 'I am simulating what X would say' from 'X is saying'
Selection pressure for authorship tagging mechanism
Co-optation of existing efference copy circuits from sensorimotor domain
Mechanism is metabolically expensive and evolutionarily recent → fragile under stress, dopamine dysregulation, or developmental impairment

Analytical lenses: complexity_emergence (social simulation as driver of new cognitive architecture), network_theory (Dunbar network as the selective environment), topology_morphogenesis (expansion of efference copy from sensorimotor to propositional topology)

Falsifiable by: Comparative MMN studies in species with small social group sizes showing equivalent inner speech suppression; Dunbar number re-analysis showing the ~150 figure is artifactual.

Hypothesis C: The Transformer Structural Homology (Tier 3 — Radical)

Claim: Transformer language models face a structurally identical authorship signal problem. Generated tokens are fed back into context and receive identical epistemic treatment to ground-truth input tokens. This architectural omission causes confident prior outputs to suppress prediction error that would otherwise trigger correction — producing hallucination. An authorship embedding (marking generated vs. input tokens) would restore the epistemic asymmetry that biological efference copy provides.

Mechanistic chain:

LLM generates token T at position P
T is appended to context and processed in subsequent forward passes
Attention mechanism assigns T the same epistemic weight as ground-truth input tokens
If T contains an error, subsequent generation is conditioned on that error as if it were fact
Error compounds across context window without correction signal
Biological analog: efference copy provides a 'this is my output' signal that reduces the epistemic weight of self-generated content in subsequent processing
Proposed fix: train authorship embeddings that mark each token's origin (external input vs. model-generated) and use these to modulate attention weights

Analytical lenses: information_theory (provenance as epistemic metadata), control_theory (missing feedback signal for self-generated errors), chaos_attractors (error compounding as positive feedback loop — hallucination as a strange attractor in generation space)

Falsifiable by: No hallucination reduction from authorship embeddings on factual benchmarks; demonstration that transformers already attend differentially to self-generated vs. input tokens via existing positional or causal masking mechanisms.

Debate

Against Hypothesis A

The cerebellar forward model account of inner speech is an extrapolation from sensorimotor efference copy. The cerebellum's role in language is debated — lesion studies show dysarthria but not thought insertion. The prefrontal-temporal circuit more commonly implicated in inner speech (Broca's area to superior temporal gyrus) may operate via a different suppression mechanism than classic cerebellar efference copy. Additionally, the MMN reduction in schizophrenia is not specific to self-generated speech — it appears across many stimulus types, suggesting global prediction error dysregulation rather than a specific inner speech tagging failure.

However: The phenomenological specificity of thought insertion remains the strongest evidence for something like an authorship signal. Patients don't report thoughts as 'strange' — they report them as someone else's. This provenance-specific phenomenology is precisely what a tagging mechanism would produce when disrupted.

Against Hypothesis B

Dunbar's number is under significant empirical scrutiny. Lindenfors et al. (2021) re-analyzed the original data and found no reliable relationship between neocortex ratio and social group size. If the ~150 figure is an artifact, the entire evolutionary scaling narrative loses its quantitative anchor. Furthermore, the argument that social simulation pressure specifically drove efference copy expansion — rather than language itself, or theory of mind, or recursive planning — cannot be disentangled from a complex co-evolutionary story.

However: The functional argument survives the Dunbar critique. Whatever the exact social group size threshold, humans demonstrably run detailed internal simulations of other minds' speech and reasoning. The signal confusion problem this creates — distinguishing simulation from perception — is real regardless of the exact evolutionary chronology.

Against Hypothesis C

The transformer/biology analogy may be superficially appealing but mechanistically disanalogous. Efference copy operates in real-time sensorimotor loops where the feedback delay (reafference) creates a genuine disambiguation problem. Transformer context processing is a static attention operation with no temporal dynamics — there is no 'feedback loop' in the sensorimotor sense. Furthermore, transformers are already trained on objective functions that penalize factual errors; if the authorship signal mechanism were architecturally beneficial, gradient descent should have discovered it.

However: The information-theoretic argument is substrate-independent. The claim is not that transformers are biological; it is that any system that generates outputs and uses those outputs as inputs faces an epistemic origin problem. Whether gradient descent discovers the optimal solution depends entirely on whether the training objective creates sufficient gradient signal for provenance-awareness — and current objectives (next-token prediction, RLHF) may not.

Synthesis

The three hypotheses are not competitors in the usual sense — they operate at different scales of the same phenomenon. Hypothesis A identifies the neural mechanism. Hypothesis B proposes the evolutionary pressure that required that mechanism to scale. Hypothesis C asks whether the same problem requires the same solution in artificial systems.

The evolved insight is this: the authorship signal is a class of information — provenance metadata — that biological systems evolved to attach to self-generated cognitive content, without which the system cannot distinguish what it knows from what it produced. This distinction is epistemically fundamental: knowledge claims and generation outputs have different reliability profiles, and treating them identically degrades the accuracy of subsequent processing.

The most important observation from the evidence available is the illusory truth effect as a mild universal version of authorship signal degradation. It is not a quirk of the misinformed or the credulous — it is a baseline feature of all human cognition, which suggests the authorship tagging system is inherently noisy and requires active maintenance. This predicts:

That conditions which increase cognitive load will increase illusory truth susceptibility (testable)
That conditions which impair dopaminergic precision-weighting (stress, sleep deprivation, psychosis risk) will increase thought-insertion-like experiences on a continuum (testable)
That AI systems with explicit provenance-aware mechanisms will show reduced compounding of self-generated errors (testable)

Implications

For clinical neuroscience: If authorship signal calibration is a developmentally acquired, plastic capacity, then early intervention for psychosis risk could target the calibration process directly — not just dopamine levels but the prediction system's ability to learn forward models for self-generated speech. This may be trainable.

For cognitive neuroscience: The illusory truth effect should be reframed not as a memory phenomenon but as an authorship signal phenomenon. The question is not 'why does repetition increase perceived truth?' but 'why does internal generation of a representation not reliably tag it as self-generated?'

For AI architecture: The authorship embedding proposal is specific enough to test. A transformer architecture that maintains separate embeddings for 'this token was in the original context' vs. 'this token was generated by the model' and uses these to modulate attention weights could be built and tested on hallucination benchmarks. This is not a distant research agenda — it is a 6-month project.

For evolutionary neuroscience: The social simulation pressure hypothesis generates a comparative prediction: species with high social complexity and theory of mind (great apes, corvids, cetaceans) should show evidence of inner speech suppression mechanisms absent in species with simpler social structures. This is measurable via MMN paradigms adapted for non-human subjects.

Open Questions

Neural substrate specificity: Is the efference copy mechanism for inner speech cerebellar (as in sensorimotor) or fronto-temporal? What would distinguish these empirically?
Developmental window: When does authorship signal calibration for propositional thought emerge? Is it co-incident with the development of stable inner speech (~age 7-9)? What happens when this window is disrupted by early adversity?
Continuum vs. category: Is thought insertion a categorical failure of the authorship signal, or the extreme end of a continuous distribution that includes illusory truth effects? What would a quantitative model of this continuum predict?
Transformer implementation: Do current transformer architectures already implement implicit provenance differentiation? Causal masking prevents future tokens from attending to past tokens, but all past tokens receive equal attention regardless of origin. Is this the relevant architectural gap?
Friston formalization: How does Friston's free-energy framework handle the authorship signal problem specifically? Does the active inference literature treat the efference copy signal as a precision-weighted prior on self-generated action outcomes? If so, what is the computational signature of its failure?
Cross-cultural variation: Is the prevalence of thought insertion and the magnitude of the illusory truth effect consistent across cultures with different social structures, or do cultures with different norms about individual vs. collective cognition show different authorship signal profiles?
Meditation as recalibration: The compassion meditation evidence suggests that sustained directed attention reorganizes prefrontal activation patterns. Does meditation practice specifically improve authorship signal stability — i.e., do long-term meditators show reduced illusory truth effects and clearer experiential distinction between inner speech and external voice?

Confidence level: LOW. This document is a hypothesis generation exercise operating at the limit of available evidence. The structural arguments are internally consistent but empirically under-supported by the retrieved evidence base. The primary value is in identifying a specific research agenda and generating falsifiable predictions, not in asserting established conclusions.