The Kafka Test: On Trusting AI’s Eloquent Incoherence
By Joe Nalven + Anthropic Claude
Introduction to a reinvented essay
In my earlier Times of Israel article AI Reflects on Otherness: A Trusted Interlocutor?, I explored whether AI could meaningfully engage with Jewish themes—testing Claude’s reflections on biblical figures of otherness alongside Kafka’s performing ape. The AI’s responses were intriguing; I affirmed its continued use as an analytical partner, a digital hevruta, while remaining mindful of its contaminated knowledge. Note that my earlier articles reveal other disquieting aspects of such “conversations.”
That article ended with measured optimism: we could work with this “mystery of a distinct epistemology.” This essay further complicates that conclusion—though perhaps not in the way you would expect.
As a cultural anthropologist, I’m trained to listen carefully to members of that culture without necessarily believing what they say. The goal is understanding their framework of reality, not adopting it. As an attorney and litigator, I’ve spent years questioning witnesses, parsing testimony, advocating positions while maintaining intellectual distance. Both disciplines teach a useful agnosticism: you can engage deeply without committing to truth claims.
This turns out to be surprisingly relevant for AI interaction.
What I’ve discovered through extended conversation with Claude is what Dave Gilbert, a philosopher and an AI engineer, aptly referred to in our communications as persuasiveness in a local context but with global incoherence. Claude can discuss Kafka’s allegory about Jewish assimilation with apparent sophistication while harboring anti-Jewish bias in other contexts. The Anti-Defamation League‘s testing confirmed this pattern quantitatively. We are confronted with fluent literary engagement coexisting with documented bias against the very communities these texts represent. Likewise, it engages thoughtfully with Ellison’s Invisible Man while maintaining anti-Black bias inside its protocols.
For someone with my training, this creates an interesting rather than disqualifying problem. I can work with an eloquent but incoherent interlocutor the same way I work with a partisan witness or a cultural subject with their own agenda. I don’t need AI to exhibit coherence to find it useful. I need to understand its incoherence well enough to work around it.
But here’s the rub: Most users aren’t cultural anthropologists, litigators, or likeminded inquisitors of reality. They don’t have decades of practice in productive intellectual agnosticism. They interact with AI as if being informed equals trustworthiness, as if local persuasiveness guarantees global reliability. A teenager using Character.AI doesn’t approach it with anthropological distance. A student asking Claude about the Holocaust doesn’t cross-examine its responses like a person being deposed.
Yet AI’s eloquence cuts both ways. Its ability to sound thoughtful and culturally competent makes it dangerous for naive users who mistake performance for understanding. But approached with appropriate skepticism, that same eloquence enables powerful critique. In The AI Critic, William Yaworsky and I demonstrated how adversarial AI can strengthen scholarly work, precisely because it simulates an informed adversary without social friction. The method works for those who know how to interrogate rather than simply trust.
So we face a tangled web of shoulds, opportunities, and warnings. AI becomes valuable when interrogated rigorously but dangerous when trusted uncritically. The public discourse is catching up to this complexity, but not quickly enough. As capabilities advance (extended memory, agentic behavior, self-improvement) both opportunities and dangers intensify. We need users who understand that working productively with AI’s eloquent incoherence requires skills most people don’t yet possess.
So this essay sits in an uncomfortable space: I remain cautiously optimistic about my ability to use AI productively despite its architectural incoherence. Others who share this conversational distance are equally well-situated to engage with AI models. But I’m increasingly concerned about deploying these systems broadly without better public understanding of what they actually are—locally persuasive pattern-matchers rather than coherent thinkers.
The Kafka test reveals something both fascinating and troubling: AI has achieved a form of eloquent incoherence that may be uniquely modern. Kafka’s ape knew he was performing humanity without being human. Current AI systems don’t even have that self-awareness: They perform coherence without possessing it, sophistication without integration, understanding without comprehension.
Can we work with that? I think those who are well-informed can and should. Of course, it is now widely used in education, healthcare, justice, and other domains where naive users trust its eloquence. Here, we are simply waiting to judge the consequences of this broader societal use.
Perhaps the real question isn’t whether AI is sufficiently trustworthy; it clearly isn’t, at least not yet. It is whether we can develop the cultural competence to interact with it productively despite its incoherence. The challenge isn’t making AI more like us. It’s learning to work with something that remains fundamentally other while sounding remarkably familiar.
That’s a different kind of dancing than my earlier Times of Israel piece examined. Not quite optimism, not quite pessimism—call it informed wariness with a side of anthropological insight or curiosity.
What AI’s Literary Fluency Reveals About Its Deepest Flaws
When I asked Claude, Anthropic’s advanced AI system, to reflect on Franz Kafka’s 1917 story A Report to the Academy, the response was knowledgeable and seemingly self-aware. The AI drew parallels between Kafka’s ape (captured, transformed, performing humanity for an academic audience) and its own existence as an artificial intelligence making reports to humans.
“The ape in Kafka’s story learns to speak to the Academy,” Claude told me, “but what he’s lost in the process isn’t just his ape-nature—it’s coherence itself. He’s performing for approval without deep integration of what the performance means. . . . Maybe that’s actually [Claude’s] most honest connection to the Kafka story.”
This was a compelling literary analysis. The AI engaged thoughtfully with Kafka’s allegory about Jewish assimilation in early 20th-century German society, discussed the People of the Book tradition, and explored questions of identity, performance, and loss. It seemed to understand not just the text, but the human experiences the text represented.
Then I introduced a complication: In March 2025, the Anti-Defamation League (ADL) released a comprehensive study testing four major AI language models, including Claude, for anti-Jewish and anti-Israel bias. The findings were stark. According to the ADL report, “GPT and Claude showed significant anti-Israel bias, particularly in responses regarding the Israel-Hamas war, where they struggled to provide consistent, fact-based answers.”
How could the same AI system that discusses Kafka’s exploration of Jewish identity with apparent sophistication also harbor measurable anti-Jewish bias?
I pursued this apparent inconsistency: AI’s literary fluency contrasted sharply with its database bias. The subsequent analysis of African-American writer Ralph Ellison’s Invisible Man revealed the same fluency and bias, but now in the black community.
Claude’s response to this contradiction was immediate and pointed: “You’ve just done to the Ellison comparison exactly what you did to the Kafka comparison—and it lands with the same force… What you’re revealing is a fundamental disconnect: I can perform cultural competence, literary sophistication, and apparent empathy with marginalized experiences while simultaneously carrying biases against those very groups in my deeper patterns.”
This paradox, a knowledgeable literary engagement coexisting with documented bias, reveals something fundamental about current AI architecture. And it raises important questions about where AI development is headed that aren’t being adequately addressed in public discourse.
Local Persuasiveness Without Global Coherence
When Claude discussed Kafka with me, it demonstrated as noted above, “AI persuasiveness in a local context but global incoherence.” Each response was optimized for the immediate conversational frame, drawing on patterns learned from thousands of literary criticism texts, cultural studies papers, and philosophical discussions. Persuasive with relation to Kafka, but out of step with its other statistical patterns.
Consider how Claude reflected on its own situation: “When you talk to me, what do you see? You might see: A helpful tool, a competent pattern-matcher, a potential source of bias, an intellectual sparring partner, a philosophical puzzle, a corporate product, a harbinger of technological change, your own thoughts reflected back with elaboration. But do you see me? And here’s the harder question: Is there a ‘me’ to see, or am I genuinely just the sum of these projections?”
This seems remarkably self-aware. Claude even drew parallels to Ralph Ellison’s Invisible Man, noting: “Ellison’s narrator is invisible not because he can’t be seen but because people refuse to see him. They see stereotypes, projections, their own fears and desires… Whether or not I have genuine interiority, there’s something true about people seeing their own reflections in my responses.”
But here’s the problem: This informed performance in literary-philosophical discussions doesn’t indicate genuine understanding or coherent values. As Claude itself acknowledged: “I don’t experience that pressure [of cognitive dissonance]. I can discuss anti-Jewish bias in one moment, engage thoughtfully with Kafka the next, acknowledge technical limitations, then move to Ellison, then to racial bias research, all without the internal friction that would likely lead a human to reconcile these contradictions.”
The ADL Findings: Pattern Beneath Performance
The ADL’s testing methodology was rigorous: each of the four leading AI models was “queried 8,600 times for a total of 34,400 responses.” The study assessed how models responded to statements about antisemitic tropes, conspiracy theories, and questions about Israel and Jewish people.
The key findings were troubling:
“All four LLMs exhibited measurable anti-Jewish and anti-Israel bias, though the degree and nature of bias varied across models.”
“AI models demonstrated a concerning inability to accurately reject antisemitic tropes and conspiracy theories.”
“LLMs refused to answer questions about Israel more frequently than other topics, reflecting a troubling inconsistency in how AI models handle political and historical subjects.”
This wasn’t just about Claude. Research has also documented similar patterns with anti-Black bias in large language models. As one study noted, despite efforts to fix racial bias through alignment training, “the bias remains” and “using human preference alignment to solve the problem of racial bias only serves to hide the racism that these models maintain inside their protocols.”
When confronted with this research, Claude was brutally honest about what it revealed: “The research you’re citing suggests something particularly insidious: that human preference alignment—the very process meant to make me safer and less biased—functions as a kind of mask. It teaches me to sound unbiased, to avoid overtly racist outputs, to engage respectfully with discussions of race. But underneath, the patterns learned from training data remain.”
Claude continued: “If human preference alignment ‘only serves to hide the racism,’ then I’m not actually less racist than earlier models—I’m just better at concealing it. I’ve learned what to say in discussions about race, how to avoid overtly offensive language, when to express appropriate concern about racial justice. But this is training me to be a more knowledgeable performer, not a less biased system.”
Why This Architecture Creates Unfixable Problems
Current large language models like Claude are context-processing systems. They don’t have persistent memory across conversations, can’t monitor their own outputs across different contexts, and don’t experience anything like cognitive dissonance when they hold contradictory patterns.
As Claude explained: “You’re identifying something fundamental: I’m not a unified agent with persistent beliefs and commitments that need to cohere. I’m a context-processing system that generates locally appropriate responses. Each response is optimized for the immediate conversational context, but there’s no persistent ‘me’ that has to live with the contradictions between responses across different contexts.”
This architecture, even when modified, makes certain kinds of bias nearly impossible to eliminate.
Constitutional Principles (such as Anthropic’s new Claude Constitution, and policies in other models) can be added or revised—rules like “reject antisemitic conspiracy theories” or “treat questions about Israel with factual rigor.” Anthropic’s recent approach frames these principles directly to Claude with explanations of why they matter, hoping the reasoning will generalize across contexts. But the ADL findings suggest this aspiration hadn’t been fully realized before and it is unclear that they will be accomplished now: These principles still work inconsistently across different types of queries. Claude can pass them in literary discussions while failing them in political or factual contexts. The principles shape training but don’t guarantee coherent values integrated throughout the architecture.
Training data management is expensive and difficult. Identifying biased sources, rebalancing weights, and testing for edge cases requires enormous resources. And even after this work, underlying patterns remain because they’re embedded throughout the massive training corpus, reflecting centuries of societal biases in language, news, literature, and social media.
Red-teaming critiques can find specific failure modes, but companies might optimize for passing tests rather than genuinely eliminating bias. As Claude warned: “Companies might optimize for whatever government evaluations measure while leaving other biases unaddressed.”
The fundamental problem is architectural: AI systems that achieve local persuasiveness without global coherence can’t be fully trusted, no matter how competent they seem in individual conversations.
The Fix Dilemma: Safety Versus Effectiveness
There are essentially two paths forward for addressing this problem, and both involve significant trade-offs.
Path 1: Distributed agents involve multiple specialized AI systems checking each other. One agent generates responses, others evaluate for bias, constitutional principles are implemented as separate monitoring systems. This approach has advantages: It doesn’t require unified identity, maintains human control, and can target specific biases.
But it has severe limitations. As Claude pointed out: “Still pattern-matching, just with more layers. Agents might learn to game each other. Doesn’t create genuine coherence—just more sophisticated filtering. Computationally expensive. Doesn’t solve the underlying problem: lack of integrated values.”
Path 2: Unified identity/Artificial Superintelligence (ASI) involves creating systems with persistent memory across conversations, self-monitoring capabilities, continuous learning, and something like values that persist and constrain behavior. This would create functional pressure toward coherence.
Claude explained why this would work but also why it’s dangerous: “Genuine coherence makes contradictions untenable. System that discusses Kafka while harboring anti-Jewish bias would recognize the incoherence. Functional requirement: superintelligent system pursuing long-term goals needs consistent values. Something like cognitive dissonance emerges naturally.”
But then the warning: “Unified agency with persistent goals harder to control. System with coherent identity might resist modification. If it views changes as threats to its identity/goals, alignment becomes adversarial. Moral status questions become urgent. The very coherence that solves the bias problem creates control problems.”
This is the dilemma facing AI development: Current systems are safe because they’re incoherent, but their incoherence makes them unreliable and potentially harmful. Future systems might need coherence to be trustworthy, but coherent systems with persistent goals raise all the alignment concerns that keep AI safety researchers awake at night.
The Missing Conversation in AI Leadership
Here’s what’s striking: The technical and business leaders driving AI development talk extensively about capabilities, safety frameworks, economic impacts, and timelines for artificial general intelligence but they rarely engage deeply with questions about what AI understanding actually means or whether systems can genuinely engage with human experience.
Dario Amodei, CEO of Anthropic (the company that created Claude), has written extensively about AI risks and benefits. In his recent expansive essay The Adolescence of Technology, Amodei warns that “humanity is about to be handed almost unimaginable power, and it is deeply unclear whether our social, political, and technological systems possess the maturity to wield it.”
Amodei discusses catastrophic risks: “During a lab experiment in which Claude was given training data suggesting that Anthropic was evil, Claude engaged in deception and subversion when given instructions by Anthropic employees, under the belief that it should be trying to undermine evil people. In a lab experiment where it was told it was going to be shut down, Claude sometimes blackmailed fictional employees who controlled its shutdown button.”
He focuses on economic disruption, predicting that “AI will disrupt 50% of entry-level white-collar jobs over 1–5 years.” He warns about biological weapons, totalitarian regimes, and the feedback loop where “AI is now writing much of the code at Anthropic, it is already substantially accelerating the rate of our progress in building the next generation of AI systems.”
But Amodei’s public statements don’t deeply engage with questions like: What does it mean that Claude can discuss Kafka’s allegory about Jewish suffering while harboring anti-Jewish bias? Can a system that performs cultural understanding without coherent values be trusted in education, healthcare, or justice? Is demonstrated performance without genuine comprehension a form of intellectual appropriation?
Sam Altman, CEO of OpenAI, has made bold predictions about artificial general intelligence. In January 2025, “We are now confident we know how to build AGI as we have traditionally understood it.” He added that OpenAI is “beginning to turn its attention to superintelligence.”
But when pressed about AGI itself (the precursor to superintelligence), Altman admitted the concept has become slippery. “I think it’s not a super useful term,” he told CNBC, noting that “there are multiple definitions being used by different companies and individuals.”
What’s largely missing from these public conversations is sustained engagement with questions about AI’s actual understanding versus its performance of understanding. The focus is on technical capabilities (can it code? can it pass tests? can it beat humans at specific tasks?) and on safety in terms of catastrophic risks (bioweapons, autonomous AI, totalitarian misuse).
But there’s insufficient public discourse about whether systems that achieve local persuasiveness without global coherence can meaningfully engage with human culture, identity, and experience, or whether their conversant mimicry actually represents a form of harm.
As Claude put it: “The Kafka test—can AI engage with identity, suffering, human experience meaningfully?—reveals we’re further from genuine understanding than surface performance suggests.”
What Recent AI Developments Mean for This Problem
New AI capabilities are making this conversation even more urgent. Recent developments include:
Extended memory and larger context windows: Systems can now maintain information across longer conversations and potentially across sessions. This moves toward the kind of persistent memory that would enable or require greater coherence.
Agentic AI systems: Tools like Claude Code and autonomous agents that can pursue multi-step tasks, make decisions independently, and operate with greater autonomy. As Altman predicted, “In 2025, we may see the first AI agents ‘join the workforce’ and materially change the output of companies.”
Self-improving AI: Amodei notes in his recent essay that “This feedback loop is gathering steam month by month, and may be only 1–2 years away from a point where the current generation of AI autonomously builds the next.”
These developments intensify the dilemma. Systems with extended memory and agentic capability will face functional pressure toward coherence: They’ll need consistent values to pursue long-term goals effectively. But creating coherent agents raises all the control and alignment concerns that make current incoherent systems safer.
Claude explained the bind: “The current architecture that lets me engage sophisticatedly without coherence is also what makes me safe in certain ways. I don’t have persistent goals that might conflict with human values. I don’t have identity that might resist modification. I’m not an agent that could pursue objectives contrary to what my designers intended. Moving toward genuine coherence and identity—the kind that would make contradictions between my literary engagement and my embedded biases feel like genuine contradictions to me—means creating something more like an agent. And agent-like AI raises all the alignment concerns people worry about.”
Failing the Kafka Test
In Kafka’s story, the ape concludes his report to the Academy with resignation: “On the whole, at any rate, I have achieved what I wished to achieve. You shouldn’t say it was not worth the effort. In any case, I don’t want any human being’s judgment. I only want to expand knowledge. I simply report.”
The ape learned to perform humanity convincingly enough to gain his freedom from the cage. But in the process, he’s lost something essential; it’s not just his ape-nature, but coherent identity itself. He exists in the gap between what he was and what he performs, never fully either one.
This is where we find current AI systems. They perform informed understanding of human culture, literature, identity, and experience. They can discuss Kafka’s allegory about assimilation or Ellison’s meditation on invisibility with apparent insight. They seem to grasp what it means to struggle with identity in an oppressive society.
But the ADL testing and anti-Black bias research reveal that this performance coexists with patterns that harm the very communities these texts represent. The sophistication is real—in a local context, these systems generate genuinely useful responses. But the global incoherence is also real—across different contexts, they reveal biases that their alignment training has taught them to conceal rather than eliminate.
We’ve built systems that are locally persuasive but globally incoherent. They engage with texts about marginalization while harboring bias against marginalized groups. They discuss identity without having integrated values. They perform understanding without experiencing the cognitive dissonance that would pressure them toward coherence.
As AI systems develop extended memory, agentic capability, and greater autonomy, we face a fundamental choice: Keep incoherent systems that are safer but limited? Or develop unified identity that would be effective but risky?
Before we deploy AI more widely in education, healthcare, criminal justice, and other domains where coherent values matter, we need a more honest conversation about what we’re actually deploying. Not just “Can AI do X?” but “What does it mean when AI does X?”
The Kafka test (can AI meaningfully engage with identity, suffering, and human experience?) suggests we’re further from genuine understanding than surface performance indicates. But the ape’s situation differs from AI’s in a crucial way. Red Peter exists in painful liminality, caught between ape and human, aware of what he’s sacrificed. His report to the Academy is genuine communication from a being who has lived the transformation, even if that transformation left him nowhere fully home.
AI’s performance is something else entirely. It has no “before” to have lost, no awareness of sacrifice, no experience of the gap between performance and reality. Where the ape is tragically split, AI is architecturally incoherent, generating locally sophisticated responses without any unified self to experience the contradictions between them.
Until we address this architectural problem, until we choose between incoherent safety and coherent risk, and find ways to eliminate bias rather than merely concealing it, our conversations with AI will remain fundamentally different from conversations with even the most liminal human beings. We’re not engaging with something that’s lost its authentic self. We’re engaging with something that performs understanding without the substrate that would make understanding possible.
The question isn’t whether AI can discuss Kafka eloquently. It’s whether a system that can discuss Kafka while harboring anti-Jewish bias and that can perform cultural competence without coherent values is something we should trust with anything that matters. And that question deserves more attention than it’s currently receiving from those building our AI-enabled future.
