The Label Is the Experience

Wine tasting, AI writing, and the two ways of reading

Expert wine judges can’t reliably score the same wine twice. Readers can’t reliably distinguish AI text from human text. The structural parallel reveals something about how labels shape experience itself.
AI
psychology
aesthetics
writing
Published

March 20, 2026

1 The Wine Problem

In 2005, Robert Hodgson — a retired statistics professor who also happened to own a winery in Northern California — persuaded the organizers of the California State Fair wine competition to let him run an experiment. He slipped triplicate samples from the same bottle into the judges’ flights, poured under different identifiers, and waited to see what happened. What happened was that only about 10 percent of judges managed to score the same wine consistently across its three appearances. Another 10 percent gave the same wine scores ranging from gold medal to no award at all.1 In a follow-up study tracking over 4,000 wines across thirteen U.S. competitions, Hodgson found that roughly 99 percent of gold medal winners at one event received no award at another. The probability of a wine winning gold was, statistically speaking, independent from competition to competition.2

These findings are not obscure. They have been discussed in the Wall Street Journal, The Guardian, and across the wine press for over a decade. They have not, however, changed very much about how wine competitions operate, or how the broader culture talks about wine quality. This should tell us something.

A separate line of research drove the point further. In 2012, Robert Ashton at Duke University compared the reliability of expert judgment across seven fields: wine, medicine, clinical psychology, business, auditing, personnel management, and meteorology. Wine experts were substantially worse than experts in every other field. Reproducibility rates for wine scoring ranged from about 8 to 55 percent, compared to 49 to 83 percent in the other domains.3

2 The Parallel

In 2023, Maurice Jakesch, Jeffrey Hancock, and Mor Naaman published a study in PNAS that tested an analogous question in a different domain — though with a crucial difference in participant pool. Where Hodgson tested expert judges, Jakesch tested general participants. Across six experiments with 4,600 people, they examined whether anyone could identify AI-generated text — specifically self-presentations written for professional, hospitality, and dating contexts. Detection accuracy hovered around chance. Participants relied on intuitive but unreliable cues: they associated first-person pronouns, contractions, and references to family with human authorship, none of which are reliable indicators. More striking, the researchers showed that AI systems could be tuned to exploit these heuristics, producing text that participants judged as “more human than human.”4

A year later, Brian Porter and Edouard Machery published a study in Scientific Reports extending this finding into creative writing. Non-expert readers asked to distinguish AI-generated poems from those by well-known human poets performed below chance — 46.6 percent accuracy. They were more likely to identify an AI poem as human-written than an actual human poem. And they rated the AI-generated poems more favorably on qualities like rhythm and beauty.5 The researchers’ explanation is worth sitting with: the relative simplicity of AI-generated poetry made it more accessible to non-expert readers, who then mistook the complexity of human-authored poems for the kind of incoherence they associated with machines.

The structural parallel between these findings and the wine literature is strong, though not exact — the wine studies tested experts, while the AI studies mostly tested general populations. The gap matters: we know expert wine judges perform poorly, but we don’t yet have equivalent controlled data on whether professional editors or literary critics detect AI text more reliably than laypeople. Still, the shared patterns are striking. In both cases, people — including those who consider themselves discerning — dramatically overestimate their ability to distinguish quality from context. In both cases, the label shapes the experience more than the substance does. In both cases, flawed heuristics create the feeling of reliable judgment where none exists. And in both cases, when the origin is revealed as “lesser” — cheap wine, AI authorship — there is an aversion that operates independently of the assessed quality. Köbis and Mossink showed this directly: participants displayed a slight aversion to AI-generated poetry regardless of whether they were informed about its origin, even when they couldn’t distinguish it from human work in a blind test.6 A 2025 review in Communications of the ACM, surveying the broader evidence, titled its summary bluntly: human detection of AI-generated content is “as good as a coin toss.”7

3 The Brain Constructs the Difference

The deepest finding in the wine literature is not that people are bad judges. It is that the label changes the actual experience — not merely the reported evaluation, but the neurological event.

In 2008, Hilke Plassmann and colleagues put wine drinkers in an fMRI scanner and told them they were sampling wines at different price points — $5, $10, $35, $45, and $90. In reality, there were only three wines. The participants consistently reported that the “expensive” wines tasted better. But the fMRI data showed something more unsettling: the medial orbitofrontal cortex, a brain region associated with experienced pleasure, was genuinely more active when participants believed they were drinking the costly bottle. The price didn’t just change their story about the wine. It changed the wine.8

This is the finding that makes the analogy between wine and writing more than a cute parallel. When someone reads a paragraph and is told it was written by a human, they are not simply evaluating it more charitably. If the mechanism is analogous — and the expectation-modulates-experience pathway is domain-general, though direct fMRI evidence for reading provenance is not yet available — they may be having a different experience. The text, neurologically, may not be the same text.

This means that the debate over whether AI writing is “really” as good as human writing may be malformed. For a certain kind of reader, it cannot be, because the knowledge of its origin is a constituent part of the experience. This is not a failure of rationality. It is how brains work. But it does mean that confident claims about detecting quality differences need to be held against the strong evidence that what is being detected is often the label.

4 Two Orientations

Not everyone reads the same way, and this is where the analogy reveals something useful rather than merely deflationary. There are, roughly, two orientations toward written text.

The first treats writing as an object to be evaluated — its texture, rhythm, and provenance are all part of the experience. Call this the consumptive orientation. Reading in this mode is something like wine tasting: you attend to the surface, develop a vocabulary for describing it, and locate yourself socially through your judgments. Prose style is not a vehicle for content; it is the content. The question is not “what did I learn?” but “how did it feel to read this?” For people who read this way, the origin of the text is not incidental information — it is part of the aesthetic object. Knowing that a paragraph was generated by a model changes the paragraph in the same way that knowing a wine costs $5 changes the wine. The experience is genuinely different.

The second orientation treats writing as a vehicle — a transparent medium through which information, arguments, or ideas are transmitted. Call this the instrumental orientation. People reading instrumentally want to know whether the recipe works, whether the argument holds, whether the technical explanation matches reality. They have an external referent against which to check the text: Did the cake rise? Does the code compile? Does the historical claim survive contact with the primary sources? For these readers, provenance is irrelevant because they have an anchor outside the text itself. The writing either delivers what they came for or it doesn’t, and the method of production is about as interesting as whether a map was hand-drawn or printed.

This distinction is not a spectrum of sophistication, and most people do not fall cleanly into one camp. The same person might read a novel in consumptive mode and a tax guide in instrumental mode before lunch. What matters is the orientation active at the moment of reading — and the reactions it produces. This is where things get revealing: a writer who is vocally hostile to AI-generated prose may have no issue whatsoever consulting an AI for a clear explanation of a medical condition, a legal concept, or a coding problem. The text in that case is just as “AI-generated,” but the orientation has shifted, and with it the entire emotional valence of the encounter. The provenance becomes invisible precisely when the reader has something external to check the text against.

For the instrumental reader, AI writing that is clear, accurate, and well-structured is simply good writing. The machine question doesn’t arise because the evaluation criteria are external to the text. For the consumptive reader, AI writing is unsatisfying in a way that has nothing to do with sentence-level quality — the knowledge that no one chose those words, that no consciousness wrestled with that metaphor, is an absence that hollows out the experience. This is a real aesthetic loss, not a superstition. But it is also unfalsifiable in the same way that wine preference is unfalsifiable: if the experience is the standard, then whatever changes the experience changes the quality, and the argument becomes circular.

5 What the Analogy Doesn’t Capture

Two things are true about wine that are not true about writing, and they are worth being honest about.

First, writing is also a thinking tool. The process of writing changes the writer’s understanding — you discover what you think by articulating it. Wine has no equivalent. Nobody becomes a different person by fermenting grapes. This means that even if AI writing is indistinguishable from human writing as a product, the process of producing it involves a fundamentally different cognitive event (or, in the case of AI, no cognitive event at all). For anyone who values writing as a practice of thought rather than a delivery mechanism for finished thoughts, this is a meaningful distinction that the wine analogy can’t touch.

Second, the wine studies are about a finished sensory product where there is, in some real sense, nothing beyond the experience. A wine has no propositional content. It doesn’t claim anything about the world. Writing does. And this means that writing can fail in ways wine can’t: it can be wrong, misleading, structurally incoherent, or superficially smooth while being analytically empty. The consumptive reader who attends to prose style is, in principle, tracking some of these deeper failures — though as the studies show, the tracking is far less reliable than practitioners believe.

6 The Signaling Question

There is a further question worth pressing: why the reaction to AI writing is so often not merely evaluative but moral. People who dislike AI-generated text frequently describe it not just as bad but as dishonest, lazy, or contemptible. This intensity of reaction is disproportionate to an aesthetic judgment. You might find a piece of writing mediocre without finding it offensive.

Bourdieu’s framework of cultural capital is useful here.9 His central insight is not simply that taste signals status — that much is obvious. It is that the signal only works if it appears natural. “I just know good writing when I see it” is a claim about perception, not about training or social position. But the detection studies suggest that what feels like innate discernment is largely a set of learned heuristics — and unreliable ones at that. If Jakesch’s participants are relying on first-person pronouns and family references to detect humanity in text, they are not perceiving quality. They are pattern-matching against social expectations, and doing it badly.

This suggests something about the moral charge. If AI writing were merely bad, the appropriate response would be indifference — you don’t get angry at a mediocre essay, you just stop reading it. The intensity comes from a different source: the technology threatens to reveal that a distinction you organized part of your identity around may not exist in the way you thought it did. When someone says “I can always spot AI writing,” they are also, implicitly, claiming membership in a class of people whose sensitivity sets them apart — and extending a quiet judgment toward those who can’t tell or don’t care. The content of the claim is aesthetic. But the function is social, and the studies suggest the underlying perception is largely illusory. That combination — social stakes plus empirical vulnerability — is what produces the heat.10

This maps cleanly onto the wine world. Robert Parker, the most influential wine critic of the late twentieth century, was once asked if he would demonstrate his consistency by tasting wines blind and rescoring them days later. He refused: “I’m not doing trained dog tricks. I’ve got everything to lose and nothing to gain.”11 The framing is revealing: an objective assessment of expertise is recast as an indignity, because the expertise’s authority depends on it remaining untested. When Parker did eventually participate in a public blind tasting of top Bordeaux 2005 wines — a vintage he had called “the greatest of my lifetime” — he could not correctly identify any of them, confusing wines from opposite banks of the Gironde.12

If Hodgson’s data is correct — and it has been replicated — then the entire apparatus of competition medals, critic scores, and prestige pricing rests on a foundation considerably softer than its practitioners believe.

The same dynamic is playing out now with writing. The people most disturbed by AI-generated text are not, in the main, the people who read for information. They are the people whose professional or social identity is bound up with the ability to produce and evaluate prose. For them, the studies showing that detection is no better than a coin toss are not interesting findings — they are existential threats.

7 What Survives

This essay has made a deflationary argument, and it is fair to ask what it is not claiming.

It is not claiming that all writing is equally good, or that quality is purely a social construction. Some wines are genuinely better than others; some writing is genuinely better than other writing. The Hodgson data shows that expert consistency is far worse than experts believe, not that expertise is meaningless. Similarly, the AI detection studies show that people cannot reliably distinguish AI text from human text in controlled settings, not that there are no real differences between them.

It is not claiming that the consumptive orientation to reading is invalid. Attending to the surface of language — its rhythm, its precision, its surprises — is a real skill that produces real pleasure. The point is that this orientation is more vulnerable to the label effect than its practitioners tend to acknowledge, and that the confidence with which people assert quality judgments consistently exceeds the reliability of those judgments.

What the evidence does suggest is that a large portion of what gets called “quality detection” is actually context detection — the label, the price, the author’s name, the knowledge of provenance. Stripping those away, humans perform at or below chance. This was true for wine before AI existed, and it is now true for writing. The difference is that in the case of writing, the technology that forces this confrontation is arriving on everyone’s desk simultaneously, which makes the resulting identity crisis considerably louder.

The instrumental readers will adapt quickly, at least for tasks where claims can be checked against something external — the code compiles, the recipe works, the date is correct. Provenance was never the point for them. Where it gets murkier is in evaluating arguments and interpretations, where there’s no simple external referent and the label effect likely creeps back in. The consumptive readers face a harder adjustment: the recognition that a meaningful part of their experience was often constructed from metadata rather than material. That’s not a comfortable thing to learn about yourself. But the wine world has been living with the same knowledge for years now, and the sommeliers are still pouring.

Footnotes

  1. Hodgson, R. T. (2008). “An Examination of Judge Reliability at a major U.S. Wine Competition.” Journal of Wine Economics 3(2): 105–113. The study ran from 2005 to 2008, with 65–70 judges tested each year. Judges tended to be more consistent in what they disliked than in what they liked — a finding that rhymes nicely with the broader point about taste as social performance.↩︎

  2. Hodgson, R. T. (2009). “An Analysis of the Concordance Among 13 U.S. Wine Competitions.” Journal of Wine Economics 4(1): 1–9.↩︎

  3. Ashton, R. H. (2012). “Reliability and Consensus of Experienced Wine Judges: Expertise Within and Between?” Journal of Wine Economics 7(1): 70–87. The comparison is devastating because wine judging is the one field where practitioners most insistently claim refined perceptual discrimination. The data suggest the opposite: that the sensorially complex, subjective nature of wine makes it harder to judge reliably, not easier.↩︎

  4. Jakesch, M., Hancock, J. T., & Naaman, M. (2023). “Human heuristics for AI-generated language are flawed.” Proceedings of the National Academy of Sciences 120(11): e2208839120. The “more human than human” finding is the one that should unsettle anyone confident in their ability to detect AI writing: the detection heuristics are not just unreliable, they are predictably unreliable, which means they can be systematically defeated.↩︎

  5. Porter, B. & Machery, E. (2024). “AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably.” Scientific Reports 14: 26133. The below-chance finding is important because it indicates participants were not guessing randomly — they were using systematic heuristics that actively led them astray. An earlier study by Köbis and Mossink using GPT-2 found similar results, though detection was possible when AI poems were randomly selected rather than curated: Köbis, N. & Mossink, L. D. (2021). “Artificial intelligence versus Maya Angelou.” Computers in Human Behavior 114: 106553.↩︎

  6. Köbis & Mossink (2021), cited above. The aversion-despite-inability-to-detect pattern is the text-domain equivalent of the wine drinker who insists cheap wine tastes worse while failing to identify it blind.↩︎

  7. Frank, J. et al. (2024). “A Representative Study on Human Detection of Artificially Generated Media Across Countries.” IEEE Symposium on Security and Privacy. Discussed in Communications of the ACM (2025) under the title “As Good as a Coin Toss.”↩︎

  8. Plassmann, H., O’Doherty, J., Shiv, B., & Rangel, A. (2008). “Marketing Actions Can Modulate Neural Representations of Experienced Pleasantness.” Proceedings of the National Academy of Sciences 105(3): 1050–1054. A 2017 replication by Schmidt, Skvortsova, Kullen, Weber, and Plassmann confirmed the effect and identified the brain’s valuation system and anterior prefrontal cortex as key mediators. See: Schmidt, L. et al. (2017). “How context alters value.” Scientific Reports 7: 8098.↩︎

  9. Bourdieu, P. (1984). Distinction: A Social Critique of the Judgement of Taste. Trans. Richard Nice. Harvard University Press. Bourdieu drew extensively on the wine world for his examples of how aesthetic judgment functions as social positioning. His concept of méconnaissance — the misrecognition of socially contingent preferences as natural perception — is relevant here, though the connection is indirect: the detection studies, which mostly test general participants rather than self-identified connoisseurs, provide the empirical ground showing that reliable detection is rare. The méconnaissance lives not in the study participants but in the people who, confronted with this evidence, insist they are the exception.↩︎

  10. This is not to say the aesthetic judgment is only social positioning — people can genuinely prefer human-written prose for real reasons. The point is that the intensity and moralization of the reaction exceeds what an aesthetic preference alone would produce, and that the gap is where the identity threat lives.↩︎

  11. Quoted in David Shaw, “He sips and spits — and the world listens,” Los Angeles Times, 1987. Parker famously did not taste blind as standard practice, a point documented in Elin McCoy’s The Emperor of Wine (2005).↩︎

  12. The blind tasting is described in multiple accounts. Parker identified a Pomerol (L’Eglise Clinet, which he had scored 100 points) as a St. Estèphe, and mistook Lafite for Troplong-Mondot. See coverage at kottke.org and discussion in Lehrer, J., “The Subjectivity of Wine,” The Frontal Cortex (2007).↩︎