Skip to main content
guideai-detection

How Does an AI Detector Work? A Technical Breakdown

· 8 min read· NotGPT Team

How does an AI detector work? The short answer is that it doesn't read text the way a teacher or editor does — it studies the statistical fingerprint left behind when a language model generates words versus when a person writes them. Two signals sit at the center of most text-based detectors: perplexity, which captures how predictable the word choices are, and burstiness, which measures how much sentence structure varies across a passage. Together, these signals feed into a trained machine learning classifier that produces a probability estimate of AI authorship rather than a simple yes-or-no verdict.

How Does an AI Detector Work at the Signal Level?

AI detectors don't check grammar, evaluate argument quality, or look for plagiarism in the traditional sense. They analyze the statistical properties of text — the probability patterns that emerge when a language model strings words together versus when a person writes naturally. The core mechanism is an asymmetry: language models pick the most probable next token given the context, which produces fluent output that is also, by definition, statistically predictable to another model evaluating it afterward. Human writers don't optimize for token probability. We choose words for rhythm, emphasis, personality, and register — choices that often look surprising from a purely probabilistic standpoint even when they're perfectly clear and readable. Beyond the two foundational metrics of perplexity and burstiness, many detectors also feed additional features — vocabulary range, passive voice frequency, transitional phrase density — into a trained machine learning classifier. The combination of these signals allows the detector to return a probability score rather than a binary label, which is a more honest representation of what statistical detection can actually tell you.

What Is Perplexity and How Does It Reveal AI Writing?

Perplexity is a measure borrowed from information theory that captures how surprised a language model would be by a given sequence of words. When an AI generates text, it consistently selects high-probability tokens — so another model evaluating the output afterward sees exactly what it would have predicted, resulting in low perplexity scores. Human writers don't follow the most-likely-next-token path. A person might use an unusual word for effect, break a sentence structure unexpectedly, or choose a phrasing that reflects their voice rather than what a model would rank as the most probable choice. These stylistic decisions produce higher perplexity — the text is more surprising from a probabilistic standpoint, even though it reads clearly to a human audience. AI detectors use this asymmetry directly: passages where every word transition is statistically expected tend to score as AI-likely, while passages with unexpected phrasing, structural breaks, or idiosyncratic word use tend to score closer to human. The complication is that not all human writing is high-perplexity. Formal genres — legal documents, academic papers, clinical reports — use predictable constructions because those registers demand it. A standard boilerplate clause and a GPT-generated version of that same clause may look nearly identical under perplexity analysis, which is why perplexity alone isn't a reliable verdict in specialized domains.

Perplexity measures how predictable each word choice is relative to what a language model would expect. AI-generated text tends to be statistically unsurprising; human writing introduces choices that don't follow the most-likely-next-token path.

What Is Burstiness and Why Does It Matter for Detection?

Burstiness captures something different from perplexity: the variation in sentence structure and length across a passage. Human writing is typically bursty. A writer might follow a long, complex sentence loaded with subordinate clauses with a short, direct one. Emphasis shifts. Rhythm accelerates and slows depending on what the passage is doing. This irregularity isn't accidental — it reflects how people think through ideas on the page, alternating between elaboration and summary, between complexity and clarity. AI-generated text tends to have low burstiness. Language models optimize for coherence, which produces prose where sentences cluster around a similar length and structural complexity. The result reads smoothly but looks unusually uniform when you examine sentence length distribution across a full passage. A histogram of sentence lengths in a typical GPT output often shows a tight cluster around a mean; the same analysis on human-written text tends to show a wider spread. Detectors calculate burstiness by analyzing sentence length variance, syntactic complexity distributions, and related structural measures across the full text. Like perplexity, burstiness is a probabilistic signal rather than a definitive marker. Some trained academic writers produce deliberately low-burstiness prose in formal registers. And a well-prompted AI model can generate text with higher burstiness if specifically instructed to vary sentence length. The signal is most meaningful across long passages where there are enough sentences to establish a distribution — not in short excerpts of a few hundred words.

How Do Machine Learning Classifiers Power AI Detectors?

Perplexity and burstiness are statistical metrics that can be calculated from first principles. What turns those metrics into a practical detector is a machine learning classifier trained on large datasets of labeled text — passages confirmed as human-written versus AI-generated. The classifier learns which combinations of signals are most predictive of AI authorship, and it can weigh dozens of features simultaneously rather than relying on just two numbers. Common features beyond perplexity and burstiness include vocabulary richness ratios (how diverse the word choices are across a passage), passive voice frequency, the density of specific transitional phrases, paragraph-level structural patterns, and semantic coherence scores between adjacent sentences. The quality of the training data determines nearly everything about how a classifier performs in practice. A model trained primarily on GPT-3.5 output has learned the statistical fingerprints of that specific model. It may perform well on unedited GPT-3.5 text but underperform on Claude 3 Sonnet, Gemini, or GPT-4o, which have different stylistic signatures. This creates a training-data lag: whenever a major new language model is released and adopted widely, detectors trained before it was available need time and new labeled examples to calibrate against it. Some detector providers release regular updates to track this drift; others don't maintain their classifiers actively after launch. The age and breadth of a detector's training data matters as much as the sophistication of its architecture — both factors determine how well it generalizes beyond its original benchmark conditions.

What Does Sentence-Level Highlighting Actually Show?

Most modern AI detectors don't return just a single aggregate score — they also highlight individual sentences or paragraphs that contributed most to the overall result. Each highlighted section carries a local probability score: the classifier's estimate that this specific passage looks AI-generated based on its statistical properties. These local scores are then aggregated, usually with some weighting, into the document-level number shown at the top. Sentence-level output is useful precisely because it tells you where the signal is concentrated, not just how strong the signal is overall. A document-level score of 70% AI-likely means something very different depending on whether the flagged content clusters in a few consecutive paragraphs or is scattered throughout the document. Concentrated flagging in one section may suggest that content was drafted separately, or that a particular passage uses a register the classifier scores as AI-like. Distributed flagging across the whole document suggests a more consistent baseline that affects the author's overall style. Sentence-level highlighting also helps diagnose false positives. When a passage is flagged but you know it's your own writing, looking at which specific sentences are highlighted — and why they might look AI-like — gives you far more to work with than an aggregate number alone. A formal introductory sentence, a passage with few stylistic variations, or a section using technical terminology may all trigger higher local scores without any AI involvement.

Why Do AI Detectors Generate False Positives?

False positives — where a detector flags human-written text as AI-generated — aren't rare edge cases. They're a predictable consequence of statistical detection applied to writing that shares surface properties with AI output, and they occur with enough regularity to matter in any context where real consequences follow the score. The most common trigger is stylistic overlap: text written in a formally correct, structurally uniform, vocabulary-constrained style, even though the author is human. Non-native English speakers working carefully in a formal register are consistently at higher risk. When someone structures sentences deliberately to minimize grammatical errors — precisely because English isn't their first language — the resulting text can look low-perplexity and low-burstiness to a detector, closely matching the profile it associates with AI-generated output. Technical, legal, and clinical writing presents a similar problem. These genres enforce predictable transitions, constrained vocabulary ranges, and standardized structures by professional convention, regardless of who wrote them. Domain-specific boilerplate — standard warranty language, recurring contract clauses, diagnostic report templates — routinely scores high on AI detectors even though the author is human. Short texts below roughly 250 words are another consistent source of false positives: most detectors simply don't have enough statistical data in a short sample to produce reliable classifications. Random variation in a short excerpt can tip an otherwise-human-looking score above a flagging threshold. The practical implication is that a high detection score and a confirmed identification of AI authorship are not the same thing — distinguishing between them requires looking at context, writing history, and the specific passages that drove the result.

False positives are a predictable consequence of statistical AI detection applied to writing that shares surface properties with AI output — not rare edge cases, but a known failure mode in specific, well-defined categories of text.

What Are the Hardest Cases for Current AI Detection?

Some types of text sit in a zone where AI detectors struggle consistently, regardless of which platform you use. Knowing what those cases look like in advance helps calibrate how much weight to place on detection results. Heavily edited AI drafts are the clearest example. If someone uses GPT for a first draft and then rewrites it substantially — changing vocabulary, restructuring sentences, inserting their own examples and analysis — the original statistical fingerprint gets diluted to the point where most detectors return unreliable scores. Even moderate post-editing can push a score from 85% AI to under 50% without any fundamental change in authorship. Mixed documents, where some sections are human-written and others are AI-generated, create aggregation problems. A document that is 60% human and 40% AI may produce an aggregate score that looks unremarkable, while the sentence-level breakdown reveals a clearer pattern of where each section originated. Highly technical or specialized content also creates difficulties. When a domain enforces constrained vocabulary and predictable structure by professional convention, a detector can't reliably distinguish between AI generation and expert human writing in that style — the perplexity signal is especially weak here because precision-driven prose is low-perplexity by design. Finally, prompt-engineered AI output — text generated with explicit instructions to vary sentence length, introduce informal phrasing, and avoid common AI patterns — can score deceptively low on most detectors. This is an arms-race dynamic that no detection approach can fully escape: as people learn what detectors measure, they can instruct AI tools to avoid those specific patterns.

  1. Heavily edited AI drafts: post-editing dilutes the statistical fingerprint detectors rely on
  2. Mixed human-AI documents: aggregate scores can be misleading — sentence-level output is essential
  3. Non-native English writers: formal, careful writing produces AI-like statistical patterns without AI involvement
  4. Short texts under 250 words: insufficient data for reliable classification
  5. Domain-specific technical or legal prose: professional conventions create AI-like surface patterns in human writing
  6. Prompt-engineered AI output: text generated with instructions to avoid detection patterns requires more sophisticated signals to catch

How Does an AI Detector Work When You Use It on Your Own Text?

Knowing the technical mechanics behind AI detection is most useful when you're looking at results for something you actually wrote — or evaluating something submitted to you. When you paste text into a detector and receive a score, the tool is running all of these signals simultaneously: calculating perplexity across the full passage, measuring burstiness in sentence length and structure, feeding those values along with additional features into a trained classifier, and returning both an aggregate score and a sentence-level breakdown. The aggregate score tells you the overall probability estimate; the sentence-level breakdown tells you which specific passages drove it. For writers checking their own work, the actionable part is usually the sentence-level view. If a few specific passages are highlighted while the rest of the text is not, that's a meaningful signal worth investigating — either those passages were drafted differently, or they happen to use a style that the classifier scores as AI-like (formal transitions, constrained vocabulary, low sentence-length variation). NotGPT's text detection returns both the document-level probability score and highlighted individual sentences, so you can trace exactly which sections contributed to the result rather than working backward from a single percentage. For anyone who receives an unexpectedly high score on their own writing, the sentence-level view is the most useful starting point for understanding what the detector is responding to and whether the result reflects your actual authorship or a false positive.

Detect AI Content with NotGPT

87%

AI Detected

“The implementation of artificial intelligence in modern educational environments presents numerous compelling advantages that merit careful consideration…”

Humanize
12%

Looks Human

“AI in schools has real upsides worth thinking about — but the trade-offs are just as real and shouldn't be glossed over…”

Instantly detect AI-generated text and images. Humanize your content with one tap.