ai-detectionguidewritingacademic-integrity

What Is Burstiness and Perplexity in Writing? The Signals Behind AI Detection

Published on 2026-06-14· 9 min read· NotGPT Team

What is burstiness and perplexity in writing — and why do these two statistical terms keep appearing whenever AI detection comes up? Both concepts originated in computational linguistics and information theory, but they entered mainstream conversation the moment AI detectors started using them as the primary evidence for whether a piece of text was written by a person or generated by a machine. For students, writers, and editors whose work passes through automated screening, understanding what these signals actually measure — and what they do not — applies to every AI detection tool, not just one specific platform.

Table of Contents

01What Is Perplexity in Writing?
02What Is Burstiness in Writing?
03How Do AI Detectors Use These Two Signals?
04Why Does AI Writing Score So Differently from Human Writing?
05Which Writing Patterns Produce Low Burstiness and Perplexity Scores?
06Can You Shift Your Perplexity and Burstiness Scores?
07What a Burstiness and Perplexity Score Actually Tells You

What Is Perplexity in Writing?

Perplexity is a measure borrowed from information theory, originally used to evaluate how well a probability model predicts a sample of text. In the context of language models and AI detection, it captures something more intuitive: how surprised a trained language model would be at the sequence of words you chose. When a word choice is highly predictable given the words around it — the obvious next word, the expected synonym, the conventional phrase that completes a familiar construction — the model assigns low perplexity to that choice. When a writer reaches for an unusual synonym, a structurally unexpected sentence, or an idiosyncratic turn of phrase, perplexity rises. Large language models like ChatGPT, Claude, and Gemini are trained to select the statistically most probable next word at each step. That training objective directly produces low-perplexity output — not as a side effect but as a fundamental consequence of how these systems are built. A language model writing an explanation of climate change will choose the most probable word at every step, staying on the statistical path any trained model would also follow. Human writers, by contrast, make choices that the training data does not predict as strongly: specific metaphors, unusual but accurate vocabulary, sentence structures that break the anticipated rhythm. Those deviations push perplexity up, and higher-perplexity text is statistically more likely to have come from a person.

Perplexity does not measure creativity or quality — it measures how far a piece of writing strays from the most statistically probable path. Human writers stray further than language models do, and that gap is what AI detectors are trained to find.

What Is Burstiness in Writing?

Burstiness originally described a property of time-series data and network events: the tendency for some processes to produce events in clusters and gaps rather than at a steady, predictable rate. Applied to writing, it describes the variation in sentence length, structural complexity, and stylistic register across a piece of text. Human writing is naturally bursty. An essay, a blog post, or a reported article typically mixes short declarative sentences — direct and emphatic — with longer sentences that carry subordinate clauses, embedded qualifications, and elaborated examples. This alternation is not consciously planned; it reflects the rhythm of spoken thought translated to prose, the way emphasis shifts naturally between a quick point and an extended explanation. AI-generated writing tends toward lower burstiness. When a language model generates a paragraph, it does not experience the shift in register that comes from moving between an emotional appeal and a technical explanation, or from summarizing a key point in one sentence and expanding on its implications for three more. The result is prose where most sentences occupy a similar structural weight: not identical, but distributed far more narrowly than a human writer typically produces over the same word count. Burstiness is measured statistically across the full document, not sentence by sentence. A single long sentence does not make a document bursty; what matters is whether the distribution of sentence lengths across the entire text is wide or narrow.

Narrow sentence-length distribution: when most sentences in a passage fall within a 10–15 word range, burstiness drops — even if individual sentences are moderately long
Uniform paragraph structure: paragraphs that consistently open with a topic sentence, add two to three supporting sentences, and close with a transition follow a template that suppresses burstiness
Consistent connective tissue: transitional phrases (however, therefore, additionally) appearing at predictable structural positions create a rhythm detection models associate with AI output
Missing register shifts: human prose usually changes in tone and sentence weight between narrative moments, analytical moments, and direct address — AI output tends to hold a consistent register throughout

How Do AI Detectors Use These Two Signals?

Most AI detection tools — including Turnitin's AI Writing Indicator, GPTZero, and similar platforms — use perplexity and burstiness together rather than treating either signal in isolation. The combination creates a more reliable classification because the two signals can confirm or contradict each other in ways that distinguish genuine edge cases from clear ones. The detection pipeline typically works at the sentence level first. Each sentence is evaluated for how predictable its word choices are given a language model's probability distribution — producing a local perplexity score for that sentence. Those sentence-level scores are then aggregated, and the variance of those scores across the document — how consistently or inconsistently high or low they are — produces the burstiness signal. A document where sentence-level perplexity scores cluster tightly together scores low on burstiness. A document where perplexity varies significantly between sentences scores higher. When both signals point toward AI-generated text — low average perplexity and low variance across sentences — the detector assigns a high AI-probability score. When signals conflict — a document with low average perplexity but high burstiness — the classifier must make a more uncertain decision, which often produces a score in the middle range where neither outcome is confidently predicted.

Sentence-level perplexity scoring: each sentence receives a probability score based on how likely its word sequence is under the model's language model
Document-level burstiness calculation: the variance of sentence-level scores across the full document produces the burstiness measure
Combined classification: low average perplexity combined with low variance (burstiness) produces the highest AI-probability scores
Threshold application: the proportion of sentences crossing the classification threshold becomes the overall percentage score
Score interpretation: neither signal alone constitutes a definitive finding — both contribute probability, not certainty

AI detectors don't compare your text against a database of AI outputs. They measure two statistical properties of your specific text and compare those properties to the distributions learned during training.

Why Does AI Writing Score So Differently from Human Writing?

Understanding what is burstiness and perplexity in writing becomes more concrete when you examine why AI-generated text reliably scores lower on both than most human writing does. The difference traces back to the training objective that all large language models share: predict the most probable next token given the surrounding context. This objective is what makes language models useful — they produce coherent, fluent, contextually appropriate text consistently. But it also makes their output systematically different from human writing in measurable ways. A language model generating a paragraph about photosynthesis does not experience fatigue, distraction, or the impulse to introduce an unexpected analogy from an unrelated domain. It does not have a half-formed thought that produces an awkward run-on sentence before the writer circles back to tighten it. It does not shift from formal explanation to conversational aside because the register felt right in the moment. Instead, it follows the statistical landscape of its training data, making consistently probable choices at every step. The result is prose with a recognizable texture: smooth, varied enough to avoid obvious repetition, but without the sharp irregularities that come from real-time thinking translated into text. Human writing, viewed statistically, is messier — not because human writers are less skilled, but because writing is a thinking process as much as a communication one, and thinking in the moment is irregular. A paragraph written by a person typically shows variation in word predictability as the writer reaches for precision, makes a side observation, and returns to the main point. That variation pushes both perplexity and burstiness upward.

AI text is smooth because language models optimize for smoothness. Human writing is irregular because it is produced by irregular thinking. The statistical difference between those two processes is what AI detection is trained to measure.

Which Writing Patterns Produce Low Burstiness and Perplexity Scores?

The most practically important insight from understanding what is burstiness and perplexity in writing is that human writers can produce text scoring low on both signals without any AI involvement. Several categories of writing reliably generate statistical profiles that overlap with AI-generated output, making them common sources of false positives across detection platforms. Knowing which contexts carry this risk helps writers, editors, and reviewers interpret detection scores with appropriate skepticism rather than treating a single number as a conclusion.

Formal academic register: the conventions of academic writing — clear topic sentences, structured arguments, formal vocabulary, logical transitions — produce predictable, low-perplexity prose, even when written entirely by a student who has mastered those conventions
Technical and scientific writing: lab reports, methods sections, and technical documentation use narrow vocabulary domains and rigid structural templates that constrain sentence variation and suppress burstiness
Non-native English writing: writing carefully in a second language naturally produces more conservative, predictable vocabulary choices and more uniform sentence structures — registering as low perplexity and low burstiness even when entirely original
Heavily edited final drafts: the revision process smooths rough edges and removes idiosyncratic phrasing, moving polished prose toward the statistical profile detection models associate with AI output
Summarization and close paraphrase: text that follows a source document's structure often adopts the source's statistical patterns; summaries tend toward smooth, predictable prose even when every word is the writer's own
Short documents under 200 words: statistical models need sufficient data to produce reliable classifications; short texts produce unstable scores that can swing dramatically with just a few word choices

A false positive is not evidence of AI use — it is evidence that the text's statistical profile falls in the overlapping region where both human and AI writing can live. Those regions are larger than most detection vendors publicly acknowledge.

Can You Shift Your Perplexity and Burstiness Scores?

If you know how your writing scores on both signals, you can adjust specific surface-level features to change those scores — and the adjustments are real improvements to your prose, not tricks to deceive an algorithm. The changes that increase burstiness and perplexity tend to make writing more specific and readable, because they replace generic patterns with particular choices. The most reliable lever for burstiness is sentence-length variation. If you scan a passage and find that most sentences are between 15 and 22 words, you have low burstiness in that section. Deliberately adding some very short sentences — five to nine words, making one point directly — and some longer sentences with embedded qualifications shifts the distribution. One short sentence inserted after two medium-length ones measurably changes the burstiness calculation for that block. For perplexity, the most reliable lever is specificity. Generic academic vocabulary — significant, important, various, multiple factors — is highly predictable given almost any context and drives perplexity down. Replacing a generic adjective with a precise one specific to your argument increases local perplexity because the choice is less expected. Adding a concrete example with a specific name, number, or observation produces the same effect. The goal is not arbitrary variation — a document where sentence lengths are randomly shuffled reads badly and may not improve perplexity at all, because the perplexity signal responds to word choices, not sentence order. The aim is to make your writing more concrete and more distinctively yours, which also happens to produce the statistical profile that detectors associate with human authorship.

Scan each paragraph for sentence-length uniformity: mark any block where all sentences fall within a 10-word range
In those blocks, insert one short direct sentence under 10 words after a longer one, or split a 30-word sentence into a 12-word and a 15-word sentence
Replace generic adjectives (significant, various, multiple) with specific ones that actually describe your argument — threefold increase, disputed, format-specific
Add at least one concrete example or specific observation per major section — these raise local perplexity by introducing terms specific to your context rather than predicted from the paragraph's topic alone
Vary the position of transitional phrases: not every paragraph needs to open with However or Additionally — sometimes contrast emerges from the sentence structure itself
Review quoted passages and citation blocks separately: they often score low on both signals and can pull down a document's overall score; offset them with your own analytical commentary before and after

What a Burstiness and Perplexity Score Actually Tells You

A detection score based on perplexity and burstiness is a statistical probability estimate, not a determination of authorship. No current AI detection system — not Turnitin's AI Writing Indicator, not GPTZero, not any platform built on the same underlying signals — can determine with certainty whether a specific person wrote a specific piece of text, or whether a specific AI tool generated it. What the score represents is where the text's statistical properties fall relative to the distribution the detection model learned during training. A high score means the text's perplexity and burstiness profile resembles text from the AI-generated side of that training distribution more than the human-written side. It does not mean the text is AI-generated; it means it is statistically similar to text that was. The most concrete evidence of this limitation is cross-platform disagreement. The same document will often score 75–85% AI on one platform and 25–35% AI on another. If both platforms are measuring real, stable properties of the document, those numbers should not disagree by 50 percentage points. The disagreement reflects differences in training data, classification thresholds, and model architecture — not differences in what the text actually is. For practical purposes, whether you are a student receiving a flagged result, an editor reviewing a submission, or an instructor deciding how to interpret an AI score, a number derived from perplexity and burstiness analysis is one data point among many — not a verdict. Platforms like NotGPT show which specific sentences drove the score, letting you examine the flagged passages directly rather than responding to a number in the abstract.

Cross-platform variability is the clearest indicator that AI detection scores are not measuring something definitive about a document. When two tools built on the same underlying signals disagree by 40 percentage points, neither score is strong evidence on its own.

Detect AI Content with NotGPT

AI Detected

“The implementation of artificial intelligence in modern educational environments presents numerous compelling advantages that merit careful consideration…”

↓Humanize↓

Looks Human

“AI in schools has real upsides worth thinking about — but the trade-offs are just as real and shouldn't be glossed over…”

Instantly detect AI-generated text and images. Humanize your content with one tap.

Download on the App Store Get it on Google Play

AI Detection False Positive: Causes, Who's at Risk, and What to Do

A detailed look at why human-written text gets flagged as AI-generated, which writing patterns trigger false positives most reliably, and what to do when it happens.

How Does Turnitin Detect ChatGPT? Inside the AI Writing Indicator

Turnitin's AI Writing Indicator measures perplexity and burstiness to assign AI scores — learn what the percentages mean and how instructors use them.

Why Do AI Detectors Flag My Writing?

Specific writing patterns — formal register, edited prose, ESL vocabulary — produce the same statistical profile AI detectors are trained to flag. Here's why.

Detection Capabilities

🔍

AI Text Detection

Paste any text and receive an AI-likeness probability score with highlighted sections.

🖼️

AI Image Detection

Upload an image to detect if it was generated by AI tools like DALL-E or Midjourney.

✍️

Humanize

Rewrite AI-generated text to sound natural. Choose Light, Medium, or Strong intensity.

Use Cases

Student Pre-Checking a Formal Essay Before Submission

Run your paper through NotGPT before handing it in to see which sentences show low perplexity or burstiness, and revise those sections while you still have time.

ESL Writer Understanding a Flagged Detection Result

Non-native English writing naturally produces low-perplexity, low-burstiness text — learn why this raises AI detection scores and how to contextualize the result.

Editor Reviewing Submitted Content for AI Patterns

Use sentence-level highlighting to identify which passages in a submission show statistical patterns consistent with AI generation, rather than relying on an overall score alone.

Back to Blog