Skip to main content
ai-detectionguideinformationalwatermark

AI Watermark Detector: What It Can Find, What It Can Prove, and How to Use It Responsibly

· 10 min read· NotGPT Team

An AI watermark detector is a tool that looks for hidden or embedded signals indicating that a piece of text or an image was created by an AI system. The concept sounds straightforward — run a check, get an answer — but in practice, watermarking and watermark detection are far more nuanced than a simple pass/fail result. Some watermarks are invisible signals encoded into pixel values; others are statistical patterns woven into word choice distributions; others are cryptographic certificates attached to a file container. Each type works differently, survives different transformations, and supports different conclusions. This guide covers how AI watermark detectors operate for both text and images, what a positive detection result actually tells you, where current watermarking technology falls short, and how to approach content verification in a way that accounts for both the strengths and the real gaps in these tools.

What Is an AI Watermark Detector?

An AI watermark detector is any tool or method designed to identify signals that were deliberately or incidentally embedded in AI-generated content at the time of creation. The word "watermark" covers three distinct technical categories that are often conflated. File-level provenance marks — most prominently C2PA Content Credentials — are cryptographically signed certificates stored in the metadata container of an image or video file. They assert authorship and record which AI tool produced the content, but they live in the file wrapper and can be stripped by any standard metadata editor. Pixel-level watermarks, of which Google DeepMind's SynthID is the best-known example, encode a detectable signal directly into the pixel values of an image during generation. Unlike file metadata, these survive format conversion, JPEG compression, and screenshot capture because they are woven into the actual image content rather than the file container. Text watermarks operate differently again: since text cannot embed signals in pixel values, text watermarking works by influencing the probability distribution of word choices during generation. When a model like a large language model generates a token, it can be biased to slightly favor tokens from a designated "green" vocabulary list. Across hundreds of tokens, this bias creates a statistically detectable pattern — the text scores higher than expected on the green-token frequency. An AI watermark detector for text checks whether a passage shows this kind of distributional skew. All three approaches have the same goal — allowing a third party to verify AI origin after the fact — but they differ dramatically in what survives editing, translation, or deliberate removal attempts.

  1. File-level provenance (C2PA): cryptographic certificate in the image or video file metadata; identifies the AI tool that generated the content; trivially removable with any EXIF editor
  2. Pixel-level watermarks (SynthID): signal encoded into actual pixel values during generation; survives format conversion, compression, and screenshots; cannot be removed without significantly degrading the image
  3. Text watermarks (statistical): bias in token-selection probabilities during generation creates a measurable distributional signature; survives minor edits but degrades with heavy paraphrasing or translation
  4. Model-intrinsic signatures: unintentional artifacts from the generation architecture itself — AI detectors that do not rely on watermarks analyze these instead; present in all AI output regardless of whether watermarking was enabled

Text Watermarks vs. Image Watermarks: How Are They Different?

The mechanics of text and image watermarking diverge so significantly that understanding one does not automatically prepare you to reason about the other. For images, the problem of embedding an invisible signal is a well-studied branch of digital steganography. Researchers can modify the least significant bits of pixel values, alter frequency components using the discrete cosine transform, or — as SynthID does — adjust the relative intensities of pixels within local patches in ways that are imperceptible to human vision but statistically detectable by the trained watermark detector. Because the signal is spread redundantly across millions of pixels, it persists through the kinds of manipulation a typical image might undergo: resizing, color correction, JPEG re-encoding at reasonable quality levels, and even printing and re-scanning. SynthID's robustness to screenshots specifically is notable: when you screenshot a watermarked image, you capture its pixel values essentially unchanged, so the watermark survives. For text, the challenge is harder. Text is discrete: there are no individual character-level values to subtly shift, and any alteration that changes the statistical pattern also changes the meaning. The most technically credible approach to text watermarking — pioneered in academic work from UC Santa Barbara and later referenced in Google's public statements about its text generation products — inserts a hidden dependency into the token sampling process. Every time the model selects a word, a private hash function determines whether that word is in the "green" set or the "red" set for that position in the sequence. The model is biased to select green tokens. A detector with access to the same hash function can then score any passage for its green-token proportion and compare it against the expected distribution for unwatermarked text. A high green-token score indicates the text may be watermarked; a score near the expected baseline indicates it is probably not. The practical problem is that this detection only works for text generated by a model that had watermarking enabled — and most publicly accessible LLMs, including the API versions of GPT-4 and Claude, do not currently apply text watermarks to user outputs by default.

"Watermarking language model outputs is technically feasible but requires every major provider to implement it consistently — a coordination problem that has not yet been solved at scale." — Soheil Feizi, University of Maryland, 2023

What Can an AI Watermark Actually Prove?

This is the question that gets glossed over most often in coverage of AI watermarking. A watermark, when detected, provides evidence that a specific AI system generated the content at the time of creation. It does not prove that the content is harmful, plagiarized, or inappropriate. It does not prove that the person who submitted the content used AI in a way that violates any particular rule. And critically, the absence of a detectable watermark does not prove the content was written or created by a human. There are several reasons why absence is not exculpatory. First, the vast majority of AI-generated content currently in circulation was produced by systems that either never implemented watermarking or did not have it active. A student who used GPT-4 through the standard ChatGPT interface, or an image generator without C2PA adoption, produced content with no watermark — because those tools do not watermark their outputs. Second, watermarks can be removed. File-level metadata is stripped by standard tools. Text watermarks degrade under paraphrasing. Even pixel-level watermarks are not guaranteed to survive adversarial processing specifically designed to defeat them. Third, some tools add fake watermarks to human-created content, either intentionally to confuse detectors or as an artifact of processing pipelines. A detected watermark is therefore meaningful: it is positive evidence that a specific AI system was involved in producing the content. No watermark is uninformative: it means either no watermarking system was used, the watermark was removed, or the content is genuinely human-created. These are three different situations with very different implications, and an AI watermark detector result alone cannot distinguish between them.

Can AI Watermarks Be Removed or Defeated?

The robustness of a watermark depends heavily on which type it is and how sophisticated the removal attempt is. File-level C2PA credentials can be stripped in seconds by anyone with a basic understanding of image metadata. Right-clicking an image, stripping its EXIF data with a free tool, converting between formats without the "preserve metadata" option, or simply taking a screenshot — any of these produces a file with no C2PA credentials. This is not a flaw in C2PA's design; the standard was built as a provenance chain for authentic media, not as a tamper-proof AI usage certificate. When C2PA credentials are present, their presence is meaningful. When they are absent, that absence proves nothing about origin. Text watermarks are more robust than file metadata but more fragile than pixel-level embedding. Academic studies on token-distribution-based watermarks have found that heavy paraphrasing, translation into another language and back, or mixing watermarked text with unwatermarked passages can all reduce detection confidence significantly. A 2023 analysis from the University of Maryland found that paraphrasing attacks reduced detection accuracy from near-certain to only slightly better than chance for some watermarking schemes. Crucially, effective paraphrasing already requires enough editing that the output differs substantially from what the model generated — so the attack has a cost. Pixel-level watermarks like SynthID are the most robust of the three categories. They are specifically engineered to survive the kinds of manipulation that commonly occur during image distribution: resizing, compression, color grading, and format conversion. Removing SynthID from an image without degrading its visual quality to a degree that defeats the purpose of the image is, according to Google DeepMind's published research, computationally difficult. That said, no watermark is unconditionally robust. Sufficiently aggressive resampling, adding noise, or using adversarial perturbation tools designed specifically to defeat pixel watermarks can all reduce detection confidence, though usually at the cost of image quality.

  1. C2PA file metadata: removable in seconds with any EXIF editor, format conversion, or screenshot; absence of credentials proves nothing about AI origin
  2. Text token-distribution watermarks: degrade significantly under heavy paraphrasing (~50% reduction in detection confidence reported in academic studies); survive light editing and minor rewording
  3. Pixel-level watermarks (SynthID): robust to JPEG compression, resizing, color grading, and screenshots; defeat requires adversarial processing that typically degrades visual quality
  4. Translation attacks on text: converting watermarked text to another language and back reduces watermark signal substantially because the vocabulary distribution resets
  5. Adversarial pixel perturbation: specialized tools can weaken even SynthID-style watermarks, but the processing is computationally expensive and often introduces visible artifacts

What Does an AI Watermark Detector Miss?

Any AI watermark detector has a hard coverage problem: it can only find signals that were embedded by systems it knows about and that have not been subsequently destroyed. This creates three systematic gaps that users relying on watermark detection alone will encounter. The first gap is generator coverage. Most AI text is generated by models — the public versions of ChatGPT, Claude, Gemini, and others — that do not currently embed text watermarks in their standard outputs. An AI watermark detector designed around token-distribution analysis will report no watermark on most AI-generated text in the wild, not because the text is human-written, but because it comes from systems that never implemented watermarking. The second gap is the post-generation editing gap. Even for systems that do watermark their outputs, any substantial editing by a human after the fact will degrade the watermark signal. A student who prompts an AI for a draft and then rewrites two-thirds of it by hand may end up with text that passes watermark detection — because the watermarked tokens are now a small minority of a larger passage. An AI watermark detector measuring distributional skew in the full text will see a diluted signal. This is not a flaw in the detection approach; it is an accurate reading of the content, which genuinely is more human-edited than AI-generated at that point. The third gap is AI content produced by models that deliberately do not watermark outputs. Open-source models downloaded and run locally — LLaMA, Mistral, Qwen, and others — produce text and images with no watermarks, because the user controls the inference and the platform cannot enforce watermark insertion. Any content produced by these tools will have no watermark, regardless of how much AI was involved. These gaps are the reason that AI watermark detection is most useful as one layer of a multi-signal verification process, not as a standalone verification method.

How to Verify AI Content Responsibly Using Watermark Detection

Responsible use of an AI watermark detector starts with understanding what the tool is actually answering. A watermark check and an AI origin check are not the same question, and conflating them produces both false confidence and unfair conclusions. For image verification, a practical workflow looks like this: check first for C2PA Content Credentials using a C2PA-compatible reader. Most standard photo applications do not display C2PA data, so you need a tool specifically designed to read them. Adobe's Content Authenticity web tool, or any C2PA-aware viewer, can surface these credentials when they exist. If credentials are present and declare AI generation, that is a strong positive finding. If no credentials are found, continue to pixel-level AI image detection — the step that measures what the image looks like rather than what its file container says. For text verification, watermark-based checks are currently limited by the adoption gap described above. Until major providers implement consistent text watermarking, the more reliable approach is to use a detector that measures the statistical properties of the text itself — perplexity, burstiness, and distributional patterns that differ between human and AI writing — rather than looking for a deliberately embedded watermark. These intrinsic-signal detectors operate regardless of whether the generating system implemented watermarking. When verification results will be used to make consequential decisions — whether academic, legal, professional, or editorial — document your methodology explicitly. Which tool did you use? What version? What result did it return? Single-tool reliance on either a watermark check or a statistical detector is not best practice for high-stakes determinations. Cross-referencing multiple tools reduces the impact of any individual tool's false-positive or false-negative rate.

  1. For images, start with a C2PA-compatible reader to check for signed Content Credentials — present credentials declaring AI generation are a fast, definitive finding
  2. Treat absent credentials as neutral — move to pixel-level AI image detection regardless of metadata status
  3. For text, use statistical AI text detection (perplexity/burstiness analysis) as the primary check — more reliable than watermark detection given current adoption gaps
  4. Cross-reference at least two independent tools before drawing a conclusion in high-stakes contexts
  5. Document your verification methodology: tool names, versions, results, and date — this supports defensible decision-making
  6. Apply proportionate confidence: a strong positive across multiple detection approaches warrants higher confidence than a borderline result from a single tool

Watermark Standards, Adoption, and What Is Actually Deployed Today

The gap between what AI watermarking can theoretically accomplish and what is currently deployed in practice is significant enough to affect how you interpret detection results. On the image side, C2PA has real traction. Adobe Firefly, DALL-E 3, and Microsoft's AI image tools all embed C2PA Content Credentials by default. The Content Authenticity Initiative has commitments from major news organizations, platform companies, and hardware manufacturers. Camera manufacturers including Leica and Sony have shipped hardware-level C2PA signing so that photos are signed at capture, not after the fact. SynthID is deployed in Google's Gemini image generation tools, Google Imagen, and has expanded to video and audio. On the text side, progress has been slower. OpenAI explored text watermarking internally and reportedly decided against deploying it in consumer products, in part due to the fragility of text watermarks under paraphrasing and the concern that disadvantaged writers — non-native speakers, writers with dyslexia, those who need assistive editing tools — might be disproportionately flagged. Google has mentioned SynthID's expansion to text in some research contexts but has not made consumer-facing text watermark detection widely available. The net result is that an AI watermark detector checking for C2PA or SynthID signals will catch content from major commercial platforms that have adopted the standard, and will miss content from open-source models, platforms that have not adopted watermarking, and any content where watermarks have been stripped or degraded. This is a coverage reality, not a failure of the watermarking concept — adoption is an ongoing process, and the tools deployed today reflect where the industry is right now, not where these standards are heading.

"C2PA provides the foundation for a web where media can carry verified provenance — but the value scales with how many creators and platforms participate." — Content Authenticity Initiative, 2024

How NotGPT Helps With AI Watermark and Origin Verification

NotGPT offers two detection tools relevant to AI origin verification that complement watermark-based approaches by analyzing the intrinsic properties of content rather than relying solely on embedded signals. The AI Image Detection tool analyzes uploaded images at the pixel level, checking for the visual characteristics that distinguish AI-generated images from photographs — texture regularity, frequency-domain signatures, and semantic consistency patterns. This analysis runs regardless of whether any watermark is present or has been removed, making it effective for images from platforms that never embedded watermarks and for images where metadata has been stripped. The AI Text Detection tool measures perplexity, burstiness, and distributional patterns in submitted text to estimate the probability that the passage was AI-generated. This is the approach that covers the adoption gap in text watermarking: rather than looking for a signal that only some generators embed, it reads the statistical fingerprints that all current LLMs leave in their outputs to varying degrees. Using NotGPT alongside a dedicated watermark check — particularly a C2PA reader for images — gives you both the provenance signal (when it exists) and the intrinsic signal (which exists regardless of whether watermarking was used). Neither approach alone covers the full verification problem; together, they address substantially more of the detection surface.

使用NotGPT检测AI内容

87%

AI Detected

“The implementation of artificial intelligence in modern educational environments presents numerous compelling advantages that merit careful consideration…”

Humanize
12%

Looks Human

“AI in schools has real upsides worth thinking about — but the trade-offs are just as real and shouldn't be glossed over…”

即时检测AI生成的文本和图像。一键将内容人性化。

相关文章

检测功能

🔍

AI Text Detection

Paste any text and receive an AI-likeness probability score with highlighted sections.

🖼️

AI Image Detection

Upload an image to detect if it was generated by AI tools like DALL-E or Midjourney.

✍️

Humanize

Rewrite AI-generated text to sound natural. Choose Light, Medium, or Strong intensity.

使用场景