Detector Test Methodology

We ran 50 samples of AI-generated text through three major detectors, then re-tested after humanizing with UnAIMyText. Last run: Jul 3, 2026.

GPTZero Turnitin Copyleaks

How We Run the Tests

Each test round generates 50 fresh AI-written samples, submits them to each detector unmodified to record a baseline, then re-submits the same samples after humanizing with UnAIMyText (Standard mode, Moderate tone, English US, all text processing toggles off). Detection rates are averaged across all 50 samples per detector. Tests are re-run monthly to catch detector model updates. Any sample returning an error or inconclusive result is discarded and replaced. Samples are not reused across rounds.

Sample sources (50 total)

Model	Provider	Samples	Temperature
ChatGPT-4o mini	OpenAI	20	Default
GPT-4o	OpenAI	15	Default
Claude 3.5 Haiku	Anthropic	10	Default
Gemini 1.5 Flash	Google	5	Default

All samples generated via official API with default system prompts. No custom instructions or personas applied.

Content types

Academic essays

250–500 words, argumentative + analytical

Business writing

Emails, reports, exec summaries

Blog posts

Informational, 300–600 words

Creative / narrative

Short-form fiction and personal essays

Humanizer settings used

Ultra modePrimary

Mode:Ultra / Advanced

Creativity:70%

Context Preservation:80%

Emotional Tone:Neutral

Formality:Formal

Preserve Intent:On

Human Natural Mistakes:Off

Language:English (US)

Text processing settings:Default, output submitted as-is

Word count range:250–600 words per sample

Standard modeComparison

Mode:Standard

Tone:Moderate

Language:English (US)

Text processing settings:Default, output submitted as-is

All detectors tested on Jul 3, 2026 within a 4-hour window to minimize model drift between runs.

GPTZero

50 samples · Jul 3, 2026

Source:ChatGPT-4o mini + GPT-4oHumanizer:Ultra / Advanced

96%

detected (raw)

15%

detected (Standard)

detected (Ultra)

97%

Ultra pass rate

GPTZero is one of the most widely used AI detectors, particularly in academic settings. It uses perplexity and burstiness scoring alongside a neural classifier to identify AI-generated patterns. It is highly sensitive to formal or structured writing, which means it also produces a moderate rate of false positives on clean human writing.

What the data showed

Ultra mode dropped GPTZero's detection rate from 96% to just 3%, a 93-point reduction. Standard mode brought it down to 15%, which is solid for most use cases. The gap between modes is widest here because GPTZero weighs sentence-level variation heavily, and Ultra's Advanced controls let you dial in exactly the right creativity and formality balance for your content.

Tips for this detector

Use Ultra / Advanced with Creativity at 65–75% for academic content
Vary sentence length throughout the text
Include personal asides or informal transitions

False positive rate on genuine human writing: Moderate

Turnitin

50 samples · Jul 3, 2026

Source:GPT-4o + Claude 3.5 HaikuHumanizer:Ultra / Advanced

94%

detected (raw)

18%

detected (Standard)

detected (Ultra)

98%

Ultra pass rate

Turnitin is the dominant AI detection tool in higher education. Its AI writing detection module was trained on a large corpus of student submissions and GPT-family outputs. It is conservative by design, meaning it has a lower false positive rate than most free tools, but it still caught 94% of unedited AI text in our tests.

What the data showed

Ultra mode reduced Turnitin detection from 94% to 2%. Standard brought it to 18%. Turnitin weights semantic predictability and paragraph-level uniformity, both of which Ultra's contextual rewriting disrupts more aggressively than Standard. For high-stakes academic submissions, Ultra is the reliable choice.

Tips for this detector

Ultra with Formality set to Formal performs best on academic writing
Focus on paragraph openings, which AI tends to make formulaic
Introduce topic-specific vocabulary naturally

False positive rate on genuine human writing: Low to Moderate

Copyleaks

50 samples · Jul 3, 2026

Source:ChatGPT-4o mini + Gemini 1.5 FlashHumanizer:Ultra / Advanced

90%

detected (raw)

20%

detected (Standard)

detected (Ultra)

96%

Ultra pass rate

Copyleaks offers AI detection alongside its plagiarism-checking suite. It performs well on professional and business writing and has the lowest false positive rate among the three tools we tested. Its 90% detection rate on raw AI content is slightly lower than GPTZero and Turnitin, which suggests it is tuned for precision over recall.

What the data showed

Copyleaks flagged 90% of raw AI text. Ultra mode reduced that to 4%; Standard to 20%. Copyleaks responds strongly to vocabulary choice and tonal variation, and Ultra's per-document controls let you match those dimensions precisely to your content type, which is why the gap over Standard is the largest here.

Tips for this detector

Ultra with Neutral tone and 80% Context Preservation works well for business writing
Mix formal and conversational register within the same paragraph
Include concrete examples or data points to reduce abstract AI phrasing

False positive rate on genuine human writing: Low

Monthly Test History

Pass rates shown are Ultra mode. Standard mode results are in parentheses. New rows added after each monthly run.

Month	GPTZero	Turnitin	Copyleaks	Samples
Jun 2026	98%(84%)	97%(81%)	96%(79%)	50
Jul 2026	97%(85%)	98%(82%)	96%(80%)	50
Aug 2026	pending	pending	pending	—

Limitations and caveats

No tool passes 100% of the time. A 98% pass rate means roughly 1 in 50 samples was still flagged even with Ultra mode.
Detectors update their models without notice. A result that was 98% one month may shift after a silent model update. This is why we re-test monthly.
Results vary by input. Highly technical text, very short passages, or unusual writing styles may produce different detection rates than the averages shown here.
These tests used Standard mode and Ultra / Advanced mode only. Other tone and mode combinations will produce different results.
Our test samples were generated via API with default prompts. Content written with specific system instructions or fine-tuned models may behave differently.

Ready to humanize your text?

Start humanizing now