Detector Test Methodology

We ran 50 samples of AI-generated text through three major detectors, then re-tested after humanizing with UnAIMyText. Last run: Jun 3, 2026.

How We Run the Tests

Each test round generates 50 fresh AI-written samples, submits them to each detector unmodified to record a baseline, then re-submits the same samples after humanizing with UnAIMyText (Standard mode, Moderate tone, English US, all text processing toggles off). Detection rates are averaged across all 50 samples per detector. Tests are re-run monthly to catch detector model updates. Any sample returning an error or inconclusive result is discarded and replaced. Samples are not reused across rounds.

Sample sources (50 total)

ModelProviderSamplesTemperature
ChatGPT-4o miniOpenAI20Default
GPT-4oOpenAI15Default
Claude 3.5 HaikuAnthropic10Default
Gemini 1.5 FlashGoogle5Default

All samples generated via official API with default system prompts. No custom instructions or personas applied.

Content types

20
Academic essays
250–500 words, argumentative + analytical
15
Business writing
Emails, reports, exec summaries
10
Blog posts
Informational, 300–600 words
5
Creative / narrative
Short-form fiction and personal essays

Humanizer settings used

Ultra modePrimary
Mode:Ultra / Advanced
Creativity:70%
Context Preservation:80%
Emotional Tone:Neutral
Formality:Formal
Preserve Intent:On
Human Natural Mistakes:Off
Language:English (US)
Text processing settings:Default, output submitted as-is
Word count range:250–600 words per sample
Standard modeComparison
Mode:Standard
Tone:Moderate
Language:English (US)
Text processing settings:Default, output submitted as-is

All detectors tested on Jun 3, 2026 within a 4-hour window to minimize model drift between runs.

GPTZero

GPTZero

50 samples · Jun 3, 2026

Source:ChatGPT-4o mini + GPT-4oHumanizer:Ultra / Advanced
97%
detected (raw)
16%
detected (Standard)
2%
detected (Ultra)
98%
Ultra pass rate

GPTZero is one of the most widely used AI detectors, particularly in academic settings. It uses perplexity and burstiness scoring alongside a neural classifier to identify AI-generated patterns. It is highly sensitive to formal or structured writing, which means it also produces a moderate rate of false positives on clean human writing.

What the data showed

Ultra mode dropped GPTZero's detection rate from 97% to just 2%, a 95-point reduction. Standard mode brought it down to 16%, which is solid for most use cases. The gap between modes is widest here because GPTZero weighs sentence-level variation heavily, and Ultra's Advanced controls let you dial in exactly the right creativity and formality balance for your content.

Tips for this detector

  • Use Ultra / Advanced with Creativity at 65–75% for academic content
  • Vary sentence length throughout the text
  • Include personal asides or informal transitions

False positive rate on genuine human writing: Moderate

Turnitin

Turnitin

50 samples · Jun 3, 2026

Source:GPT-4o + Claude 3.5 HaikuHumanizer:Ultra / Advanced
93%
detected (raw)
19%
detected (Standard)
3%
detected (Ultra)
97%
Ultra pass rate

Turnitin is the dominant AI detection tool in higher education. Its AI writing detection module was trained on a large corpus of student submissions and GPT-family outputs. It is conservative by design, meaning it has a lower false positive rate than most free tools, but it still caught 93% of unedited AI text in our tests.

What the data showed

Ultra mode reduced Turnitin detection from 93% to 3%. Standard brought it to 19%. Turnitin weights semantic predictability and paragraph-level uniformity, both of which Ultra's contextual rewriting disrupts more aggressively than Standard. For high-stakes academic submissions, Ultra is the reliable choice.

Tips for this detector

  • Ultra with Formality set to Formal performs best on academic writing
  • Focus on paragraph openings, which AI tends to make formulaic
  • Introduce topic-specific vocabulary naturally

False positive rate on genuine human writing: Low to Moderate

Copyleaks

Copyleaks

50 samples · Jun 3, 2026

Source:ChatGPT-4o mini + Gemini 1.5 FlashHumanizer:Ultra / Advanced
89%
detected (raw)
21%
detected (Standard)
4%
detected (Ultra)
96%
Ultra pass rate

Copyleaks offers AI detection alongside its plagiarism-checking suite. It performs well on professional and business writing and has the lowest false positive rate among the three tools we tested. Its 89% detection rate on raw AI content is slightly lower than GPTZero and Turnitin, which suggests it is tuned for precision over recall.

What the data showed

Copyleaks flagged 89% of raw AI text. Ultra mode reduced that to 4%; Standard to 21%. Copyleaks responds strongly to vocabulary choice and tonal variation, and Ultra's per-document controls let you match those dimensions precisely to your content type, which is why the gap over Standard is the largest here.

Tips for this detector

  • Ultra with Neutral tone and 80% Context Preservation works well for business writing
  • Mix formal and conversational register within the same paragraph
  • Include concrete examples or data points to reduce abstract AI phrasing

False positive rate on genuine human writing: Low

Ready to humanize your text?

Start humanizing now
Detector Test Methodology - GPTZero, Turnitin, Copyleaks | UnAIMyText