Detector Test Methodology
We ran 50 samples of AI-generated text through three major detectors, then re-tested after humanizing with UnAIMyText. Last run: Jun 3, 2026.
How We Run the Tests
Each test round generates 50 fresh AI-written samples, submits them to each detector unmodified to record a baseline, then re-submits the same samples after humanizing with UnAIMyText (Standard mode, Moderate tone, English US, all text processing toggles off). Detection rates are averaged across all 50 samples per detector. Tests are re-run monthly to catch detector model updates. Any sample returning an error or inconclusive result is discarded and replaced. Samples are not reused across rounds.
Sample sources (50 total)
| Model | Provider | Samples | Temperature |
|---|---|---|---|
| ChatGPT-4o mini | OpenAI | 20 | Default |
| GPT-4o | OpenAI | 15 | Default |
| Claude 3.5 Haiku | Anthropic | 10 | Default |
| Gemini 1.5 Flash | 5 | Default |
All samples generated via official API with default system prompts. No custom instructions or personas applied.
Content types
Humanizer settings used
All detectors tested on Jun 3, 2026 within a 4-hour window to minimize model drift between runs.
GPTZero
50 samples · Jun 3, 2026
GPTZero is one of the most widely used AI detectors, particularly in academic settings. It uses perplexity and burstiness scoring alongside a neural classifier to identify AI-generated patterns. It is highly sensitive to formal or structured writing, which means it also produces a moderate rate of false positives on clean human writing.
What the data showed
Ultra mode dropped GPTZero's detection rate from 97% to just 2%, a 95-point reduction. Standard mode brought it down to 16%, which is solid for most use cases. The gap between modes is widest here because GPTZero weighs sentence-level variation heavily, and Ultra's Advanced controls let you dial in exactly the right creativity and formality balance for your content.
Tips for this detector
- Use Ultra / Advanced with Creativity at 65–75% for academic content
- Vary sentence length throughout the text
- Include personal asides or informal transitions
False positive rate on genuine human writing: Moderate
Turnitin
50 samples · Jun 3, 2026
Turnitin is the dominant AI detection tool in higher education. Its AI writing detection module was trained on a large corpus of student submissions and GPT-family outputs. It is conservative by design, meaning it has a lower false positive rate than most free tools, but it still caught 93% of unedited AI text in our tests.
What the data showed
Ultra mode reduced Turnitin detection from 93% to 3%. Standard brought it to 19%. Turnitin weights semantic predictability and paragraph-level uniformity, both of which Ultra's contextual rewriting disrupts more aggressively than Standard. For high-stakes academic submissions, Ultra is the reliable choice.
Tips for this detector
- Ultra with Formality set to Formal performs best on academic writing
- Focus on paragraph openings, which AI tends to make formulaic
- Introduce topic-specific vocabulary naturally
False positive rate on genuine human writing: Low to Moderate
Copyleaks
50 samples · Jun 3, 2026
Copyleaks offers AI detection alongside its plagiarism-checking suite. It performs well on professional and business writing and has the lowest false positive rate among the three tools we tested. Its 89% detection rate on raw AI content is slightly lower than GPTZero and Turnitin, which suggests it is tuned for precision over recall.
What the data showed
Copyleaks flagged 89% of raw AI text. Ultra mode reduced that to 4%; Standard to 21%. Copyleaks responds strongly to vocabulary choice and tonal variation, and Ultra's per-document controls let you match those dimensions precisely to your content type, which is why the gap over Standard is the largest here.
Tips for this detector
- Ultra with Neutral tone and 80% Context Preservation works well for business writing
- Mix formal and conversational register within the same paragraph
- Include concrete examples or data points to reduce abstract AI phrasing
False positive rate on genuine human writing: Low
Ready to humanize your text?
Start humanizing now