Master AI Detection: How to Tell if ChatGPT Wrote Something
You're staring at a document that doesn't quite sit right.
Maybe it's a student essay that is polished but oddly hollow. Maybe it's a cover letter that says all the right things without sounding like a real person. Maybe it's a freelance draft that is clean, organized, and technically competent, yet somehow interchangeable with a hundred others.
That feeling matters. In editing, fact-checking, hiring, and academic review, the first sign is often not a detector score. It's friction. The text seems fluent, but it doesn't seem authored.
The problem is that how to tell if ChatGPT wrote something isn't a single-trick exercise. There's no magic phrase, punctuation mark, or browser tool that settles it. The useful approach is layered. Start with the text itself. Examine the language patterns. Use detectors carefully. Then look for evidence of provenance, revision history, research notes, and the human trail behind the words.
That's the workflow experienced reviewers now need. AI tools are becoming normal across professional work, including areas where accuracy and authorship matter a great deal, such as streamlining legal research with AI. The practical question isn't whether AI exists in the workflow. It's whether a given piece of writing is being presented truthfully, responsibly, and with enough human judgment behind it. If you want a broader grounding in what counts as synthetic media and machine-made output, this overview of AI-generated content is a useful starting point.
The Modern Dilemma of Digital Text
A few years ago, suspicious writing usually looked sloppy. It might have been plagiarized, padded, or assembled from low-quality sources. Now the harder cases are the opposite. The text is often grammatical, readable, and structurally tidy. That surface competence makes detection harder, not easier.
Editors run into this with contributed articles. Teachers see it in reflection essays that sound mature but strangely generic. Hiring teams notice it in application materials that mirror job descriptions too neatly. The common pattern is not obvious error. It's generic fluency without convincing ownership.
Strong suspicion usually starts when a text sounds finished before it sounds lived-in.
That shift changes the standard of review. A reviewer can't rely on one tell, because one tell won't hold up. Plenty of human writers use formal transitions. Plenty of humans write blandly. Plenty of real drafts are over-polished after revision. If you accuse someone based on a single habit, you'll eventually accuse the wrong person.
Why origin matters in practice
The stakes vary by context, but the reasoning is similar:
- In education, the issue is authorship, learning, and whether the student can actually explain the work.
- In publishing, the issue is reliability, originality, and whether the draft carries judgment instead of recycled synthesis.
- In hiring, the issue is representation. A cover letter may be assisted, but it still has to reflect the applicant's own thinking.
- In compliance or legal review, the issue is defensibility. You need to know who stands behind the claims and where they came from.
A practical workflow solves a narrower problem than people often assume. It doesn't prove metaphysical authorship. It helps you build a defensible editorial judgment from multiple forms of evidence.
The standard is judgment, not certainty
That matters because many texts today are hybrid. A person may have drafted the outline, used ChatGPT to expand it, and then revised heavily. Another person may have written everything themselves, but in a style so formal and even that it triggers suspicion. Most real decisions happen in that messy middle.
So the task isn't to catch a robot. It's to evaluate a document the way a seasoned editor evaluates any dubious source. Look for patterns. Test the weak spots. Ask for support. Then decide whether the text is likely human, likely AI-assisted, likely AI-generated, or inconclusive.
Start with Quick Heuristic Checks
Your first pass should be fast and human. Don't open a detector immediately. Read the piece as an editor would. Mark the places where the writing feels too smooth, too balanced, or too detached from real experience.
OpenAI community heuristics point to a cluster of signals, not a single giveaway: frequent em dashes, repeated ideas, a neutral or overly balanced tone, lack of real voice or emotion, formulaic filler phrases, shallow coverage, and low variation in sentence structure. The key takeaway is to score several signals together and avoid making a binary judgment from one trait alone, as noted in this discussion of pattern-based AI writing tells.

Read for voice, not just correctness
AI-generated writing often passes a grammar check and still fails a voice check.
Look for signs like these:
- Overly even tone. The piece stays calm, balanced, and polished from start to finish, even where a human writer would usually show stronger preference, uncertainty, irritation, or conviction.
- Generalized claims. The text keeps speaking in broad truths instead of naming concrete situations, examples, edge cases, or trade-offs.
- Personal distance. A first-person piece may use “I” without adding any real memory, stake, or perspective. It performs voice instead of revealing one.
- Clean symmetry. Points arrive in tidy, well-spaced structures that feel engineered for readability more than thought.
A strong human draft usually has some texture. It leans too hard somewhere. It doubles back. It includes an oddly specific example. It reveals what the writer noticed, not just what a system can summarize.
Watch for repetition disguised as development
This is one of the most reliable early checks.
A suspicious draft often appears to advance the argument while simultaneously restating the same idea in new packaging. The paragraphs change wording, but not substance. You see “efficiency,” then “optimized workflow,” then “improved productivity,” all carrying the same thin point.
Practical rule: If you can cut a third of the draft without losing any meaning, the text may have been generated or padded with AI assistance.
That isn't proof. Some human writers are repetitive too. But repetition combined with a flat tone and generic examples is worth escalating.
Small tells that matter only in combination
A single marker proves almost nothing. Several together become useful.
| Signal | On its own | In a cluster |
|---|---|---|
| Formal transition phrases | Common in human writing | More suspicious if every paragraph uses them |
| Balanced “on the one hand” framing | Sometimes good writing | More suspicious if the writer never commits to a point |
| Uniform sentence length | Can be stylistic | More suspicious if paired with shallow content |
| Polished but vague wording | Common in corporate prose | More suspicious if examples are also generic |
Use this stage to decide whether the text deserves deeper review. Don't use it to issue a verdict.
Perform a Deeper Linguistic Analysis
If the quick read raises concerns, switch modes. Stop asking whether the prose feels odd and start asking how it behaves across the page.
Modern detectors gained traction by using perplexity and burstiness. GPTZero describes perplexity as the predictability of the word sequence, while burstiness measures how that predictability changes through a passage. It argues that human writing tends to have higher and more variable scores because people use more randomness, unusual wording, and sentence-structure variation, as explained in its overview of how ChatGPT detection works.

Read like a pattern analyst
You don't need software to apply these ideas manually.
Perplexity, in plain terms, asks whether the next word feels too expected. AI text often chooses the safest available continuation. That makes the prose smooth and legible, but also statistically unsurprising. Human writers are less uniform. They choose odd verbs, interrupt their own rhythm, or take a sentence in a less optimized direction.
Burstiness is easier to spot by eye. Human writing tends to vary more in sentence length, pace, and density. A writer may use a short sentence after a dense one. They may shift from explanation to anecdote. AI text often settles into a narrow band of sentence shapes and stays there.
Manual checks that actually work
Try this on a suspicious draft:
- Scan sentence openings. If too many sentences begin with similar constructions, the prose may be machine-shaped.
- Underline unusual specifics. Not broad examples. Actual specifics. Product names, dated memories, strange but relevant details, firsthand caveats. If there are few or none, that's a weak authorship signal.
- Mark paragraph function. Ask what each paragraph uniquely contributes. If multiple paragraphs merely paraphrase the same takeaway, the draft may have been generated for volume.
- Read every third sentence aloud. AI writing often sounds more repetitive when sampled than when read straight through.
One practical reference for this stage is Raven SEO's guide to practical checks for content quality and SEO, especially if you're reviewing marketing copy that can sound polished while still being thin. For a broader editorial framework, this piece on AI content analysis is helpful when you need to judge pattern quality rather than just surface fluency.
What human variation usually looks like
Human writing often includes productive imperfections.
A real writer often leaves fingerprints in the rhythm. A sentence runs long because the idea mattered. Another lands abruptly because the writer already knows the point.
That doesn't mean good human prose must be messy. It means it usually contains non-uniformity with purpose. The point of this stage is not to punish polish. It's to distinguish genuine control from statistical smoothness.
Use AI Detector Tools with Skepticism
Detectors are useful. They are not judges.
Many users make one of two mistakes with AI detectors. They either trust them blindly or dismiss them completely. Both reactions are lazy. A detector score is best treated the way a fact-checker treats a shaky witness statement. It may be useful evidence. It is never the whole case.

Mozilla's summary of OpenAI's first public classifier is still one of the clearest reminders of the limits. OpenAI said that early tool needed at least 1,000 words before it could reliably inspect text, and even then it was not always fully accurate. Mozilla also notes that human editing can fool detectors, which is why AI detection should be treated as probabilistic rather than definitive, as described in this piece on how to tell ChatGPT-generated text.
What detector scores can and can't tell you
A detector can help in three practical ways:
- Triage volume. If you manage many submissions, it can flag drafts worth closer review.
- Corroborate suspicion. A strong AI signal becomes more meaningful when your manual review already found the same weaknesses.
- Standardize first-pass screening. Teams can use the same tool and compare notes instead of relying only on instinct.
What it can't do is establish intent, authorship history, or editorial honesty. A heavily revised AI draft may look human. A concise, formal, human-written memo may look machine-like. A detector doesn't know who interviewed sources, who checked the facts, or who can defend the argument in conversation.
Use more than one reading of the evidence
A practical review process looks like this:
- Run the detector only after manual review. If you start with the score, it can bias your reading.
- Check text length. Short passages are especially risky to classify with confidence.
- Test edited sections separately. A hybrid document may contain both human and AI-heavy passages.
- Compare outputs from more than one tool, not for mathematical certainty, but for directional agreement.
- Record why you think the score matters. “High AI score” is weak documentation. “High AI score plus repetitive paraphrase, generic examples, and no revision trail” is stronger.
If you want to understand how people try to evade these systems, looking at services that humanize chatgpt text is instructive. Not because they make content trustworthy, but because they show how shallow many detector-bypass tactics are. They often swap phrasing, vary sentence lengths, and inject superficial irregularity. That may change a score. It does not automatically produce thoughtful writing.
A quick explainer helps here:
The best use case for detectors
The best detectors support judgment. They don't replace it.
A detector is strongest when the text is long enough, the prose is relatively unedited, and your independent reading already noticed machine-like patterning. It is weakest when the sample is brief, highly revised, or context-dependent, such as a personal statement, email, discussion-board post, or short reflection.
Treat a detector as a corroborating instrument, not a verdict machine.
If you need a practical shortlist before choosing one, this guide to AI content detection tools is a good reference point. Just don't outsource the decision to the dashboard.
Try Active Verification and Provenance Checks
When the text is still in doubt after close reading and detector checks, shift from passive analysis to active verification. At this point, many weak reviews stop too early. They stare at the prose instead of testing the authorship claim around it.
Ask the writer for process evidence
A real author usually leaves a trail.
In education, that trail might be version history in Google Docs, rough notes, source annotations, or earlier fragments saved in a learning platform. In editorial work, it might be interview notes, article outlines, source lists, or a draft with tracked revisions. In hiring, it could be a request to discuss the cover letter and elaborate on a passage in real time.
Ask for artifacts that emerge naturally from honest work:
- Earlier drafts with visible changes
- Research notes that show how claims were assembled
- Source links or screenshots used during writing
- An outline that predates the polished version
- A live explanation of why specific choices were made
People who wrote the piece usually remember the hard parts. They can explain why paragraph three changed direction, why a source was excluded, or why one phrase was chosen over another. Someone who pasted generated text and lightly edited it often can't.
Use targeted follow-up questions
General questions are easy to bluff through. Specific ones aren't.
Instead of asking, “Did you write this yourself?”, ask things like:
- What source led you to this claim?
- Why did you frame the second section as a comparison instead of an example?
- Can you rewrite this paragraph now in a more casual tone without changing the meaning?
- What did you cut from the original draft, and why?
- What part took the longest to get right?
These questions don't accuse. They verify authorship through process memory.
If the writer can defend the choices, trace the sources, and revise on command, that counts in their favor even if the text initially looked suspicious.
Probe the suspicious lines directly
Sometimes one phrase does most of the damage. It sounds generic, inflated, or detached from the surrounding piece.
Pull that line out and test it. Ask the writer to unpack it in plain language. If they can't explain what they meant, or if they paraphrase it back into the same vague terms, the issue may not be AI alone. It may be weak understanding.
That distinction matters. Your job is not only to catch generated text. It's to identify writing that lacks accountable thought.
Interpret Your Findings and Make a Decision
By this point, you should have several kinds of evidence, not just one. The final step is synthesis.
A useful decision is based on convergence. One clue means little. Several independent clues pointing the same way create a defensible conclusion. If the draft sounds generic, shows low variation, triggers detector concern, and comes with no revision trail, the case is stronger than any one signal alone.

A practical verdict framework
Use categories, not absolutes.
| Assessment | What it usually looks like | Sensible next step |
|---|---|---|
| Likely human | Distinct voice, specific detail, credible process evidence, explainable revisions | Approve or continue normal review |
| Likely AI-generated or AI-heavy | Cluster of heuristic flags, machine-like patterning, suspicious tool results, weak provenance | Request clarification, revision, or policy review |
| Inconclusive or hybrid | Mixed signals, partial revision history, edited machine-like passages | Ask for disclosure, further revision, or a live writing sample |
Match the action to the context
An educator may ask the student to explain the paper orally or produce supporting notes. An editor may return the piece and ask for sharper sourcing, firsthand examples, and a rewrite in the writer's own voice. A hiring manager may move the candidate to a live exercise rather than discarding them immediately.
That restraint matters. Plenty of people use AI as an assistant. The sharper question is whether they can stand behind the content. If they can verify facts, explain reasoning, and revise with competence, the response may differ from a case where the text appears copied, unsupported, and misrepresented.
The standard to keep
You rarely need certainty. You need a judgment you can explain.
The most defensible conclusion is usually not “This was definitely written by ChatGPT.” It's “Taken together, the language patterns, tool results, and lack of provenance make this text unlikely to be fully human-authored.”
That is how experienced reviewers work. They don't chase a magic bullet. They build a record.
If your verification work extends beyond text, AI Image Detector helps you apply the same evidence-first mindset to visuals. It checks whether an image is likely human-made or AI-generated, gives a confidence-based verdict, and does it with a privacy-first workflow that's useful for journalists, educators, moderators, and fact-checkers handling mixed media.



