Artificial intelligence (AI) can’t do everything (or at least it can’t do everything well), but one thing generative AI tools using large language models are very good at is creating text. If you bombed the verbal part of the SAT test and writing anything longer than a text is terrifying, the whole experience can seem pretty magical; being able to generate an email, essay, or cover letter without having to stare at a blank page for hours and fret over every vocabulary choice is a powerful tool. That’s why it’s estimated that nearly 20% of adults in the U.S. have used AI to write emails or essays.
Once that email or essay is polished up (and fact checked, right?), however, there’s a looming hurdle: AI detectors, ranging from humans being aware of the “tells” behind AI-generated writing to online tools that purport to scan text and identify whether it was written by human beings or AI. The accuracy of those detectors is questionable, but people use them, so you have to worry about that if you’re going to pass off an AI-generated cover letter or other piece of writing as something not written by AI.
Enter the AI “humanizer,” a tool designed to take your AI copy and turn it into something, well, more human by removing and rewording common AI tics and phrasing. It’s an appealing idea: You get AI to generate your essay, you run it through the humanizer, and the end result seems like it was written from scratch by a human (presumably, you). But do they work?
The test
To find out, I conducted a little experiment. While this isn’t exactly an exhaustive investigation, it definitely gave me a solid sense of whether any of these tools are worth using if you insist on having AI secretly write all of your correspondence, school assignments, or heartfelt emails to old friends.
First, I had ChatGPT generate an essay on … how to make AI writing more humanized. It spun up an essay in a few seconds, and the result was perfectly coherent. I didn’t fact-check it or massage the text in any way; its sole purpose is to be tested in humanizing tools.
Next, I ran the essay through a few AI detectors to make sure it was a fine example of mediocre AI writing. The results were as expected: QuillBot scored it as 94% AI, ZeroGPT scored it at 97%, and Copyleaks scored it a robust 100% AI-generated. The world of AI detectors agreed: This essay from ChatGPT reads like it was written by ChatGPT.
The results
Now, could AI humanizer tools fix that? There are a lot of humanizers out there—the explosion of AI chatbots has inspired a war between the detectors and the tools designed to fool them. So I chose a few popular ones to test out.
First, though, I wanted a bit more calibration, so I did something obvious: I fed ChatGPT’s text back into it and asked it to humanize the text. All of these tools are AI-based, after all, so maybe the easiest thing in the world is to just ask ChatGPT to be less like itself.
What do you think so far?
Then I took the original ChatGPT-generated text and fed it through four other humanizer tools: Paraphraser.io, StealthWriter, Grammarly, and GPTHuman.
Now I had five “humanized” versions of an essay that three AI detectors had scored as pretty obviously AI. Would their scores improve? The answer is pretty much no, though one tool showed what you might generously call “promise”:
Paraphraser.io: Got murdered. Quillbot scored its version at 83% AI-generated, Copyleaks at a pretty firm 100%, and ZeroGPT at a suspiciously specific 99.94%.
ChatGPT: Bombed, although to be fair, it’s not specifically a humanizer, and perhaps a more thorough prompt would have yielded better results. Both QuillBot and Copyleaks scored it at 100% AI-gen, while ZeroGPT gave it 87.77%.
Grammarly: Also bombed pretty thoroughly, with QuillBot, Copyleaks, and ZeroGPT scoring its version 99%, 97.1%, and 99.97% respectively.
GPTHuman: This one had mixed results. QuillBot was totally fooled, scoring it 0% AI-gen, and ZeroGPT wasn’t sure of itself, scoring it just 60.96%. But Copyleaks had no doubt, slapping it with a 100% score.
StealthWriter: The most effective one tested here. While ZeroGPT was suspicious, scoring it as (again, curiously specific) 64.89% AI-gen, Copyleaks scored it at just 3%, and QuillBot was totally fooled with a 0% score.
One aspect of Stealthwriter that may have helped its effectiveness was the ability to keep running the humanizer over the text over and over again. The first run-through, StealthWriter claimed it would score as 65% human, so I ran it a second time, and the score jumped into the 80s, so I ran it again, and it hit 95%. After that, the score didn’t budge when I ran the humanizer tool over the text.
All of these tools state pretty plainly that you should review the results and make your own adjustments, and I didn’t review the humanized text for quality of writing or accuracy. I just wanted to see if they would fool AI detectors, and the answer is: Probably not, but StealthWriter might help.
Finally, consider that there are a lot of AI detector tools out there, which means the variability of scores (even with StealthWriter) is a concern: You can’t always know which detector tool someone is using. If they’re using a detector I didn’t use here and it’s better at detecting what StealthWriter is doing, for example, you’ll still get nailed. If you’re worried about your AI-generated text being detected as such, your best bet remains doing the writing yourself, or at least revising AI-generated text very, very thoroughly.