In order to evade AI detectors, we gave Wraith Scribe's AI a "personality." In this personality though, it can say things that are grammatical incorrect, give misspelled words, or use profanity excessively.
Thus we have introduced a custom AI model which will take the raw AI text, and edit it to be grammatically correct (with much fewer profanities).
The secondary problem is that Grammarly will be pulling the plug on allowing developers to integrate their widget. Thus, Wraith Scribe will no longer support Grammarly in our text editor in 2024. So it's even more important that Wraith Scribe's text comes out near grammatically perfect in order to both reduce having to edit the text, and also to the text much easier to read.
Before the new AI model has been implemented, the baseline AI model produced (across 10 articles, and 20317 words / 182 errors):
- An average of 111 words per 1 grammatical error.
- Standard deviation (per article) of 51 words per error.
Grammatical errors are measured inside a subset of Grammarly's count of errors. A lot of Grammarly's suggestions are false positives, and thus is errors are only counted here when it is an actual grammatical error. It may want to suggest a colloquial phrase like "He can't tell north from south" to "He can't tell north from the south." The latter's wrong because it changes the inherent meaning of the sentence. The former insults the subject; the latter suggests the subject is not able to differentiate the directional north to the southern belt of the United States.
To illustrate how I counted errors further: Grammarly may say a 2000-word article has 30 errors, with 10 false positives. In this case, I would manually go through each of those errors to see if they're real grammatical errors and end up waiving the 10 false positives. I'd end up with a 2000-word article with 20 errors, or 1 error per 100 words.
After about a week of painstaking work of building samples and iterating through models, I am happy to report the final AI model produced (across 10 new articles with 16503 words and 94 errors):
- An average of 175 words per 1 grammatical error. Or an improvement of 57%.
- Standard deviation of 58 words per error (a bit higher than control).
- About 1 standard deviation better than baseline.
Of the 10 articles generated, there were 2 underperforming outliers; the other 8 articles all had a staggering 200-250 words per 1 grammatical error. Getting rid of outliers, we get (across 8 articles of 13729 words and 63 errors):
- An average of 217 words per 1 grammatical error. An improvement of about 95%.
- Standard deviation of 21 words per error.
- About 2 standard deviations better than baseline.
Making a 1-shot model that is grammatical flawless every single time is not feasible. If you take a look at autonomous AI driving or even Grammarly themselves--they are far from ideal. AI driving works most of the time, but not all of the time. Grammarly requires the user to supervise their suggestions because they're not confident that all their grammar / spelling suggestions are good. I'm one person and they have a whole team of AI engineers.
That said, I do plan to make this better in the future. A 2001-word article with 9 small grammar mistakes isn't the end of the word and much better than I could possibly do in a 1st (or even 3rd) draft. But making a 2001-word article have only 2 small mistakes is even better. The tradeoff here is much more compute time, and that cost is inevitably passed onto the customer. So as a first draft, I want to conservatively reduce grammatical errors as much as possible, with the least amount of compute so I can keep costs low for me and the users.
Q & A
Question: Why couldn't I also get rid of outliers in the baseline results?
Answer: The baseline results across the 10 articles vary wildly; from 56 to 182 words per 1 grammatical error, and are distributed fairly evenly (4 between 56-69, 4 between 108-141, last 2 are 182 and 201). If I get rid of the outliers, the baseline model would actually be worse, which means the final model would look even better. Conversely, the final model has a tight grouping across 8 articles and only 2 articles are under the 200 words per mistake mark.
Question: Why isn't this perfect?
Answer: 1-shot, perfect grammar models are not expected. Grammarly has a large team of AI engineers and they still rely upon human beings to supervise whether or not their suggestions are good or not (and a lot of it isn't). As a rule of thumb, AI can help improve things and reduce incidences of errors, but 100% eliminating them is not feasible due to the probabilistic nature of AI.
Question: Will you make this even better in the future?
Answer: Yes, but there's a tradeoff between making this better and charging users more, as well as loading time to publish the article. There's probably some optimal point of low-frequency grammar mistakes and low extra costs, and I don't think I've hit it yet. Though in the short-term, there are many other things for me to work on.