This thread reminded me of this:
https://bitcointalksearch.org/topic/m.52047825If anybody wants to have a bit of fun, there is a tool created by the IBM Watson AI lab called GLTR (Giant Language model Test Room) that aims at detecting whether a text is AI generated or human generated. See:
http://gltr.io/dist/index.htmlIt provides some sample texts, but you can provide your own. In brief, it matches each word with a prediction based on the previously written word. The more predictable the words, the more likely the text is AI generated (green and yellow in the tool’s output). Further details here:
http://gltr.io/AI to trap AI …
So I wondered what GLTR would make if the article created with OpenAI GPT-3 (from 'OpenAI' to 'human operator'). I haven’t seen any updates to the GLTR website, so I do not know if the underlying software has been upgraded since 2019, but anyway. Alas, it doesn’t seem to be working properly, and I only managed to run a small fragment through it. Running all the text made the program stall (or I lost patience waiting for over 30 min., despite multiple attempts).
I also tried out another site that detects fake news:
https://grover.allenai.org/detect. When I gave it as input the complete text written by the AI, it stated as output "We are quite sure this was written by a machine", whereas when I fed it the complete explanation provided by the author, it came back with "We are quite sure this was written by a human". That looked promising, until ...
When I did the same, feeding it one paragraph at a time, the output indicated that the first paragraph of the AI written text was quite surely written by a human, but all the rest were deemed as written by a machine (one was likely, but not certain).
Of course I lost faith in the site when I ran the OP and I got a "We are quite sure this was written by a machine" (I’m pretty sure @fillippone’s hairy wrist is quite human). Same goes for multiple other posts within the thread.
As AI text generation progresses, so does the counter-detection software. Of course, integrating this into a forum and having to cope with truckloads of false positives would defeat the purpose. I recall the GLTR solution being based on predicting the next word in a phrase, and determining whether the next word was within the most common next predicted words - the further away from common word prediction, the better (i.e "the cat was" + "angry" is more standard than "the cat was" +"flabbergasted"). That kind of algorithm may be an approach to work on text written by natives to a language to a certain degree, but people with limited language skills and vocabulary would probably display as "AI" posters, and not as non-english limited vocabulary posters. Anyhow, interesting to see how this evolves (in general terms).