Open AI is seemingly
working on a solution to digitally watermark their outputs, through a kind of "
unnoticeable secret signal in its choices of words, which you can use to prove later that, yes, this came from GPT" (see
this blog):
<…> Conceptually, any post using GPT to create content on Bitcointalk is plagiarizing, as the poster is not creating the content himself, and is in fact trying to pass on someone else’s content as his (albeit that someone else being an AI). According to OpenAI’s
Sharing and Publication Policy, using the API, and I have to assume that that extends to the results of their chatbot, requires one to explicitly indicate that the content was AI-generated. Though this latter point is not technically of our concern, it seems like a reasonable request to place, in a similar fashion to links on posts that are largely/verbose based on other sources.
Detection of GPT usage is not going to be easy for the most, and likely, over time, people can pick-up on patterns such as the usage of near perfect English, consistency in its usage throughout all posting history, and/or alternating with changes in style (human/AI), lack of real interaction from a less than academic point of view, certain types of formal constructions, and so forth. This is obviously is not exclusive to GPT, nor sufficient to deem someone a GTP-plagiarizer with certainty, and maybe likenesses is the closer one can get shorter of a confession.
Now all this is, if it becomes an extended practice, is going to be a drag, whereby people will be able to create bag loads of posts with cero effort and thought, and although likely matching quality-wise a large base of posts that we encounter per se, it may easily become a new spam-fest source of neutral content.
I’ve read, though I couldn’t find the original source (i.e. team declarations), that Chat-GPT can’t plagiarize per se (the language model generates the text using a probability for the next best word to use, based on the prior words it has already used), although there could be a fortuity chance of it happening. Many of the posts we read seem to me, from a reader’s point of view, a compendium of text-spinning ideas. Though that is not what it’s really doing, the probability of the next word to use is derived from the model created through the training data, and that is inevitable a subjacent reference to all the text it provides.
On the other hand, even if the text is comprehensive and aligned somewhat with what the average Joe may be able to come up with, the poster conceptually (in my book) plagiarized the output from Chat GPT, without giving credits to the source of the text’s generation, trying to pass the text on as his own.