Chatbots vs. Watermarkers: an ethical and legal battle

“Chatbots vs. Watermarkers: an ethical and legal battle” by Niccolò Lasorsa Borgomaneri, Studio Legale Marsaglia, Milan

The current situation

Chatbots have become increasingly popular in recent years due to their ability to automate conversations and improve the efficiency of processes. Indeed, one only has to type ‘use chatbots’ into Google to realise the amount of companies that sponsor and provide its use for improving company productivity.

If until a few months ago no one knew the real meaning of the word chatbot, today not only have the vast majority of people learnt its meaning but have also tried to use chatbot technology.

If the most widely used chatbot is the one created by the start-up openAi GPT (1), which already has more than a million users, we have to point out that there is now a very long list of chatbots or projects linked to its creation by the giants of the technology industry; starting from Google, which in January published a document describing a similar model capable of creating new music from a textual description of a song and is working on the creation of the anti-chat GPT called ‘Apprentice Bard’, to Baidu, the Chinese search giant, which intends to incorporate a chatbot in its search engine in March, to Replika, a chatbot that presents itself with the slogan ‘the companion that cares about you’ and which has been, correctly, panned by the comments of the Privacy Guarantor (2), the examples are endless.

The situation has been referred to as the ‘chatbot war’ in Silicon Valley as the increasing proliferation of artificial intelligence (AI) tools specifically designed to generate human-like texts has left many perplexed.

Indeed, the use of chatbots has also raised some ethical and legal issues, particularly in relation to their use by students and intellectual property.

We try to dwell on the problems of the use of chatbots in academic circles and the simultaneous emergence of tools for detecting whether a text (or other ‘material’) has been created by chatbots.

Since ChatGPT, was launched in November, not only have students started to cheat by using it to write their essays, but also, one of several examples, the news site CNET (https://www.cnet.com/) has used ChatGPT to write articles, only to have to make corrections after accusations of plagiarism.

Many students have used chatbots as a study support tool, particularly for learning difficult concepts or solving complex tasks. However, this raises the question of the ethics of using chatbots. If students use chatbots to pass their assignments, can these be considered as a result of their work? Or is it the students who use chatbots who gain an unfair advantage over their classmates?

Teachers, in particular, are trying to adapt to the availability of software that can produce a moderately acceptable essay on any subject in no time. Should we go back to pen-and-paper assessments? Increase the supervision of exams? Ban the use of artificial intelligence altogether?

This question is particularly relevant in academia, where students are assessed on the basis of their knowledge and skills. The use of chatbots may pose a threat to academic integrity, as students may not be able to demonstrate their true level of knowledge or gain an unfair advantage over other students. Furthermore, the use of chatbots could be considered a form of plagiarism, as students use automatically generated responses to answer tasks that require their personal knowledge. In fact, another problem related to the use of chatbots is intellectual property. Chatbots can be used to generate answers or solutions to problems, but who owns the intellectual property on those answers? Students using chatbots could be accused of violating the intellectual property rights of the chatbot owners. This while at the same time the chatbot owners themselves are accused of infringing intellectual property rights by those whose texts, documents, photos or other things are used precisely to ‘feed’ the chatbot algorithms.

The well-known image site Getty Images wrote in mid-January announcing the commencement of its lawsuit against Stability AI: ‘This week Getty Images commenced legal proceedings in the High Court of Justice in London against Stability AI, alleging that Stability AI has infringed intellectual property rights, including copyright in content owned or represented by Getty Images. Getty Images believes Stability AI has illegally copied and processed millions of copyrighted images and related metadata owned or represented by Getty Images, in the absence of a licence, for the benefit of Stability AI’s commercial interests and to the detriment of content creators.

Getty Images believes that artificial intelligence has the potential to stimulate creative endeavours. Accordingly, Getty Images has provided licences to leading technology innovators for purposes related to the training of artificial intelligence systems in a manner that respects personal and intellectual property rights. Stability AI has not sought any such licence from Getty Images and has instead chosen, in our view, to ignore viable licensing options and long-standing legal protections in order to pursue its own independent commercial interests

Thus, again, the enormous potential of artificial intelligence is emphasised, but also the necessary compliance with legal barriers.

And the problem is precisely this: what are these barriers?

Returning to the topic of the use of chatbots in the academic field, there are already examples where, in order to avoid these problems, some academic institutions have banned the use of chatbots by students. However, this is not always feasible or effective, as chatbots have become increasingly sophisticated and difficult to detect.

This is for the simple reason that, with the advent of artificial intelligence, both text detectors and text generators are becoming increasingly sophisticated, and this could have a significant impact on the effectiveness of the various methods and tools proposed for recognising AI-generated text.

Moreover, if we are to be completely honest, chatbots can also be used by teachers and researchers to automate their research and teaching processes.

The paradox is why the world’s leading artificial intelligence companies cannot reliably distinguish the products of their own machines from the work of humans. The reason is very simple: the main goal of AI companies is to train AI ‘natural language processors’ (NLP) to produce results as similar as possible to human writing. In fact, the public demand for an easy means to detect such AI actually contradicts their own efforts in the opposite direction.

The different technical remedies

Watermarks

In this context, the use of watermarks can be an effective solution for managing the use of chatbots. To simplify, watermarks are digital markers that are embedded in an image or document to identify the owner or author of the work. In this case, watermarks can be used to identify answers or solutions generated by chatbots.

These ‘watermarks’ are invisible to the human eye, but allow computers to detect that the text probably comes from an artificial intelligence system. If incorporated into large language models, they could help prevent some of the problems these models have already caused.

Watermarking is a security technique used to protect intellectual property, particularly digital documents, from unauthorised use and counterfeiting. This technique involves inserting an image, text or other type of watermark within the document, making it unique and easily traceable.

In some studies, these watermarks have already been used to identify artificial intelligence-generated text with near certainty. Researchers at the University of Maryland, for instance, were able to identify text created by Meta’s open-source language model, OPT-6.7B, using a detection algorithm they built.

Although one of the University of Maryland researchers who participated in the work on watermarking, John Kirchenbauer, ‘It’s the Wild West right now’, perfectly sums up the current situation.

Classifiers

We include under this definition the tools whereby programmers ‘teach’ the computer to do something with data already labelled by humans, i.e. to classify (in our case) the use of certain words instead of others, or combinations of words as processed by a chatbot.

OpenAi itself, the creator of ChatGPT, presented a ‘classifier for indicating texts written by AI’ (4) in January, admitting, however, that it has a success rate of no more than 26% of the analysed text.

Another classifier that appears to be more effective is the one created by Princetown student Edward Tian, who released the first version in January of GPTZero.

This application identifies the authorship of artificial intelligence based on two factors: the degree of complexity of a text and the variability of the sentences used.

To show how the programme works, Tian posted two videos on Twitter (5) comparing the analysis of a New Yorker article and a letter written by ChatGPT. In both cases, the app was able to correctly identify their human and artificial origin.

The current ‘trick’ to defeat classifiers is to replace certain words with synonyms. Websites offering tools that paraphrase AI-generated text for this purpose are already popping up all over the world.

Using these sites (e.g. https://www.gptminus1.com/), even Tian’s classifier did not exceed the percentages seen above.

AI-generated text detectors will become more and more sophisticated. The anti-plagiarism service TurnItIn (https://www.turnitin.com/it) recently announced the arrival of an AI handwriting detector with a claimed accuracy of 97%.

However, text generators are also becoming increasingly sophisticated as we pointed out above.

It is the classic battle in which we do not see a winner but only two contenders who are constantly overtaking each other without seeing a finish line and, therefore, a winner at the moment.

Conclusion:

We believe that, as has often happened in the digital sector, we will reach a point where custom and practice will lead the legislator to be able to regulate this sector too in a harmonious manner.

Of course, when, in addition to the law, ethical issues are involved, it is difficult to get everyone to agree, so we are sure that, at least for the academic field, it will not be possible to please everyone.

With regard to intellectual property related to these issues otherwise, we are sure that practice will lead to constantly evolving jurisprudential pronouncements that will be able to draw from both the law and the relevant technology.

The above could imply that it is even more difficult for companies to protect their intellectual property, as counterfeiters could use advanced text generators to create documents that look authentic, but are in fact counterfeit. This could necessitate a change of approach in IP protection, with the adoption of more advanced security techniques, such as cryptography or blockchain.

In summary, the advancement of artificial intelligence could have a significant impact on the security of digital documents and the protection of intellectual property. It is therefore important for companies to take advanced security measures to protect their digital documents and intellectual property.

 

  1. So much so that for about a month now, it has been so easy to fail to use https://chat.openai.com because it is too crowded with users that when you try to connect, without success, you are asked to provide your email address in order to be notified when you will be able to use the chatbot.
  2. Guido Scorza at https://www.garanteprivacy.it/web/guest/home/docweb/-/docweb-display/docweb/9851525
  3. https://www.agendadigitale.eu/cultura-digitale/watermarking-chatbot-diritto-autore/
  4. See https://openai.com/blog/new-ai-classifier-for-indicating-ai-written-text
  5. https://twitter.com/edward_the6/status/1610067688449007618?s=20&t=jg6P95D41soM9gNS1J43gg

 

Milan, 5 April 2023

©Niccolò Lasorsa Borgomaneri