LANGUAGE MODELS:
How AI learns to talk

Although automated learning of language models such as ChatGPT has many advantages, there are also some potential dangers and challenges. Thus, ChatGPT and similar systems learn from a huge amount of texts. The contents of these texts can only be as good, correct and complete as they are sourced. They may contain biases or distortions, or make unexpected connections. This sometimes leads to unexpected and undesirable results, for example, when the model learns racist or sexist speech patterns that are present in the training data. They are also not immune to manipulation – for example, by deliberately entering text or distorting the training data to deceive the model.

Picture of Nikolai Zotov

Nikolai Zotov

Science Editor

Picture of Nikolai Zotov

Nikolai Zotov

Science Editor

Neural network as a basis

For example, as a linguistic AI model, ChatGPT uses a neural network to understand and produce natural language. However, this only allows the language model to reproduce the linguistic form. However, it is not capable of a proper understanding of the deeper meaning of a subject. With this technology, this Large Language Model (LLM) learns and thereby improves its responses and results. For this purpose, ChatGPT searches a huge amount of digitized texts in multiple languages, including books, magazines, articles, websites and social media. These texts are broken down into smaller units called words or subwords.

Once this is done, the system uses an algorithm called a transformer to detect and analyze relationships between tokens. Transformers are neural networks specifically designed to process text. The Transformer architecture consists of a matrix of (computational) neurons called multi-head attention blocks. Each multi-head attention block consists of multiple sub-neurons that are able to attend to different aspects of the text and analyze those aspects.

Model optimization also with human assistance

During the learning process ChatGPT goes through fine-tuning. This is an iterative process in which the model is gradually improved – also through human intervention, i.e. specialized teams or interaction with users – by training it to perform certain tasks. These can span everything from generating text to answering questions. An example of such a task is text generation, where the model is given a specific input (for example, part of a sentence) and then attempts to complete the rest of the sentence based on the previous context.

To train the model on a particular task, such as text categorization, a small amount of data is used called training data. This data is used with the fine-tuning technique to adapt the model to the specific task. When matched to the task, the model can be used to generate new text or answer questions related to the specific task.

Data augmentation – a machine learning method that artificially generates new training data by modifying or reshaping existing data – is also used to improve performance. This involves manipulating the training data set to prepare the model for different scenarios. For example, the data can be manipulated by adding synonyms or antonyms to improve the model’s ability to capture different meanings and contexts.

Another important step in training ChatGPT is backpropagation. In this process, the error of the model is calculated by comparing the output with the expected results. The error is then propagated backward through the network to determine which layers and weights of the network are responsible for the error. In this way, the model can be optimized to minimize future errors.

More backgrounds

Abstrakte Darstellung von Sprache