Common large language models (LLMs) can be adapted to different requirements by means of fine-tuning. However, according to Aleph Alpha, this often delivers “unsatisfactory results when adapted to new languages or highly specialized industry knowledge”. The Heidelberg-based start-up has developed a new AI architecture that aims to change this. Aleph Alpha is also cooperating with AMD, SiloAI and Schwarz Digits.
During training, LLMs learn patterns based on a tokenized version of the texts used for training. The texts are broken down and their structure analyzed, from which probabilities are ultimately derived. Once the training is complete, the resulting LLMs can only be further adapted by means of fine-tuning. This is done as a kind of build-up on the existing LLM. The problem arises when the new text differs greatly from the text used to train the LLM during fine-tuning. Then, as Aleph Alpha writes, “it cannot be tokenized efficiently”.
A new tokenizer-free architecture is intended to change this. This is arranged hierarchically and combines processing at character and word level. The published paper states: “It uses a lightweight character-level encoder to convert character sequences into word embeddings, which are then processed by a word-level backbone model and decoded back into characters via a compact character-level decoder.”
According to Aleph Alpha, this makes it possible to create “sovereign models for different alphabets, less common languages and highly specific industry knowledge”. Aleph Alpha speaks of a breakthrough. A great deal of data was previously required for successful fine-tuning. The new architecture is significantly more efficient. This saves computing power and therefore resources. For many languages, there is not enough data available to achieve good results in the previous way.
AMD, SiloAI and the Schwarz Group join in
Aleph Alpha is also cooperating with AMD and SiloAI. The Finnish start-up was acquired by AMD in the summer. According to the press release, “this new, innovative AI model architecture enables a 70 percent reduction in training costs and carbon footprint compared to alternative options for Finnish, for example.” AMD also believes that the collaboration will strengthen the European AI ecosystem.
Comparative values for training effectiveness
(Image: Aleph Alpha)
The offer is initially aimed at European authorities. Aleph Alpha has been targeting them as customers for some time. The AI operating system for authorities is called Pharia. The initiative is also supported by the data centers of Stackit, the cloud solution from Schwarz Digits. Schwarz Digits is the IT and digital division of the Schwarz Group (Lidl, Kaufland).
(emw)
Don’t miss any news – follow us on
Facebook,
LinkedIn or
Mastodon.
This article was originally published in
German.
It was translated with technical assistance and editorially reviewed before publication.
Dieser Link ist leider nicht mehr gültig.
Links zu verschenkten Artikeln werden ungültig,
wenn diese älter als 7 Tage sind oder zu oft aufgerufen wurden.
Sie benötigen ein heise+ Paket, um diesen Artikel zu lesen. Jetzt eine Woche unverbindlich testen – ohne Verpflichtung!