Successful cases
Kimu, a new chatbot in Basque that can be installed on in-house servers:
We have developed Kimu, a chatbot for Basque designed to help companies and organizations in their daily work. The model is lightweight, so it can be installed on servers and computers in companies and organizations, thus enabling data privacy and confidentiality to be preserved. The model is capable of understanding and executing various tasks requested in Basque in natural language by the user. However, depending on the needs of companies and organizations, the model can also be adapted to specific use-cases to further improve the quality of the results. What is more, although it has been created for Basque, Kimu also performs well in several other languages, for example, Spanish, English, Italian, etc.
Kimu, a new chatbot in Basque that can be installed on in-house servers:
Overview
We have developed Kimu, a chatbot for Basque designed to help companies and organizations in their daily work. The model is lightweight, so it can be installed on servers and computers in companies and organizations, thus enabling data privacy and confidentiality to be preserved. The model is capable of understanding and executing various tasks requested in Basque in natural language by the user. However, depending on the needs of companies and organizations, the model can also be adapted to specific use-cases to further improve the quality of the results. What is more, although it has been created for Basque, Kimu also performs well in several other languages, for example, Spanish, English, Italian, etc.
Challenge
One major advantage of the Kimu model is its small size: with 9 billion parameters, it falls within the category of Small Language Models (SLMs) among LLMs. Open-source Small Language Models (open-source SLMs) perform competitively in large languages (Spanish, English, etc.), but not for limited-resource languages, such as Basque. And low-resource languages lack sufficient resources to create models like this from scratch. Indeed, using cross-lingual transfer learning techniques, we have integrated Basque language skills into a small language model: we have combined a foundational model that we have adapted to Basque with a post-trained model that is not adapted to Basque.
Collaboration
To teach Basque to the foundational model, which does not know Basque well, it has been used the Zelai Haundi corpus, created by Orai. This corpus contains 500 million words which only has free license content.
Result
The model is capable of understanding and executing various tasks requested in Basque in natural language by the user, such as translating and summarizing, answering queries about documents, extracting information, correcting and adapting texts…
Project images