Forside

New Project to Increase Technological Access to Speakers of Smaller Languages

: 21.12.2021

In the project ’Multilingual Modelling for Resource-Poor Languages’, funded by a three-year DKK 5 million Carlsberg Foundation Young Researcher Fellowship, Associate Professor at the Department of Computer Science Johannes Bjerva and his team will address the challenge of increasing technological access to billions of speakers of resource-poor languages.

New Project to Increase Technological Access to Speakers of Smaller Languages

: 21.12.2021

In the project ’Multilingual Modelling for Resource-Poor Languages’, funded by a three-year DKK 5 million Carlsberg Foundation Young Researcher Fellowship, Associate Professor at the Department of Computer Science Johannes Bjerva and his team will address the challenge of increasing technological access to billions of speakers of resource-poor languages.

Language is the key to accessing modern technology such as online search, spelling correction, and automatic translation. However, out of the over 7,000 languages in the world, only a handful so-called resource-rich languages have access to such technology.

In the project ’Multilingual Modelling for Resource-Poor Languages’, funded by a three-year DKK 5 million Carlsberg Foundation Young Researcher Fellowship, Associate Professor at the Department of Computer Science Johannes Bjerva and his team will address this issue in order to increase technological access to billions of speakers of resource-poor languages.

HUGE INEQUALITIES

There are currently huge inequalities between the languages of the world when it comes to technological access. According to Johannes Bjerva, this is primarily due to the fact that research and development efforts have typically focussed on rich languages, and tried to use these approaches directly on smaller languages.

- Current algorithms rely on huge amounts of data, which only resource-rich languages, such as English, Russian, or Chinese, are able to provide. In theory, you could choose to collect the necessary data also for the smaller languages, but that would be a gigantic and extremely expensive task and simply not feasible, says Johannes Bjerva. 

To make up for the lack of data, a fundamentally different approach is therefore required when developing solutions for resource-poor languages. 

- We know from linguistic typology that some languages are similar in structure, and consistent in their differences. We can benefit from this knowledge by making use of data from a resource-rich language to create models that we can apply to resource-poor languages in an automated process, Johannes Bjerva explains.

SOCIETAL IMPACT

With its focus on providing technological access to speakers of resource-poor and therefore marginalized languages, the societal impact of the project is expected to be substantial.

- The uniqueness of this project lies in the use of knowledge from linguistic typology to design methods that will be applicable to smaller languages. We will dig deeper into the nature of linguistic structure and explore how we can make use of this in societally important technology, says Johannes Bjerva.

Access to information and communications technology is one of the UN Sustainable Development Goals (9.c), but Johannes Bjerva points out that the aim of the project goes even beyond that:

- Simply providing access to information and communications technology will not have the intended impact for speakers of resource-poor languages, as physical access to technology does not equate to access to modern technologies at a broader level. We need to include the language aspect in order to provide this access.

SCIENTIFIC IMPACT

According to Johannes Bjerva, the findings from the project will not only lead to fundamental changes in how multilingual NLP (Natural Language Processing) is approached. The project also holds the potential to have fundamental impact in other scientific fields.

- There is obviously a potential impact on linguistic typology, but I also expect that the impact on machine learning will be substantial. My guess is that between one third and half of the research output will be either developments of machine learning algorithms or more insight into the algorithms that we already have, he says and continues:

- Machine learning methods will actually be at the very centre of the project. The methodological framework that we will lean on consists of the most recent state-of-the-art methods within machine learning. 

INTERDISCIPLINARY TEAM

An important first task in the project is to put together the research team.

- The key to achieving the goals of this ambitious project lies in the composition of the team, and its focus on interdisciplinary collaboration. There will be a postdoc in the team with a profile similar to my own, i.e., a mix of linguistics and machine learning, and three PhD students including at least one with a background in computational linguistics, says Johannes Bjerva.

The scope of the Carlsberg Foundation Young Researcher Fellowship that Johannes Bjerva has been awarded is described in fairly broad terms under the headline of multilingual NLP. This means that Johannes Bjerva and his team will have extensive freedom as to the research directions that they will pursue.

- Being awarded this fellowship is obviously an important step in my scientific career, not least because the scope of the fellowship offers the freedom to pursue the research directions that I am most enthusiastic about. This freedom will also be very attractive to the researchers that will be part of the team, he says.

Together with a colleague from the Department of Computer Science, Associate Professor Niels van Berkel, Johannes Bjerva is the first ever from Aalborg University to be awarded a Carlsberg Foundation Young Researcher Fellowship.

FURTHER INFO

CONTACT

Johannes Bjerva
Associate Professor
Department of Computer Science,
Aalborg University
Mail: jbjerva@cs.aau.dk
Web: https://bjerva.github.io/