Skip to main content

Updates on projects to improve Welsh language technology and AI.

First published:
7 January 2025
Last updated:

Overview

Over the past few years, we have done a lot of work to improve technology to help more people to use Welsh in everyday life. 

Our current priorities for Welsh language technology are:

  • improving technology to increase the daily use of Cymraeg
  • making sure everyone can access Welsh language technology
  • improving Welsh language artificial intelligence (AI) and speech and language technologies (by sharing data and other means)

The Written Statement on Welsh language technology explains what we are doing to achieve this.

We’ll publish regular updates on this page.

Updates on Welsh language technology

Some of the links will be useful for a wide range of people, others point to infrastructure and data for digital developers. 

Bangor University

Welsh audio training data

Cymen was awarded an Arfor grant to produce and publish new Welsh language voice data under a permissive licence. They've transcribed and verified about 40 hours of speech.

Bangor University transcribed more audio data in the past year. There are now 52 hours of data in that particular sub-collection (August 2025)

As of August 2025, there’s a total of 249 hours of Welsh audio training data, including: 

Cardiff University

  • Published SENTimental: a tool to collect annotations to create training data and test Welsh sentiment analysis. Built with Lancaster University.

Welsh Government

Updates on Welsh placenames and mapping

The National Library for Wales

Mapio Cymru

Published guidance on how to add Welsh language street names to the OpenStreetMap Welsh map.