Socially responsible representation of accents in voice data: Considerations for practitioners and policymakers

Kathy Reid and Elizabeth T. Williams
School of Cybernetics, Australian National University


Voice technologies such as automatic speech recognition (ASR) and natural language processing (NLP) are increasingly deployed in systems such as voice assistants, chatbots and avatars, across a range of real-world contexts, such as healthcare [6]. These technologies are driven by machine learning (ML). At the heart of ML lies data, large volumes of which are needed to train accurate ASR and NLP models. Data structures used for storing voice data often allow for the representation of features of the speaker – such as gender, age and accent. Accent is a broadly defined concept, but in general is taken to be how a person’s spoken language differs from the local spoken language across criteria such as phonetics, intonation and lexicon [2] [1]. Accent data is often included in voice datasets used for ML so that the model better meets the needs of the context to which it is deployed. For example, a model intended for use in an Australian context may be trained on a dataset containing Australian-accented English.

Bias – systematic and unfair discrimination against individuals or groups in favour of others [3] – may be present in voice data. Often, biases are then propagated into ML models when they are trained upon this data. Voice datasets and models may therefore exhibit accent bias – the inability to accurately recognise particular accents – and thereby discriminate against people who speak with accents. This challenge is attracting nascent research interest as voice technologies proliferate and desire grows to build voice technologies that work well for speakers of all accents [4] [5].

This contribution will examine current methods of representing accents in voice data and their limitations. Debate will be fostered on alternative approaches to accent representation which contribute to reducing bias, increase the diversity of accent expression and help guard against the harms of differentiating speakers by their spoken accent.
Thanks are extended to Charlotte Bradley for her review of this submission.

[1] Georgina Brown. 2014. Y-ACCDIST: An Automatic Accent Recognition System for Forensic Applications. Ph.D. Dissertation. University of York.
[2] Tracey M Derwing and Murray J Munro. 2009. Putting Accent in Its Place: Rethinking Obstacles to Communication. Language teaching 42, 4 (2009), 476.
[3] Batya Friedman and Helen Nissenbaum. 1996. Bias in Computer Systems. ACM Transactions
on Information Systems (TOIS) 14, 3 (1996), 330–347.
[4] Arthur Hinsvark, Natalie Delworth, Miguel Del Rio, Quinten McNamara, Joshua Dong, Ryan Westerman, Michelle Huang, Joseph Palakapilly, Jennifer Drexler, Ilya Pirkin, et al. 2021. Accented Speech Recognition: A Survey. arXiv preprint arXiv:2104.10747 (2021). arXiv:2104.10747
[5] Bret Kinsella and Ava Mutchler. 2020. Smart Speaker Consumer Adoption Report 2020. Technical Report. Voicebot.AI.
[6] Emre Sezgin, Yungui Huang, Ujjwal Ramtekkar, and Simon Lin. 2020. Readiness for Voice Assistants to Support Healthcare Delivery during a Health Crisis and Pandemic. NPJ Digital Medicine 3, 1 (2020), 1–4.