As a very early adopter of an Amazon Echo device, while living in the U.S., I discovered that even though I had been interacting with native English speakers for most of my adult life (including working as a diplomat in an English-speaking country) I was having serious difficulties making myself understood by Alexa – and, as I discovered soon afterwards, by Siri, Cortana and most other forms of AI-powered natural language processing systems.

I soon realized that my not ridiculously thick, but still noticeable Italian accent was throwing these “intelligent”, English-speaking systems completely off the rails. I also learned that I was not alone in my pain: I met Irish, Australian, South African, Scottish people with the same communication problem.

While this proved to be a great ice-breaker at cocktail parties, the experience led me to think more deeply about the assumptions that designers of Artificial Intelligence systems make, whether for speech recognition and natural language processing, or in other application domains.

How was it possible that allegedly “intelligent” systems were not able to perform such a simple task? Or perhaps this task is actually not that simple, but given that most human beings are actually capable to “filter out” foreign accents of their interlocutors at a very early age, what does this tell us in terms of our expectations of what Artificial Intelligence systems can deliver?

Furthermore, since language accent is often one key predictor of socio-economic background and status, what can this episode tell us about the way in which these AI systems are trained, and what are the assumptions that their designers are making about their users? Can this quite specific scenario teach us something more about the broader social, economic and political assumptions that AI system designers are (not) making, or should be making, and how?

