The future of voice recognition

29/01/13
As anyone who’s ever struggled to make themselves understood to an automated phone service will tell you, voice recognition software is far from perfect.

The technology we use to provide live subtitles is more specialised and effective than that of the average cinema ticket office, and it’s improving all the time. But there will always be issues with any voice software. What are those issues likely to be in the future?

In the short- to medium-term, we may see advances in those currently Partridge-thwarting automated voice recognition systems. However, it seems unlikely that software will be able to match human stenographers and respeakers any time soon. Some automated systems are relatively successful when following a single speaker. But on coverage featuring two or more speakers – on a TV debate, say, or chat show – it is immensely difficult to capture everything being said accurately, due to factors such as people’s tendency to talk over each other, and the fact that it can be difficult to differentiate off-screen speakers. So thankfully it looks like we human subtitlers won’t be out of a job just yet.

More generally, one big problem for any English-language program will likely always be the sheer size of the language. I don’t mean the number of words we use (although English does have one of the biggest vocabularies of any tongue – not that it’s a competition!); I mean its massive variety of accents and modes of speech. Since much of the tech is developed in the US, it’s understandable that our software is best able to intrepret US English. It’s also pretty good for South-Eastern British English, since that’s also quite a ubiquitous accent. However it can struggle with others, especially distinctive regional ones like Glaswegian or Northern Irish. Again, things are improving on this front. But given the diversity of accents just within our offices, let alone the rest of the English-speaking world, this is likely to be an ongoing challenge.

In a similar vein, globalisation will continue to provide a unique set of obstacles for subtitlers. Another implication of voice software being developed by English speakers is that the tech has difficulty understanding sounds of other languages. As popular sports televised in Britain attract ever more players from non-English speaking countries, it will become more important to viewers that subtitling software can accurately capture the names of footballers like Demba Ba or tennis players such as Na Li. Names like these aren’t naturally easy for our software to understand (although usually we are able to reproduce them using specific work-arounds). As Red Bee now provides subtitles on a wider range of sports thanks to its work with Sky and ESPN , we’ll be keeping a close eye on potential improvements in voice recognition across languages.

What do you think? Let me know in the comments below.

Martin Cornwell, Subtitler