Why Automatic Speech Recognition is not Automatic Captioning?

Share Twitter Linkedin

Live captioning is challenging to deliver, no matter how it is produced. Due to its real-time nature, the text needs to be produced immediately with almost no time for reaction and correction. Any interruptions due to technical or resourcing problems result in immediate on-air outages. Delivering a quality service with high accuracy, high uptime, and low latency takes a well-resourced operation.

At Red Bee Media, we are launching our fuly automated live captioning solution at IBC this year,and so it seemed a good time to write about some of the challenges we have discovered in the development of this service.

For over a decade, Red Bee Media has been producing live captions using the “respeaking” technique – that is, where a trained captioner repeats everything they hear in the programme audio into an Automatic Speech Recognition (ASR) engine. This engine has been trained to their voice, and in turn feeds our Subito platform, giving a further opportunity to correct the output as it comes out in real-time. These ideal acoustic conditions, the spoken punctuation and the delivery of speech through a single, calm voice guarantee high levels of accuracy.

With fully automated live captions, however, you remove the person from the point of immediate delivery and instead run your ASR engine directly against the programme audio. This reduces the cost of the service, but it also comes with a significant reduction in accuracy. What’s more, Live ASR performance is inevitably less accurate than offline ASR as it has less time to make a decision on what has been said and is unable to take the words that follow an utterance in to account.

The best way to increase the accuracy of live captioning, no matter how it is produced, is through preparation and practice, and the analysis of historic output in order to find areas that need improvement. With our fully automated live captioning solution we have brought the expertise and experience of our live captioners to bear on the matter – they work with our technical team to tailor the solution to specific output. We can choose the best ASR for the given range of accents, subject material and genre, and apply genre and programme-specific vocabulary lists to both the ASR engine and our Subito software’s House-Styles for automatic substitutions.

While all of the above gives us the most accurate transcription possible, we also need to manage the fact that Automatic Speech Recognition is not Automatic Captioning. By making use of our Subito platform we are able to take that raw, real-time text and format it into captions that have the styling appropriate for a given market, apply timing rules and automatic omission of redundant text to ensure readability and alignment to the relevant regulatory mandates, provide timing offsets to ensure the best synchronicity possible and deliver these captions to a huge variety of inserters, encoders and online end-points.

Fully automated ASR-based captioning is not ready for mainstream broadcast Television yet. However, it is improving all the time, and has reached a point where it can suit programming for smaller audiences and tighter budgets, or where regulatory requirements are not major a factor, but where the content owner nevertheless wishes to ensure accessibility to a wider audience.

Our Subito platform also opens the possibility of a mixed-mode of service, with “prime time” shows delivered via the traditional managed service and other content at “off-peak” times covered using the automated approach. In a similar vein, the automated solution can also be leveraged to guarantee uptime by providing a very low cost, 24/7 failsafe.

By letting Red Bee Media take the responsibility for managing and improving this service, we can always deliver the best possible automated captioning solution for any given broadcast – with constant evolution to incorporate the latest innovations in our captioning platform and the latest advances in AI-driven Speech Recognition the market has to offer.

Hewson Maxwell, Emerging Technology Lead, Access Services

Share Twitter Linkedin

Why Automatic Speech Recognition is not Automatic Captioning?

Event

Blog

news

news

news

news

Event

Event

news