Up to now, the focus of organisations keen to move their video content online and into a form that’s portable and consumable on the average handset has been to present complete assets (or programmes), or special, curated excerpts to audiences via either their own portal or a shared portal like YouTube.

That’s all well and good, but it’s not particularly revolutionary, and it doesn’t necessarily reflect how media is viewed these days by those for whom social media has revolutionised behaviour – in particular, the ‘difficult to reach’ 16-24 year-olds. It’s a nervous mantra among traditional TV channel management teams that there’s a growing challenge to entice this group to park themselves in front of a big screen and watch anything longer than 15 minutes of passive entertainment. Live events will drag them to the couch for fear of missing out, but what about the many thousands of hours of pre-recorded content available?

I’d dispute that anyone with a smartphone is hard to reach – but it does take a little creativity to capture their attention. Sticking a traditional TV channel online simply won’t reach a considerable section of the market – you need to reflect the patterns of consumption found in ecosystems outside broadcast TV. Viewing patterns on connected devices are very different.

According to Ericsson ConsumerLab, 61% of consumers watch TV and video content on smartphones, with teenagers spending nearly two thirds of all TV/video viewing hours on a mobile device. A solution needs to be found that allows people to view traditional media in a new, bite-size way that facilitates this behaviour rather than frustrates it.

Online search is still heavily weighted towards finding programmes; yet it’s not really possible to search for a chapter, a memorable phrase or a mood. Temporal metadata – i.e. data relating to what’s happening on screen at any given point in a clip – is missing, and I think it’s the key to facilitating innovative methods of content discovery.

So, how can we go about providing a portable, platform-resilient mechanism to transport time-specific data about a given video clip?

There are clearly paradigms, the most straightforward of which is probably a caption data track. The captions or subtitles for a clip are simply a series of time-specific text events that must always synchronise with the video content. Most player technologies have solved the challenge of linking the ‘time-codes’ that define these events with the video clip, and many of the caption file formats used by these players are in some kind of extensible format. If the caption file is in XML, it’s relatively trivial to include extra ‘hidden’ timed data that can convey chapter makers, speaker information, moods, products on screen, music information or data to facilitate the further understanding or exploration of the subject matter of the clip.

Creating this data can be done automatically, but the technologies are still pretty immature and will not provide consistent, good quality results. Given that in most workflows a captioner is already being used to transcribe the content and generate captions, why not task them with a little extra work to capture the metadata required? Captioners are using mental and practical processes that generate the required metadata as part of the captioning workflow – it’s simply a matter of ensuring they have a simple method of capturing this data within the captioning software. In this way curated and quality controlled data can be created that adds genuine value to the video content.

What commercial opportunity does this create? A basic approach would be to modify the online presentation to resemble DVD playback; the viewer can skip chapters and find the scenes that include their favourite characters. This experience can be enhanced by adding commercial or thematic links: want to buy the music you’ve just heard, visit the location or simply link though to a Wikipedia explainer on a particular subject? It can all be done with a click or a save-for-later mechanism.

There’s also an opportunity to create ‘non-linear’ viewing opportunities. If we know all the scenes in all the content on a given platform, we know who’s in them and we know the mood(s), then it’s possible to start creating personalised experiences for the platform user or viewer. Imagine a typical commuter: she knows her average train journey is 32 minutes long, she’s a massive fan of her local football team and she’s always looking for something that will make her laugh on her way to and from a demanding job. For her work she needs to stay abreast of the latest financial news – but doesn’t always have time to read every background feature. We should be able to create a pretty neatly-defined profile from this kind of information (some of which she’ll volunteer herself, some of which can be inferred from her behaviour). We can build algorithms that take these needs and search through the platform’s archives and latest updates, and use the results to construct a personalised playlist for her.

This personalised playlist could include stories from a sports news service relevant to her team, a few financial background stories, a catch-up on key scenes from a favourite drama so that she’s ready for the next episode and a few archive comedy clips from classic shows. All this can be presented with relevant (and brief) targeted advertising spots, and maybe a taster of a new drama that the algorithms predict she might want to follow. This would be precisely the right length for her commute, could be cached in order to prevent signal black-spots from disturbing the viewing experience, and would be tailored to engage, inform and entertain in way that ensures she steps off the train ready for work.

Sounds a little futuristic and hard to achieve? Not really. The processes are pretty much in place, the underlying data formats are ready; the next step is to turn this into a genuine user experience with some engaging UX design and reliable back-end infrastructure and services.

Matt Simpson, Head of Technology, Access Services