The Channel: Tagging TV and Temporal Metadata

This article was originally published in the Association of International Broadcasting Magazine, The Channel and written by Richard Kastelein with ideas and help from Paul Johnson. The full magazine is located online here and a PDF version of this article is located here

Historically, TV metadata has been used to supply Electronic Programme Guides (EPGs) and therefore has been adequate for description at a show level. Typically when the industry talks about TV metadata, they talk about snippets of information and images provided by companies such as Rovi and Gracenote that can be used for the descriptive editorial information, images and multimedia on one show as a whole.

But what about at the scene level? And why is temporal metadata — or Tagging TV — the new oil?

Itʹs now all about applying metadata not just to a whole piece of content, but individual chunks within it, such as a movie scene or song. Of course, this can be relevant both for production and search/ discovery... but the real value lies in providing contextual data on the second screen — whether that is curated or automated, factual or commercial. Let me explain further.

Over $200 bn is being spent annually in global TV ad spend — but, viewers are increasingly watching TV along with their portable devices, resulting in more and more viewer attention directed away from TV spots and towards their laptops, tablets and smartphones. This means those 30 second linear TV spots that agencies convince brands are worth millions are very likely to become less valuable in the future. The fragmentation of the viewing audience due to an increasing number of channels to choose from is another important factor. More people are watching TV – viewing figures are going up – but they are also watching more channels. And the number and choice of channels is not going to decrease in the future – on the contrary – particularly with the advent of Over the Top (OTT) content being fed to the living room by new gatekeepers like Samsung, LG, Sony with smart TVs, game consoles and Blu‐Ray players – and the powerhouses of Apple, Google, and perhaps Microsoft.


As the second‐screen mobile devices draw attention from commercials they will become hugely important in the future disruption of the current value chain in the industry... not only because viewers are drawn there to discover and share their content, gamify the TV experience, and engage in new ways…. but also the simple fact is itʹs bi‐directional. Like the web it can and will give Internet Protocol (IP) metrics that the old school TV value chain could only dream about in the past — giving a deeper understanding of consumers and their behaviour.

Let me give you some example headlines that make my case:

2011: 70% of tablet owners are using their device & 50 % of people are regularly online whilst watching TV
2011: 57% of over‐16s in the UK are using the internet for social networking
2011: 60% of TV viewers are distracted by a smartphone while watching TV
2011: 82% of TV adverts generate negative ROI 

Is there life after the 30‐Second Spot? Yes, and temporally‐tagged TV metadata is the key.

Why? Think about it. When Angelina Jolie shows up for the premiere of Kung Fu Panda 2 wearing Michael Kors ‐ and that dress has been tagged as metadata in the timeline of the show — the tag can then become a trigger for an action on the second screen. Such as, ʹSave for Later and Buyʹ or ʹLearn Moreʹ. When a Porsche shows up in a movie scene — perhaps it can trigger a second screen call to action by offering a free test drive? Perhaps even a different model can be shown depending on whether more is known about the demographic of the user — throwing in more targeted advertising to boot.

Now this all sounds great and even perhaps easy. But itʹs not. And thatʹs why itʹs the new oil. The lack of standardisation in the area of TV programme information (or TV "metadata" as it is called) poses increasing problems right through the TV value chain. Everybody loses, from content producers, broadcasters, advertisers and network operators to viewers. Production companies are chock‐full of creatives — they donʹtfind this extra work appealing in any sense and are not doing it. Thereʹs another chance to create contextual temporal tags at the broadcaster level — as they buy the scripts. But the infrastructure and common standards are just not there yet in the playout systems. So in many cases, itʹs third parties that are trying to solve this problem outside of the old school.


There are essentially two ways to tag video entertainment: curated and automated. And both have their pros and cons. Manually tagging millions of programmes and shows is going to take a decade of Mechanical Turks but this really offers up the best metadata. On the other hand, there are companies that use technologies to automate the process such as Speech to Text, Video Recognition technologies, Audio Fingerprinting, Natural Language Processing techniques, and when available, Closed Caption data to create temporal tagging of content to provide a clear view of what is happening when within a piece of video.

The automatically culled data is then cleaned up with algorithms and output to an XML file which can be used in conjunction along the timeline of a video. And this can be done in real time with live video, believe it or not. The tags can then be further automatically linked by algorithms to companion content from reliable sources such as Wikipedia and IMDB. Or even linked to eCommerce sources such as Amazon, eBay or the App Store on the device itself.

Probably the best way is to use a combination of both types of tagging ‐ automated then moderated/curated by humans.

There have been attempts to create common XML standards for the industry around EPG metadata — three broadcast industry initiatives have been started to tackle the problem. The earliest was DVB‐SI, which is an integral part of the digitalisation of TV in Europe and other regions of the world. Two other promising TV metadata standards that build on the precedent of DVB‐SI will soon be finalised. TV‐Anytime, the first of these, addresses the needs that arise from high volume low cost storage (e.g. PVRs and VOD services). The second, MPEG‐7, is much broader in its scope, seeking to provide tools for describing all forms of multimedia content delivered by the broadest possible range of networks and terminals.


The Brussels funded FP7 EU NoTube project aims to show how Semantic Web technologies can be used to connect TV content and the Web through Linked Open Data, as part of the trend of TV and Web convergence. They are focussing on BMF 2.0 (Broadcast Metadata Exchange Format), the rather outdated TV‐Anytime, as an internationally agreed and accepted metadata schema in the TV consumer domain and another barely used but interesting egtaMETA from the commercial side for adverts.

NoTube is a European research project exploring the future of TV in the ubiquitous internet that includes the BBC and IRT as well as a slew of university researchers from across Europe. Essentially NoTube will allow disparate metadata interoperability within the NoTube platform creating metadata transformations that are required to translate metadata of external sources to TV‐ Anytime. In the course of the metadata enrichment process in NoTube, additional metadata is then added to the TV‐Anytime metadata sets, therefore pushing to that standard.