What is the Video Intelligence API?

Analyze your videos in a few lines of code

At Google’s Cloud conference in San Francisco last month we announced the Video Intelligence API — an API that makes it easy to analyze your video content with a single API request. To show the potential of the Video API, I built a demo. The code for the demo is now available on GitHub! The demo shows how you can use the Video Intelligence API to:

See what’s happening in each scene of your video:

Search a large library of videos using metadata from the API:

The API provides access to a pre-trained model that tells you what your video is about at a high-level, along with granular data on what’s happening in each scene. Let’s say you’ve got a scene with a dog at 0:07, like this:

The Video API tells you there’s a dog in that scene, along with all the other scenes containing a dog. Here’s what the JSON response looks like for one label:

{
"description": "Dog",
"language_code": "en-us",
"locations": [ {
"segment": {
"start_time_offset": 7090474,
"end_time_offset": 8758738
},
"confidence": 0.99793893,
"level": "SHOT_LEVEL"
}

This JSON provides the segments in microseconds where this label appears in the video. For the same segment in the video, the API also tell us what breed of dog it is — it returns the label “Dashchund” with 81% confidence.

You might be thinking, couldn’t I just take frames from my video at 1fps and send it to the Vision API? You could, but don’t. That’s where the secret sauce of the Video API comes into play. In addition to the shot level annotations above, it also runs your video through a model that analyzes how different frames in the video relate to each other. If the video has scenes of costumes and candy, video level annotations can tell you it’s a video about Halloween. Finally, the API’s shot change detection feature breaks your video into scenes. This is returned as a JSON object of segments, indicating each time the camera shot changed in a video.

To learn more about the Video API check out this great talk from Google Next by Ram Ramanathan (PM on the Video API), Juhyun Lee (engineer on the Video API), and Lynne Hurwitz (engineer at Wix) for a deep dive.

On the backend, the videos for the app are stored in a Google Cloud Storage bucket. I’ve written a Cloud Function that is triggered whenever a new file is added to that bucket. The function checks to make sure the file is a video and then sends that file to the Video API for annotating. Conveniently, the Video API lets you optionally pass an outputUri parameter as part of your request — this is a Cloud Storage URL where the API will write the video annotations when it finishes processing. I take advantage of this to store the annotation JSON in a separate Cloud Storage bucket.

The frontend is a Node.js app running on App Engine using ES6, Sass, Gulp, CanvasJS, and few more client-side JS tools. I worked with Alex Wolfe on the UI — thank you for your help Alex!

After building the demo I got the opportunity to present it in front of 10,000 people (did I mention this is a cool job?!). Here’s a video from the Next keynote:

You can try out the Video API directly in the browser without writing any code. If you’ve got videos you’d like to analyze, sign up for the Video Intelligence API alpha and let me know what you think. Leave a note in the comments below or find me @SRobTweets on Twitter.