Back to OmniBlog

Behind the scenes: Building the next generation of advanced video search

No items found.

Published:

August 20, 2024

Topic:

Deep Dive

Intro

Omnisearch has now been around for a year and a half and the product has changed and matured pretty rapidly since those early days. Today we’re launching the most significant Omnisearch upgrade in a long time, and are marching towards our endgame of being able to search anything under the sun.

In short, Omnisearch can now search visual data like images and videos. But wait, you say, weren’t we able to do this since at least last year? Yes and no. We’ve been able to process video materials since the very beginning of Omnisearch, but we were only able to process and search the audio parts.

For instance, in the video below, we could only find the spoken mentions of “book” rather than the object itself. The situation for images and documents was analogous, as we could detect and search the text embedded inside them, but had no way of finding the objects themselves.

That changes today!

The old problem - video content search 

Before we dive into the product evolution itself and enumerate all the fantastic new features developed for this release of Omnisearch, it’s worth talking about the spark that got us to start Omnisearch in the first place. 

During my time at Amazon, I was running into a pretty basic problem. As both my teams (S3 and Alexa) were fairly technically complex, there were a lot of training materials engineers needed to go through, particularly training videos. The problem was that there was that even though you could search videos by title or description, there was no way to find anything inside the video contents themselves. So if you wanted the exact piece of a video where “Paxos” or “gradient boosting” was explained, well, good luck!

My co-founder Matej and I grasped the problem and decided to develop a solution any company could use to power its search functionality. It turns out the timing was great. Not only has there been a secular decline in the prevalence of textual data on the web, but the COVID-19 pandemic has taken online education (our initial vertical) to a completely different level.

Product evolution

In the early days 

Omnisearch started out innocuously, with a prototype of an audio and video search API. The whole thing was fairly bare-bones: AWS services stitched together and a simple API in Python. The demo, which you can have a look at, featured uploading files from S3, trying out the API in Postman, and then integrating it into a demo site. Not half bad for a first try, but as you can see, we could still only search the audio parts of the video.

Our first big release was actually a showcase project for the API: a simple podcast search engine, which did very well on Product Hunt and even got us our first press article!

The product evolved in a couple of different ways from there. First off, we developed a real dashboard that customers could use to upload documents, perform searches, and access the API documentation. Additionally, we started supporting a huge variety of video hosting providers - unlike the previous version, you could now simply paste in a YouTube URL and the video would get processed out of the box.

And finally, based on feedback from prospective customers (especially in EdTech), we expanded our supported document types to Word and PDF documents, as well as presentations and plain text. Here’s a demo from that time:

Thinkific

We now got to the most important part of our journey from a business standpoint. We were at one point talking to an investor about Omnisearch and mentioned that EdTech is our initial vertical. He was kind enough to make an introduction to the folks at Thinkific, a public company in the EdTech space, that you can think of as a Shopify for online courses.

It turns out that Thinkific was just starting with the rollout of its developer program. Meaning they made their APIs public so that developers could write applications against them. The apps would then be sold through the app store and course creators worldwide could install them to upgrade their sites.

We were among the first to launch an app in the app store, and it helped us get our first paying customers. Essentially, the app just integrated with customers’ online schools, indexed all of their course content, and made said content searchable for their students. As usual, here’s a demo:

If you pay attention, you’ll see another great feature: faceted search! We made it possible for customers to define their own filters and expose them to their students.

v2

There were now two things we needed to address. The first one we’ll reveal in more detail in a couple of weeks (stay tuned), but it includes massive, massive performance improvements, among other things. The second part that’s always lacking is that we had no way to do anything useful with visual data. Yes, we performed OCR on documents, but that didn’t perform any sort of object detection, and couldn’t understand how to relate objects to text. 

Why was this so important? First of all, as we position ourselves in other verticals like media and e-commerce, this takes a completely new urgency, not to mention EdTech use cases like searching slides that are embedded in video lessons. Or even something that’s written on a whiteboard.

And finally, we knew we could never legitimately claim to be the most versatile search solution in the market without that.

Features and capabilities

The new version of Omnisearch uses pre-trained deep learning models and vector search methods that allow us to search video and image content using textual queries. To reiterate the example from above, while the previous version could find only a spoken mention of the word “banana”, the current one can find images or video frames containing an actual banana object, like so:

While it’s imperfect in handling more abstract concepts, it handles regular objects, logos, people and even fictional characters extremely well. In videos, in particular, it can find the most significant moments where the objects are present and show the navigation buttons, analogously to how the standard Omnisearch worked for audio. In fact, it seamlessly interleaves both audio and visual occurrences, providing a game-changing multimedia search experience. 

Here are a couple of other query examples so you get a better feel for the experience. The names of the documents are deliberately gibberish so as to make sure that the algorithm searches the contents rather than titles or descriptions. Here's searching for a fictional character:

The following query returns results for "netflix". Notice that it detects both the logo inside an image, as well as the occurrence of the stylized text inside a video.

As always, you can try out the new features through our online dashboard. And you can find the API documentation right there, so you can start integrating the search functionality into your site or application right away. And by all means, if you’d like to learn more and talk about how Omnisearch can benefit you and your company, make sure to book a demo

We thus conclude another phase of our product journey. While there’s still plenty of work to do to bring visual search to the level of robustness of our tried and tested search algorithms, we are now one massive step closer to our endgame of being the “search for everything”.

Subscribe to our newsletter

Sign up to our newsletter and receive the latest updates!