Twelve Labs Raises $12 million To Take AI Video Search To The Next Level
Video search startup Twelve Labs has just announced a new seed funding round of $12 million. This adds to an original $5 million and takes the company to $17 million of funds to tackle the knotty problem of video search.
Traditional video search relies mostly on the use of keywords, tags, and titles. The problem is simple. How to convert moving images into something which a computer search engine can recognize fast enough for a search function. Twelve Labs think they have it licked. Of course, the key lies in advanced artificial intelligence.
The company has developed its own ‘video understanding’ algorithm called ViSeRet, which was developed out of work done by the founding team at Cornell University. Instead of using pre-labeled tags to do the Ctrl-F work of searching video contents, the team developed an AI based process which involves chopping up the video content into very small chunks.
Twelve Labs Uses A Cloud Platform
This method allows for rapid retrieval using machine learning and delivers very accurate results in a fraction of the time of traditional methods. By using a cloud processing platform, the problems of performance are also mitigated, and allow for some very natural language searches to be done. Users can look for movement and actions, objects and people, sounds, text and speech, and the algorithm will return what they need. It’s clever stuff.
Video is now a deeply engrained part of modern culture, with some reports saying that up to 80% of the world’s data is stored in video format. From Zoom calls to YouTube documentaries and TikTok dances, video is everywhere. So it makes sense to be able to search and retrieve it in more efficient ways, especially as the global archive grows to monumental levels.
In 2020, networking company Cisco estimated that it would take more than five million years to watch the total video crossing global IP networks each month. The challenge for Twelve Labs was to try to make video search as fast and efficient as text search.
Without having to manually tag components or label objects and other cumbersome tasks. Crucially, the team focused on delivering identification alongside context, which is key to accurate results.
The ViSeRet engine first converts the contents and context of a video into vector data, numbers which represent the content. When the user types in a search, it is converted into a vector and the system finds the closest vector match from inside the video in question. The clever trick is to package all of that clever tech into an easy-to-use interface, which supports natural language queries, and can be delivered via an API as well.
The Company Won First Prize
The system is actually so good that last year it beat a bunch of major players in the 2021 ICCV Value Challenge for video retrieval. The challenge was hosted by Microsoft, and the Twelve Labs team and tech grabbed first place. Of course, the value of developing something of this nature goes well beyond winning challenges, or even conducting simple search.
Next generation applications will demand much more sophisticated solutions than are available today. Doing video to video search, creating video summaries and building clever content recommendation engines are a few that spring to mind. Imagine a search engine which could take a quote from a random movie, and produce a list of similar works, which would fit your mood or needs.
One important factor about the new technology is it is being developed outside of the big tech guns. Google and Microsoft both have lofty ambitions for the video search space, but they’re keen to keep the tech to themselves, at least for now.
Any small upstart which can challenge that control is to be applauded. Although there’s probably a pretty good chance that the startup will be bought out once it reaches critical mass down the line. Figma and Adobe, anyone?
In the meantime, the small 20 person team at Twelve Labs will be using the additional funds to continue working on the product. Part of the newer developments is the ability for companies to train their own data for their own video content, which should make for some very interesting tools later on down the line.
The company already has customers who are paying for the service, and it also has a contract with Oracle to train models using Oracle’s cloud platform. All the signs are that the next few years could be exciting for Twelve Labs and their video search SaaS.
They’re up against some huge entities with the deepest pockets in the universe, but scrappy startups have always had a fighting chance in that kind of scenario. One way or another.