MemSQL is a real-time data warehouse and a perfect system for large scale operational analytics. MemSQL provides millisecond response times for analytical queries and is a part of the critical path for real-time applications.
We often hear from our customers that they want to do various types of artificial intelligence (AI) and machine learning (ML) model evaluations for IoT data, as well as imagery, in real time.
A good example of this is when you need to find similar images in a large corpus of image data. For instance when you point a camera at a person and are quickly able to determine if that person is in a database. This is what is referred to as real-time facial recognition.
From Images to Feature Vectors
Facial recognition is a subject of ongoing research to efficiently extract feature vectors from images using deep learning. Here is a reference to a modern approach: http://www.robots.ox.ac.uk/~vgg/software/vgg_face/.
For the purpose of this post, we will assume that this is a somewhat solved problem and we can efficiently extract feature vectors from any incoming image. Once those feature vectors are produced, all you need to do is insert them into a MemSQL table with the following simple schema.
CREATE TABLE features (
id bigint(11) NOT NULL AUTO_INCREMENT,
feature_vector binary(4096) DEFAULT NULL,
KEY id (id) USING CLUSTERED COLUMNSTORE
A typical way to insert the vectors is to use Apache Spark, which enables quick parallel data transfer into MemSQL.
There are two frequently used approaches to measuring the similarity between vectors: cosine similarity (cosine of the angle between the vectors) and Euclidean distance. Cosine similarity is defined as the dot product of the vectors, divided by the product of the vector norms (length of the vectors). If the vectors are normalized, the cosine similarity is simply the dot product of the vectors (since the product of the norms is 1).
(Yes, MemSQL is the database that does dot product and cosine similarity – the term one of our customers used in the Google search that ended with their using MemSQL for a large deployment.)
To search using cosine similarity we can simply run this query to find similar images.
DOT_PRODUCT(feature_vector, <Input>) > 0.9
Input is a feature vector extracted from an incoming image, and 0.9 is a constant that was experimentally tuned, which corresponds to an angle of less than 26 degrees between the feature vector and the input.
Euclidean distance is also frequently used to measure similarity. It is defined as the norm of the vector resulting from the subtraction of two input vectors. The EUCLIDEAN_DISTANCE built-in can also be used to efficiently measure the similarity between vectors.
This query performs a full table scan, which seems like it might be slow, but we will share our approach to perform this computation at memory bandwidth speed.
Here is our set of assumptions:
- Memory Speed: 50GB/sec
- Each image feature vector contains 1024 features, resulting in 4KB/vector
So, if we are limited by memory bandwidth, that means we can search 12.5 million images a second per node or 1 billion images a second on a 100 node cluster. Let’s verify that’s actually true. I developed a simple test by creating a MemSQL columnstore table with the schema above and populated it with 12.5 million random 4KB normalized feature vectors. The machine I used has a 6-core Xeon E5 processor. When I ran the search query, I got a 0.25 second response time.
How can MemSQL run this faster than memory bandwidth? The answer is compression of columnstore tables. Because the random vectors were normalized, they were able to be compressed from 50GB down to a size that can be read from memory in less than 0.25 seconds.
This shows that the DOT_PRODUCT computation can be done faster than 50GB/sec, and if no compression is applied, memory bandwidth is the limiting factor.
Because you can perform image recognition at in-memory speed, your bottleneck for similarity computation is not necessarily compute. We realize that there are other algorithms that gain efficiency by avoiding the full table scan and only lose a small amount of accuracy. However, you can achieve good practical results with a very straightforward implementation.
Currently, we are adding more primitives to enable more machine learning use cases. We are also exploring GPUs, which have much higher memory bandwidth (up to 1TB/sec) to enable real-time scoring for more complex AI/ML problems.
Try It For Yourself