Success in the mobile advertising industry is achieved by delivering contextual ads in the moment. The faster and more personalized a display ad, the better. Any delay in ad delivery means lost bids, revenue, and ultimately, customers.
Manage, a technology company specializing in programmatic mobile marketing and advertising, helps drive mobile application adoption for companies like Uber, Wish, and Amazon. In a single day, Manage generates more than a terabyte of data and processes more than 30 billion bid requests. Manage analyzes this data to know which impressions to buy on behalf of advertisers and uses machine learning models to predict the probability of clicks, app installs, and purchases.
Managing Data at Scale
At the start, Manage used MySQL to power their underlying statistics pipeline, but quickly ran into scaling issues as data volume grew. Manage then turned to Hadoop coupled with Apache Hive and Kafka for data management, analysis, and real-time data feeds. However, even with this optimized data architecture, Manage found that Hive was slow and caused hours of delay in data pipelines.
To meet customer expectations, Manage needed a solution that could deliver fresh data for reporting, while concurrently allowing their analytics team to run ad hoc queries. Kai Sung, Manage CTO and co-founder began the search for a faster database platform, and found MemSQL. The Manage team quickly started prototyping on MemSQL, and was in production within a few months.
Streaming Log Data from Apache Kafka
Manage uses MemSQL Streamliner, an Apache Spark solution, to first stream log data from Apache Kafka, then store it in the MemSQL columnstore for further processing. As new data arrives, the pipeline de-duplicates data and aggregates it into various summary tables within MemSQL. This data is then made available to an external reporting dashboard and reporting API. With this architecture, manage has a highly scalable, real-time data pipeline that ingests data and summarizes data as fast as it is produced.
10x Faster Data
After implementing MemSQL, Manage was able to reduce the delay in data freshness from two hours down to 10 to 15 minutes. With MemSQL, the Manage team now has the ability to run analytics much faster and can react to marketplace changes in the moment.
In an EnterpriseTech article, Kai Sung said, “We’ve built a highly scalable, real-time data pipeline that ingests and summarizes data as fast as we produce it. Our analytics team is able to run ad-hoc queries on log-level data within seconds.”
For more details, check out the EnterpriseTech article: Managing 30B Bid Requests, 1.5B Users per Day in (near) Real Time
If you are interested in MemSQL, you can download at www.memsql.com/download