MemSQL and Pinterest Showcase Operationalizing Spark at Strata + Hadoop World 2015
Companies demonstrate MemSQL Spark Connector to derive real-time insights on unique and interesting Pinterest engagement data
SAN FRANCISCO, February 18, 2015 - MemSQL, the leader in real-time databases for transactions and analytics, today announced a showcase with Pinterest, the visual bookmarking tool, of operationalizing Spark to be demonstrated at Strata + Hadoop World 2015 on February 19th in San Jose, California. MemSQL introduced and open sourced the MemSQL Spark Connector so forward thinking and engineering-driven companies such as Pinterest could begin to operationalize results from Spark deployments.
Capturing and processing event data from high volume websites presents unique challenges – in particular web companies need to ensure they can reliably ingest all of the information as well as store it in a format easily queried by analysts. Spark has recently emerged as a leading processing framework for big data deployments and in particular includes Spark Streaming, which is ideal for real-time data structuring. Spark is then frequently coupled with a datastore for long-term persistence. In the case of this demonstration, that datastore is MemSQL.
Pinterest showcases real-time user engagement across the globe with MemSQL and Spark - Click to Tweet
“At Pinterest, we’re indexing tens of billions of objects to help people discover and save creative ideas,” said Krishna Gade, Engineering Manager at Pinterest. He continued, “By combining MemSQL, Spark, and our real-time data pipeline, we can instantly deliver unique and relevant content that matches users’ interests, as we build a discovery engine.”
Using MemSQL and Spark, Pinterest was able to build a data pipeline that provides immediate insight into how users are engaging with Pins across the globe in real-time. This helps Pinterest become a better recommendation engine for showing related Pins as people use the service to plan products to buy, places to go, and recipes to cook, and more. In particular, this instant insight helps Pinterest, and businesses that use Pinterest, understand the most compelling use cases and provide value to their Pinner community and partners. Given the fast pace of trends in industries like retail and media and entertainment, and the overlap across other lifestyle areas, Pinterest provides a level of user understanding and engagement that individual brands are not able to develop themselves.
From a technical perspective, Pinterest ingests site activity data through Apache Kafka, a high-throughput distributed messaging system, and feeds that data into Apache Spark via Spark Streaming. Within Spark, Pinterest can filter Pin creations and Repins while also enriching the data with geolocation information. All of this data is persisted to MemSQL, a real-time database for transactions and analytics that provides immediate query capabilities to data analysts at Pinterest. Pinterest engineers can use familiar SQL tools to query MemSQL, including joining Repins to originating Pins and understanding Repin trends across geographies.
“Pinterest has shown cutting edge leadership in developing real-time data pipelines with MemSQL and Spark,” said Ankur Goyal, Director of Engineering at MemSQL. He added, “We are thrilled to be working with them to showcase the forefront of technology across operational databases like MemSQL and data processing frameworks such as Apache Spark.”
This real-time workflow combining MemSQL and Spark allows Pinterest to get an immediate view of Pinterest engagement activity around the world. With a simplified data pipeline, Pinterest maintains high throughput, easy operations, and minimal latency. Now Pinterest can share geographically relevant engagement with users and brands to drive understanding of unique, useful, and interesting content.
See It In Person
MemSQL will showcase the MemSQL Spark Pinterest demonstration at Strata + Hadoop World 2015 on February 19 at the San Jose Convention Center. Attend one of these talks or visit the MemSQL booth for more details.
MemSQL Keynote Presentation
Close Encounters with the Third Kind of Database
9:10am-9:15am on Thursday, February 19, Grand Ballroom 220
MemSQL Tutorial Session
Bringing OLAP Fully Online: Analyze Changing Datasets in MemSQL and Spark with Pinterest Demo
Includes appearances from Robert Stepeck, CTO, Novus and Yu Yang, Software Engineering, Pinterest
10:40am-11:20am on Thursday, February 19, Room LL20D
Strata + Hadoop World Show
Visit the MemSQL Booth 1015 during show expo hours.
- Wednesday, February 18, 5:00pm - 6:30pm
- Thursday, February 19, 10:00am - 4:30pm and 5:30pm - 7:00pm
- Friday, February 20, 10:00am- 4:00pm
MemSQL is the leader in real-time databases for transactions and analytics.
As a purpose built database for instant access to real-time and historical data, MemSQL uses a familiar SQL interface and a horizontally scalable distributed architecture that runs on commodity hardware or in the cloud.
Innovative enterprises use MemSQL to better predict and react to opportunities by extracting previously untapped value in their data to drive new revenue.
MemSQL is proven in production environments across hundreds of nodes in high velocity Big Data environments.
Based in San Francisco, MemSQL is a Y Combinator company funded by prominent investors including Accel Partners, Khosla Ventures, First Round Capital and Data Collective. Follow us @MemSQL or visit at www.memsql.com.
Director of Communications