Analytics Acceleration for

Data Lakes

From Legacy to Modern Architectures

Hadoop and cloud object store data lakes are optimized for storing large data volumes but struggle with real-time analytics at-scale. MemSQL enhances and accelerates analytic performance for Hadoop, AWS S3, and more.

Legacy Architecture

STEP 1
database
Multiple Data Sources
Applications store events in various databases
Application Source

OLTP, NoSQL Datastore

Oracle, SQL Server, Cassandra

STEP 2
gear
Batch Data Integration
Batch oriented data collection tools slow down the syncronization process
Transform

Data Integration

Flume, Scoop, Spark, Kafka

STEP 3
datalake
Project Complexity
Several projects and experts required to enable high performance SQL access
Store

Data Lake

Hadoop, NoSQL, AWS S3

STEP 4
piechart
Slow Dashboards
Poor interactive performance for dashboard and visualization tools
Visualize Batch Data

Dashboard

Tableau, Looker, Microstrategy

arrows
Accelerating analytics on existing data lake infrastructure requires a database with scalable rapid data ingestion and fast queries of large data sets leveraging the simplicity of SQL.

Modern Architecture Augmented with MemSQL

STEP 1
database
Multiple Data Sources
Application stores events into database
Application Source

OLTP, NoSQL Datastore

Oracle, SQL Server, Cassandra

STEP 2
gear
Real-Time ETL
Real-time data syncronization with exactly-once semantics for accurate results
memsql
Fast Scalable SQL
Distributed relational database leverages scalable SQL for easy analytic access
Transform + Analyze

MemSQL

Directly connect Kafka, Spark or a
change data capture tool to MemSQL

STEP 3
piechart
Interactive Dashboards
Compatibility with existing BI tools delivers interactive dashboards at scale
Visualize Real-Time Data

Dashboard

Tableau, Looker, Microstrategy

Data Integration with Sparkarrows
Works with Legacy Architecture for Archiving and Data Science processing

Data Lake

arrows
The modern database solution from MemSQL provides real-time analytic performance across several data sources with scalable SQL for an integrated cost effective platform.

Customer Snapshot

Consumer Packaged Goods

A global consumer packaged goods company struggled to provide an accurate real-time view of their logistics, point of sale, and sentiment analysis applications. Use of Hadoop prevented rapid analysis and up-to-date visibility for their operations. MemSQL enabled real-time analytics across multiple applications leveraging rapid data synchronization and scalable SQL.

Data Analysis Before MemSQL

STEP 1
Multiple Data Sources
Multiple Data Sources
Several data sources provide analytics for supply chain analytics
Multiple Data Sources

Pulling data from Factory, Warehouse, Shipping, Point of Sale and Distribution Data

STEP 2
gear
Slow Running ETL
ETL jobs missing nightly load windows resulting in incomplete data syncronization
Transform

Several SAP Data Services jobs required to transform disparate data formats

STEP 3
datalake
Performance Complexity
Query processing too slow and developer intensive resulting in limited use
Store

Data stored in HDFS leveraging Apache Hive for analysis

STEP 4
piechart
Incomplete Insights
Incomplete data view and slow dashboard performance did not meet business requirements
Logistics and Distribution dashboard

Visualized data with Tableau mobile, SAP Business Objects, and Python

arrows
The batch data movement architecture and slow query performance of Hadoop resulted in incomplete data views and a frustrating user experience for analysts and data scientists.

Data Analysis After MemSQL

STEP 1
Multiple Data Sources
Multiple Data Sources
Several data sources provide analytics for supply chain analytics
Multiple Data Sources

Pulling data from Factory, Warehouse, Shipping, Point of Sale and Distribution Data

STEP 2
gear
Real-Time Syncronization
Real-time data syncronization provides up-to-date data views
memsql
Fast Analytics
Fast scalable SQL delivers rapid updates and analytic query performance
Transform + Analyze

All data sources syncronized in real time using Apache Spark, AWS S3, and SAP Data Services with standard SQL for analysis

STEP 3
Logistics and Distribution Dashboard
Complete Visibility
Broad data access ensures dashboards, analysts, and data scientists leverage an accurate unified source
Logistics and Distribution Dashboard

Interactive visualation and analysis with Tableau mobile, SAP Business Objects, and Python

arrows
Implementing MemSQL with real-time data syncronization and fast query processing on standard SQL resulted in an accurate and responsive data lake environme

Ready to get started?

See how MemSQL can modernize your data analytics