Analytics Acceleration for Data Lakes

From Legacy to Modern Architectures

Hadoop and cloud object store data lakes are optimized for storing large data volumes but struggle with real-time analytics at-scale. MemSQL enhances and accelerates analytic performance for Hadoop, AWS S3, and more.

Legacy Architecture

STEP 1
database
Multiple Data Sources
Applications store events in various databases

Application Source

OLTP, NoSQL Datastore

Oracle, SQL Server, Cassandra

STEP 2
gear
Batch Data Integration
Batch oriented data collection tools slow down the syncronization process

Transform

Data Integration

Flume, Scoop, Spark, Kafka

STEP 3
datalake
Project Complexity
Several projects and experts required to enable high performance SQL access

Store

Data Lake

Hadoop, NoSQL, AWS S3

STEP 4
piechart
Slow Dashboards
Poor interactive performance for dashboard and visualization tools

Visualize Batch Data

Dashboard

Tableau, Looker, Microstrategy

arrows

Accelerating analytics on existing data lake infrastructure requires a database with scalable rapid data ingestion and fast queries of large data sets leveraging the simplicity of SQL.

Modern Architecture Augmented with MemSQL

STEP 1
database
Multiple Data Sources
Application stores events into database

Application Source

OLTP, NoSQL Datastore

Oracle, SQL Server, Cassandra

STEP 2
gear
Real-Time ETL
Real-time data syncronization with exactly-once semantics for accurate results
memsql
Fast Scalable SQL
Distributed relational database leverages scalable SQL for easy analytic access

Transform + Analyze

MemSQL

Directly connect Kafka, Spark or a
change data capture tool to MemSQL

STEP 3
piechart
Interactive Dashboards
Compatibility with existing BI tools delivers interactive dashboards at scale

Visualize Real-Time Data

Dashboard

Tableau, Looker, Microstrategy

Data Integration with Sparkarrows

Works with Legacy Architecture for Archiving and Data Science processing

Data Lake

arrows

The modern database solution from MemSQL provides real-time analytic performance across several data sources with scalable SQL for an integrated cost effective platform.

CUSTOMER SNAPSHOT

Consumer Packaged Goods

A global consumer packaged goods company struggled to provide an accurate real-time view of their logistics, point of sale, and sentiment analysis applications. Use of Hadoop prevented rapid analysis and up-to-date visibility for their operations. MemSQL enabled real-time analytics across multiple applications leveraging rapid data synchronization and scalable SQL.

Data Analysis Before MemSQL

STEP 1
Multiple Data Sources
Multiple Data Sources
Several data sources provide analytics for supply chain analytics

Multiple Data Sources

Pulling data from Factory, Warehouse, Shipping, Point of Sale and Distribution Data

STEP 2
gear
Slow Running ETL
ETL jobs missing nightly load windows resulting in incomplete data syncronization

Transform

Several SAP Data Services jobs required to transform disparate data formats

STEP 3
datalake
Performance Complexity
Query processing too slow and developer intensive resulting in limited use

Store

Data stored in HDFS leveraging Apache Hive for analysis

STEP 4
piechart
Incomplete Insights
Incomplete data view and slow dashboard performance did not meet business requirements

Logistics and Distribution dashboard

Visualized data with Tableau mobile, SAP Business Objects, and Python

arrows

The batch data movement architecture and slow query performance of Hadoop resulted in incomplete data views and a frustrating user experience for analysts and data scientists.

Data Analysis After MemSQL

STEP 1
Multiple Data Sources
Multiple Data Sources
Several data sources provide analytics for supply chain analytics

Multiple Data Sources

Pulling data from Factory, Warehouse, Shipping, Point of Sale and Distribution Data

STEP 2
gear
Real-Time Syncronization
Real-time data syncronization provides up-to-date data views
memsql
Fast Analytics
Fast scalable SQL delivers rapid updates and analytic query performance

Transform + Analyze

All data sources syncronized in real time using Apache Spark, AWS S3, and SAP Data Services with standard SQL for analysis

STEP 3
Logistics and Distribution Dashboard
Complete Visibility
Broad data access ensures dashboards, analysts, and data scientists leverage an accurate unified source

Logistics and Distribution Dashboard

Interactive visualation and analysis with Tableau mobile, SAP Business Objects, and Python

arrows

Implementing MemSQL with real-time data syncronization and fast query processing on standard SQL resulted in an accurate and responsive data lake environment.

Ready to get started?

See how MemSQL can modernize your data analytics