Analytics Acceleration for Data Lakes

From Legacy to Modern Architectures

Hadoop and cloud object store data lakes are optimized for storing large data volumes but struggle with real-time analytics at-scale. MemSQL enhances and accelerates analytic performance for Hadoop, AWS S3, and more.

Legacy Architecture

STEP 1
database
Multiple Data Sources
Applications store events in various databases

Application Source

OLTP, NoSQL Datastore

Oracle, SQL Server, Cassandra

STEP 2
gear
Batch Data Integration
Batch oriented data collection tools slow down the syncronization process

Transform

Data Integration

Flume, Scoop, Spark, Kafka

STEP 3
datalake
Project Complexity
Several projects and experts required to enable high performance SQL access

Store

Data Lake

Hadoop, NoSQL, AWS S3

STEP 4
piechart
Slow Dashboards
Poor interactive performance for dashboard and visualization tools

Visualize Batch Data

Dashboard

Tableau, Looker, Microstrategy

arrows

Accelerating analytics on existing data lake infrastructure requires a database with scalable rapid data ingestion and fast queries of large data sets leveraging the simplicity of SQL.

Modern Architecture Augmented with MemSQL

STEP 1
database
Multiple Data Sources
Application stores events into database

Application Source

OLTP, NoSQL Datastore

Oracle, SQL Server, Cassandra

STEP 2
gear
Real-Time ETL
Real-time data syncronization with exactly-once semantics for accurate results
memsql
Fast Scalable SQL
Distributed relational database leverages scalable SQL for easy analytic access

Transform + Analyze

MemSQL

Directly connect Kafka, Spark or a
change data capture tool to MemSQL

STEP 3
piechart
Interactive Dashboards
Compatibility with existing BI tools delivers interactive dashboards at scale

Visualize Real-Time Data

Dashboard

Tableau, Looker, Microstrategy

Data Integration with Sparkarrows

Works with Legacy Architecture for Archiving and Data Science processing

Data Lake

arrows

The modern database solution from MemSQL provides real-time analytic performance across several data sources with scalable SQL for an integrated cost effective platform.

CUSTOMER SNAPSHOT

Consumer Packaged Goods

A global consumer packaged goods company struggled to provide an accurate real-time view of their logistics, point of sale, and sentiment analysis applications. Use of Hadoop prevented rapid analysis and up-to-date visibility for their operations. MemSQL enabled real-time analytics across multiple applications leveraging rapid data synchronization and scalable SQL.

Data Analysis Before MemSQL

STEP 1
Multiple Data Sources
Multiple Data Sources
Several data sources provide analytics for supply chain analytics

Multiple Data Sources

Pulling data from Factory, Warehouse, Shipping, Point of Sale and Distribution Data

STEP 2
gear
Slow Running ETL
ETL jobs missing nightly load windows resulting in incomplete data syncronization

Transform

Several SAP Data Services jobs required to transform disparate data formats

STEP 3
datalake
Performance Complexity
Query processing too slow and developer intensive resulting in limited use

Store

Data stored in HDFS leveraging Apache Hive for analysis

STEP 4
piechart
Incomplete Insights
Incomplete data view and slow dashboard performance did not meet business requirements

Logistics and Distribution dashboard

Visualized data with Tableau mobile, SAP Business Objects, and Python

arrows

The batch data movement architecture and slow query performance of Hadoop resulted in incomplete data views and a frustrating user experience for analysts and data scientists.

Data Analysis After MemSQL

STEP 1
Multiple Data Sources
Multiple Data Sources
Several data sources provide analytics for supply chain analytics

Multiple Data Sources

Pulling data from Factory, Warehouse, Shipping, Point of Sale and Distribution Data

STEP 2
gear
Real-Time Syncronization
Real-time data syncronization provides up-to-date data views
memsql
Fast Analytics
Fast scalable SQL delivers rapid updates and analytic query performance

Transform + Analyze

All data sources syncronized in real time using Apache Spark, AWS S3, and SAP Data Services with standard SQL for analysis

STEP 3
Logistics and Distribution Dashboard
Complete Visibility
Broad data access ensures dashboards, analysts, and data scientists leverage an accurate unified source

Logistics and Distribution Dashboard

Interactive visualation and analysis with Tableau mobile, SAP Business Objects, and Python

arrows

Implementing MemSQL with real-time data syncronization and fast query processing on standard SQL resulted in an accurate and responsive data lake environment.

Ready to get started?

See how MemSQL can modernize your data analytics

Data Lake Resources

New 451 Survey Sheds Light on Top Use Cases for AI and Machine Learning in Key Industries
New 451 Survey Sheds Light on Top Use Cases for AI and Machine Learning in Key Industries
This 451 Research report leverages the results of its most recent survey to discuss the current and future use cases for AI and ML in four key verticals: financial services, retail, healthcare, and manufacturing.
Read Now
Operationalizing MemSQL - On Demand
Operationalizing MemSQL - On Demand
View this on-demand webcast and learn how you can operationalize MemSQL.
Watch Now
Accelerate Decision Making with Real-Time Analytics on AWS
Accelerate Decision Making with Real-Time Analytics on AWS
The number of sources generating continuous, streaming data has exploded in recent years. From website clickstream data to telemetry data from Internet of Things (IoT) devices, the variety, volume, and velocity of data continues to increase. In response, businesses are evolving their analytics approach from batch to real time, and turning to new tools to deliver actionable insights in seconds instead of hours or days.
Watch Now
Five Reasons to Switch from Oracle to MemSQL - On Demand
Five Reasons to Switch from Oracle to MemSQL - On Demand
Databases have grown dramatically as data increases in importance. Oracle and other legacy technologies have built empires serving this need. However, increases in users and data are driving new demands that are pushing the limits of traditional database architectures like Oracle resulting in difficult-to-use, maintain, and expensive systems.
Watch Now
How Manage Accelerated Data Freshness by 10x
How Manage Accelerated Data Freshness by 10x
To meet customer expectations, Manage needed a solution that could deliver fresh data for reporting, while concurrently allowing their analytics team to run ad hoc queries. Kai Sung, Manage CTO and co-founder began the search for a faster database platform, and found MemSQL
Read Now
A Trillion Rows per Second as a Baseline for Interactive Analytics - On Demand
A Trillion Rows per Second as a Baseline for Interactive Analytics - On Demand
Watch on-demand a MemSQL architect-led session on query and performance tuning.
Watch Now