MemSQL Enhances Real-Time Data Pipelines for Spark and Python in New Release
Performance boost for Spark SQL, new support for Python, and NUMA-aware Deployments now part of MemSQL Ops
San Francisco, CA - December 16, 2015 - MemSQL, the leader in real-time databases for transactions and analytics, today announced significant advances for creating real-time data pipelines for Apache Spark, as well as support for the Python language and Non-Uniform Memory Access (NUMA) architectures in the latest version of MemSQL Ops. MemSQL can now run Spark SQL queries inside of the MemSQL database, provide in-browser Python programming, and automatically optimize NUMA deployments. These features drive rapid results and faster analytics for data scientists.
MemSQL Enhances Real-Time Data Pipelines for Spark and Python in New Release - Click to Tweet
“The newest release of MemSQL Ops reinforces our commitment to the Spark community to deliver even faster access to real-time data and analytics,” said Eric Frenkiel, co-founder and CEO, MemSQL. “Our mission is to deliver technology that integrates advances across the open source ecosystem and that appeals to the programming community at large.”
As a transient processing framework, Spark is well suited for data analysis and model development, but it is not purpose built for high performance SQL. To that end, MemSQL now allows Spark SQL queries to run inside of the MemSQL database, which can improve performance by up to 50x on many workloads. By combining MemSQL with Spark, data scientists can tap a permanent, transactional datastore to feed the latest business data into their models for real-time analytics.
Moreover, the combination of Spark and MemSQL further unifies in-memory processing with in-memory storage for lightning fast results. Users have access to a familiar SQL interface, which provides the performance and persistence to run real-time data pipelines successfully. Spark data transformation capabilities can be fully utilized when paired with distributed, in-memory stores like MemSQL, compared to traditional disk-based stores like HDFS.
The latest release of MemSQL Ops also features in-browser Python programming, which opens up Python’s vast library of analysis packages such as Numpy, Scipy and Pandas to users running MemSQL. These libraries, as well as the prototyping speed of Python, have made Python incredibly popular among data scientists, application developers and database administrators alike.
For users running MemSQL in a NUMA environment, MemSQL Ops now offers point-and-click installation. MemSQL Ops can intelligently map MemSQL instances to CPUs that share local memory. The increased efficiency on large server deployments can accelerate queries by up to 40%. From ultra-fast query execution to efficient storage of business data, MemSQL enables users to operate with maximum efficiency in fast-paced production environments.
Read more on the MemSQL blog: www.memsql.com/blog/spark-sql-performance-boost
MemSQL is the leader in real-time databases for transactions and analytics. As a purpose built database for instant access to real-time and historical data, MemSQL uses a familiar SQL interface and a horizontally scalable distributed architecture that runs on commodity hardware or in the cloud. Innovative enterprises use MemSQL to better predict and react to opportunities by extracting previously untapped value in their data to drive new revenue. MemSQL is deployed across hundreds of nodes in high velocity big data environments. Based in San Francisco, MemSQL is a Y Combinator company funded by prominent investors including Accel Partners, Khosla Ventures, First Round Capital and Data Collective. Follow us @MemSQL or visit at www.memsql.com.