Insite360 Uses MemSQL Pipelines to Deliver IoT in the Cloud – Case Study

Floyd Smith

Insite360 is a complete solution for fuel management and environmental services. It’s used in gas stations and by other suppliers, distributors, and buyers of petroleum products and other energy commodities. Insite360 is delivered as software as a service (SaaS) and is the market leader worldwide.

Insite360 customers see increased operational efficiency through the automation of relevant processes and continuous monitoring. They are able to cut costs, increase revenues, and reduce risk by using the reporting, advanced analytics, and decision support tools built into Insite360.

In terms of its IT architecture, Insite360 is in the midst of a three-step process toward the optimal architecture. In the first step, it moved operations to the cloud. In the second step, it accelerated analytics with MemSQL. In the third step, to be implemented soon, it will add ingress and transaction processing to its existing MemSQL-based analytics operation, thus “killing three birds with one stone” – MemSQL being that stone.

Jay Dave is the business intelligence team lead for the Insite360 transformation. He’s been given the opportunity to help his team innovate at a high level, and MemSQL is helping him do it.

Moving Fuel Fast

Insite360 is the market leader in fuel management and related business and operations services for gasoline and other fuels. In the US alone, nearly 400 million gallons of gasoline are sold every day – more than a billion dollars worth a day, at typical retail prices, totaling roughly $400 billion a year. Insite360 is used to support the ordering and delivery of the majority of this gasoline and related fuels, and for fueling and refueling, retail operations in gas station convenience stores, back-end business operations, delivery scheduling, and much more.

As Jay describes it, “Sensors gather data from fuel lines and dispensers. Data is collected by the automated tank gauge (ATG) and sent to Insite360 through the cloud.”

Sensor/IoT data is not the only part of the picture. The ever-fluctuating price and availability of oil and gasoline are among the most important numbers in the world. The retail operations attached to gas stations are a big part of food and sundry sales worldwide, and getting bigger. Millennials, in particular, are said to favor gas stations over grocery stores for food purchases.

Costs must be rigorously controlled, as gas stations are low-margin businesses. Yet all the cash that’s handled, and the volatile nature of the main product, means that security is an ongoing concern with robbery, burglary, organized crime, and terrorism all posing ongoing risks.

Insite360 has an ATG Alarm Management team. They remotely monitor ATG equipment and monitor compliance status to comply with local, state, and federal environmental regulations, all at low cost. With the help of this team, companies can more effectively manage regulatory and related business issues. The technicians can fix many alarms remotely and quickly, sending others out for dispatch to the field.

Insite360 Alarm Management handles more than a thousand different kinds of alarms related to fuel tank, dispenser, lines and other sources. Dozens of analysts in control rooms watch all the alarms and try to fix the problem remotely. If not, they dispatch a technician to find the problem and solve it.

So Insite360 needs all the data, from all the sources, all the time, processed instantly. And their infrastructure is rapidly evolving to allow them to deliver better and better offerings to the demanding trade they serve.

Step 1: Cloud Operations through PostgreSQL and Redshift

It makes sense for Insite360 to deliver its service through the cloud. Its customers, and the customers’ IoT “things”, such as the fuel pumps that drivers use every day, are distributed to the most far-flung corners of the globe. Only through the cloud can Insite360 get affordable access to the distributed worldwide computing infrastructure that it requires.

The first cloud architecture for InSite360 is a customer database offered by the company, which combined cloud and on-premises data sources and sent all the data into PostgreSQL, the open source, row-oriented database, hosted on AWS. Then, for analytics, the data was moved to the AWS Redshift data warehouse, which is column-oriented. Microsoft’s Power BI tool provided data visualization to Insite360’s end customers.

Insite360 replaced their original architecture using AWS services with MemSQL
The original architecture included AWS Elastic MapReduce batch processing for big data.

This setup offered many advantages. Database offerings for cloud service providers have solved some of the scalability problems that have afflicted traditional, single-node databases for decades, making SaaS offerings much more practical. The move from proof of concept to deployment in the cloud was smooth and simple, as reprovisioning was fast and easy. Creating a workable infrastructure for an offering as ambitious as Insite360 was quite an accomplishment.

The original architecture showed that Insite360 could work effectively in the cloud. However, there were five main problems with the original setup:

  1. Speed. The original architecture was slow – often too slow to meet customer expectations.
  2. Cost. The original infrastructure was also expensive to run. Running AWS Elastic MapReduce (EMR) against typical IoT big data, to prepare data for PostgreSQL, is complex operationally and can easily cost tens of thousands of dollars a month. AWS costs scale with data volume handled, even if end customers are not willing to ratchet up their payments to the same extent.
  3. Complexity. The original cloud architecture for Insite360 had lots of moving parts, each needing specialized skills from a wide range of architects, developers, and operations people, not to mention contract managers and accounting folk. Interfaces such as the connection between Oracle, running onsite, and Postgres in the cloud needed constant care and feeding from DevOps personnel.
  4. Rigidity. In the original infrastructure, Power BI connected directly to RedShift. Making this happen was a lot of work. RedShift is always column-oriented and disk-based, but some analytics operations benefit from running against row-oriented tables or from storing more data in memory, which are not possible with RedShift.
  5. Faux scalability. Cloud databases do not truly offer scalable SQL, they simply manage around the lack of scalability inherent to their SQL offerings somewhat smoothly. A truly large operation, such as Insite360’s fully global IoT architecture, can run into limitations and roadblocks that are hard, or impossible, to design around.
  6. Cloud vendor lock-in. Today, cloud means multi-cloud. Committing to AWS services means committing your data to, and optimizing your people’s skill sets for, a single cloud. As the leader in its market, Insite360 needs the flexibility that deep dependence on AWS services takes away.

Step 2: Leveraging Kafka Messaging and MemSQL-Based Analytics for Speed and Simplicity

Many users move to MemSQL for scalable compute capability in analytics, and Insite360 is no exception. Their current architecture continues to run in the cloud. However, it makes two several changes from the previous design, taking advantage of newer technology:

  • Use Kafka throughout. In the current architecture, Kafka is used as a messaging bus to convey data from both NoSQL and most SQL databases. Batch processing in Python puts the data in the new data warehouse.
  • Use AWS Database Migration Service for Oracle data. This service, introduced three years ago, replicates a source database without too much operational burden on the source database.
  • Move ETL and analytics processing to MemSQL. Instead of amalgamating data in PostgreSQL, running an ETL process to get it into RedShift, then running Power BI against Redshift, the new architecture amalgamates data in MemSQL, then runs Power BI against MemSQL directly.

The new architecture carries a heavy load. The MemSQL database runs hundreds of tables to host all the data coming from sensors, which arrive as Kafka events. Python processes Kafka messages and puts the clean and processed data into new tables.

The new architecture must meet a wide range of requirements, including the Payment Card Industry Data Security Standard (PCI DSS) and other data security requirements.

The new architecture takes advantage of the fast MemSQL database and MemSQL Pipelines.
The current architecture uses Kafka, Python, and MemSQL Pipelines for easier processing.

The new architecture addresses all of the problems found in the original architecture:

  1. Less cost. Big cost savings are achieved by eliminating the need to run AWS Elastic MapReduce (EMR) 24/7.The new architecture uses a MemSQL Kafka pipeline and Python for batch processing, with little processing cost.
  2. Less complexity. Instead of a two-headed core engine, PostgreSQL plus Redshift, MemSQL handles core analytics processing, simplifying management and maintenance.
  3. Less rigidity. Getting most data sources into a single stream of Kafka messages, all processed through Python, makes the architecture more flexible and simpler at the same time.
  4. More speed. The current architecture is much faster. With fewer moving parts, finding any bottlenecks and addressing them is much easier, and MemSQL is a very fast analytics engine.
  5. More scalability. Kafka and MemSQL are both fully scalable. Only the Oracle component is not able to scale out.
  6. Less cloud vendor lock-in. The only AWS-specific service used heavily in the current architecture is the AWS Database Migration Service. As such, the new architecture could be moved to a different cloud, on-premises, or to a blended architecture, giving Insite360 much more flexibility and negotiating power with all its vendors.

In addition to the benefits from MemSQL, Kafka is an important part of the picture. “All the sensor and IoT data is coming from the ATGs using Kafka,” says Jay. “So the Kafka pipeline is a very welcome feature of MemSQL. It gets us away from managing AWS EMR, which is not an easy task. Not running AWS EMR saves us tens of thousands of dollars a month.”

A few months ago, Insite360 began to take advantage of a new feature, the ability to backup S3 databases into MemSQL. The company has already done several backup and restore operations, and say that it “works like a charm” for the database and also for Kafka events.

The cloud application is based on a microservices architecture, with each microservice using its own data store. All the data goes into tables in MemSQL routed thru Kafka, then customers run analytics through Power BI for data visualization. Insite360 also does machine learning from MemSQL data for alarm management and natural language processing (NLP) on alarm and compliance data.

Internal customers include the product team, marketing, sales, and other groups within the company. External customers are the service stations. With the up-to-date information from Power BI, they can make informed decisions about operating and maintaining their stations.

Step 3: Using MemSQL Pipelines for Ultimate Simplicity

The need to separate transactions from analytics is nothing but an artifact of the processing limitations of traditional SQL databases such as Oracle compared to NewSQL databases such as MemSQL. However, over decades, expectations, and existing architectures, have hardened. The model of having an OLTP database, followed by an ETL process, feeding an OLAP database, is accepted as the norm.

As a result, a MemSQL implementation usually goes through two stages. In the first stage, either the OLTP or OLAP side is breaking down or already broken. Due to its high performance and scalability, MemSQL is used to either augment or replace the challenged transactional or analytical processing systems.

Then, an organization becomes more familiar with MemSQL’s strengths. In particular, it realizes that MemSQL, and its Pipelines capability, can break down the barrier between OLTP and OLAP. MemSQL can simultaneously process transactions, transform data, and respond to queries.

Once this realization takes hold, a radically simpler architecture suggests itself. The entire traditional ETL structure can be moved into MemSQL, which operates in three steps:

  • Extract. The original purpose of MemSQL Pipelines is to extract data rapidly, and with minimal load on the source database.
  • Transform. Simple transformations can be handled within a MemSQL Pipeline. The newer MemSQL capability, Pipelines to Stored Procedures, allows for more complex transformation steps.
  • Load. Data from Pipelines is loaded directly into MemSQL, and analytics queries against updated data run continually throughout.
MemSQL will run as a partly in-memory database and partly on disk in the new FuelQuest architecture.
The next-generation architecture will be even
simpler, faster, and easier to manage.

Insite360 is preparing to implement such a radically simpler architecture and reap the benefits, which include seven highly desirable attributes:

  • Up-to-date data. In the new architecture, batch processing is eliminated. Cutting out batch processing and taking advantage of unified transactions and analytics means that analytics always runs against current data.
  • Speed dial. Transaction processing and analytics speed is easily dialed up as needed by using more processors for MemSQL.
  • Lowest cost. MemSQL price/performance is stellar for either transactions or analytics. Combining transactions and analytics in a single MemSQL database cuts costs further.
  • Ultimate simplicity. Standardizing on a single database and consistent processes for bringing in and transforming data makes the architecture extremely simple and reduces the load on operations personnel.
  • Ultimate flexibility. Any data source can be added without changing the core processing and analytics architecture, due to the flexibility of MemSQL Pipelines and the processing power of stored procedures. And any analytics tool can be used, due to MemSQL’s SQL support.
  • Seamless scalability. With transaction processing and analytics support running on the same, fully scalable engine, you can literally throw hardware at problems.
  • No cloud vendor lock-in. You can set up MemSQL on any cloud or on-premises, and change hosting at any time. If you do use cloud vendor-specific services as a data source or on the analytics side, the dependency is more tactical than strategic, as you have flexibility at the core.
  • Machine learning and AI-friendly. Because data is current and bottlenecks can be eliminated by adding hardware, and because more data sources can be added as needed, machine learning programs and AI always have all the up-to-date data they need.

Next Steps for Insite360

Insite360 will be moving to the new, radically simplified architecture in the months ahead. Compared to the original cloud infrastructure, costs are expected to drop roughly in half. The number of people needed to operate the system will also be cut roughly in half to run a system that’s much more functional than the original.

One area in which this new functionality will bring benefits is enhancing the ability to do machine learning and AI. “Machine learning and AI work better with more data to aggregate,” says Jay. Insite360 uses advanced variance analytics to solve problems like missing fuel. If, for example, 50 gallons of fuel is missing out of thousands of gallons pumped, that can be due to various causes, such as theft, over-pumping, or a leak, which could be occurring in any of several different locations. Once the data is in MemSQL, Insite360 can use machine learning and AI to help operators prioritize where to look for problems first.

Improving the underlying architecture is also a positive for end customers. They get better results faster. Also, they can easily understand how data comes into, through, and out of the system. This increases their confidence in the results and their ability to use the system effectively.

All involved are looking forward to these benefits and more. The organization will indeed have “killed three birds with one stone” – MemSQL.

memsql rainbow wave
Webinar
See What’s New in MemSQL 6.7:
The No-Limits Database