Looking for a Solution to NoSQL Analytics Limitations? Here’s How to Map NoSQL JSON to SingleStoreDB Tables

RI

Rémi Imbert

Cloud Solutions Engineer

Looking for a Solution to NoSQL Analytics Limitations? Here’s How to Map NoSQL JSON to SingleStoreDB Tables

NoSQL has raised a big interest in database trends in the past 10 years, commonly referenced for scalable databases and pure OLTP speed for lookup queries. But when it comes to running analytics queries — or more complex OLTP queries —  NoSQL starts to fail.

Why? Because NoSQL queries are limited to key value queries, which are very fast but sometimes require you to add an additional layer of computation on the application side to achieve expected results — where with SQL, you can simply query the result you want. Today, with the rise of distributed SQL databases like SingleStore, it’s easier to handle scalability issues you might encounter with legacy NoSQL databases.


In this blog, we will go through best practices to move from a NoSQL database to SingleStoreDB — including how to quickly import JSON data into SQL tables. But first, let’s see more on SingleStoreDB. 

what-is-single-store-dbWhat Is SingleStoreDB?

SingleStore is a distributed SQL database that handles both analytical (OLAP) and transactional (OLTP) workloads in the same table type. SingleStoreDB provides fast ingestion and high query speed for complex OLTP and OLAP queries. It provides a robust, scalable solution that is levels above what other legacy single node databases can do. There is also a managed service that can be deployed on AWS, GCP or Microsoft Azure.

moving-past-no-sql-limitationsMoving Past NoSQL Limitations

NoSQL databases are more scalable than legacy SQL databases, handling hundreds of millions of transactions in a high concurrency environment for pure OLTP queries. But today, as data rules  the world, you need the best insights from your database via analytics dashboards, or complex OLTP queries. Unfortunately, these insights can not be obtained properly by a NoSQL database, so users often have to add a new layer to handle analytics with a data warehouse — then also add another OLTP SQL database to handle some more complex SQL queries.

All of that will result in data movement, ETL processes and database sprawl which leads to high latency and poor user experiences. What we’ve seen is a progression from a SQL to NoSQL era, leading us to where SingleStore is presently — the NewSQL era.

mapping-no-sql-json-to-single-store-tablesMapping NoSQL JSON to SingleStore Tables

SingleStoreDB provides excellent support for JSON, especially since our latest product improvements in version 8.0  Even better, you can ingest data directly from JSON files stored in cold storage — with high ingest speeds being one of the strongest capabilities in SingleStoreDB.

To ingest data, we’ll use SingleStore Pipelines.Let’s move some data from a NoSQL database to a AWS S3 bucket, ingesting it into SingleStoreDB Here is  basic .json data stored on a bucket called nosql migration :

{ "id": "1",
  "name": "John Smith",
  "job": "Director" ,
  "Address": { "Street": "Anger street",
    "StreetNum": "32 bis",
    "City": "London",
    "Country": "United Kingdom"
  }
}

And here is the corresponding table in SingleStoreDB :

CREATE TABLE employee (
id
int,
name varchar(32),
job varchar(32),
gender
varchar(10),
address JSON,
Shard
key (id)
);

We can also provide a better table definition to have all address information directly in that table, and ingest this information later :

CREATE TABLE employee (
id
int,
name varchar(32),
job varchar(32),
gender
varchar(10),
address JSON,
street
varchar(32),
streetnum
varchar(32),
city
varchar(32),
country
varchar(32),
Shard
key (id)
);

Now, as we define the equivalent table in SingleStoreDB, let’s ingest it via a Pipeline:

CREATE PIPELINE pipeline_migration_nosql AS
LOAD
DATA S3 'nosqlmigration'
CONFIG '{"region": "eu-west-1"}'
CREDENTIALS '{"aws_access_key_id": "aws_access_key_id",
             "aws_secret_access_key": "your_aws_secret_access_key",
              "aws_session_token": "your_aws_session_token"}'
INTO TABLE employee
(id
<- id,
name <-name,
job <- job,
gender
<- gender,
address <- address,
street
<- address::street,
streetnum
<- address::streetnum,
city
<- address::city,
country
<- address::country)
FORMAT JSON;

We want to extract the JSON object address and ingest it directly into table fields. That way, we can easily use these fields to run more advanced queries. Now, let’s see what we can do if we have a more nested JSON with an array. It’s pretty common for some NoSQL databases to have a collection of items with one array as field type.

handling-a-json-arrayHandling a JSON Array

An array in JSON is a list of keys and values. There are multiple options to import it properly into SingleStoreDB tables. The best options depend on which type of operation (aggregation, lookup select, etc.) and how often you want to access these items in an array.

Let’s use this nested JSON as an example :

{ "id": "1",
  "name": "John Smith",
  "job": "Director" ,
  "address": { "street": "Anger street",
    "streetnum": "32 bis",
    "city": "London",
    "country": "United Kingdom"
  },
 "experience": [
 { "role": "Lead Engineer",
    "company": "Json",
    "yoe": 3
  },
{ "role": "Senior Engineer",
    "company": "Avro",
    "yoe": 3
  },
{ "role": "Junior Engineer",
    "company": "Parquet",
    "yoe": 4
  }
  ]
}

option-1-import-it-as-a-field-jsonOption 1: Import it as a field JSON

This option is very performant (and even better now with our 8.0 release!) if you want to complete simpler operations, like lookups or aggregations.

The idea is to store the array field into a JSON type, directly using the JSON field in your query. The drawback of this method is that you lose the interesting table structure of a SQL database.
Here is an example of table definition and  query to access the specific value you are looking for:

CREATE TABLE employee (
id
int,
name varchar(32),
job varchar(32),
gender
varchar(10),
address JSON,
street
varchar(32),
streetnum
varchar(32),
city
varchar(32),
country
varchar(32),
experience JSON,
shard
key (id)
);

Query 1 : Find previous experiences of employee 1

SELECT emp.name,emp.job, exp.table_col::$role as 'role',
exp.table_col::$company as 'company'
FROM employee emp , TABLE(JSON_TO_ARRAY(emp.experience)) exp
WHERE emp.id=1;

Query 2 : Aggregate the total years of experience for employee 1

SELECT emp.name, SUM(exp.table_col::yoe) as 'Total YoE'
FROM employee emp , TABLE(JSON_TO_ARRAY(emp.experience)) exp
WHERE emp.id=1;

option-2-create-a-table-to-represent-a-json-arrayOption 2: Create a table to represent a JSON Array

The second option is to fully use the relational database SQL to build tables that represent these arrays. Here is the second table that will represent the array (employee experience):

CREATE TABLE experience (
id_employee
int,
role varchar(32),
company varchar(32),
yoe
int,
Shard
key (id_employee)
);

This table structure makes your data more readable and usable for your application or tool. You will need to re-write the pipeline that ingests the JSON file to interpret this array, and insert into the corresponding tables. To make it work, we will write a stored procedure that will insert into two tables, redirecting the ingestion from the pipeline into this procedure.

Here is an example of a stored procedure that will insert into these two tables:

-- Stored Procedure for ingesting json array in multiple table
DELIMITER //

CREATE OR REPLACE PROCEDURE employee_proc(batch QUERY(id
varchar(32),name varchar(32),job varchar(32),gender varchar(32),
address JSON,street varchar(32),streetnum varchar(32),city
varchar(32),country varchar(32),experience JSON))
AS
DECLARE
 json_array ARRAY(json);
 e json;
BEGIN
FOR
batch_record IN COLLECT(batch) LOOP
   BEGIN
 INSERT INTO employee(id, name,
job,gender,address,street,streetnum,city,country) VALUES
(batch_record.id,batch_record.name,batch_record.job,batch_record.ge
nder,batch_record.
address,batch_record.address::$street,batch_recor
d.
address::$streetnum,batch_record.address::$city,batch_record.addr
ess
::$country);
 json_array
= JSON_TO_ARRAY(batch_record.experience);
 
FOR i IN 0 .. LENGTH(json_array) - 1 LOOP
   e = json_array[i];
   
INSERT INTO experience(id_employee,role,company,yoe) VALUES
(batch_record.id,e::$role,e::$company,e::yoe);
 
END LOOP;
   
END;
 
END LOOP;
 
END //

DELIMITER ;

And here is the new pipeline SQL statement:

-- Pipeline for ingesting json into a Stored procedure

CREATE PIPELINE pipeline_migration_nosql AS
LOAD
DATA S3 'nosqlmigration'
CONFIG '{"region": "eu-west-1"}'
CREDENTIALS '{"aws_access_key_id": "your_aws_access_key_id",
            "aws_secret_access_key": "your_aws_secret_access_key",
             "aws_session_token": "your_aws_session_token"}'
INTO PROCEDURE employee_proc
FORMAT JSON;

Query 1: Find experiences of employee 1

SELECT emp.name,emp.job, exp.role , exp.company
FROM employee emp JOIN experience exp ON emp.id = exp.id_employee
WHERE emp.id=1;

Query 2: Aggregate the total number of years experience for employee 1

SELECT emp.name,SUM(exp.yoe) as 'Total YoE'
FROM employee emp JOIN experience exp ON emp.id = exp.id_employee
WHERE emp.id=1;

performance-comparison-between-both-optionsPerformance Comparison Between Both Options

Both options can achieve very fast performance in SingleStore — but they came with some drawbacks. Option one is good if the query is light on the JSON array. Option two makes your data schema more readable, and offers more possibilities in terms of queries shapes and computation. But option two will have an impact on ingest speed and making joins between two big tables can be costly in terms of CPU.

Here is a benchmark chart that shows performance for both options for the two queries described above. The experience table has an average of three times the number of rows of the employee table.


*Test has been run against a few hundred millions of rows with a machine of 4 vCPU

In terms of pure performance for these two queries, option one is faster than option two. It will also have a better CPU consumption. But as mentioned, the preceding schema will look less readable for running more complex queries, and would be less efficient using pure SQL dialect.

conclusionConclusion

In this blog, we’ve covered how easily SingleStoreDB handles JSON formats and JSON objects. It’s quite easy to make the move from a NoSQL database to SingleStoreDB, easier than you might think!

The main reason to make the switch is that SingleStoreDB shows very good performance for pure lookup query — but more importantly, it can easily handle light to heavy analytical query under a high concurrency volume. SingleStoreDB has also very fast ingestion speed, thanks to its native pipeline feature.

SingleStoreDB comes with a full package of features — and a fully managed service. Make the move now, and start with a free trial today.


Share