The Impact of Always-on Connectivity for Geospatial Applications and Analysis
Devices, Computing and Connectivity Converge
In the past ten years technology shifts have re-crafted the geospatial applications and analytics landscape.
- The iPhone and Android ecosystems have fostered a world where almost everyone is a beacon of information;
- Large scale computing capabilities have provided companies like Google and Facebook the ability to keep track of billions of things, and companies like Amazon and Microsoft are making similar computing power available to everyone;
- Global Internet coverage continues to expand, including innovative programs with balloons and solar powered drones.
These trends are causing billion dollar shifts in the mapping and geospatially-oriented industries, for example:
In August 2015, a consortium of the largest German automakers including Audi, BMW, and Daimler (Mercedes) bought Nokia's Here mapping unit, the largest competitor to Google Maps, for $3.1 billion.
In addition to automakers like the German consortium having a stake in owning and controlling mapping data and driver user experiences, the largest private companies, like Uber and Airbnb, depend on maps as an integral part of their applications. Source: VentureBeat
In this paper, we'll examine several showcase applications that demonstrate modern geospatial capabilities of an in-memory approach. In particular, we'll focus on transportation.
Geospatial Analytics and Transportation
Uber has shown the world what is possible when capitalizing on the trends we called out earlier: ubiquitous mobile phones, computing capabilities, and connectivity. In late 2015, Uber announced it has server 1 billion rides, and in early 2016 it was operating in 400 cities across 68 countries.
Uber began when its co-founders were unable to get a taxi one evening, but the frustration was impactful knowing they held GPS-capable computers in their pocket and there was likely a labor and asset pool capable of filling the taxi gap.
Of course what makes Uber stand out today it its ability to link millions of riders and corresponding drivers quickly, accurately, safely, and effortlessly. It is hard to discount this as anything but a game changer.
While Uber data is not available for the world to see, we are fortunate to be able to get a small sense of the kind of information involved with the release of taxi data from the New York City Taxi Commission.
Real-time geospatial capabilities in MemSQL identify the geographic location and characteristics of natural or constructed features and boundaries, and the objects that reside or move within them. For mobile, transportation and logistics, having instant access to real-time geospatial data can mean greater visibility into smart device application use, fuel efficiency, global supply chains and real-time inventory management. Industries gain true competitive advantage when business-critical decisions can be made as quickly as the data is captured.
The demonstration, titled Supercar, makes use of a dataset containing the details of 170 million real world taxi rides. By sampling this dataset and creating real-time records while simultaneously querying the data, Supercar simulates the ability to monitor and derive insights across hundreds of thousands of objects on the go.
By natively integrating geospatial datatypes in its relational database, MemSQL enables simple queries to derive informative results. The queries available in Supercar include:
- How many riders did we serve?
- What was the average rider wait time?
- What was the average trip distance?
- What was the average trip time?
- What was the average price/fare?
Simple Queries With Native Geospatial Intelligence
The demonstration uses the developer-focused mapping platform from Mapbox and combines simple SQL queries generated on the fly. For example, users can pan across the map and zoom in to specific sections which creates an area in which they can then run the query.
One query example for passenger count is shown below. The coordinates of the polygon were removed for simplicity sake, but in practice represent the latitude and longitude of the four corners of the visible map area.
SELECT SUM(passenger_count) as result FROM trips WHERE GEOGRAPHY_INTERSECTS(pickup_location, "POLYGON((...))") OR GEOGRAPHY_INTERSECTS(dropoff_location, "POLYGON((...))")
MemSQL Supercar Real-Time Geospatial Demo
Supercar Technical Details
"Supercar" is a simulation of 50,000 taxis roaming around the New York metro area, picking up and dropping off passengers. Each vehicle reports its geolocation to the server once a second. A "trips" thread uses real-world NYC taxi data to create requests for pickups and destinations at a clip of several hundred per second. A taxi is chosen by performing a within_distance geospatial query to find the closest 20 available vehicles with the features the rider asks for (e.g., SUV, carseat, limo). A candidate is chosen at random and the taxi starts moving to the pickup point. The price of the ride is determined dynamically based on a geofence query and recent values for supply and demand within that geofence.
Once the rider is dropped off, the taxi performs another query to determine where it is, the price of that area, and the location of another area with a higher price. Having chosen a likely place to wait for another fare, it moves toward it.
The "pricing" thread dynamically adjusts local prices for taxi fares once a second. It looks up recent requests and taxi locations, grouped by the areas they occurred, and bumps the price of each geofence up or down based on the ratio of supply and demand. A web-based user interface plots the state of the system and allows the user to run real time analytical queries against the dataset.
An Esri Take on New York's Taxi Data
From the blog of Mansour Raad, Esri BigData, MemSQL and ArcGIS Interceptors
Last week, at the Developer Summit, we unveiled Server Object Interceptors. They have the same API as Server Object Extensions, and are intended to extend an ArcGIS Server with custom capabilities. An SOI intercepts REST and/or SOAP calls on a MapServer before and/or after it executes the operation on an SOE or SO. Think servlet filters.
A use case of an SOI associated with a published MXD is to intercept an export image operation on its MapService and digitally watermark the original resulting image. Another use case of an interceptor is to use the associated user credentials in the single-sign-on request to restrict the visibility of layers or data fields.
This is pretty neat and being the BigData Advocate, I started thinking how to use this interceptor in a BigData context. The stars could not have been more aligned than when I heard that the MemSQL folks have announced geospatial capabilities in their In-memory database. See, I knew for a while that they were spitballing native geospatial types, but the fact that they showcased it at Strata + Hadoop World made me reach back to them to see how we can collaborate.
The idea is that since ArcGIS server does not natively support MemSQL, and since MemSQL natively supports the MySQL wire protocol, I can use the MySQL JDBC driver to query MemSQL from an SOI and display the result in a map.
The good folks at MemSQL bootstrapped a set of AWS instances with their "new" engine and loaded the now-very-famous New York City taxis trips data. This (very very small) set consists of about 170 million records with geospatial and temporal information such as pickup and drop off locations and times. Each trip has additional attributes such as travel times, distances and number of passengers. It was up to me now to query and display dynamically this information in a standard WebMap on every map pan and zoom. What do I mean by "standard" here, is that an out-of-the-box WebMap should be able to interact with this MemSQL database without being augmented with a new layer type or any other functionality. Thus the usage of an SOI. It will intercept the call to an export image operation with a map extent as an argument in a "stand-in" MapService and will execute a spatial MemSQL call on the AWS instances. The result set is drawn on an off-screen PNG image and is sent back to the requesting WebMap for display as a layer on a map.
Real-time Business Intelligence companies like Zoomdata have also shown what is possible with geospatial analytics.
TaxiStats features a real-time dashboard application with Zoomdata. The simulated pickup and drop-off data from taxis is streamed into MemSQL as rides complete. The Zoomdata business intelligence dashboard displays that data as it is collected while exploratory analytics run simultaneously on the dataset. The dashboard includes:
- Real-time data for pickups by ZIP code on the map, total volume of rides, and rides by time of day.
- A map and graph that can be filtered to explore and drill down.
- A live stream that can be paused or rewound to examine a specific time period.
TaxiStats Showcase Application
About MemSQL Geospatial
MemSQL at a Glance
MemSQL is the leader in real-time databases for transactions and analytics. As a purpose built database for instant access to real-time and historical data, MemSQL uses a familiar SQL interface and a horizontally scalable distributed architecture that runs on commodity hardware or in the cloud. Innovative enterprises use MemSQL to better predict and react to opportunities by extracting previously untapped value in their data to drive new revenue. MemSQL is deployed across hundreds of nodes in high velocity big data environments. Based in San Francisco, MemSQL is a Y Combinator company funded by prominent investors including Accel Partners, Khosla Ventures, First Round Capital and Data Collective. Follow us @MemSQL or visit at www.memsql.com.
MemSQL Product Architecture
MemSQL combines real-time streaming, database, and data warehouse workloads for sub-second processing and reporting in a single, scalable, easy-to-manage database. Build real-time applications to instantly respond to dynamic business changes. Bring your data into the light of day with precision insights, faster decisions, and immediate action.
MemSQL achieves these capabilities through a unique combination of features
A Commitment to the EnterpriseMemSQL has always maintained an enterprise focus, ensuring our database delivers the maturity and functionality to serve the most demanding workloads.
Full Transactional SQLMemSQL is a scalable, performant database that retains the time-tested relational properties of SQL.
Multi-model and Multi-ModeMemSQL supports multiple data models beyond SQL including key-value, document/JSON, and geospatial.
In-Memory Rowstore and Disk/SSD-based ColumnstoreMemSQL features an in-memory row store and a disk/SSD-based column store in a single database, achieving extremely low latency execution while allowing for data growth.
Distributed ArchitectureMemSQL supports a distributed architecture that can scale out on commodity hardware. This architecture also supports distributed query optimization and execution for the fastest analytics possible at scale.
Deploy On-Premises or in the CloudMemSQL can be deployed on site on commodity hardware, or on any public cloud including Amazon, Azure, Google, Digital Ocean, Softlayer and others. This provides complete flexibility for a variety of use cases.
Building Modern Database Applications with MemSQL
In addition to well-understood database models, MemSQL allows you to go beyond what previous databases or data warehouses were capable of. We'd invite you to consider some of the following options.
Building Modern Database Applications with MemSQL In addition to well-understood database models, MemSQL allows you to go beyond what previous databases or data warehouses were capable of. We'd invite you to consider some of the following options.
High-Volume Transactional WorkloadsMemSQL excels at high volume transactional workloads, including those where real-time analytics come into play. With MemSQL you can ingest millions of records per second, and run queries with results accurate to the last transaction.
Data Warehouses with Live DataIn the past, data warehouses were batch-loaded with data after-the-fact. With MemSQL, you can send live data to the database and run complex analytical queries with ease, all in a non-blocking infrastructure. MemSQL allows you to take an overnight process and turn it into a continuous process.
Real-Time Data Pipelines with Apache Kafka and SparkMemSQL Streamliner supports modern streaming workloads using the power of Apache Spark, and enables our customers to stream, persist, and analyze hundreds of terabytes of data a day without writing any code. Easily connect to Apache Kafka as a real-time message queue, or use a custom extract to pull data from your preferred source.
Starting with MemSQL 4, geospatial functions are now part of the database. This includes the three main object types of polygons, paths, and points.
For a complete reference of MemSQL geospatial functions, please refer to the Geospatial Guide.
With the advent of mobile phones, ubiquitous computing, and global internet connectivity, nearly every data point has a place. As such, geospatial analytics is becoming more important than ever.
In particular, the scale and size of emerging geospatial datasets demands a similarly scalable database. MemSQL, through its distributed architecture and support of geospatial functions fits this demand perfectly.
Future Geospatial Developments
As geospatial demands increase, MemSQL plans to support them. This includes making geospatial functions and data types first class citizens for real-time data pipelines, and the expansion of more models and a broader range of queries.