Problems connecting AWS EMR Spark cluster to MemSQL cluster

One of our developers is trying to use the memsql-spark-connector to connect our EMR spark cluster to memsql, but we keep getting the following error:

java.sql.SQLException: Cannot create PoolableConnectionFactory (Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.)

What permissions/security group/etc changes might I be missing?

Welcome to the MemSQL forums Ahjr! Before I get into details on your specific issue, please let me know which version of the MemSQL Spark Connector you are using. We are just about to release the next major version of the connector which you can see here: https://github.com/memsql/memsql-spark-connector/tree/3.0.0-beta

I highly recommend using that as it will be released in a couple weeks. At this point we have multiple users testing it and finding it to be much better than the current released version.

Regarding your connection issues. That particular issue is most likely network related. Here are the network connections which occur during usage of the 2.0.0 connector:

spark driver ↔ master aggregator
spark driver ↔ child aggregators
executor ↔ all nodes in cluster

For this to work, the spark connector leverages hostnames it finds from running show leaves and show aggregators in your cluster. For this reason when using the 2.0.0 version of the connector you need to ensure that all nodes in your Spark cluster have bidirectional access to the ports which MemSQL is running on for every node in the MemSQL cluster.

In the 3.0.0 version of the connector we made this much easier to manage by adding more features to control which nodes we try to connect to. My suggestion is to upgrade to 3.0 first in order to see if that resolves the issue. If it doesn’t, do some investigation on your network connectivity and do some manual testing by using the mysql command line tool to test connecting to various memsql nodes from different places in your spark cluster (and where you are running the spark driver).

Good luck!

Thanks for the comprehensive reply based on what I realize now is some pretty thin information!

He got it working with this post and trace logging. Appreciate the quick response.

1 Like