Welcome to the MemSQL forums Ahjr! Before I get into details on your specific issue, please let me know which version of the MemSQL Spark Connector you are using. We are just about to release the next major version of the connector which you can see here: https://github.com/memsql/memsql-spark-connector/tree/3.0.0-beta
I highly recommend using that as it will be released in a couple weeks. At this point we have multiple users testing it and finding it to be much better than the current released version.
Regarding your connection issues. That particular issue is most likely network related. Here are the network connections which occur during usage of the 2.0.0 connector:
spark driver <-> master aggregator
spark driver <-> child aggregators
executor <-> all nodes in cluster
For this to work, the spark connector leverages hostnames it finds from running
show leaves and
show aggregators in your cluster. For this reason when using the 2.0.0 version of the connector you need to ensure that all nodes in your Spark cluster have bidirectional access to the ports which MemSQL is running on for every node in the MemSQL cluster.
In the 3.0.0 version of the connector we made this much easier to manage by adding more features to control which nodes we try to connect to. My suggestion is to upgrade to 3.0 first in order to see if that resolves the issue. If it doesn’t, do some investigation on your network connectivity and do some manual testing by using the mysql command line tool to test connecting to various memsql nodes from different places in your spark cluster (and where you are running the spark driver).