Thanks Hanson, that definitely got me much closer! I tried the following part of the documentation:
One way to verify connectivity is to run the command
FILL CONNECTION POOLS on all MemSQL nodes. If this fails with the same error, then a node is unable to connect to another node.
Upon running “FILL CONNECTION POOLS”, I did indeed get an error:
ERROR 1735 (HY000): Unable to connect to leaf @127.0.0.1:3307 with user distributed, using password YES:  Cannot connect to '127.0.0.1':3307. Errno=99 (Cannot assign requested address)
Both nodes are on the same machine and were setup using the “cluster in a box” automated installation. I checked UFW rules and nothing is being blocked (I even tried disabling UFW). Also, a user is setup to be able to connect on any ip and the bind-address for both nodes are set to 0.0.0.0. Interestingly, the server is working, I can run all queries, but I run
memsqlctl list-nodes it says the leaf node is False for connectable and Unknown for recovery state.
I restarted the node, but after running “FILL CONNECTION POOLS” it ended up back to looking like that screenshot.
Does the documentation go deeper in-depth on what to do if the nodes can’t communicate with each other?
I fixed the error with FILL CONNECTION POOLS by going into each node’s memsql.cnf file and commenting out the socket parameter so the nodes use TCP. I’m running into a different error now where if I try to sort by a column on the table, I got the error “ERROR 1777 (HY000): Partition records:2 has no master instance.”. The table has around 17 million rows. The query uses the shard key in the where clause so it’s running on around 8 million of those records. The server has over 100 GB of ram. My hunch is I have something misconfigured.