Issue while executing Spark Job from AWS EMR

Hi Team,

We are facing issue while loading data from AWS EMR/Spark to memSQL. We tried with memSQL Pipeline with proper privileges to normal user like 1. create pipeline, drop pipeline, start pipeline and stop pipeline. Still we are unable to load data.

We also tried to use Spark Connector to memSQL to load data still we are facing issue. Actual error we got is -

Access Denied for user <user_id> to database <db_name>

If we try to run job using memSQL root credentials it works fine.

Can anyone suggest what privileges we are missing.

Thanks in advance.

Prashant

Unlike a typical application, Spark can sometimes directly query leaf nodes in a MemSQL cluster. If I had to guess the leaf nodes in your cluster don’t have the user created on them that spark is using to query (likely the user only exists on the aggregators as this is typically the only nodes that need them).

Thanks for reply,

We have added user and granted privileges same as master to leaf node.
Privileges assign to memsql user are -

SELECT, INSERT, UPDATE, DELETE, CREATE, CREATE TEMPORARY TABLES, LOCK TABLES, SHOW VIEW, CREATE PIPELINE, DROP PIPELINE, START PIPELINE, ALTER PIPELINE, SHOW PIPELINE

Still we are getting error while running Scala/Spark job as -

20/05/28 09:31:48 ERROR ApplicationMaster: User class threw exception: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Access denied for user ‘user_id’@’%’ to database ‘db_name’

This job is truncate and load thus after truncate instead of overwrite we tried using insert as well as append, still we are facing same issue.

Thanks in advance.
Prashant

Hello,

I’ve linked MemSQL’s permissions matrix here, which provides all the permissions you need for a given operation.

That said, the permissions you linked above look satisfactory for loading data, however, depending on exactly what you are doing in the statement, you may require further permissions:

Can you provide:

  1. Both the pipeline load statement and spark write statement you used that lead to the access denied error?
  2. The MemSQL version of your cluster?

Furthermore, regarding Spark:

  • Can you confirm what version of the Spark Connector you are using? We strongly recommend you use the latest version MemSQL Spark Connector 3.0
  • I’ve linked the README which also refers to the permissions required
  • Note that if you are truncating a table using OverWrite, you require SELECT, INSERT, and DROP. If you are dropping/recreating or creating a new table from scratch, you need CREATE as well. If you are appending only, that only requires the SELECT and INSERT permissions.

As Adam from our team noted above, if you are using parallel read/partition pushdown in the Spark Connector, the user writing from Spark also requires access to the MemSQL leaves:

Best,
Roxanna

Hi Roxanna

Thanks for your prompt reply.

As mentioned earlier all nodes in cluster, I created user_id assuming Spark will execute parallel read/partition pushdown, with aforesaid privileges.

Here is the write statement of Spark Job -

statement.executeUpdate(s"delete from $target_orc where $src_filter")
target.write.format(“com.memsql.spark.connector”).mode(“append”).save(target_memsql_table)

MemSQL version is 7.0.11
MemSQL-Spark Connector version is spark-connector version: 2.0.1

We are checking with version 3 as mentioned in your reply and will let you know.

Thanks in advance.
Prashant