Errors in memsql.log at DR site, AdjustReferenceDatabases: Failure updating reference databases

Started seeing the following messages on a replicated cluster. The tables being replicated seem to be in sync. How can I identify what is causing this error?

3787503923692 2020-06-17 10:15:23.654 ERROR: ProcessARD Thread 115048: AdjustReferenceDatabases: Failure updating reference databases at LSN 231. Succeeded updating 4, needed 5.

The replication status seems to indicate the cluster database LSN maybe the sticking point at LSN 231.

Please explain what I’m looking at from this status.

memsql> show replication status;
+-------------+------------------------------+--------------------------------------------+--------------+------------------+--------------------+------------------+----------------+----------------+-----------+------------+---------------------------------------------------------------+-------------+-----------------+-------------------+-----------------+---------------+---------------+
| Role        | Database                     | Master_URI                                 | Master_State | Master_CommitLSN | Master_HardenedLSN | Master_ReplayLSN | Master_TailLSN | Master_Commits | Connected | Throttling | Slave_URI                                                     | Slave_State | Slave_CommitLSN | Slave_HardenedLSN | Slave_ReplayLSN | Slave_TailLSN | Slave_Commits |
+-------------+------------------------------+--------------------------------------------+--------------+------------------+--------------------+------------------+----------------+----------------+-----------+------------+---------------------------------------------------------------+-------------+-----------------+-------------------+-----------------+---------------+---------------+
| master      | cluster                      | NULL                                       | online       | 0:231            | 0:231              | 0:46             | 0:231          |           NULL | yes       | no         | CH-P-MEM-SQL1.xxxxxxxx.com:3306/cluster                       | replicating | 0:230           | 0:231             | 0:230           | 0:231         |          NULL |
| master      | cluster                      | NULL                                       | online       | 0:231            | 0:231              | 0:46             | 0:231          |           NULL | yes       | no         | CH-P-MEM-SQL2.xxxxxxxx.com:3306/cluster                       | replicating | 0:230           | 0:231             | 0:230           | 0:231         |          NULL |
| master      | cluster                      | NULL                                       | online       | 0:231            | 0:231              | 0:46             | 0:231          |           NULL | yes       | no         | CH-P-MEM-AGGR2.xxxxxxxx.com:3306/cluster                      | replicating | 0:230           | 0:231             | 0:230           | 0:231         |          NULL |
| async slave | cluster_12702002914197157727 | MI-P-MEM-AGGR1.xxxxxxxx.com:3306/cluster   | online       | 0:231            | 0:231              | 0:46             | 0:231          |           NULL | yes       | no         | NULL                                                          | replicating | 0:769           | 0:770             | 0:769           | 0:770         |          NULL |
| master      | cluster_12702002914197157727 | NULL                                       | replicating  | 0:769            | 0:770              | 0:769            | 0:770          |           NULL | yes       | no         | CH-P-MEM-AGGR2.xxxxxxxx.com:3306/cluster_12702002914197157727 | replicating | 0:769           | 0:770             | 0:769           | 0:770         |          NULL |
| master      | cluster_12702002914197157727 | NULL                                       | replicating  | 0:769            | 0:770              | 0:769            | 0:770          |           NULL | yes       | no         | CH-P-MEM-SQL1.xxxxxxxx.com:3306/cluster_12702002914197157727  | replicating | 0:769           | 0:770             | 0:769           | 0:770         |          NULL |
| master      | cluster_12702002914197157727 | NULL                                       | replicating  | 0:769            | 0:770              | 0:769            | 0:770          |           NULL | yes       | no         | CH-P-MEM-SQL2.xxxxxxxx.com:3306/cluster_12702002914197157727  | replicating | 0:769           | 0:770             | 0:769           | 0:770         |          NULL |
| async slave | crca_core                    | MI-P-MEM-AGGR1.xxxxxxxx.com:3306/crca_core | replicating  | 0:444            | 0:445              | 0:444            | 0:445          |           NULL | yes       | no         | NULL                                                          | replicating | 0:444           | 0:445             | 0:444           | 0:445         |          NULL |
| master      | crca_core                    | NULL                                       | replicating  | 0:444            | 0:445              | 0:444            | 0:445          |           NULL | yes       | no         | CH-P-MEM-AGGR2.xxxxxxxx.com:3306/crca_core                    | replicating | 0:444           | 0:445             | 0:444           | 0:445         |          NULL |
| master      | crca_core                    | NULL                                       | replicating  | 0:444            | 0:445              | 0:444            | 0:445          |           NULL | yes       | no         | CH-P-MEM-SQL2.xxxxxxxx.com:3306/crca_core                     | replicating | 0:444           | 0:445             | 0:444           | 0:445         |          NULL |
| master      | crca_core                    | NULL                                       | replicating  | 0:444            | 0:445              | 0:444            | 0:445          |           NULL | yes       | no         | CH-P-MEM-SQL1.xxxxxxxx.com:3306/crca_core                     | replicating | 0:444           | 0:445             | 0:444           | 0:445         |          NULL |
| async slave | crca_dm                      | MI-P-MEM-AGGR1.xxxxxxxx.com:3306/crca_dm   | replicating  | 0:23639          | 0:23640            | 0:23639          | 0:23640        |           NULL | yes       | no         | NULL                                                          | replicating | 0:23639         | 0:23640           | 0:23639         | 0:23640       |          NULL |
| master      | crca_dm                      | NULL                                       | replicating  | 0:23639          | 0:23640            | 0:23639          | 0:23640        |           NULL | yes       | no         | CH-P-MEM-AGGR2.xxxxxxxx.com:3306/crca_dm                      | replicating | 0:23639         | 0:23640           | 0:23639         | 0:23640       |          NULL |
| master      | crca_dm                      | NULL                                       | replicating  | 0:23639          | 0:23640            | 0:23639          | 0:23640        |           NULL | yes       | no         | CH-P-MEM-SQL1.xxxxxxxx.com:3306/crca_dm                       | replicating | 0:23639         | 0:23640           | 0:23639         | 0:23640       |          NULL |
| master      | crca_dm                      | NULL                                       | replicating  | 0:23639          | 0:23640            | 0:23639          | 0:23640        |           NULL | yes       | no         | CH-P-MEM-SQL2.xxxxxxxx.com:3306/crca_dm                       | replicating | 0:23639         | 0:23640           | 0:23639         | 0:23640       |          NULL |
| async slave | dbatest                      | MI-P-MEM-AGGR1.xxxxxxxx.com:3306/dbatest   | replicating  | 0:403            | 0:404              | 0:403            | 0:404          |           NULL | yes       | no         | NULL                                                          | replicating | 0:403           | 0:404             | 0:403           | 0:404         |          NULL |
| master      | dbatest                      | NULL                                       | replicating  | 0:403            | 0:404              | 0:403            | 0:404          |           NULL | yes       | no         | CH-P-MEM-AGGR2.xxxxxxxx.com:3306/dbatest                      | replicating | 0:403           | 0:404             | 0:403           | 0:404         |          NULL |
| master      | dbatest                      | NULL                                       | replicating  | 0:403            | 0:404              | 0:403            | 0:404          |           NULL | yes       | no         | CH-P-MEM-SQL2.xxxxxxxx.com:3306/dbatest                       | replicating | 0:403           | 0:404             | 0:403           | 0:404         |          NULL |
| master      | dbatest                      | NULL                                       | replicating  | 0:403            | 0:404              | 0:403            | 0:404          |           NULL | yes       | no         | CH-P-MEM-SQL1.xxxxxxxx.com:3306/dbatest                       | replicating | 0:403           | 0:404             | 0:403           | 0:404         |          NULL |
+-------------+------------------------------+--------------------------------------------+--------------+------------------+--------------------+------------------+----------------+----------------+-----------+------------+---------------------------------------------------------------+-------------+-----------------+-------------------+-----------------+---------------+---------------+
19 rows in set (0.02 sec)

Hi,

The error you see in the log can happen if there are issues with replication. The system retries on those errors, and as long as you don’t persistently see the error, then replication is healthy.

Your output of show replication status also looks fine - I do see a bug, though, where the output for Master_*LSN columns for the cluster replica master row seems to be using the local cluster db LSNs rather than the primary cluster’s cluster db. If you run show databases extended on both clusters, however, you should see that the cluster replica has the correct LSN. I have also filed an internal task to investigate and fix this issue.

If the log message is showing up in your tracelogs repeatedly, then there might be some issue, otherwise I would assume it’s a temporary issue - a timeout, network issue, … since the system fixed it afterwards (if it had not, this message would show up repeatedly in the logs, at a regular cadence). The details of the failure should be in preceding log messages. There are internal efforts to improve these particular log messages to make it easier to understand what causes a particular failure, but currently, it is necessary to investigate the logs in more detail in order to diagnose the root cause - although I wouldn’t worry about it if it shows up once, and not repeatedly.

1 Like