Left join like in mySql?

I would like to understand why the left join does not react as in mysql. I have two tables that I would like to put together.

This is the statement that works for me in mysql:

SELECT * from EtlStep 
LEFT JOIN EtlStepExecution on EtlStep.Id = EtlStepExecution.StepId 
where EtlStep.PackageId = 2000000 AND (EtlStepExecution.PackageExecutionId = 3000014 OR EtlStepExecution.StepId is null);

The same statement does not output the one line where the StepId is null

That is the output of mysql

And this is the output in memsql

The line with the id 2000018 is not displayed.

I’d be happy if you could help me.

Hi, can you provide the create table statements and the relevant row of the data to reproduce this? We tried a similar query in memsql and it worked as expected.

Hi, Could you also run the following queries and send us the result?

explain extended SELECT * from EtlStep 
LEFT JOIN EtlStepExecution on EtlStep.Id = EtlStepExecution.StepId 
where EtlStep.PackageId = 2000000 AND (EtlStepExecution.PackageExecutionId = 3000014 OR EtlStepExecution.StepId is null);

`

Data EtlStep:

image

 CREATE TABLE `EtlStep` (
      `Id` bigint(11) NOT NULL AUTO_INCREMENT,
      `PackageId` bigint(20) unsigned NOT NULL,
      `StepType` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
      `Name` varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
      `Description` text CHARACTER SET utf8 COLLATE utf8_general_ci,
      `Payload` text CHARACTER SET utf8 COLLATE utf8_general_ci,
      `NextStepId` bigint(20) unsigned DEFAULT NULL,
      `Finalize` varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
      PRIMARY KEY (`Id`)
    ) AUTO_INCREMENT=1

Data EtlStepExecution:

CREATE TABLE `EtlStepExecution` (
  `Id` bigint(11) NOT NULL AUTO_INCREMENT,
  `StepId` bigint(20) unsigned NOT NULL,
  `PackageExecutionId` bigint(20) unsigned NOT NULL,
  `StartTime` datetime DEFAULT NULL,
  `EndTime` datetime DEFAULT NULL,
  `Status` varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `StartReason` varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `FailedReason` varchar(500) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  PRIMARY KEY (`Id`)
) AUTO_INCREMENT=1;

Here is the result:

It looks fine to me. In the explain plan above we rewrote the original (EtlStep left outer join EtlStepExecution), but we converted this to the equivalent (EtlStepExecution right outer join EtlStep). I also tested it by inserting a few rows of data and testing what passed through.

Did you have a particular result that seemed wrong?

There’s no one who can help?

Hi Hamza!

We will have someone help you on this topic very shortly – apologies for the delay!

Just to make sure this not a bug in the RIGHT JOIN that the optimizer is choosing, can you show the output of the following query:

set optimizer_disable_right_join=1; -- not sure if this session variable is available on your memsql version
SELECT * from EtlStep 
LEFT JOIN EtlStepExecution on EtlStep.Id = EtlStepExecution.StepId 
where EtlStep.PackageId = 2000000 AND (EtlStepExecution.PackageExecutionId = 3000014 OR       
EtlStepExecution.StepId is null);

The result is the same as before.

I am seeing a similar issue when filtering for null values - I don’t want to think this is a major processing bug, but it looks like it may be. I noticed it when writing a case statement but have now noticed that I cannot filter for null values, but if I select and group the columns, the null values appear.

The null seems to be a common denominator between our issues.

@gmcclintock: please post a specific example of what query is not behaving as you expect. At this point I can only guess, but do note that null = null evaluates to null, not to true (that is standard SQL) - you would need to use x is null. Also, that does not appear related to this thread so feel free to open a new post for it.

I tried this on memsql 6.8 and 7.0 beta 1 and could not repro the problem. Could you tell me what version of memsql you were using? In MemSQL “Select @@memsql_version” will tell us what version you were testing.

I abbreviated your scenario and here is a complete example, it returns 4 rows.

    CREATE TABLE `EtlStepExecution` (
  `Id` bigint(11) NOT NULL AUTO_INCREMENT,
  `StepId` bigint(20) unsigned NOT NULL,
  `PackageExecutionId` bigint(20) unsigned NOT NULL,
  `StartTime` datetime DEFAULT NULL,
  `EndTime` datetime DEFAULT NULL,
  `Status` varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `StartReason` varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `FailedReason` varchar(500) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  PRIMARY KEY (`Id`)
) AUTO_INCREMENT=1;

CREATE TABLE `EtlStep` (
      `Id` bigint(11) NOT NULL AUTO_INCREMENT,
      `PackageId` bigint(20) unsigned NOT NULL,
      `StepType` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
      `Name` varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
      `Description` text CHARACTER SET utf8 COLLATE utf8_general_ci,
      `Payload` text CHARACTER SET utf8 COLLATE utf8_general_ci,
      `NextStepId` bigint(20) unsigned DEFAULT NULL,
      `Finalize` varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
      PRIMARY KEY (`Id`)
    ) AUTO_INCREMENT=1;


insert into EtlStepExecution
-- id, stepid, package exec id
values (300034, 2002, 30014,null,null,null,null, null ),
       (300035, 2006, 30014,null,null,null,null, null ),
       (300037, 3063, 30014,null,null,null,null, null );

insert into EtlStep
-- id, packid, 
values (3063, 2000,null,null,null,null,null,null ),
       (2002, 2000,null,null,null,null,null,null ),
       (2006, 2000,null,null,null,null,null,null ),
       (2018, 2000,null,null,null,null,null,null );

SELECT * from EtlStep
LEFT JOIN EtlStepExecution on EtlStep.Id = EtlStepExecution.StepId
where EtlStep.PackageId = 2000 AND
(EtlStepExecution.PackageExecutionId = 30014 OR 
EtlStepExecution.StepId is null);

I solved the problem. There are several entries in the table ‘EtlStepExecution’ where the id 2000018 exists.

If you execute this it will not work anymore.

insert into EtlStepExecution
-- id, stepid, package exec id
values (300038, 2002, 30015,null,null,null,null, null ),
       (300039, 2006, 30015,null,null,null,null, null ),
        (300040, 2018, 30015,null,null,null,null, null ),
       (300041, 3063, 30015,null,null,null,null, null );

The sql that works for me:

SELECT * from EtlStep
LEFT JOIN (SELECT * FROM EtlStepExecution where EtlStepExecution.PackageExecutionId = 3000014 OR
EtlStepExecution.StepId is null) EtlStepExecution 
on EtlStep.Id = EtlStepExecution.StepId 
where EtlStep.PackageId = 2000000 

Thank you again for your effort!

1 Like

Glad to hear you solved the problem!

They key difference between the two is whether there are rows in EtlStepExecution with StepId = 2000018 but which all have PackageExecutionId != 3000014 (they are in your second set of example data but not the first - I assume the first screenshot had a filter that didn’t show them?)

In the original query

SELECT * from EtlStep 
LEFT JOIN EtlStepExecution on EtlStep.Id = EtlStepExecution.StepId 
where EtlStep.PackageId = 2000000 AND (EtlStepExecution.PackageExecutionId = 3000014 OR EtlStepExecution.StepId is null);

the LEFT JOIN will match those rows, but then the WHERE filter will reject them because EtlStepExecution.PackageExecutionId is not 3000014.

In the modified query

SELECT * from EtlStep
LEFT JOIN (SELECT * FROM EtlStepExecution where EtlStepExecution.PackageExecutionId = 3000014 OR
EtlStepExecution.StepId is null) EtlStepExecution 
on EtlStep.Id = EtlStepExecution.StepId 
where EtlStep.PackageId = 2000000 

the LEFT JOIN does not match those rows, so it outputs a NULL row for the 2000018 StepId.

As far as I can tell, the MemSQL behavior was correct for the data and query you described.

Yes you’re right! :slight_smile: