How to delete duplicate records in a column-store table

We have several duplicate records in my colum-store table.

I know below query will give the unique records.

select * from (select *, row_number() over (partition by id order by columns_to_order_by) as rn from t) v where rn = 1

But please guide me how to delete the duplicate records in my table without creating tables. In oracle we used to delete using Pseudocolumn like rowid.

This is a way to do it.

memsql> create table t(a int, key(a) using clustered columnstore);
Query OK, 0 rows affected (0.23 sec)

memsql> insert t values(1),(1),(2);
Query OK, 3 rows affected (0.14 sec)
Records: 3  Duplicates: 0  Warnings: 0

memsql> select * from t;
+------+
| a    |
+------+
|    1 |
|    1 |
|    2 |
+------+
3 rows in set (0.08 sec)

memsql> delete from t where a = 1 limit 1;
Query OK, 1 row affected (0.07 sec)

memsql> select * from t;
+------+
| a    |
+------+
|    1 |
|    2 |
+------+
2 rows in set (0.00 sec)

We don’t have a row_id pseudo column.

Also, another way to do it would be to insert all the duplicate records into a separate table, then delete them all from the source table, then insert one of each duplicate back again. That way you could do the whole thing in 3 SQL statements rather than using one statement per row, as with the DELETE… LIMIT 1 method outlined above.

is there any Pseudocolumn approach?

If ‘id’ holds unique values, this is similar to one I have. Try something similar to the below on a temp table.

I’m not sure the ‘order by’ clause will matter since you have a ‘PARTITION BY’ statement, however, you’d want to include all columns excluding the ‘id’ column after ‘PARTITION BY’. This is assuming your ‘id’ column holds unique values, and you don’t have any other columns such as an incrementing DATETIME.

DELETE FROM t WHERE id IN (SELECT id FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY <column_b>, <column_c>, <column_d> order by columns_to_order_by) AS rn FROM t) WHERE rn > 1);


If ‘id’ does not hold unique values, then try your original format:

DELETE FROM t WHERE id IN (SELECT id FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY id order by columns_to_order_by) AS rn FROM t) WHERE rn > 1);

We are also introducing unique indexes for columnstore in the upcoming release. you can download a beta early next week (likely May 4th) and we will release 7.1 end of May.