I’m working with a large sample of trading data from multiple years, with each row as a single trade. Columns included in this table are the trade_datetime (MM-DD-YYYY HH:mm:ss.SSS), trade_date (MM-DD-YYYY), and trade_hour (HH).
My below query works as intended with a smaller set of data; this retains the latest trade data over a 1hr interval, then deletes everything else.
However with anything larger, 8.55M rows of data, I receive the following error in MemSQL:
ERROR 1956 ER_PARAMETER_CAPACITY_EXCEEDED: Query cannot be completed because the parameter array capacity of 1048576 was exceeded
I have plenty of resources available on this machine, along with 128GB or RAM, yet was wondering if there was a setting within MemSQL placing constraints on this parameter?
DELETE FROM tbl_trades WHERE row IN (SELECT row FROM tbl_trades A WHERE trade_datetime < (SELECT MAX(trade_datetime) FROM tbl_trades WHERE ticker = A.ticker AND trade_date = A.trade_date AND trade_hour = A.trade_hour) GROUP BY ticker, trade_price, trade_volume, trade_datetime, trade_date, trade_hour);
Here is how the table was created:
CREATE TABLE IF NOT EXISTS tbl_trades ( row INT AUTO_INCREMENT, ticker TEXT NOT NULL, trade_price DECIMAL(12,2) NOT NULL, trade_volume DECIMAL(12,2) NOT NULL, trade_datetime DATETIME(6) NOT NULL, trade_date DATE NOT NULL, trade_hour INT NOT NULL, upload_time TIMESTAMP(6) DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, KEY(`row`) USING CLUSTERED COLUMNSTORE, SHARD KEY (`ticker`) );