What is in memory and what is on disk?

danilsh · April 4, 2019, 9:58pm

Hi, I need some clarification about what is in memory and what is on disk.

I have a table that is created from a 16GB CSV file; the table is created using ‘USING CLUSTERED COLUMNSTORE.’ I do see the column files created on the disk, and I can read from the table using the ‘SELECT’.

However, when I check the execution plan, I can see that the majority of time is spent in scanning the column files on the disk. My expectation for the ‘In Memory Database’ was that the tables would be in memory; thus there would be no disk reading during “SELECT.” However, this is not the case.

Is this because my table is created ‘USING CLUSTERED COLUMNSTORE.’? How can I make sure that my table is entirely in memory? How can I make sure that the disk IO is minimal when I do the 'SELECT ’ only?

mbhojani · April 4, 2019, 10:36pm

Hi @danilsh,

It’s important to note that MemSQL is a memory-optimized database and not completely in-memory. We support OLAP/data warehouse use cases where it makes sense to store data on disk. There are two main types of tables in MemSQL - rowstore and columnstore (doc: SingleStoreDB Cloud · SingleStore Documentation)

When you create a table with ‘USING CLUSTERED COLUMNSTORE’, you’re making a columnstore table where the data will go to disk (caveat to that is that there is some memory overhead involved). If you want your data to sit completely in memory, simply remove the ‘USING CLUSTERED COLUMNSTORE’ line, as our default table type is rowstore.

danilsh · April 16, 2019, 4:43pm

thank you very much.