Parallel stored procedures / TVF

zmeidav · September 13, 2019, 10:08pm

Hey,

Assuming I have a querying returning a bunch of IDs back.
How would I go with fetching each such ID from the same table or another table in parallel?

Sometimes fetching by ID in parallel from client side, is faster and less memory consuming than the extra join / IN predicate.
If we could define some sort of a TVF that works on a scalar and returns a ROW it would run in parallel?
CREATE FUNCTION my_func_to_fetch_by_id (a varchar)
returns table/row/row(s) as return select * from another_table where some_id=a;
i.e.: select my_func_to_fetch_by_id(my_id) from ( select my_id from big_table order by something desc limit 100);

the other alternative is to query in parallel after fetching the result set, which incurs an overhead.

hanson · September 17, 2019, 9:40pm

I think you mean “in a single query” more than “in parallel.”

Consider:

select *
from t
where t.id in (select id from …);

or

select t.*
from t, (select id …) as s
where t.id = s.id;

zmeidav · September 18, 2019, 6:46am

In some occasions it is faster to return results by given IDs by directly querying the table using IN or joins.
This is true especially for columnar tables ( with hash index ) - the IN query does not use the hash index, nor does the join.

As you can see in this post: Columnstore ordering by column performance - Help - SingleStore Forums

The best option to sort, is to do sorting on the column, and only select a single column back which can be identified.
Then in parallel run multiple selects by the returned result set.

I was hoping to accomplish that by using MemSQL Functions that will do the select * from table where id=id for me.
utilizing the parallel processing you have, instead of managing threads in the application.
Unfortunately the options you provided do not solve this use case and the VTF/ UDF cannot do what I require.

hanson · September 18, 2019, 4:55pm

I see. Yes, we know about this issue and we will be working to improve this to allow faster IN filters and nested loop joins using the hash indexes on columnstores. I’ll pass this feedback on to our development team.

hanson · September 18, 2019, 7:07pm

I’m told that IN filters are already supported for seeks on columnstore hash indexes. What version are you using?

zmeidav · September 19, 2019, 4:29am

I use the beta from early August/end of June.
How will you handle cases of hash index assembled from multiple columns? or is that not supported

zmeidav · September 19, 2019, 12:14pm

will this also be applicable for joins?

hanson · September 19, 2019, 7:37pm

The 7.0 beta 1 code has been refreshed since then. You can get the latest here:

https://www.memsql.com/7-beta-1

We don’t support multi-column hash indexes yet, but we do support index intersection, e.g. if you have

WHERE col1 = constant1 and col2 = constant2

and hash indexes on col1 and col 2, we can use them both. Still not as good as a multi-column index but may perform better than using only 1 index.

Query optimizer support to do a seek in the inner side of a nested loop join into a columnstore hash index has not yet been built. Probably that will follow fast after 7.0 is released. It won’t be in 7.0. But otherwise, columnstore hash indexes can be used effectively to do filtering before the resulting rows are fed into a join (hash, nested loop, or merge).