[binary lenght for cosine similarity] MEMSQL do not allow binary column lenght larger than 255 ? How to do similarity?


#1

I followed example at https://www.memsql.com/blog/image-recognition-at-the-speed-of-memory-bandwidth/ and my features extractor generates vector of 1920. However, memSQL 6 does not seem to support it ?

Version 6
CREATE TABLE features (id bigint(11) NOT NULL AUTO_INCREMENT, features_extracted binary(1920) DEFAULT N
ULL, user_id varchar(256) DEFAULT NULL, bbox_id varchar(256) DEFAULT NULL , KEY id (id) USING CLUSTERED COLUMNS
TORE);

Error: ERROR 1074 (42000): Column length too big for column ‘features_extracted’ (max = 255); use BLOB or TEXT instead

Please help.
Steve


#2

I found solution: use BLOB for field type and JSON_ARRAY_PACK as example here:

Steve


#3

I’m glad you found a solution. What kind of matching are you doing?


#4

Hi Hanson,

I use EUCLIDEAN, which my food dataset is NOT familiar with hence I have to do experiments all over again to find a sweet spot (18.0 now). Is there a COSINE DISTANCE somehow available ?

Thanks,
Steve


#6

For COSINE SIMILARITY (CS) use the DOT_PRODUCT() function, https://docs.memsql.com/sql-reference/v6.8/dot_product/, and normalize all the input vector lengths to 1. I believe you can compute the cosine distance from that as 1 - CS.


#7

Would it be so cool to have COSINE_DISTANCE function in MEMSQL …

Thanks.
Steve


#8

Why not just use DOT_PRODUCT()? Is the length normalization a problem or something?