We have a MemSQL cluster that we will be using a data lake. To avoid a data swamp, we would like to monitor object usage so that we can evaluate what is still needed and what can be removed as the data lake grows. What is the best way to achieve this requirement?
It appears that comprehensive audit logging is licensed separately and can impact performance, which might be overkill anyway. Trace logging can log queries and has a “partial mode” to limit overhead (which may miss some queries, which is acceptable for this requirement). My thought is to enable trace logging in partial mode and aggregate the results from the log files into a database table, possibly using a pipeline. Would partial logging still impact performance? Do the trace logs include the user name such that we can see who is using the objects? Are there any other solutions now or in the future with the product roadmap?