S3 Pipeline with DigitalOcean Spaces


#1

Hi!

Is it possible to use DigitalOcean Spaces product to ingest data into MemSQL?
It should be mostly compatible with the S3 protocol.

Someone already did it?

Thanks,
Vinicius


#2

Hi there Vinicius. I can’t speak to Pipelines’ compatibility, but I’d like to ask you some questions born of pained astonishment[0].

  • If you’re currently using DOS in production, are you hitting DOS’s 200/second rate limit? If not, do you expect to?
    If you’re not, I’d like you to take very seriously the 200/second rate limit. Yes it’s GETs, not just POSTs. When you hit it, you get a 503 response from DOS, which you have to handle. Note that you never had to think about it for S3, because they just scale you. DOS does not.

  • Can Pipelines handle retry/fail (via 503 status)? If not, it’s a nonstarter[1]. The equivalent question for my team was, “Does our application handle retry/fail for objects?” We didn’t, because with S3 we never had to. We wrote it ourselves on short notice. We experienced degraded performance during this time. We talked numerous times to DOS support, who–I mean I dunno–we could never get to admit that it’s reasonable for a company to have more than 200/second non-CDN’d hits[2].

From your message I don’t know whether you are evaluating DOS, or currently using it in production. And your workload will be different from ours. So take my anecdote with an iceberg-sized grain of salt. But you should treat that 200/second limit as radioactive.

In sum: I (a pained, concerned citizen[3]) think that if you know DOS well, you know its knife-like jagged edges (you handle 503s, you rolled your own purge-API scheduler (that’s what we did)), then you’re fine. But if you haven’t been so cut, I implore you to take seriously their documentation, and reckon with how grinding degraded performance of your object store can be.

[0] Nothing against my team–DOS bills itself as a “drop-in replacement for S3". We just didn’t expect DOS to fall this short. Here’s a product manager for DO calling a drop-in replacement: https://www.digitalocean.com/community/questions/api-to-purge-file-cache-from-spaces-cdn-after-file-update
[1] This is a rhetorical flourish. I have never used Pipelines.
[2] We got zillions of 503s, then we put up the CDN. no improvement because our workload is spread across hundreds of thousands of objects, accessed variably (not good for CDN). So we iterated on retry/fail. All throughout, we were talking to DOS support.
[3] I work for neither MemSQL nor DOS nor Amazon. I’ve used S3 and DOS.


#3

Hi @curmudgeon!

Thanks for your detailed answer.

I was not aware the 200 reqs/sec limits on DOS. That is really a bummer. I was not looking for running this in production, but looking to save money when loading initial data in a cluster outside AWS.


#4

Thanks for taking the time to answer other MemSQL user’s questions! Much appreciate it.