TL;DR s5cmd provided us a 30x speedup compared to a custom Python threading loop when downloading S3 objects. It’s easy to use and fits right in your workflow today!
When working with code, or anything really, you always apply trade-offs. One example is simplicity versus runtime efficiency, often talked about in CPU-cycles versus brain-cycles, where the latter usually wins. But sometimes the trade-off is hard, s5cmd is such a case - it’s a single new dependencies with massive gains.
S5cmd is a very fast S3 and local filesystem execution tool. For those that care it’s written in Go, which is a fast language, by Google, that builds small simple binaries.
I can’t share specific numbers from work, but the speedup is approximately 30x compared to a (simple) custom threading pool in Python, that’s huge! Joshua Robinson found the same numbers in his blog when comparing with s3cmd / aws-cli.
In our case the single dependency addition was worth it because the efficiency and cost-reduction overweighs the cons, especially as the dependency itself is very lean.
I hope someone who’s in need of a faster S3 download/upload tool reads this and manages to speed their tooling up! 😊
Thanks for this time, Hampus Londögård