Multiprocessing TSV repair
A few days ago I wrote about optimizing a TSV repair script that took large TSV files with unquoted newline characters like this:
A few days ago I wrote about optimizing a TSV repair script that took large TSV files with unquoted newline characters like this:
This TSV file has a problem:
You ever hear someone say “We can just automate it,” and feel almost certain it’s a bad idea? I’m always fighting with this. And I can’t always explain why I...
I wrote my last post about consistent hashing, which has really stuck with me. The problem statement is so simple but the solution seems at a glance to be co...
I’ve been interested in decentralized data in distributed systems lately. There’s too much data for any one storage node to handle, so you have lots of stora...
A common approach in the industry for forming a performance oriented SLA is to describe it using average, median and expected variance. At Amazon we have ...
I saw this on HN this morning. Nearly 30 years ago Swatch created Swatch Internet time. The units are called .beats, and they’re a decimal timekeeping system...
What is it about big social media that makes me feel like I’m part of the conversation? Conversations are being had, and there are people here, so this must ...
I’ve been reading about distributed systems lately. I have a lot to catch up on. When I started making a reading list a couple months ago, I had heard about ...
In which I find out with some certainty what time it is.
A little while ago my laptop died once, and then twice, and in between each failure I had to use a spare laptop. I ended up setting up my computer fully for ...
I’ve tried lots of to-do lists, from ToDoist to fine-grained Jira tickets and all things in between. I still felt disorganized. But I found1 Jeff Huang’s My ...
What time is it?