A must-read followup that explores other data structures for geofencing:
2 years after the initial blog post, and we get this submitted twice in two days?
See item?id=16084090 for the brief discussion yesterday.
"Complicated S2?" That's surprising, the S2 library (https://github.com/google/s2geometry) seems really clean and fairly intuitive to me.
The last time this was posted a few commenters mentioned using PostGIS instead of Uber rolling this themselves.
If anyone's used PostGIS with a similar level of traffic/latency requirements could you comment on your experience?
4500 queries per second per node doesn't sound like much.
"100s" of linear polygon scans could likely turn into a few, plus a quad tree or spatial grid hash lookup. I don't quite understand why those approaches are considered "complex?" Every simplistic game engine does this as soon as it implements collision.
Not saying what they built doesn't work, just curious about options.
I imagine this is the service they used to implement greyballing. https://www.nytimes.com/2017/03/03/technology/uber-greyball-...
I know it's fun to hate on Uber here, and there are plenty of reasons to do so. But this is surprising to me. I was at one of the Uber tech meetups in NYC last year or maybe a year and a half ago, and it seemed like a total circus. The guy who was championing golang was up there talking about how cool go is and that they were rewriting all their microservices in it just because go is cool.
He also talked about how many problems it had caused them and all that fun stuff. I left that meetup thinking two things: 1. Who the fuck breaks stuff just to use a cool new language? 2. Good lord, I really don't ever want to work on this team. It's chaos.
Instead of indexing the geofences using R-tree or the complicated S2, we chose a simpler route based on the observation that Uber’s business model is city-centric; the business rules and the geofences used to define them are typically associated with a city. This allows us to organize the geofences into a two-level hierarchy where the first level is the city geofences (geofences defining city boundaries), and the second level is the geofences within each city.
What about Uber's business model impacted this choice of algorithm?
Edit: left only the unanswered question
From early 2016.
Excellent article! I love to see go more and more in big enterprise companies.
Have you guys looked at moving to rust for any services?
Why do you need 50ms for 99th percentile for such a simple and stateless service?
Hey, really cool tech, good work. How's building a profitable business going?
Could we ban Uber from HN? There really needs to be consequences.
Sounds legit. I heard Uber does lots of queryable stuff per second.