I wonder how accurate this is in areas where tourism is a large contribution to the local economy. You don't actually have to be much good at running a business if you've got an endless stream of new people for several months out of the year and don't need to rely on repeat business. You just rename the place and hire a new GM when the negative reviews start overwhelming you (this applies more generally than restaurants btw).
I would probably try new restaurants more frequently if I could be more sure I wasn't gonna pay $10 for a $5 burger and help buy some sleazy J1-slave-driver (owner is too nice of a word) a new Land Rover in the process.
How did he get the data? It's pretty hard to pull the reviews and the data from yelp. I tried to do that to do some querying, but their search isn't so great and they pull a lot of stunts to prevent you from scraping.
Oh, I see he's using the kaggle data. That's not guaranteed to be reliable.
As the author mentioned changes in rent are a huge factor. Did the date of closures coincide with a new lease which can range from 1 - 10 years. Seeing a distribution of the age of the restaurant when closed could show them.
The other huge factor is cost of labor. Maybe looking at the minimum wage could be another feature. The news usually has those articles about how restaurants are struggling and the incremental minimum wage increase will hurt their business. It'd be interesting to see how strong of a factor that is in restaurant closures.
Also factors that could be tough to get but important * Cost of the ingredients like meat, vegetables etc.. * General Economic conditions, are consumers going out to eat?
It sounds like they un-anonymized the data, which strikes me as slightly unethical. (I mean it's not medical data or anything, but I don't think that was the intended use of the anonymized data.)
Further, it seems like the results of this will be used to deny loans to restaurants that are not doing so great, thus ensuring that they fail because they can't get funding for renovations and improvements.
Very nice! I like how you used multiple data sources to enable a study that couldn't be done with just one.