May 28, 2026Research

The Project Three Star Team

Fine dining room with plated dishes and Michelin-style restaurant ambiance

For each year since 1936, Michelin has updated its guidebooks to award or deduct up to 3 stars for each restaurant it inspects. The inspection process is shrouded in secrecy to avoid conflicts of interest and stars are scarcely awarded. The stars have since become one of the most recognized and respected awards a restaurant can achieve, with thousands of chefs from all corners of the globe devoting their lives to the pursuit of just a single star.

In fact, Michelin-starred cooks take the stars so seriously that renowned chefs, such as Gordon Ramsay, have wept over the loss of one or more stars, with some even taking their own lives.

“I started crying when I lost my stars. It's a very emotional thing for any chef. It's like losing a girlfriend. You want her back. I think every top chef in the world, from Alain Ducasse to Guy Savoy, when you lose a star it's like losing the Champions League. There's next year. So it's not done forever that you can't win those things back. I got asked the question literally two weeks ago on holiday. What would you do if you ever lost your third star? Honestly? I would win it back. It's nice to stay focused.”
— Gordon Ramsay, Celebrity Chef

And it’s no wonder. A single star can have customers flocking to that designated restaurant, bringing about both immense amounts of fame and profit. The late and great Joël Robuchon, who himself had an absolutely astounding record of 32 Michelin stars, once said that “with one Michelin star, you get about 20 percent more business. Two stars, you do about 40 percent more business, and with three stars, you’ll do about 100 percent more business.”

Customers and investors alike are therefore very keen to predict which restaurants eventually earn a Michelin star before the competition gets fierce.

In the latest training set, the imbalance is stark: 58,633 Michelin-covered examples are treated as No Star, compared with 2,709 one-star restaurants, 473 two-star restaurants, and just 139 three-star restaurants.

That makes correctly identifying a restaurant with a star so difficult that a standard ML model would probably be worse off than a basic line of code that constantly outputs a negative Michelin classification. The model has to find rare signal without turning every expensive, highly rated restaurant into a fake three-star prediction.

All validation results in this section are calculated only on restaurants in places covered by the Michelin Guide. Restaurants in non-covered locations are excluded from the benchmark.

Validation rows61,954audited restaurant rows used after Michelin-coverage filtering
No-star examples58,633canonical No Star examples in Michelin-covered places
Starred examples3,321known 1-, 2-, and 3-star examples in the training frame
Scored universe72,798frontier-LLM-audited restaurant rows scored for the live table
No Star50,978
live audited restaurant universe predicted without stars
1 Star21,484
live audited restaurant universe predicted at the one-star tier
2 Stars180
live audited restaurant universe predicted at the two-star tier
3 Stars156
live audited restaurant universe predicted at the three-star tier
ModelCity Macro F1Pooled Macro F1Log loss3-star F1
Project Three Star Model 1.519.53%44.70%0.28321.16%
XGBoost21.91%35.14%0.32624.68%
LightGBM21.53%33.89%0.26512.43%
Weighted probability ensemble21.65%36.62%0.27322.60%
Random Forest21.76%31.95%0.35125.56%
ExtraTrees21.71%32.86%0.33822.44%
Balanced Random Forest21.65%29.34%0.73623.46%
Ordinal XGBoost21.80%40.33%0.28924.77%
Ordinal LightGBM21.66%37.46%0.15724.11%
Project Three Star Model 1.020.41%39.54%0.1104.05%
Stacked ensemble16.70%48.22%0.10125.41%
Class-prior / always-No-Star baselines1.13%24.31%0.2400.00%

Validation uses city-grouped out-of-fold splits so the model is tested on held-out geographies instead of memorizing a city’s restaurant mix. The currently published Project Three Star Model uses tuned class-offset decisions for public exact-tier labels. On fresh grouped validation, that public rule reaches 19.53% city Macro F1, 44.70% pooled Macro F1, and 89.79% exact-tier accuracy across the four star-tier classes. Accuracy is reported for context only; it is not the selection metric because an always-No-Star model already looks deceptively strong on accuracy.

City Macro F119.53%
fresh 5-fold city-group validation, all labels counted per city
Pooled Macro F144.70%
pooled four-class macro F1 across all validation restaurants
Starred recall92.08%
known starred restaurants recovered by the Project Three Star Model
Precision@10099%
share of the top 100 starred-probability rows that were starred
Predicted No StarPredicted 1 StarPredicted 2 StarsPredicted 3 Stars
Actual No Star42,506correct no-star calls16,101called 1 Star9called 2 Stars17called 3 Stars
Actual 1 Star48missed as No Star2,595correct 1-star calls37called 2 Stars29called 3 Stars
Actual 2 Stars0missed as No Star417called 1 Star32correct 2-star calls24called 3 Stars
Actual 3 Stars0missed as No Star96called 1 Star14called 2 Stars29correct 3-star calls

The matrix also shows the central limitation: the model recovers known starred restaurants aggressively, but distinguishing 1-, 2-, and 3-star tiers is still hard because there are so few 2- and 3-star examples. We publish those rows as ranked signals, not next-guide guarantees.

Before choosing the public star-tier setup, we compared raw class probabilities, sigmoid-calibrated variants, class-offset tuning, and a weighted Project Three Star Model + LightGBM + XGBoost probability ensemble. The live public tier now follows the tuned weighted class-offset rule rather than raw highest-probability argmax, using probability quality for ranking while thresholding rare classes before publication.

Out-of-fold calibration reliability line charts for in-guide and starred predictions
The dotted diagonal represents perfect calibration. Probability quality matters for ranking, but the production decision was made on fresh city-grouped exact-tier validation.
Why it matters

We used probability diagnostics to check whether higher scores actually corresponded to higher hit rates before choosing a public exact-tier model.

What the graph shows

LightGBM and the weighted ensemble were useful probability references, but the final public setup was chosen only after frozen weights and offsets were retested on fresh city-grouped splits.

How we use it

The public exact-tier label uses the tuned class-offset decision rule from the weighted model rather than raw probability argmax, so rare star tiers are thresholded before publication after the frontier LLM restaurant-only audit.

These are the highest-confidence current non-starred restaurants from the live frontier-LLM-audited watchlist. They are not presented as next-cycle guarantees; they are the restaurants the Project Three Star Model most strongly believes deserve a closer look.

RankRestaurantCityCurrent statusStar score
1Arany KaviárBudapestSelected Restaurants99.95%
2Cafe MonarchPhoenixNot in Michelin Guide99.95%
3IshizukaMelbourneNot in Michelin Guide99.95%
4Marie BPragueSelected Restaurants99.95%
5Anomaly SFSan FranciscoSelected Restaurants99.94%
6OnyxBudapestSelected Restaurants99.94%
7The Water LibraryBangkokNot in Michelin Guide99.94%
8Upstairs (at Trinity)LondonBib Gourmand99.94%
9California GrillOrlandoNot in Michelin Guide99.93%
10PersonaStockholmSelected Restaurants99.93%

The full prediction table contains the broader ranked universe across Michelin-covered and non-covered places, including guide candidates, current Michelin restaurants, and lower-confidence rows.

View full prediction table