Today a Salon magazine blog touched on the flaws of the many online review sites: Why a five-star restaurant serves one-star food – Machinist: Tech Blog, Tech News, Technology Articles – Salon.

The writer opens with an anecdote about Yelp — a review site that, in my experience, has long epitomized these flaws (see my lone Yelp review). The writer stepped into a weekend breakfast café that thirty-eight Yelp reviewers gave an average rating of 4.5 stars (out of 5 possible stars), and yet he perplexingly “gulped down limp slabs of two-star French toast, sipped at one-star coffee, and took in the ordinary two-star ambience.” What gives?

In the case of Yelp, I believe it’s the form of the “popularity contest” they’ve created, the nature of Yelp reviewers, and the dynamics Yelp has created to incent these reviewers. (Some readers here may recall that Yelp approached me in their very early days to write café reviews for them, and my reply was that I created back in 2003 directly out of my frustration with sites like Yelp.)

Sites like Yelp have a tendency for reviewers to weigh unusual biases — such as a perceived value for the money (“all-you-can-eat soggy French toast for $2?!…Five stars!”), extra credit for obscurity and the hipness quotient that bestows on the establishment as well as the reviewer (“Frank Chu‘s mom doesn’t make French toast for everybody, but we go way back and she deserves to be Yelped…Five stars!”), etc. It has since gotten to the point where I find CitySearch‘s restaurant user reviews more useful than Yelp’s.

While you might say that Yelp is CitySearch’s user reviews with the “Web2.0″ veneer of social networking, unfortunately that social element creates a competition between Yelp users and serves as a major underlying driver for the reviewing process. Too often the game isn’t about good, fair, and accurate reviews (externally focused) — it’s about ego and online social posturing (internally focused).

Response bias

But none of that is mentioned by the writer of the Salon article. Rather, he pursues the problem of what’s called response bias. It manifests itself in skewed ratings where, as often happens, most everybody is better than the average — a mathematical impossibility. For example, Yelp CEO Jeremy Stoppelman told the writer that “85 percent of local businesses on the site get a three-star or better average rating.”

But it’s not just star inflation either — user ratings on sites like show compression at the high and low ends of the scale, and relatively few ratings inbetween. Perhaps ambivalent, middle-of-the-road reviews just don’t inspire any of us to submit our thoughts to a Web site. We have to love it or hate it.

All of this comes back to my motivations for personally reviewing as many cups of espresso as I could find in this town. Sure, nobody can review everything with consistency. But just how valuable is my four-star rating for a Burmese restaurant if I’ve been to very few for comparison?

So in the interest of full disclosure of my own espresso reviews here, some self-examination was in order. Evaluating my 595 (and counting) espresso ratings in San Francisco alone (which includes establishments since closed), what follows here are the average ratings for a variety of different rating criteria — each made on a scale of 0-10:

Criteria Avg. Rating Notes
aroma 5.339
brightness 5.533 This was a touch higher than I expected
body 5.316
crema 5.237
flavor 5.427 This was also surprisingly a touch high
correction -0.105 Cupper’s correction was slightly negative
coffee rating 5.2654

And then there is the similar question of my café rating criteria:

Criteria Avg. Rating Notes
ambiance 5.939 Serious inflation!
barista 4.834 Here is where I held the toughest standards
presentation 5.602 I gotta get tougher on paper cups
savvy 4.946 Not too generous here either
cafe rating 5.3302 Saved by ambiance!
net rating 5.2978
price $1.699 That comes to $1,010.91 for 595 espressos

All things considered — outside of my clearly biased ambiance ratings — I’m pretty happy with the results of this spot check. If you consider that 5.0 should be about the center point, these averages aren’t out of whack. And I dare you to find many 10s or 0s on our reviews.