Ten Kilograms of Chocolate

“A little weird initially”

dark chocolate

Chocolate: One of the few tastes we can all agree on. Everyone likes chocolate! It is one of the most popular food types and flavors in the world. The worldwide consumption is estimated to be at least 7.2 million metric tons (source). Europeans, in particular, are huge fans of chocolate. In fact, the countries with the highest per-capita chocolate consumption are all in Europe: Switzerland, leading the chart, consumes more than 10 kilos per year per citizen. Germany (9.2 kilos) and Lithuania (9 kilos) take second and third place (source).

We Europeans love to eat chocolate, and with demand comes supply. There are lots and lots of different chocolates to choose from. This makes it difficult to figure out which is the best chocolate. Obviously, no single chocolate is going to be the favorite of everyone, but most people don’t even know which chocolate they prefer themselves.

At the end of 2017, we started a small experiment: Run a series of blind tastings to find the best chocolate among many different brands. At the end of the event, each participant would know their favorite chocolate among the brands we sampled.

We first describe the setup and execution, followed by the results.

Setup

Humans aren’t very good at tasting multiple items at the same time. A long time ago, we tried to do a blind tasting of different diet colas. After the third glass, most tasters couldn’t remember the flavor of the first sample anymore, and the comparisons became heavily skewed.

This time we decided to reduce tastings to two samples at a time. Specifically, we would run an elimination tournament where each chocolate would compete for the “best chocolate” award.

pieces of chocolate
pieces of chocolatepieces of chocolate

We had our coworkers as participants and the chocolates as competitors. In each round, the next two chocolates in each participant’s tournament would battle against each other. The participant would be the judge and select the winner, and the loser would be eliminated from the tournament (for that particular participant).

As an option, we also allowed participants to run a double-elimination tournament, where a chocolate would need to lose twice before being eliminated from the competition. This had two advantages:

  1. One mistake in the tasting wouldn’t eliminate a really good chocolate. It would still have a chance to survive in the loser’s bracket (and even potentially win the whole tournament after winning the loser’s bracket).
  2. More chocolate!
    Participants choosing the double-elimination variant received nearly twice as many samples as the ones running the single-elimination tournament.

The Competitors

When doing blind tastings, one wants to compare samples that resemble each other. Comparing a milk chocolate with a dark chocolate doesn’t provide lots of value since the outcome simply tells us whether the tester prefers milk or dark chocolate. All other properties, like texture or after-taste, won’t have any significant effect in such a comparison.

In our tasting we limited the chocolates to 70%-75% “pure” dark chocolates. With “pure” we mean chocolates without advertised flavors like “sea salt” or “mint”. Almost every chocolate adds some flavors, but as long as it wasn’t clearly written on the packaging, we allowed them to compete.

In the end we found 32 chocolates that met our criteria, and who would compete in the tournaments. The results section lists 35 chocolates for the following reasons:

By mistake we initially included a chocolate that had a crunchy additive (Toms Extra med Knas). After several tastings we replaced that one with a different brand and replayed all battles that had this chocolate as competitor. The already submitted battle results were kept as bonus rounds (outside the tournament).

Similarly, after several rounds, it became clear that one of the chocolates (Levevis) just wasn’t up to the task. Out of the 27 battles, it had only won four of them. We decided to put it out of its misery and replace it with another brand. Again, we replayed the battles with the new brand and kept the already submitted results as bonus rounds.

Finally, we received an additional chocolate (Choceur) after the tournament was already in full swing. We distributed the chocolate as bonus rounds. The chocolate wasn’t integrated in any tournament, but we collected the results and comments.

cocoa beans
cocoa beanscocoa beans

Participants

All in all, more than 35 people participated in the event. Of those 29 provided significant feedback:

  • 14 finished a double-elimination tournament.
  • 10 finished a single elimination tournament.
  • The remaining five participants finished at least 10 rounds.

Execution

With 32 competitors (chocolate brands), a single-elimination tournament requires 31 battles to find the winner. For the participants with double-elimination, this number increases to 62. The whole tournament thus required more than 1,200 comparisons. In fact, with guests and bonus rounds, we received more than 1,400 battle results. This amount of tastings requires good organization.

drawer full of chocolate

We wrote a Google Apps Script (sources here) that would help us run the tournament. Together with a Google Form, we were able to streamline the process: After every battle, participants would enter their judgment (“A” or “B” and hopefully some comments) in the Google Form, which automatically filled the individual tournaments in a Google Spreadsheet.

Whenever we administrators were ready to distribute new samples, we launched the script, which gave us a list of samples we needed to prepare. The process was optimized to a point where the papers were printed in a specific order, so that distributing the samples would yield an optimal path through all the participants. Specifically, we avoided visiting the same rooms or floors twice.

For single-elimination tournaments we generally prepared samples once a week. For double-elimination, the tastings were distributed up to three times a week. Normally, double-elimination would only increase the frequency by a factor of two, but we initially distributed the samples to all participants at the same time, and, later, had to catch up to finish at the same time.

Participants received a paper with two pieces of chocolate on them:

piece of paper with 2 chocolates on it

Printed on the paper was the name of the participant, and the current round number. In addition, the paper contained a short-link to the Google form.

screenshot of the Google form
screenshot of the Google formscreenshot of the Google form

The actual form at the office was slightly different, as we were all Google Apps users, and the form was automatically collecting the email address of the person who filled out the form.

For the participants, the tastings were blind. Since we didn’t reveal which chocolates competed, it was also harder to make guesses. Even chocolates with characteristic shapes were often not recognized. For example, one baking chocolate only came in a big chunk and had to be broken down into smaller pieces. If the participants had known that this chocolate was part of the tournament, they would have had an easy time recognizing it. Without that knowledge, most people didn’t even detect that it was a baking chocolate.

Once all participants had finished their tournaments, we revealed all the chocolates and distributed the annotated spreadsheets (again generated by the Google Apps script). For every battle, the spreadsheet contained a note with the comments that the participant had written:

It took more than six months to finish the event, at which point we had received 1402 battle results. Sometimes participants forgot to enter their results, and we had to replay the battle. The actual number of battles was thus slightly higher. Given that each battle had at least 10g of chocolate (usually more in the range of 15–20g) the tournament thus consumed at least 10 kilos of chocolate.

Results

In the rest of the article we are going to present the results. The names of the participants have been changed to protect their identities.

All the data is available here (the whole spreadsheet) and here (only comments).

The Chocolates

For this event we tasted 35 chocolates. In alphabetical order:

Global Ranking

Each participant had their own tournament, with a single winner at the end. As such, the event wasn’t designed to find the best overall chocolate. However, given that we ran more than 1400 battles (blind tastings of two chocolates), we can use the data to extrapolate a global ranking.

The easiest form of ranking simply looks at the win percentages of each chocolate. With this approach we obtain the following table:

+------+-----------------+------+--------+----------------+
| Rank | Chocolate | Wins | Losses | Win-Percentage |
+------+-----------------+------+--------+----------------+
| 1 | Bjoernsted | 72 | 38 | 65% |
| 2 | Anthon Berg | 64 | 39 | 62% |
| 3 | Maestrani | 62 | 38 | 62% |
| 4 | Lindt | 62 | 42 | 60% |
| 5 | Coop dark | 61 | 44 | 58% |
| 6 | Frellsen | 53 | 39 | 58% |
| 7 | Spar | 52 | 40 | 57% |
| 8 | Merci | 49 | 39 | 56% |
| 9 | Moser Roth | 51 | 41 | 55% |
| 10 | Nespresso | 49 | 40 | 55% |
| 11 | Valrhona | 46 | 39 | 54% |
| 12 | Choceur | 7 | 6 | 54% |
| 13 | V (vores) | 45 | 39 | 54% |
| 14 | Lindt mild | 46 | 40 | 53% |
| 15 | Bouchard | 47 | 43 | 52% |
| 16 | Carletti | 44 | 41 | 52% |
| 17 | Manner | 42 | 41 | 51% |
| 18 | Odense | 42 | 41 | 51% |
| 19 | Toms Extra | 46 | 45 | 51% |
| 20 | Premieur | 43 | 43 | 50% |
| 21 | Carino | 39 | 44 | 47% |
| 22 | DmBio | 38 | 45 | 46% |
| 23 | Fair | 36 | 44 | 45% |
| 24 | Zotter | 36 | 44 | 45% |
| 25 | Cote d'Or | 32 | 44 | 42% |
| 26 | Sarotti | 31 | 44 | 41% |
| 27 | änglamark | 32 | 49 | 40% |
| 28 | Ritter | 28 | 43 | 39% |
| 29 | Coop 365 | 27 | 43 | 39% |
| 30 | Suchard | 28 | 45 | 38% |
| 31 | Marabou Premium | 26 | 43 | 38% |
| 32 | Toms Extra knas | 9 | 16 | 36% |
| 33 | Alnatura | 25 | 45 | 36% |
| 34 | Oego | 20 | 44 | 31% |
| 35 | Levevis | 4 | 23 | 15% |
+------+-----------------+------+--------+----------------+

Unfortunately, this ranking is unfair to chocolates that got mostly paired against strong chocolates. A better approach takes into account transitivity. If chocolate A won against chocolate B, and B won against C, then we should obtain a ranking A > B > C, independently of how often A, B and C won against worse chocolates.

Constructing a global ranking that uses transitivity is a well known problem. The most famous solution, is the ELO rating system which is widely used in sports, such as chess, american football, tennis, or football. However, for our purposes, the ELO system has one major flaw: it takes into account time.
ELO assumes that competitors change, and the system adapts the rating after each match. That is, a win against a competitor A from some time ago might not have the same impact as a win against the same competitor now. This makes sense in sports, where competitors clearly perform differently over time. It does not apply to our chocolates which (should) have the same taste whenever they were eaten.

For our chocolate tournament we thus use the Bradley-Terry model. This model builds a probability model, taking into account all battles, independently on when they were executed. The result is a model, where each chocolate receives a score, where higher means that it is more likely to beat another chocolate.

The Bradley-Terry scores for our chocolates are:

+------+-----------------+--------+--+--------+
| Rank | Chocolate | Score | | %-Rank |
+------+-----------------+--------+--+--------+
| 1 | Bjoernsted | 58.136 | | 1 |
| 2 | Anthon Berg | 48.875 | | 2 |
| 3 | Maestrani | 48.492 | | 3 |
| 4 | Lindt | 40.445 | | 4 |
| 5 | Coop dark | 40.120 | | 5 |
| 6 | Spar | 39.427 | | 7 |
| 7 | Nespresso | 38.407 | | 10 |
| 8 | Frellsen | 36.812 | | 6 |
| 9 | Merci | 36.439 | | 8 |
| 10 | Lindt mild | 34.188 | | 14 |
| 11 | Valrhona | 34.147 | | 11 |
| 12 | Moser Roth | 34.023 | | 9 |
| 13 | V (vores) | 31.264 | | 13 |
| 14 | Odense | 30.450 | | 18 |
| 15 | Premieur | 30.235 | | 20 |
| 16 | Toms Extra | 29.882 | | 19 |
| 17 | Manner | 29.865 | | 17 |
| 18 | Bouchard | 29.865 | | 15 |
| 19 | Choceur | 29.784 | | 12 |
| 20 | Carletti | 29.512 | | 16 |
| 21 | Carino | 26.084 | | 21 |
| 22 | Zotter | 24.242 | | 24 |
| 23 | DmBio | 23.462 | | 22 |
| 24 | Sarotti | 20.442 | | 26 |
| 25 | Fair | 20.406 | | 23 |
| 26 | Cote d'Or | 19.763 | | 25 |
| 27 | Ritter | 18.524 | | 28 |
| 28 | Coop 365 | 17.896 | | 29 |
| 29 | Suchard | 17.816 | | 30 |
| 30 | änglamark | 17.647 | | 27 |
| 31 | Marabou Premium | 16.228 | | 31 |
| 32 | Toms Extra knas | 15.871 | | 32 |
| 33 | Alnatura | 14.701 | | 33 |
| 34 | Oego | 12.179 | | 34 |
| 35 | Levevis | 4.368 | | 35 |
+------+-----------------+--------+--+--------+

As can be seen, those two rankings mostly agree. The biggest difference is for Choceur which moved from rank 12 to rank 19. The algorithm assigned it a lower score than what would be expected by its win percentage. This chocolate had only 13 battles, as it was distributed as a bonus chocolate, and it’s therefore hard to assign it a stable rank yet. The scores in the middle of the pack are very similar, and even minor variations can lead to big swings.

Review

In this section we will discuss the top chocolates. During the event, participants were asked to write their impressions about the chocolates, which we will try to summarize. These summaries won’t be perfect: participants don’t necessarily agree, but even from the same participant we sometimes got opposing impressions on different rounds.

Interested readers can draw their own conclusions by reading all comments here. That document also contains the comments for chocolates we don’t discuss here. Note that the spreadsheet contains even more information, as it features the comments for the battles as well.

Björnsted

Score: 58.136
Price: 2.23€
URL: http://www.bjoernsted.de/en/work/edel-bitter/

Photo of Bjoernsted chocolate
Photo of Bjoernsted chocolatePhoto of Bjoernsted chocolate

Björnsted ended up having a really great event. It had the highest win percentage, as well as the highest Bradley-Terry score.

Has a great texture and a nice, strong taste of dark chocolate

Of the 24 finished tournaments, it won three, was in the final in two more, and was one of the last four chocolates for five participants.

Word cloud of comments on bjoernsted
Word cloud of comments on bjoernstedWord cloud of comments on bjoernsted

Knack. A buttery texture emerges rather soon, with a clear, dry taste of cocoa that lasts for a long time, and some bitterness along the way.

The comments are generally positive (as was to be expected). This chocolate is neutral, with a typical dark-chocolate taste. The chocolate melts nicely.

Anthon Berg

Score: 48.875
Price: 2.01€
URL: https://tomsgroup.com/globalassets/brandsites/export-catalogue-2018-final.pdf, page 16/28 (72% Cacao)

Photo of Anthon Berg chocolate
Photo of Anthon Berg chocolatePhoto of Anthon Berg chocolate

While Anthon Berg isn’t close to winning, it still performed great. It won two tournaments, and was among the top four chocolates for five participants.

Interesting flavor, hard to put the right words to this.

It was pitted against Björnsted five times, and won two of these matches.

Word cloud of comments on Anthon Berg
Word cloud of comments on Anthon BergWord cloud of comments on Anthon Berg

Initial shock, but once gotten used to its pretty ok.

The chocolate is slightly sweet, without overdoing it. It has rich flavors. The participants thought they had detected: raisins, nuts, fruity notes, milk, cherries, between cheesy and smoky notes, and herbal flavors.

According to the back of the packaging, the chocolate may contain traces of nuts and milk, but they are not listed as ingredients. The only listed flavor ingredient is vanilla…

[…] some note at the end of the chew that catches you like the Balrog caught Gandalf from the deep abyss.

Maestrani

Score: 48.492
Price: 1.83€
URL: https://www.maestrani-schokolade.ch/en/product-range (72% Cocoa)

Photo of Maestrani chocolate
Photo of Maestrani chocolatePhoto of Maestrani chocolate

Maestrani barely missed the second spot.

Similar to Anthon Berg, it won two tournaments. It was furthermore in one other final, and reached the top four for two other participants.

Word cloud of comments on Maestrani
Word cloud of comments on MaestraniWord cloud of comments on Maestrani

Smoooth

This is the chocolate with the most comments. As it happens, participants that wrote more comments liked this chocolate and thus kept it running longer in their respective tournaments.

Rather soft. Dry, honest, a bit buttery and sweet after a while, but with a taste that stays clear and well-defined.

The chocolate has a hint of bitterness, but also develops a slightly sweeter buttery aftertaste. Some participants noted a nutty flavor.

Lindt

Score: 40.445
Price: 2.94€
URL: https://www.chocolate.lindt.com/shop/excellence/excellence-cocoa-70

Together with Nespresso (rank 7) Lindt is the only well known international brand that made it into the top 10. It managed to win two tournaments, was in the final an additional two times, and was twice in the top four.

Nice and fruity but does not really melt in the mouth as nicely.

Word cloud of comments on Lindt
Word cloud of comments on LindtWord cloud of comments on Lindt

Some dry cocoa flavor, some bitterness, later sweetness. Still a lot of cocoa in the late aftertaste.

A bit fruity, with a distinct way of melting in the mouth (mentioned by multiple participants). It also has a distinct aftertaste (sometimes noted positively, sometimes negatively).

Coop Dark

Score: 40.120
Price: 2.54€
URL: https://butik.mad.coop.dk//kiosk/chokolade/moerk-chokolade/coop-dark-72-kakao-p-7340011427230

Coop is a cooperative based in Denmark. It operates a good portion of Danish supermarkets. “Coop Dark” is a private label that is available in their chains.

Nice and rounded even though a bit neutral without much aftertaste.

This chocolate was doing particularly well in the beginning of the event, when its win ration was beating all other chocolates. Over time, it had to compete with stronger chocolates and it eventually dropped to fifth spot (barely after rank four).

Word cloud of comments on Coop Dark
Word cloud of comments on Coop DarkWord cloud of comments on Coop Dark

This chocolate stands out by having overwhelmingly positive comments. Even when losing the impressions were generally positive. However, some participants didn’t like the dry texture.

Has a good smooth texture and a great strong taste of dark chocolate.

A smooth, all-round good chocolate with a typical taste of dark chocolate.

Miscellaneous

This section discusses various interesting facts and results we obtained from this event.

The most expensive chocolate was, by far, Valrhona. 100 grams cost 8.28€ which is about eight times more expensive than the cheapest chocolate, and was significantly more expensive than the next chocolates (Zotter at 4.86€ and Nespresso at 4.37€).

The chocolate only made the eleventh rank, barely missing the top ten. In a funny twist, this chocolate had two participants complain about the “cheap” taste. Participants clearly don’t agree on the chocolate. Opinions go from “Not great, Wouldn’t buy” to “Perfect. just perfect”.
The participant with the most colorful (most professional sounding) comments, Tom, ended up liking this chocolate the most. In fact, Tom had Nespresso (third most-expensive chocolate) and Valrhona in their final, and Zotter was only eliminated very late, too.

The cheapest chocolate was Choceur at 0.95€. It was only distributed as a bonus chocolate (outside the individual tournaments). Interestingly, the chocolate doesn’t exist anymore. At the end of 2016, VKI (Verein für Konsumenteninformation) tested different dark chocolates for pollutants and complained that Choceur (as well as six other chocolates) contained (cancerogenous) mineral oils. As a consequence Aldi/Hofer removed the chocolate from their shops. https://help.orf.at/stories/2805418/

In the absence of Choceur, the cheapest chocolate was DmBio at 1.15€. The chocolate did ok, and was ranked in the middle of the pack, at rank 23. The comments rarely mention “cheap”, and the biggest complaint seems to be that the chocolate is “not exiting” and “uninteresting”. That said, one participant complained that “… the taste […] was kind of fishy — oysters came to mind”.

Another chocolate that does’t exist anymore is Carino. During the event we evaluated Cariño Zartbitter 70%. After contacting EZA (the manufacturer), we were told that this particular chocolate was discontinued due to production-related reasons (“aus produktionstechnischen Gründen”). This means that Neelix will have to find another favorite chocolate…

We couldn’t find the product page for Ritter Sport’s Fine Dark 73%, either. It seems like this chocolate has been replaced by Ritter Sport Cocoa Selection 74%. We have contacted Ritter Sport, but didn’t get any response so far.

Three chocolates advertise themselves as baking chocolates:

  • Carletti
  • Odense
  • Manner

Generally, participants didn’t detect anything special with these chocolates. All three ranked solidly in the middle, all close to each other. There seems to be a special flavor or texture to them, though. Dolim ended with all three baking chocolates among their top four contestants, with Carletti winning Dolum’s tournament.

Cote d’Or and Suchard are extremely similar. As we found out, both are owned by Mondelēz International, and it feels like Cote d’Or and Suchard are just regionally adjusted versions of the same chocolate. We found one minor difference in the ingredient list, but overall the chocolates seem to taste the same. Their flavor is not universally liked, though, and the selected-comments section below features both chocolates.

Fourteen participants finished a double-elimination tournament. Out of those, half changed the winner in the finals. That is, for seven participants the winner’s bracket winner did not end up winning the tournament.
In four tournaments, the loser of the winner’s bracket final managed to win the loser’s bracket final, and thus was going up for a rematch (repeating the winner’s final pairing). In all cases the loser’s bracket winner won the tournament.

Selected Comments

Each comment is prefixed by the participant that wrote it. Each time a name is prefixed it comes from a separate battle. For example, we have two comments from Miles on Manner. These are not from the same round.

Carino:

lursa: Taste like a cigar smells when it hasn’t been lighted yet
worf: Smoky. Weird.

Coop Dark:

nog: smells like honey and tobacco.

Cote d’Or:

Kira: “taste like iron”
Quark: “Tasted like blue cheese. Yuck!”
Michael: “ Not great, too smoky, seaweedy?”
Lursa: “This tasted like it had been smoked, yak”
Jean-Luc: “Not good. Smells and tastes smokey. “ashtray” comes to mind.”
Kurn: “Burnt and bitter”

Fair:

tom: Knack. Funny … rather anonymous at first, later on some sweetness/acid, but also an ever so subtle flavor of vomit (..no it’s not terrible, but it is funny..).

Manner:

miles: Somewhat surprised by the taste. I think I like it though.
miles: Weird. Wouldn’t buy. (though it _does_ seem to grow on me)
Lursa: just YAK!

Marabou Premium:

ezri: Absolutely awful! Tasted like bandaids

Premieur:

lursa: absolutely horrible, there were something in it that shouldn’t be there but can’t tell what it was…vradjjjj bad!
daniels: WTF is that aftertaste?

Suchard:

lursa: yak, that tasted like gasoline! really dry aftertaste
martok: This chocolate has cheesy taste and flavor. Usually I don’t like it, but this time it was surprisingly pleasant.
martok: Nice dark chocolate. The taste and the flavor are mostly cocoa, but there is some subtle yet distinct addition which reminds of something smoked or maybe about some sort of cheese. Anyways, it’s a nice addition.
jean-luc: Texture is good. Blue cheese taste. Not good.

Toms Extra:

quark: Tastes like dill…
martok: Well, it’s a nice dark chocolate. But I couldn’t throw away an image of pickled cucumbers out of my head while chewing it for some reason. Very interesting flavor. And it’s surprising that I didn’t noticed the flavor in any round before.
lursa: I think there was mynthe in this one, yearkkk
lursa: Is there mynthe in this? weird is what comes to mind:-)

Zotter:

lursa: absolutely horrible — I’d rather lick an ashtray — yak!
lursa: taske tobacco ish…yak!

Do One Yourself

All tools to run a similar experiment are online: A template spreadsheet is available here, and a detailed README on how to use it, is here.

If you like chocolate, also consider trying to bake this amazing chocolate almond cake.

Have fun!

Thanks to Lasse R.H. Nielsen and Matias Meno for their helpful comments.

Photos from this post: