2008-goldstein.pdf: “Do More Expensive Wines Taste Better? Evidence from a Large Sample of Blind Tastings”, (2008; ):
Individuals who are unaware of the price do not derive more enjoyment from more expensive wine.
In a sample of more than 6,000 blind tastings, we find that the correlation between price and overall rating is small and negative, suggesting that individuals on average enjoy more expensive wines slightly less. For individuals with wine training, however, we find indications of a non-negative relationship between price and enjoyment. Our results are robust to the inclusion of individual fixed effects, and are not driven by outliers: when omitting the top and bottom deciles of the price distribution, our qualitative results are strengthened, and the statistical-significance is improved further.
These findings suggest that non-expert wine consumers should not anticipate greater enjoyment of the intrinsic qualities of a wine simply because it is expensive or is appreciated by experts.
2009-bohannon.pdf: “Can People Distinguish Pâté From Dog Food?”, (2009-04-01; ):
Considering the similarity of its ingredients, canned dog food could be a suitable and inexpensive substitute for pâté or processed blended meat products such as Spam or liverwurst. However, the social stigma associated with the human consumption of pet food makes an unbiased comparison challenging.
To prevent bias, Newman’s Own dog food was prepared with a food processor to have the texture and appearance of a liver mousse. In a double-blind test, subjects were presented with 5 unlabeled blended meat products, one of which was the prepared dog food. After ranking the samples on the basis of taste, subjects were challenged to identify which of the 5 was dog food.
Although 72% of subjects ranked the dog food as the worst of the 5 samples in terms of taste (Newell and MacFarlane multiple comparison, p < 0.05), subjects were not better than random at correctly identifying the dog food.
“Pure Exploration for Multi-Armed Bandit Problems”, (2008-02-19):
We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of forecasters that perform an on-line exploration of the arms. These forecasters are assessed in terms of their simple regret, a regret notion that captures the fact that exploration is only constrained by the number of available rounds (not necessarily known in advance), in contrast to the case when the cumulative regret is considered and when exploitation needs to be performed at the same time. We believe that this performance criterion is suited to situations when the cost of pulling an arm is expressed in terms of resources rather than rewards. We discuss the links between the simple and the cumulative regret. One of the main results in the case of a finite number of arms is a general lower bound on the simple regret of a forecaster in terms of its cumulative regret: the smaller the latter, the larger the former. Keeping this result in mind, we then exhibit upper bounds on the simple regret of some forecasters. The paper ends with a study devoted to continuous-armed bandit problems; we show that the simple regret can be minimized with respect to a family of probability distributions if and only if the cumulative regret can be minimized for it. Based on this equivalence, we are able to prove that the separable metric spaces are exactly the metric spaces on which these regrets can be minimized with respect to the family of all probability distributions with continuous mean-payoff functions.
The stochastic multi-armed bandit model is a simple abstraction that has proven useful in many different contexts in statistics and machine learning. Whereas the achievable limit in terms of regret minimization is now well known, our aim is to contribute to a better understanding of the performance in terms of identifying the m best arms. We introduce generic notions of complexity for the two dominant frameworks considered in the literature: fixed-budget and fixed-confidence settings. In the fixed-confidence setting, we provide the first known distribution-dependent lower bound on the complexity that involves information-theoretic quantities and holds when m is larger than 1 under general assumptions. In the specific case of two armed-bandits, we derive refined lower bounds in both the fixed-confidence and fixed-budget settings, along with matching algorithms for Gaussian and Bernoulli . These results show in particular that the complexity of the fixed-budget setting may be smaller than the complexity of the fixed-confidence setting, contradicting the familiar behavior observed when testing fully specified alternatives. In addition, we also provide improved sequential stopping rules that have guaranteed error probabilities and shorter average running times. The proofs rely on two technical results that are of independent interest : a deviation lemma for self-normalized sums (Lemma 19) and a novel change of measure inequality for (Lemma 1).
“D-TS: Double Thompson Sampling for Dueling Bandits”, (2016-04-25):
In this paper, we propose a Double Thompson Sampling (D-TS) algorithm for dueling bandit problems. As indicated by its name, D-TS selects both the first and the second candidates according to Thompson Sampling.
Specifically, D-TS maintains a posterior distribution for the preference matrix, and chooses the pair of arms for comparison by sampling twice from the posterior distribution. This simple algorithm applies to general Copeland dueling bandits, including Condorcet dueling bandits as its special case.
For general Copeland dueling bandits, we show that D-TS achieves 𝑂(K2 log T) regret. For Condorcet dueling bandits, we further simplify the D-TS algorithm and show that the simplified D-TS algorithm achieves 𝑂(K log T + K2 log log T) regret. Simulation results based on both synthetic and real-world data demonstrate the efficiency of the proposed D-TS algorithm.