1 Introduction
Do the following thought experiment: You are the human resources manager of a company and you are assigned the task of hiring a new employee. After advertising the position, you receive several dozen applications from candidates listing their skills and credentials (e.g., grade point average, work experience, programming skills). You can determine each candidate’s potential only after inviting him or her for an interview. Let us assume that you can interview candidates sequentially and that you can decide to stop interviewing and hire a candidate after each interview. Crucially, making the effort to interview another candidate is costly. What is the best way to organize the interview process? First, you need to decide the order in which you will be inviting candidates. Then, after each interview you need to decide whether to make an offer to one of the interviewed candidates, thus stopping your search. The first problem is an ordering problem and the second a stopping problem.
Clearly, if you could perfectly estimate the potential of the candidates on the basis of their credentials you could directly choose the best one by using decision analytic methods (e.g., Reference Keeney and RaiffaKeeney & Raiffa, 1993). On the other hand, if the credentials were not at all informative, you would have to invite people at random, and your problem would reduce to an optimal stopping problem. Such models have been developed formally in statistics (Reference DeGrootDeGroot, 1970) and economics (Reference StiglerStigler, 1961; Reference Lippman and McCallLippman & McCall, 1976) and human behavior in them has been tested empirically in psychology (Reference Rapoport and TverskyRapoport & Tversky, 1966; Reference LeeLee, 2006), economics (Reference Schotter and BraunsteinSchotter & Braunstein, 1981; Reference HeyHey, 1987; Reference SonnemansSonnemans, 1998) and marketing (Reference Zwick, Rapoport, Lo and MuthukrishnanZwick, Rapoport, Lo & Muthukrishnan, 2003). Intuitively, most everyday decision-making problems lie between these two extreme cases; in reality, the attributes of the alternatives can be used to predict their utility but only imperfectly. There often remains some amount of uncertainty that cannot be explained by the attributes.
There has been some work on ordered search: In a seminal paper, Weitzman (1979) put forward a general model for ordering alternatives and terminating search in search problems with recall, where decision makers initially have partial information about the alternatives but learn their true utility after paying a search cost. Although Weitzman’s results readily generalize to scenarios with multi-attribute alternatives, only a recent study by Dzyabura (2013) considered explicitly how sequential search can be guided by a multi-attribute utility model. Earlier, Roberts and Lattin (1991) presented a model of consideration set formation in which the decision makers include alternatives in their consideration set guided by a compensatory multi-attribute utility model. In essence, Roberts and Lattin’s model can be seen as an ordered search model, in which the number of alternatives that will be searched has to be decided once and for all before any search is performed.Footnote 1 However, the authors did not connect their results to search theory. Further, Moorthy, Ratchford and Talukdar (1997) employed Weitzman’s model to derive predictions about the length of search in consumer choice. They further specified the original model and assumed that decision makers are uncertain about a brand’s utility. The consumer beliefs in their model are probabilistic, and more experienced consumers are better at differentiating brands in terms of utility. Moorthy et al. tested their predictions on data from the car market. Last, in economics, several papers started from the assumption that agents search the alternatives in an externally imposed (Reference ArbatskayaArbatskaya, 2007) or subjectively defined (Reference Bagwell and RameyBagwell & Ramey, 1994; Reference Armstrong, Vickers and ZhouArmstrong, Vickers & Zhou, 2009) order and studied the aggregate market behavior.
In this paper we show that, when the decision makers’ preferences can be described by a linear utility model, the problems introduced by Weitzman have an intuitive and psychologically plausible solution. Returning to our example, we will analytically show that the optimal policy is to follow the estimated utility order prescribed by your subjective utility model; then stop when the expected return from seeing one more candidate for the job turns negative. In essence, the utility models play the role of cognitive search engines, generating the order in which alternatives are examined. We formally develop this approach and apply it to three models that have been studied extensively in the field of judgment and decision making: (i) multi-attribute linear utility, (ii) equal weighting of attributes and (iii) a single-attribute heuristic. The simpler models (ii) and (iii) have been shown to perform well under some conditions (e.g., scarce information available for calibrating models) in one-shot choice problems (Reference BarronBarron, 1987; Gigerenzer, Todd & the ABC Research Group, 1999; Reference Fasolo, McClelland and ToddFasolo, McClelland & Todd, 2007; Reference KatsikopoulosKatsikopoulos, 2011). We then compare the performance of the models in 12 real-world environments ranging from consumer choice to industrial experimentation and examine how the models’ expected utility order and estimation error influence their performance and length of search.
Conceptually, our approach illustrates that optimal stopping problems assuming random search and one-shot choice problems are the boundary cases of an ordered search problem with imperfect information. It provides a formal framework within which the assumptions made by ordered search models, such as those proposed by Reference Bagwell and RameyBagwell and Ramey (1994), Moorthy et al. (1997) and Reference Armstrong, Vickers and ZhouArmstrong, Vickers and Zhou (2009), can be further clarified. In practice, our approach extends discrete choice models by specifying the exact search process. It is a plausible alternative to Roberts and Lattin’s (1991) theory of consideration set formation and it further advances our understanding of decision-making in environments with rank-ordered alternatives.
In what follows, as in the model presented by Weitzman (1979), we focus on a scenario in which decision makers learn the exact utility of an alternative after sampling it and can always choose alternatives that they have sampled in the past. In Section 2 we develop a formal framework of optimal ordering and stopping. In Section 3 we test three models in 12 real-world environments ranging from consumer choice to industrial experimentation and examine the ordering and estimation error components of the models. In Section 4 we discuss the possibility of connecting our findings to other work, assess the conceptual and applied implications of our approach, and finally discuss the limitations and possible extensions of our framework.
2 The theoretical framework
2.1 The environment
There are n alternatives A 1,...,A n. Each alternative A i, i ∈ {1,...,n} is associated with a vector of attributes ai = (a i1,...,a ik) and a utility u i of choosing it. The u is are unknown but the ais are known to the decision maker. The decision maker estimates u i by f(ai). We assume that the estimation errors, єi , such that u i = f(ai) + єi, are iid Gaussian with mean µ and standard deviation σ. We call this equation the decision maker’s subjective model. For each alternative, the decision maker can only learn the utility u i by sampling the alternative and paying a cost c. The decision maker can sample as many alternatives as desired, and she can eventually choose only among the sampled ones. If the decision maker searches for k items, the cost to be paid is kc and the decision maker will choose out of these k the alternative with the highest utility.
2.2 The optimal strategy
Let S denote the set of alternatives already searched and S denote the set of alternatives not searched yet. That is,
The decision maker’s problem is to determine the search order and the stopping rule. Let the variable y denote the maximum utility that the decision maker can obtain from the alternatives in S, y = max A k ∈ Su k, where A k belongs to S. If the decision maker sampled just one more item A k before stopping search then the subjective expected gain (i.e. the increase in utility minus cost) is (probabilities and expectations below are based on the decision maker’s subjective model):
It is intuitive that the decision maker should keep on sampling as long as there exists an alternative A k ∈ S such that its subjective expected gain R(A k) > 0. (If all R(A k) < 0, search should be stopped.) Given this, the decision maker should sample the alternative A k that achieves the maximum subjective gain R(A k). It turns out that to maximize R(A k), it suffices to select the alternative with the highest f(ai) (for a proof, see Appendix 1):
Result 1 If for two alternatives A i, A j ∈ S, f(ai) > f(aj) then R(A i) > R(A j).
This result says that if the decision maker decides to sample one more item before terminating the search, then the choice should be the one that has maximum unconditional expectation E(u i) = f(ai). This suggests the following policy:
Selection rule: Order the alternatives based on their unconditional expectation. Select the items for sampling in this order.
Stopping rule: If at any stage subjective expected gain is negative, terminate the search.
Note that the stopping rule can be applied only if the standard deviation of the estimation error σis estimated. On the other hand, for the selection rule to be applied, only the parameters of the multi-attribute function f(ai) need to be estimated. For σ= 0 the decision-making problem reduces to a single choice. For c = 0 the decision maker searches all the alternatives. Note that for c = 0 and when there are only two alternatives the model reduces to the probit model.Footnote 2 If σ= ∞ , P(u i > y) is equal to 0.5.
2.3 Subjective models
In our framework, the order of the alternatives A i is based on their subjective utility f(ai). Thus, the decision maker’s eventual success depends upon the multi-attribute utility function he or she uses to estimate utilities and generate an order of the alternatives. We present three psychologically plausible and widely used multi-attribute utility functions:
1. Multi-attribute linear utility (MLU): f((a i 1,...,a i m)) =∑jβja i j.
MLU is one of the cornerstone models in research on multi-attribute decision making (Reference Keeney and RaiffaKeeney & Raiffa, 1993). It is also the model used to derive consumer preferences in conjoint analysis surveys in marketing. MLU is the equivalent of multiple linear regression, which has been widely studied as a model of inductive inference in multi-cue learning (Reference Hammond, Hursch and ToddHammond, Hursch & Todd, 1964).
2. Equal-weighted linear utility (EW): f((a i 1,...,a i m)) = ∑ja i j, where all the attributes a i 1,...,a i m are normalized and brought to the same scale.
EW is a special case of MLU where all decision weights βjs are equal. It was originally proposed as an alternative to multi-linear regression by Reference Dawes and CorriganDawes and Corrigan (1974).
3. Single-attribute utility (SA): f((a i 1,...,a i m)) = a i j, where a i j has the highest ecological validity among {a i 1,...,a i m} as expressed by Kendall’s tau non-parametric correlation.
Versions of the SA model have been studied by Reference Hogarth and KarelaiaHogarth and Karelaia (2005). The SA model is akin to the lexicographic heuristic (Reference Payne, Bettman and JohnsonPayne, Bettman & Johnson, 1993; Reference Kohli and JedidiKohli & Jedidi, 2007) and the take-the-best heuristic (Reference Gigerenzer and GoldsteinGigerenzer & Goldstein, 1996; Reference Katsikopoulos, Schooler and HertwigKatsikopoulos, Schooler & Hertwig, 2010). However, SA resolves ties between alternatives by choosing at random, whereas the lexicographic heuristic and take-the-best examine additional attributes.
Note that the three presented models can also be seen as cardinal estimation models, where the criterion value corresponds to the utility. Davis-Stober, Dana and Budescu (2010b) have pointed out that the notation of utility and estimation models can be used interchangeably.
2.4 An example
We illustrate how the different components of equation 1 play out with a concrete example (see Figure 1). Consider a scenario where a decision maker is searching in an online store to buy a single album of an up-and-coming music band she just heard about on the radio. The band has produced three albums so far. The decision maker can learn the exact utility of an album by listening to its songs. Her subjective beliefs are described by the SA model. As represented by the straight line in Figure 1, the decision maker believes that the expected utility of an alternative can be estimated by u i = 0.3*a i , where a i is the average rating of the album by other users of the site. As represented by the bell-shaped curves, she believes that the estimation error єi of her model is iid Gaussian with mean µ = 0 and standard deviation σ= 0.5. The decision maker first samples album K, which has the highest expected utility. She finds out that the utility of album K is 1.87, which is slightly less than its expected utility. Then she has to decide whether it is worthwhile to examine the album L. Following equation 1 the expected returns from sampling album L can be written as P(u L > u K) × E(u L − u K| u L > u K) − c . P(u L > u K) = 0.33, E(u L − u K| u L > u K) = 0.329 and their the product equals 0.109.Footnote 3 Thus, the decision maker will examine the second album if the cost of search is lower than 0.109; otherwise she will stop search, choose album K, and never learn the actual utility of album L. Let us assume that the cost is 0.05. Then, the decision maker samples album L. She finds out that the utility of L for her is 2.27, which is higher than her expectation and the utility of album K. Thus, L replaces K as the sampled album with the highest utility (y in equation 1). Now the returns from sampling album M can be written P(u M > u L) × E(u M − u L| u M > u L) − c . P(u M > u L) = 0.003 and E(u M − u L| u M > u L) = 0.152. The product of these two parts equals 0.0005. Thus the overall return is negative and the decision maker will stop search after sampling album L and choose it. She will never learn the realized utility of album M.
3 Results
We applied the three models to study guided search in real-world problems. We examined the performance of the models in 12 data sets ranging from consumer choice to industrial experimentation. All the datasets included a variable with positive and more-is-better valence, which we treated as the utility. All the attribute values were normalized to a 0–1 scale to implement the EW model. When a correlation between an attribute and the criterion value in the test data set was positive, we converted the lowest attribute value to 0 and the highest to 1. When a correlation was found to be negative we converted the highest attribute value to 0 and the lowest to 1. We used ordinary least squares to calibrate the parameters βj of the three models. For each of the models, we estimated the standard deviation of the error component σusing the standard error of the corresponding linear regression model where T is the number of alternatives in the training set and ŷ the estimate of the linear regression model with parameters βj.
3.1 The environments
The 12 datasets analyzed are freely available from online databanks. We provide an additional reference list of the publications in which the datasets were originally reported in the supplementary material. As reported in Table 1, the datasets are characterized by a diverse number of attributes, alternatives, and intercorrelations between the attributes. We normalized the utility variable, setting the utility of the alternative with the lowest value equal to 0 and that with the highest value equal to 1. This transformation was necessary in order to achieve comparability across the datasets. The remaining variables are the attributes that were used to predict the alternatives’ utility. In a few cases we excluded variables that were not suitable as attributes in a choice problem. In the white wine quality and the red wine quality environments we reduced the number of alternatives to 200 by choosing once at random from the initial dataset. This transformation was implemented to generate a plausible decision-making ecology. The ecological characteristics of the resulting environments were very similar to those of the complete environments reported in Table 1.Footnote 4 We used the same subset of 200 alternatives consistently in all the analyses reported in this paper.
3.2 Performance
The performance of a model depends (1) on the order it generates, which determines how quickly it discovers high quality alternatives, and (2) on the estimated standard deviation of the error component σ, which influences the subjective probability that the utility of the next alternative to be sampled will be higher than the utility of the best alternative discovered up to that point. Clearly, when σ increases, the subjective probability and consequently the expected returns from further search also increase. Finally, the model performance depends (3) on the cost of search in the environment. Factor 3 is an environmental factor, while factors 1 and 2 reflect how well a model captures the properties of the environment.
At first, we manipulated the cost of search and pitted the models against each other in eight different cost conditions. For the purposes of our simulation we adapted the technique of cross-validation to a search problem. We fixed the parameters βj and σcorresponding to each model in half of the data set (training set) and evaluated the performance of the models in the remaining half (test set). This process was repeated 10,000 times in total. For each repetition the alternatives that were part of the training and test sets were drawn at random from the entire data set. As a result, the maximum utility, which could be achieved by all models in each of the repetitions, differed slightly depending on the utilities of the alternatives that were part of the training and test sets. The models first sampled the alternative that they estimated to have the highest expected utility. Then each model decided according to equation 1 whether to proceed to search the alternative with the second highest expected utility, then the third, and so forth. When a model stopped search, the alternative with the highest utility among the alternatives searched up to that point was chosen. The measure of performance was the average utility achieved by a model in 10,000 repetitions. In real life, this corresponds to the average performance of 10,000 decision makers, with randomly sampled experiences, who all followed the optimal policy. This measure can also be seen as an approximation of the expected payoff for a single decision maker.
Average results from the 12 environments for all eight cost conditions are presented in Table 2 and the results from each of the 12 environments individually are presented in Figure 2. The best performing model was MLU. The performance differences were largest for high search costs and they gradually attenuated as the costs decreased. Clearly, a high cost implies that the models are likely to search only a few alternatives at the beginning of the order, and the models may have different orders. As the costs decrease, all the models search further down the expected utility line and are more likely to discover the high-quality alternatives in the environment. As a result, the performance differences diminish.
There is much variability in performance in individual environments, where in several cases the simpler models outperformed MLU for the entire cost range. For example, EW performed best for most of the cost conditions in the beer aroma, cheese taste, CPU efficiency, olive oil quality, and potato taste environments, while SA performed best in the red wine quality, tea quality, and octane quality environments. Thus, although MLU performed better on average, it clearly outperformed the simpler models only in the remaining four environments. The better performance of MLU at the aggregate level was mainly driven by its superior performance in the white wine quality environment. As stressed earlier, the model-specific factors that may influence the models’ performance are on the one hand the order generated by each model and on the other hand the estimated standard deviation. In the rest of the Results section we decouple the roles of these two factors.
3.3 Impact of search order on performance and search length
To disentangle the impact of the search order from that of the estimated standard deviation of the error component σ, we first examined the average return to search (as measured by the average utility of the best searched alternative), achieved by each multi-attribute utility model, for all possible search lengths k, without implementing any search cost. Moreover, we compared these results to random search, which corresponds to the assumption made in most optimal-stopping studies. As we did for the full task, we cross-validated the models on half of the data points and we repeated our simulations 10,000 times.
As shown in Figure 3, the largest difference between the subjective models and random search is found in environments with high R 2 of the best fitting linear regression. In the environments with the highest R 2, such as CPU efficiency and octane quality, the best solution was almost always located in one of the first search trials by all the multi-attribute models. In contrast, in environments with low R 2 such as beer aroma, potato taste, and fluorescent lamp lifetime, the margin is smaller and in some cases random search performed almost as well as the multi-attribute models.
In all environments there is a close correspondence between the average return to search as a function of the search length and the performance of the model in the full task, as depicted in Figure 2. For most costs, the model with the highest average return to search is also the best performing model in the full task. In four of the five environments in which EW performed best in the full task (beer aroma, cheese taste, CPU efficiency, potato taste) it also performed best for most of the search lengths. Similarly, in the three environments where SA performed best in the full task (red wine quality, tea quality and octane quality) it also performed as well as or better than the other two models for most search lengths in the search task. Overall, there are a few cases, such as the tea quality environment, where different models lead to best performance for different search lengths k.
Note that the factors that influence the performance of a model directly influence the length of search, as observed in Figure 4. According to the theory presented in section 2, the utility of the best alternative secured so far, y, and the probability P(u k > y), that the utility u k of the next alternative down the expected utility line A k will be higher than y should be inversely related. This suggests that models that place high-utility alternatives earlier in the search order will be characterized by lower returns to search and should, ceteris paribus, search less. Indeed, this is what we observe in Figures 3 and 4. However, to fully understand the performance and search length of the models we also need to come to grips with the role played by the standard deviation of the error component σ. The next section delves into that exactly.
3.4 The role of the standard deviation σ
The second factor that has an impact on the performance of the model and the length of search is the standard deviation of the error component σ. To assure good performance the models’ error component should correspond to the unexplained uncertainty in the environment. For estimated = 0 the model would deem that the first alternative is also the best one. For = ∞ the model would calculate P(u k > y) = 0.5 (following equation 1). However, both these models would be unadaptive in environments in which the uncertainty that cannot be explained by f(ai) is moderate. A lower standard deviation of the error component implies a decrease in the subjective probability P(u k > y) and a decrease in the expected returns from finding a better alternative E(u k − y | u k > y). Thus, ceteris paribus, it should lead to a shorter search.
Within models, the estimated standard deviation of the error component in our simulations is inversely related to the accuracy of the estimates of the model in the training set in which the model was fitted. Consequently, in most cases a good ordering should be accompanied by a relatively low error component and in tandem they lead to a reduced length of search. Note, however, that there could be a discrepancy between the accuracy of the estimates of a model and the average returns to search. Remember that only the best alternative discovered so far counts for the decision maker in the search task.
Between models, the MLU tends to capture a larger proportion of the uncertainty in ai (mlu = 0.166) and usually has a slightly lower estimated standard deviation than EW and SA (ew = 0.191 and sa = 0.191 ). As reported in Figure 4, only in the olive oil quality and potato taste environments was a heuristic model (EW) found to have an estimated standard deviation of the error component lower than the full model (MLU).
As seen in Figures 2-4, the models that achieve higher average returns to search (Figure 3), search less (Figure 4, Table 3 for average results) and score better on the full task (Figure 2). The differences in the estimated standard deviation of the models are in most environments marginal. In half of the environments they correspond to performance differences in the average returns to search. This suggests that the error components of the models are sufficiently well calibrated to the unexplained uncertainty in the environment. There are a few cases where a different estimated standard deviation might lead the models to markedly different performance. For example, in the CPU efficiency environment the EW and MLU have similar returns to search. MLU has a much lower estimated standard deviation , searches less than EW, and performs worse in the full task. This indicates that MLU may occasionally miss a very good alternative and it could benefit from searching more.
3.5 Paired-comparison results
So far, we have discussed how different properties of the search models influence their performance and we have looked at specific environments to understand how these factors play out in practice. We have illustrated that the most crucial factor is the search order in which the models sample the alternatives, although this has to be accompanied by a well-calibrated estimated standard deviation of the error component . But is there a way to predict which strategy is most likely to be successful in a given environment? The conditions under which different linear or heuristic models perform well in choice and inference contexts have been thoroughly investigated (for a review see Katsikopoulos, 2011). Few studies generalize beyond binary choice to choice between several alternatives or among the entire data set; however, most of the existing literature has focused on binary choices. To answer our question we attempt to shed light on the connections between the novel task we presented and the well-studied paired-comparison task.
Thus, we examined whether there is a correspondence in performance between the binary choice and the search task. We compared the performance of the three models in the binary choice task. As before, we fixed the parameters corresponding to each model in half of the dataset and evaluated their performance in all possible binary choice tasks in the remaining half. The process was repeated in total 10,000 times.
As seen in Table 4, on average, MLU performed best, followed by SA and then EW. In the individual environments, the performance of the models varied significantly. There were four environments in which EW performed best and one in which SA did. In general, the performance in the binary choice task is a good proxy of performance in the full search task. The model that performed best in the paired-comparison task also performed best in 7 of the 12 environments in the search task for cost equal to 1/23. The discrepancies observed can be attributed to the fact that in the full search task, the alternatives that are searched early on contribute disproportionally to the success of a model. In contrast, in binary choices all possible single choices in the data set contribute equally to the performance of the model. Hence, it is possible that the model that performs best in the search task does not maintain that superior performance in the binary choice task, and vice versa.
4 Discussion
4.1 Conceptual implications of our framework
4.1.1 Deterministic optimization and search models: A possible compromise
In optimization problems, widely studied in economics, decision makers can determine the alternative that maximizes their utility. This vision of decision making contrasts with search models in which the decision makers sample alternatives at random and stop search after encountering a good enough alterative (e.g., Simon, 1955; Reference Chow, Robbins and SiegmundChow, Robbins & Siegmund, 1971; Reference Caplin, Dean and MartinCaplin, Dean & Martin, 2011). Random sampling may lead to violations of the revealed preference principle and unpredictability in regard to the choices of individual decision makers. Ordered search models provide a possible compromise between these two approaches. Decision makers have a well-defined utility function before the search starts. However, as long as there is some uncertainty about the exact utility of the alternatives, it may pay to sample some of them to learn their utility. In our model, the initial preferences guide the search process but are also subject to revision when the true utility of the sampled alternatives is revealed. For an external observer, such as a firm or a market analyst, ordered search is more predictable, at the level of an individual decision maker, than random search. In ordered search, if the model of the decision maker and the actual utility of the alternatives are known, the external observer could also predict the final decisions made, as well as the preference reversals that would occur along the way.
4.1.2 Evaluating the hypotheses of ordered search models
Employing our framework, we can examine the conditions under which the assumptions postulated in previous ordered search models hold true. Reference Bagwell and RameyBagwell and Ramey (1994) suggested that decision makers may use a simple rule of thumb and first consider buying products from firms that advertise more.Footnote 5 This corresponds to an SA model where the amount of advertising is the most informative cue. This, indeed, is plausible in some cases. However, the SA model implies that when more informative attributes are available, the amount of advertisement may be completely ignored as an attribute. Armstrong et al. (2009) assumed that consumers search products according to an abstract attribute called prominence and suggested that firms might be willing to invest to achieve prominence and improve their search order. Our framework allows us to estimate the exact impact of an intervention in the attributes of a product in terms of the firm’s prominence in the market. Finally, Moorthy et al. (1997) suggested that the expectation of the utility of a brand’s products is normally distributed. In addition, they suggested that more experienced consumers are better able to differentiate between products. These assumptions are both in line with our framework.
4.2 Applied implications of our framework
4.2.1 Sequential search and consideration set formation
Our modeling approach suggests that only the alternatives that have been sampled by the decision makers stand a chance of being selected. Similarly, several marketing scientists have advanced choice models in which the decision makers first restrict their attention to a subset of the alternative set — commonly called the consideration set. The decision makers then examine the alternatives of this subset more closely and finally choose one of the alternatives in it (e.g., Reference Wright and BarbourWright & Barbour, 1977; Reference Shocker, Ben-Akiva, Boccara and NedungadiShocker, Ben-Akiva, Boccara & Nedungadi, 1991; Reference Gilbride and AllenbyGilbride & Allenby, 2004; Reference MoeMoe, 2006). Such models have often been found to outperform, in fitting and prediction, discrete choice models in which the decision makers are assumed to consider all the alternatives (e.g., Reference Gilbride and AllenbyGilbride & Allenby, 2004). Our approach shares some of its assumptions with a popular model of consideration set formation put forward by Reference Roberts and LattinRoberts and Lattin (1991). In their model, decision makers, whose pay-off function is described by a compensatory multi-attribute utility model with an additional error term, decide which alternatives to include in their consideration set. Similar to our approach, the decision makers examine the alternatives in the order of decreasing expected utilities predicted by their utility model, paying a fixed cost for every new alternative they place in their consideration set. In contrast to our model, the decision makers do not learn the exact utility of the alternatives immediately after paying the cost. Instead, they learn all the utilities of the alternatives in the consideration set right before they choose among them. Our model can be seen as the sequential search counterpart to Roberts and Lattin theory of consideration set formation. Inversely, Roberts and Lattin’s model can also be understood as an ordered search model in which the search length has to be decided at the outset. The exact domain of application of sequential search and fixed-sample-size models of consideration-set formation may depend on the exact characteristics of the decision making context and it should be the subject of further empirical investigation in the future.
4.2.2 Parallels to online ranking schemes
There are clear-cut parallels between the search approach we present here and the methods used by commercial search engines and recommendation systems to pre-rank the alternatives for Internet users. Such ranking schemes implement a valence function that maps the attributes of the alternatives to the relevance or utility for the user (Burges, Shaked, Renshaw, Lazier, Ari and Deeds, Matt, Hamilton & Hullender, 2005; Hüllermeier, Fürnkranz & Cheng, Weiwei and Brinker, Klaus, 2008; Reference Yaman, Walsh, Littman and DesjardinsYaman, Walsh, Littman & Desjardins, 2011). The valence functions underlying the ranking schemes are trained with clickstream decision data or other information that can reveal the preferences of the user (Reference Sarwar, Karypis, Konstan and RiedlSarwar, Karypis, Konstan & Riedl, 2000). Then, similar to what our theory suggests, the ranking schemes present the alternatives in decreasing order of relevance or utility. This approach is implemented without invoking a formal decision-making theory that predicts how decision makers will choose on the basis of the presented rank order. If the goal of the ranking-scheme engineers is to increase the utility derived by the users, a formal decision-making theory, such as search theory, might further inform the development of ranking-schemes as well as the techniques used to train their valence functions. Indeed, there have been recent papers in machine learning about why the design of ranking techniques can benefit from taking into account how people actually decide in rank-ordered environments (Reference Agichtein, Brill and DumaisAgichtein, Brill & Dumais, 2006; Reference Chapelle, Metlzer, Zhang and GrinspanChapelle, Metlzer, Zhang & Grinspan, 2009). We take a step in that direction and illustrate the role of cost of search and the uncertainty in the environment in the search process.
4.3 Connections to prescriptive and descriptive decision making
4.3.1 From choice to search
For the paired-comparison problem, where the task is to choose one of two alternatives, the accuracy of heuristics such as EW and SA has been analyzed and compared to that of the full linear model in numerous studies. Until now, a thread of the existing literature in judgment and decision making has examined binary choices in environments with binary or continuous attributes without making any assumptions about the mapping from an alternative’s attributes to its utility (e.g., Katsikopoulos et al., 2010) or when the mapping is characterized by noise (Reference Hogarth and KarelaiaHogarth & Karelaia, 2005; Reference Hogarth and KarelaiaHogarth & Karelaia, 2006; Reference Rieskamp and OttoRieskamp & Otto, 2006; Davis-Stober, Dana & Budescu, 2010a; Davis-Stober et al., 2010b). A second thread of the literature has focused on environments with binary or continuous attributes, where a mapping between the attributes and the utility exists but decision makers have imprecise knowledge about the attribute weights (Reference Johnson and PayneJohnson & Payne, 1985; Reference Martignon and HoffrageMartignon & Hoffrage, 2002; Reference Hogarth and KarelaiaHogarth & Karelaia, 2005; Reference Baucells, Carrasco and HogarthBaucells, Carrasco & Hogarth, 2008; Reference KatsikopoulosKatsikopoulos, 2013).
Given the observed correspondence in choice task and search task results, the findings of the first thread of studies in binary choice tasks may also generalize to the search task. Overall, these studies have found no large performance differences between the heuristics and the full model and that heuristics can outperform the full model under the appropriate conditions. Both EW and SA fare especially well in out-of-sample prediction (Reference Einhorn and HogarthEinhorn & Hogarth, 1975; Reference Hogarth and KarelaiaHogarth & Karelaia, 2005; Reference Katsikopoulos, Schooler and HertwigKatsikopoulos et al., 2010). The SA model tends to perform well when a simply or cumulatively dominating alternative is present (Reference Baucells, Carrasco and HogarthBaucells, Carrasco & Hogarth, 2008; Reference ŞimşekŞimşek, 2013; Reference Katsikopoulos, Egozcue and GarciaKatsikopoulos, Egozcue & Garcia, 2014), or when there exist high correlations between the single attribute and all other attributes (Reference Hogarth and KarelaiaHogarth & Karelaia, 2005; Davis-Stober et al., 2010a,2010b). EW tends to perform well when the variability in cue validities is small or when there are high intercorrelations between all the attributes (Reference Einhorn and HogarthEinhorn & Hogarth, 1975; Reference WainerWainer, 1976). In the environments that we studied we also found support for some of these findings. For example in the four environments, in which EW performed best in binary choice the difference between and was small.
Although our first results suggest a relation between the search task and the choice task, additional research is required in the future to establish this relation and to identify the conditions under which differences in the relative performance of models in the two tasks are to be expected. A further follow-up to our study would be to examine when decision makers choose a certain strategy (Reference BröderBröder, 2003) and how the decision makers learn to act adaptively and to select a strategy that performs well in a given environment (e.g., Reference Rieskamp and OttoRieskamp & Otto, 2006).
4.3.2 Psychological plausibility of the stopping rule
So far we have shown that the proposed stopping policy is optimal. But is it psychologically plausible? When sampling from a known distribution with or without recall an optimally acting decision maker should always stop right after encountering an alternative with a value higher than the optimal threshold. Several variations of such optimal threshold problems have been studied extensively in psychology and economics. We have identified five studies reporting results on experiments in which subjects sampled from a known distribution with recall (Reference Rapoport and TverskyRapoport & Tversky, 1970; Reference Schotter and BraunsteinSchotter & Braunstein, 1981; Reference HeyHey, 1987; Reference KogutKogut, 1990; Reference SonnemansSonnemans, 1998). In sum, a moderate proportion of the participants in these studies behaved in a manner consistent with the optimal stopping rule. Common discrepancies from the optimal strategy included stopping too early and exercising recall. Nonetheless, the researchers found that the average performance of the participants was near optimal. Hey (1982) and Sonnemans (1998) reported that many subjects used heuristic strategies that appeared consistent with the optimal rule and led to near optimal performance. In the stopping policy we have presented, the decision maker should at every search step reevaluate the returns from sampling the next alternative in line. This task may appear demanding in relation to the optimal threshold rule. However, it has the same structure as a simplified version of a signal-detection problem (e.g., Reference Green and SwetsGreen & Swets, 1966) — in which humans are known to perform fairly well. Thus, we believe that the stopping policy could be psychologically plausible as such, or it could be well approximated by clever heuristic algorithms. Clearly, human behavior in the task has to be investigated experimentally in the near future.
4.3.3 Alternative search algorithms
We showed that, under the homoscedasticity assumption embedded in many linear models and the subjective linear utility models outlined in this paper, the intuitive policy of searching the alternatives in the order of their subjective expected utility is optimal. In addition, in this special case our policy coincides with another simple algorithm called directed cognition, which has been proposed by Reference Gabaix, Laibson, Moloche and WeinbergGabaix, Laibson, Moloche and Weinberg (2006). This algorithm searches myopically, as if the next step of the search process was also the last one. When the assumption of the homogeneously distributed error component є does not hold, the behavior suggested by our ordering policy diverges from Weitzman’s (1979) optimal solution and from Gabaix et al.’s (2006) directed cognition algorithm. Gabaix et al. (2006) showed that, even in simple cases of indexable problems like those discussed by Weitzman (1979) and Reference Gittins, Glazebrook and WeberGittins, Glazebrook and Weber (1989), the decision makers are unlikely to follow the optimal algorithm and instead decide in line with the directed cognition algorithm. In the future, as in Gabaix et al. (2006), one could examine experimental scenarios in which the behavior prescribed by the three discussed algorithms diverges to evaluate their psychological plausibility in different decision-making environments.
4.4 Extensions and limitations
The current model could be extended to scenarios with different search costs for different alternatives or when the decision maker can choose an alternative directly without paying the sampling cost. Further, the model could be readily extended to cases when the decision maker can choose more than one alternative. In addition, it is possible to account for contexts where new alternatives become available at a later point in time. The decision maker can construct a search order for the new alternatives and consider if it is worthwhile to sample some of them, examining first the alternative with the highest expected utility.
In our model we have assumed that decision makers have direct and free access to the attributes all at once. We did not discuss cases where the decision makers have to pay a search cost to learn additional attributes before moving forward to examine further alternatives. A model of this kind with random sampling is presented by Reference Lim, Bearden and SmithLim, Bearden and Smith (2006) and has been investigated empirically by Reference Bearden and ConnollyBearden and Connolly (2007). Further, as in search problems, we have assumed that the decision maker learns the exact utility of an alternative after paying a search cost and examining it. However, there are many dynamic decision-making contexts where the cost is internally defined as an opportunity cost when consuming an inferior alternative (Reference NelsonNelson, 1970). In these environments information acquisition can be inherently noisy and decision makers may want to sample the alternatives repeatedly. Such decision-making contexts are commonly referred to as multi-armed bandits. In fact, when we change the costly sampling assumption to repeated sampling, in which the final pay-off is the sum of the experienced utilities, our framework turns into a multi-armed bandit with contextual information. This type of bandit framework has been receiving increasing attention in recent years in machine learning (Reference Pavlidis, Tasoulis and HandPavlidis, Tasoulis & Hand, 2008; Reference Li, Chu, Langford and SchapireLi, Chu, Langford & Schapire, 2010).
So far we have postulated that decision makers have stable preferences and an accurate error estimate of their model throughout the entire search process. This strong assumption may hold precisely or it may approximate the truth in many decision-making environments, especially when decision makers have long experience testing alternatives. However, in some environments decision makers may not know the utility weights but rather learn them along the way as they examine new alternatives. This approach has been followed by Dzyabura (2013), who assumed that decision makers update the estimates of the weights and the search order after each new alternative they examine. Clearly, when decision makers learn their preferences the ordering and stopping rules derived in our paper are not guaranteed to be optimal. A fully rational policy would have to anticipate the future evolution of decision makers’ preferences and then build any beliefs about preference change into the search and stopping rules. Even in simplified scenarios this approach is known to be computationally intractable. In similar preference-learning problems encountered in machine learning, greedy heuristics are implemented instead to balance preference learning and exploitation (Reference Brochu, De Freitas and GhoshBrochu, De Freitas & Ghosh, 2007; Reference Brochu, Cora and De FreitasBrochu, Cora & De Freitas, 2010). Another approach would be to compare, for any given search length, the performance of alternative, optimal in some respect, active learning algorithms (Reference FedorovFedorov, 1972; Reference Sugiyama and NakajimaSugiyama & Nakajima, 2009).
4.5 Conclusion
In two recent publications Luan, Schooler and Gigerenzer (2011, 2014) stressed the need to integrate decision-making theories in psychology and illustrate how apparently disparate models share common conceptual ground. In the same vein, we argued that choice and search problems, which until now have been studied separately, are the boundary cases of a broader decision-making problem. We showed how three choice models that have been extensively studied in the field of judgment and decision making can guide the search for good alternatives and we formulated their corresponding optimal stopping rule. Then, we compared the performance of the models in 12 real-world environments ranging from consumer choice to industrial experimentation and illustrated how each models’ expected utility ordering and estimation error influence its’ performance and length of search. As in previous model comparisons, in one-shot choice problems we found that heuristic linear models performed on average close to a multi-attribute linear utility model. Moreover, in individual environments the heuristic models often outperformed the full model. To further understand when such results are to be expected we examined the relationship between the search problem and the well-studied binary choice problem. We found that in most cases the models that performed well on the binary choice task also did so in the search task. This suggests that previous findings on the ecological rationality of choice and inference strategies are also relevant to the search task. Finally, we discussed the connections of our model to the existing literature and suggested possible paths for future research.
Appendix 1: Proof of result 1
First notice that for any alternative A k
For alternative A i and A j let α = f(ai) − f(aj). Assuming α > 0, we will show that R(A i) > R(A j). The distribution of A i is the same as the distribution of A j + α; that is, F i(u) = F j(u−α) ∀ u ∈ ℜ.
Q.E.D