Pure Hyperbolic Discount Curves Predict
“Eyes Open” Self-Control

George Ainslie
Veterans Affairs Medical Center, Coatesville PA, USA
University of Cape Town, South Africa
George.Ainslie@va.gov

Published in Theory and Decision 73, 3-34, 2012; 10.1007/s11238-011-9272-5

Based on an address at Foundations and Applications of Utility, Risk, and Decision Theory XIV
Newcastle, UK June 16, 2010

This material is the result of work supported with resources and the use of facilities at the Department of Veterans Affairs Medical Center, Coatesville, PA, USA. The opinions expressed are not those of the Department of Veterans Affairs of the US Government.

Abstract

The models of internal self-control that have recently been proposed by behavioral economists do not depict motivational interaction that occurs while temptation is present. Those models that include willpower at all either envision a faculty with a motivation (“strength”) different from the motives that are weighed in the marketplace of choice, or rely on incompatible goals among diverse brain centers. Both assumptions are questionable, but these models’ biggest problem is that they do not let resolutions withstand re-examination while being challenged by impulsive alternatives.

The economists’ models all attempt to make a single equilibrium preference predictable from a person’s prior incentives. This was the original purpose of these models’ hyperboloid (“β-δ”) delay discount functions, which have been widely justified by the assumption that a person’s intertemporal inconsistency (impulsiveness) can be accounted for by the arousal of appetite for visceral rewards. Although arousal is clearly a factor in some cases of intertemporal inconsistency, it cannot be blamed for others, and furthermore does not necessarily imply hyperboloid discounting. The inadequacy of β-δ functions is particularly evident in models of internal self-control. I have reviewed several of these models, and have argued for a return to pure hyperbolic discount function as originally proposed, the relatively high tails of which can motivate a recursive process of self-prediction and thereby the formation of self-enforcing intertemporal contracts. Such a process does not require a separately motivated faculty of will, or incompatible goals among brain centers; but it also does not permit the prediction of unique preferences from prior incentives.

Key words: hyperbolic delay discounting; intertemporal inconsistency; self-control; strength of will; intertemporal bargaining; visceral reward; self as population; economic models

A principal topic in behavioral economics has been impulsiveness and self-control. It is presumed that a person maximizes expected utility or reward, which conversely defines reward as the pressure that expectation exerts on choice. Impulsiveness has been defined as a violation of stationarity, such that a person who usually prefers one consumption or reward option temporarily prefers an incompatible alternative that is poorer by her usual standards, in the absence of any new information about its value. Impulsiveness is synonymous with preference reversal or dynamic inconsistency. Self-control has meant the avoidance or control of impulsiveness.

Most behavioral economists have settled upon David Laibson’s quasi-hyperbolic delay discount function as their model of impulsiveness (1994, 1997), but more from a desire to preserve the tractability of classical economic discount functions than from either parsimony or a need to fit experience. Several models of self-control have been built explicitly or implicitly on quasi-hyperbolic curves; but these all require the person to rely on an additional source of motivation that does not depend on current prospects (as in 'strength' models)" or to take the impulsive option out of consideration just before the moment of choice. I will argue that a satisfactory model of self-control must account for resistance to temptation on the basis of current best prospects and while the person is paying full attention to it. Quasi-hyperbolic functions have difficulty predicting this kind of self-control, but the pure hyperbolae that Laibson originally modified (Ainslie 1975, 1992) predict its features in detail. However, the cost of returning to this function will be loss of unique predictions of a person’s choice based on knowledge of her prior incentives.

This article is organized as follows:

Since this topic is still unfamiliar to many, a short historical explanation is called for. I will describe how hyperbolic delay discount curves that were derived from behavioral research suggested an explanation for preference reversal phenomena that were originally described in economics (Section I).
I will evaluate the proposal made soon afterwards that a hybrid exponential and step function can account for these phenomena without the loss of mathematical tractability that pure hyperbolic curves entail (Section II).
I will explore hyperbolic curves’ implication that successive motivational states will be in a limited warfare bargaining relationship, which a person can partially resolve by interpreting them as an intertemporal variant of a repeated prisoner’s dilemma (recursive self-prediction), and in which the stake for cooperation is enlarged by the curves’ relatively high tails at long delays (Section III).
I will discuss evidence for intertemporal bargaining versus some conventional suggestions as the mechanism of self-control in the absence of precommitment (“willpower;” Section IV).
I will examine recent proposals within behavioral economics to account for willpower without recursive self-prediction (Section V).
Finally I will augment my argument for recursive self-prediction by illustrating how it accounts for familiar motivational patterns that have lacked underlying mechanisms (section VI).

Roots of hyperbolic discounting

The discounted utility theory that has been accepted as the standard valuation model in economics (Samuelson, 1937) has no place for a regular tendency to form temporary preferences. Thus it has trouble accounting for impulsiveness. The standard valuation curve as a function of delay is

where Value0 = value if immediate and δ = (1 – discount rate).

This function describes consistency of valuation over time in the absence of new information, and, by implication, the possibility of predicting future valuation if a valuation at any earlier point is known. Economist Robert Strotz was the first behavioral scientist to suggest that people have non-exponential delay discount curves that require them to deal strategically with their own expectable changes of preference (1956). He pointed out that a person cannot rely on her relative valuation of two future rewards staying the same as the rewards get closer—that a curve depicting the current value of a perspective reward as a function of its delay is apt to stray from the exponential function depicting consistent preference. He did not specify a particular function, but noted that any function that is not exponential depicts inconsistent preferences over time. Previous writers had noticed how people sometimes pay for commitment devices to forestall their own impulses-- the economist Alfred Marshall, for instance, noticed how the poor preferred to buy small amounts of coal or alcohol at retail rather than buying a stock more cheaply to keep at home (1921, p. 814). However, Strotz was the first theorist to describe the need for such commitment as the product of a fundamental dynamic inconsistency resulting from the nonexponential discount of delayed rewards.

Economists did not immediately pursue Strotz’ insight. Meanwhile other authors began to deal with dynamic inconsistency. Political scientist and philosopher Jon Elster pointed out that Ulysses’ situation in preparing to sail past the Sirens was archetypical of choice in a modern world filled with temptations (1979). Political economist Thomas Schelling wrote that “people behave sometimes as if they had two selves, one who wants clean lungs and long life and another who adores tobacco” (1980, p. 95). Economists Richard Thaler and Hersh Shefrin noted that a person’s conflicting motives tend to coalesce into a “doer” interest that tries to get immediate gratifications and a “planner” that tries to keep the doer from giving in to temptation (1981).

In a seemingly unrelated line of experiments, psychologist Richard Herrnstein reported that subjects tend to sample two concurrent streams of reward in proportion to the mean rates, amounts, and immediacies of those rewards (the “matching law;” 1961, 1997). Building on both ideas, the present author pointed out that application of the matching law to single (discrete, i.e. not streamed) choices between smaller sooner (SS) and larger later (LL) rewards results in four specific predictions (Ainslie, 1975, 1992):

The decline in rewarding effect with delay is described better by a discount function that is inversely proportional to delay (hyperbolic discount curve) than by a function that declines by a constant proportion of remaining value for each unit of delay (exponential discount curve). That is, data on the evident value of a single prospective reward at varying delays will be described better by a hyperbolic than by an exponential function of those delays (figure 1). The most commonly used form of the hyperbolic curve has been

where Value0 = value if immediate and k is degree of impatience (Mazur, 1987).

Preferences between some pairs of an SS reward and an LL alternative at varying delays but with a constant lag between SS and LL rewards will favor LL rewards when both are distant, but shift to SS alternatives when they become closer (figure 2).

During the period when an LL reward is preferred, subjects will sometimes choose behaviors whose only function is to prevent their own subsequent choice of the SS alternative.
Subjects choosing between a whole series of SS/LL pairs at once will have a greater tendency to choose the LL rewards than will subjects choosing between the same pairs one at a time. A subsidiary prediction is that human subjects who perceive a current choice between SS and LL rewards as a test case—an example that predicts their own future preferences between similar pairs of rewards-- will be more apt to prefer the LL reward than they do when they see the pair of alternatives as an isolated choice. That is, people may create bundles of interdependent expectations by predicting their future choices recursively on the basis of each current choice. Such bundles would not change the way each component reward is discounted, but discounting of their aggregate value would move in the direction of exponentiality.

Subsequent work by both economists and psychologists has largely confirmed the first three predictions, both in human subjects whose choices are spontaneous (not calculated) and in nonhuman animals. The nonhuman experiments demonstrate that the findings are not products of experimenter suggestion or cultural norms. I will say more about prediction 4 presently.

Curve fitting. Plots of indifference points between pairs of SS and LL rewards as delay is varied show less least-squares deviation from hyperbolic than from exponential curves (reviewed in Green & Myerson, 2004; Kirby, 1997). Procedures to prevent human subjects from strategically claiming false preferences, such as second bid auctions, confirm the superior fit of hyperbolic curves (e.g. Kirby & Marakovic, 1995; Kirby & Guastello, 2001). Hyperbolic curves are not found just in short range choices. People expressing preferences for public programs that would pay off at delays up to 100 years produced a similar curve (Cropper et.al., 1992), as did an analysis of discount rates implied by long term public works investments (Harvey, 1994).
Preference reversal. When subjects are given choices between a given pair of SS and LL rewards with a fixed lag between them but a varying delay until the SS reward would be available, their preference regularly changes from LL to SS as this delay gets shorter (reviewed in Green & Myerson, 2004). This temporary preference for SS rewards is found in both people (Kirby & Herrnstein, 1995) and pigeons (Ainslie & Herrnstein, 1981; Green et.al., 1981); even in people it has been reported over time ranges varying from days (Green et.al., 1994) to decades (Green & Myerson, 2004). It can occur even when there is a delay before the earliest SS reward is available (Ainslie & Haendel, 1983; Green et.al., 2005). That is, a subject who preferred $50 in one year to $100 in three years often did not prefer $50 in four years to $100 in six years, choosing the SS reward even when it would be delayed by one year but not four years. Such findings argue against the proposal, to be discussed presently, that SS rewards are temporarily preferred only because immediacy creates an arousal effect.
Precommitment. The simplest evidence of choosing purely to avoid a later choice comes from experiments in which nonhuman animals will peck a key to avoid a later choice of immediate SS versus delayed LL food (Ainslie, 1974), or press a lever that commits them to a later SS electric shock to avoid having to choose between SS and LL shocks when the SS would be immediate (Deluty et.al., 1983). It is hard to strain humans’ self-control ability in the laboratory, but surveys of purchasing behavior in combination with subject self-reports have found many examples of deliberate precommitment, such as buying tempting consumables in small packages despite strong price incentives to buy in bulk (Wertenbroch, 1998; cf. Marshall, 1921, p. 814), and voluntarily arranging deadlines to avoid procrastinating unpleasant medical tests (Trope & Fishbach, 2000) or term papers (Ariely & Wertenbroch, 2002). Likewise, in much larger choices people are said to make illiquid investments that would be otherwise inefficient, in order to commit their future spending (Laibson, 1997)

For reasons that I will discuss shortly research and theory have neglected the fourth prediction. Although it has the most significance for practical self-control, it has been studied the least. A particularly significant feature is that it helps to differentiate the original, “pure” hyperbolic discounting hypothesis from an often-proposed hyperboloid alternative.

Hyperboloid discounting

The shape of hyperbolic discount curves is incompatible with most computations of value in economics, which assume consistent preferences over time in the absence of new information, and hence exponential curves (Koopmans, 1960). David Laibson had the insight that predictions 2 and 3, and much published evidence about human impulsiveness, were consistent with a function that would be more acceptable to economists, since it preserves the use of limit theorems to predict unique preferences from a knowledge of a subject’s prior incentives (1994, 1997). This was a step function, borrowed from an existing model of the motives in intergenerational transfers (Phelps & Pollack, 1968), and more mathematically tractable than a pure hyperbola. In the format of the first two formulae, it would be

where Value0 = value if immediate and β has one of only two values, β = 1 when reward is imminent or 0 < β < 1 at all other times; δ = 1 – discount rate (McClure et.al., 2004). Accordingly, a “hyperboloid” or “quasi-hyperbolic” discounter chooses consistently until the SS reward is imminent, when its value suddenly spikes upwards by (1 – β)/β, potentially reversing an existing preference for an LL alternative (figure 3).

The two discrete components of motivation in the β-δ model fit seductively into the long tradition of two-faculty models that date back to Plato’s chariot of the soul, pulled by the well-behaved horse of reason and the unruly horse of passion (Phaedrus, 253e). When Laibson proposed his hyperboloid formula Thaler and Shefrin had already described a two agent model, in which a farsighted (or slowly discounting) planner sets rational policies but must rely on a shortsighted (or rapidly discounting) doer to execute them (1981). The planner was just Economic Man from conventional utility theory, but the doer did not have a clear basis—nor did Laibson’s β factor.

George Loewenstein proposed a rationale for the doer-- that it could arise from the “viscerality” of some motives (1996; Lowenstein & O’Donoghue, 2004). The viscerality hypothesis was soon married to the β spike of imminent reward in formula 3 (McClure et.al., 2004), producing a model in which a person discounts the future consistently until an appetite or emotion is aroused, and then becomes markedly less patient. In an additional refinement the all-or-none spike has since been softened into a second, steep exponential discount curve from the moment of expected reward, which sums with the shallower standard discount curve (McClure et.al., 2007):

where Value0 = value if immediate, β and δ are one minus their respective discount rates, and ω is a weighting factor. I will refer to upward departure from the (shallower) exponential curve in either formula 3 or formula 4 as the β spike. The addition of one or two parameters can obviously improve how the new equations fit delay discounting behavior, removing prediction #1 as a criterion for selection-- except for parsimony. The occurrence of β spikes makes predictions #2 and #3, and under some assumptions accommodates prediction #4, but, as we will see, not as well as pure hyperbolic curves do.

Adapting β-δ theory to handle rewards that are not imminently available complicates the interpretation of formulas 3 or 4, since the spike no longer occurs just at near-zero delays. The pre-reward spike of Laibson’s hyperboloid discounting becomes an upward departure that can occur any time that emotion or appetite are stimulated (McClure et.al., 2004)—or, presumably, when they arise without a stimulus. The point from which delay should be computed is then ambiguous.

Nevertheless, the β-δ function has become virtually the standard in behavioral economics. Benabou and Tirole even refounded much of the intertemporal bargaining model as it appeared in Ainslie (1992) on the assumption that the basic discount curve is β-δ (2004, p. 857). Much empirical evidence has been consistent with the model. Loewenstein and various collaborators (1996; 1999) have built on the work of psychologist Walter Mischel (e.g. Mischel & Moore, 1980) and others to show that decisions made in a state of emotional arousal have a greater tendency to favor SS alternatives. There have also been brain imaging reports that the brain valuation sites that humans share with non-humans respond only to imminent rewards, whereas the prefrontal cortical sites that are uniquely developed in humans respond to rewards in all time ranges (McClure et.al., 2004, 2007). Berns et.al. concisely summarize recent theories about intertemporal choice, which largely favor conditioned arousal of appetite or emotion as the driving force of inconsistent preference (2007). Thus the idea that viscerality is the source of β spikes that in turn explain impulsiveness has some support in psychological research as well as common experience. However, there are problems both with viscerality as an ultimate cause of impulsiveness and with the β-δ model itself.

Elicited arousal has limited explanatory power. Despite the frequent experience that “something came over me” that can accompany preference reversals, a stimulus-response model of temptation is a crude explanation for most cases:

Viscerality has proven to be an elusive concept. Abstract rewards such as money, Amazon coupons, and the satisfaction of curiosity have been claimed to behave as visceral rewards (Lowenstein, 1999; McClure et.al., 2004).
Some impulses involve mundane activities that are not emotionally arousing—procrastination is a prominent example (Ainslie, 2010a; O’Donoghue & Rabin, 1999).
Some impulses may take longer than arousal can literally be sustained. Examples include buying a house against one’s better judgment, taking an unskilled job instead of going to college, or not contributing to a retirement plan.
In cases that do involve appetite or emotion, the arousal may be self-initiated. Although arousal is often occasioned by external events, it also occurs upon deliberately fantasizing about the activity in question, or, especially, when a person has figured out how to give herself permission for the activity—discovering a loophole, as it were, in her resolve. Much human temptation is not based on the immediate availability of consumption—even illegal drugs can be scored without undue delay—but on rationalization, of which arousal is as much the effect as it is the cause.
In the laboratory, conditioned arousal depends on stimuli (CSs) that predict a likelihood and imminence of the conditioning stimuli (UCSs; e.g. Carter & Tiffany, 2001; Field & Duka, 2001), making this mechanism a defective model of the visceral arousal in daily life that follows exposure to non-predictive stimuli or pure suggestion. Conditioning per se is just the learning of environmental contingencies (Rescorla, 1988); it does not add incentive to those contingencies, and thus should not increase preference for SS over LL rewards. Granted, reward-augmenting arousal may come from fantasizing about consumption, and avoiding such fantasizing can be a (somewhat unstable) self-control tactic against impulses in which arousal is a factor; but “conditioned stimuli” that do not predict the possibility of imminent consumption are mere occasions for such fantasy (see Ainslie, 2010b and the previous bullet).

The authors who first put forward the beta-delta proposal later softened it into the idea that two exponentially-discounting motivational centers operate continuously in tandem, choosing more or less patiently according to the admixture of hot and cool activity (van den Bos & McClure, 2013; Loewenstein et.al., 2015; see Cave, 1997). Not surprisingly, the adjustment of the three governing parameters (hot impatience, cool impatience, and balance of admixture) allow a good fit to any data, but it is hard to see why trials run under conditions of maximum heat () and maximum cool () should both still yield hyperbolic discount curves. [added June 2018]

Shallow discount curves need to be constructed. The δ factor that is supposed to govern discounting in the absence of visceral cues is unlikely to describe an elementary (untutored) perception of the world. Implicit in the β-δ explanation of preference reversals is the background of Economic Man, an agent whose δ rate is set by rational estimation of what he will need over his lifetime, discounting future outcomes only to the extent that their distance makes them uncertain to occur. Even without the problem of preference reversals it is hard to find in such a picture the intuitions of a flesh-and-blood organism.

The necessary basis of utility, reward, is an elementary selective process inherited from simpler organisms and observable in four month old infants (e.g.Darcheville et.al., 1993). We are born with steep discount curves, and continue to show them when the reward is an increase in our immediate comfort, for instance relief from boredom or from noxious noise (Navarick, 1982). We must acquire by learning whatever devices let us achieve the banker-like preferences we sometimes show (e.g. Harrison et.al., 2010; Coller et.al. 2010). The observation that financial prudence lies at the low extreme of a wide range of reported discount rates (Frederick et.al., 2002) also argues that it is constructed, the result of learned self-control techniques that compensate for steeper underlying curves. Nor can this learning be simple. Presumably, any possibility of acting directly on the discounting process would be quickly exploited to make delayed rewards loom larger, allowing a child in effect to coin reward—for instance, to feel like her birthday was a month away instead of two months. After maximal exploitation of such an action, if it can occur at all, the banker-to-be will need to find indirect ways of becoming patient; and even this search is contingent on something in her motivational endowment that creates the wish to be more patient. The consistent relative values of exponential curves do not create such a wish.

My point is that exponential functions cannot be used to derive shallow discount curves from steep ones. Although they simulate a pure hyperbola’s especially steep discount rates when a reward is imminent, these functions do not preserve its especially shallow discount rates at long delays, and thus cannot make prediction 4 unless shallow curves already exist. I will argue that the high tails of hyperbolic curves create the necessary wish to be patient, and that prediction 4—the reward bundling effect—makes patience possible.

External commitment devices can play a role in our waiting for LL rewards—social pressure is the one with which we all start as children—but do not constitute self-control. I will show that the reward bundling phenomenon permits the modeling of willpower as an intertemporal bargaining process, and thus of self-control that acts consistently over time and simultaneously with temptation. Accordingly, reward bundling can sometimes flatten people’s effective discount curves to produce the patient preferences of Economic Man.

I will further argue that reward bundling is what creates both the strength and freedom of the will as we experience them, and that rejection of this property has crippled the many attempts that have been made within behavioral economics over the past decade to model internal self-control. My criticism of these attempts does not question their mathematics or the logic otherwise derived from their assumptions, but the awkwardness of the assumptions themselves. If the phenomenon of hyperbolic discounting did not provide a direct route to internal self-control we would have to accept them as the best available heuristics; but this is not the case.

Recursive self-prediction

Prediction 4a. [Subjects choosing between a whole series of SS/LL pairs at once will have a greater tendency to choose the LL rewards than will subjects choosing between the same pairs one at a time.] Pure hyperbolic functions predict this phenomenon, but functions of the β-δ sort predict it so weakly that they could be said to contradict it. Hyperbolic discount curves are not only steeper than exponential ones when delays are short, they are also flatter than both exponential and β-δ curves when delays are long. That is, the tails of the curves are higher. As described above, this property predicts people’s disproportionately high valuation of long range investments (re prediction #1), and a more robust tendency to make long range commitments against impulses than β-δ curves predict (prediction #3). However, the most important consequence of these high tails is a combining property that is much stronger than that of β-δ curves.

It is true that even with β-δ curves, the aggregate discounted value of a series of LL rewards will be greater than that of a single LL reward, and with some values could be greater than β spikes from SS alternatives. However, this is possible only if a person already has somehow achieved low rates of discounting. As I have just argued, the much shallower curves implied by financial prudence are higher order phenomena that require explanation themselves. To get from spontaneous preference to even medium term prudence, moment to moment incentives must combine over time. The shape of the discount curves from individual rewards will determine how combinations are valued.

To see the difference in combining tendency between either exponential or hyperboloid curves and true hyperbolic curves when raw discount rates are high, consider a case of round numbers. If four LL rewards of amount = 10 are available 10 time units apart, then by formulas 1, 3, or 4 a slope that would yield a value of 1.0 at a point 10 units before the first reward (δ = 0.794) would yield 1.0 + .1 + .01 + .001, or just 1.111 at the same point for the prospect of all four rewards together (figure 4). (Ex hypothesi the effect of β spikes cannot be cumulated.)

With hyperbolic curves (formula 2) a slope that gave a value of 1.0 at 10 time units before the first LL reward (k = 0.9) would yield 1.0 + .526 + .357 + .270, or 2.15 for all four ((10/(1+(.9 x 10))+10/(1+(.9 x 20))…) (figure 5).

With a series of ten such rewards the β-δ sum is 1.111111111. It will never make it to 1.2, whereas a hyperbolic curve gives 3.08 for ten rewards and keeps climbing, albeit more slowly, for longer series. If we accept the evidence that hyperbolic discount curves from multiple delayed rewards are additive (Kirby, 2006; Mazur, 1986), then a person’s perception of her choices as being between whole series of LL versus SS rewards could greatly increase the attractiveness of the LL rewards, without any supplementary source of motivation. Single SS rewards that could overcome single LL rewards because of their proximity might not be able to do so when a series of similar choices was cumulated. Thus, hyperbolic curves from combinations of delayed rewards will often have more aggregate value than their combined SS alternatives, the tempting power of which comes only from each single SS reward as this reward becomes imminent (figure 6).

With exponential delay discount functions, SS rewards that were great enough to out-bid an LL alternative at some point would out-bid it at all points; the same relative values would remain if series of these alternatives were cumulated (figure 7). The same relative values would remain also if the δ components of SS rewards remained lower than the curve from alternative LL rewards but were supplemented by β spikes (figure 8).

There is experimental evidence that bundling series of rewards together increases subjects’ patience. Psychologists Kris Kirby and Barbarose Guastello studied choices between SS and LL amounts of money in undergraduate volunteers (2001). They used an auction procedure to find the smallest SS rewards that subjects strictly preferred to LL alternatives, then offered series of choices between these SS and LL rewards in one of three patterns: (1) free condition: five separate choices at fixed intervals of several days; (2) forced condition: delivery of the same five SS or LL rewards at the same intervals, but chosen all at once at the same moment as the first choice in (1); or a third condition that I will discuss with prediction 4b. They found that a much larger proportion of subjects chose the LL rewards when choosing all at once (condition 2) than when the subjects expected separate choices (condition 1). The authors repeated the procedure using slices of pizza, and found the same pattern. They interpreted the results to show that the students were less patient toward the single choices in a series than they were toward the series as a whole.

Choice patterns reflecting hyperbolic discount functions are also observed in animals, even more robustly than they are in humans. An increase in patience with reward bundling should therefore be seen in animals as well. In an experiment to test whether this phenomenon can arise from the basic hyperbolic curves as predicted, rather than being specific to humans, Ainslie and Monterosso let rats choose between SS and LL amounts of sugar water (2003). At values close to the subjects’ indifference points, they chose the LL alternatives more often when making choices for the next three trials all at once than when choosing on a trial-by-trial basis, a result predicted by hyperbolic but not exponential discount functions. The occurrence of the bundling phenomenon in animals is especially significant because it rules out cultural bias or other factors that require higher cognitive capacities. It might be objected that there is a qualitative species difference in the operation of reward, but in fact young children respond to reward schedules in the same pattern as animals (Bentall et.al., 1985), and can be observed to adapt this process gradually (Sonuga-Barke et.al., 1989). I argue that this adaptation occurs largely through learning to evaluate choices categorically, a process that is rewarded first by parents and then through the self-prediction process I will discuss presently.

However, there is an obvious hitch to the strategy of choosing options in whole categories instead of singly. Under the pressure of temptation, a doer could simply propose to take the imminent SS reward on this occasion and the series of LL rewards in subsequent choices. This behavior was indeed what was expected by Ted O’Donoghue and Matthew Rabin, the first economists to make a Schelling-like game theoretic analysis of temporary preferences for SS rewards (1999, 2001); it limited the realistic choices that the procrastinators in their model could make. The solution might seem to be intuitively obvious—mere awareness of the question, “If not now, when?” But a person under the influence of nearby reward can usually tell herself, “next week, when conditions will be different.” If making a resolution were enough to preserve a plan, people would not have much trouble with impulse control. On the contrary, experience soon teaches that there has to be an enforcement mechanism that gives a resolution credibility in an ongoing competition with SS rewards. I will argue that such a mechanism depends on putting whole series of rewards at stake in each choice, a condition that is made possible by the high tails of hyperbolic curves. Both parts of prediction #4 are necessary:

Prediction 4b. [Human subjects who perceive a current choice between SS and LL rewards as a test case—an example that predicts their own future preferences between similar pairs of rewards-- will be more apt to prefer the LL reward than they do when they see the pair of alternatives as an isolated choice.] To restate this hypothesis: Self-aware subjects, presumably only humans, who see their current choices as predictive of a category of similar choices in the future will thereby tie their expectation of the whole category of rewards to how they see themselves make the current choice (Ainslie, 1975, 1992, 2001). A combination of imperfect self-prediction and a tendency to temporarily prefer SS rewards sets up a limited warfare relationship among successive selves, which can be resolved by discerning—or defining—a modified repeated prisoner’s dilemma among these selves: The modification is that defection in the present case makes defection in future cases more likely, not from a motive of retaliation but by making cooperation seem likely to be wasted.

The hyperbolic relationship describes prospects with a variety of time ranges. If the time units in the above examples are days (roughly), the relevant choice might be to get drunk now on Saturday night versus feeling healthy for the rest of the week. If the time units are months, the choice might be to spend the annual bonus just received versus to pay off accumulated credit card debt. In either case the differential prospects are maximized if the person expects the series of LL rewards to be assured unless a lapse occurs, but to be jeopardized if it does. The aggregate LL reward may comprise the evident consequences of each choice, as in binges with hangovers, or may be a continuous state such as good health, adequate savings, etc. The necessary element is that the person’s expectation of getting the aggregate LL reward is put at stake when each opportunity for a relevant SS reward occurs.

Hyperbolic discount curves open the possibility that self-control arises from intertemporal bargaining, the activity in which reward-seeking processes that share some goals (e.g. long term sobriety) but not others (getting drunk tonight) maximize their individual expected rewards, discounted hyperbolically to the current moment. This limited warfare relationship is familiar in interpersonal situations, where it often gives rise to self-enforcing contracts (Telser, 1980) such as nations’ avoidance of using a nuclear weapon lest nuclear warfare become general. In interpersonal bargaining, stability is achieved in the absence of an overarching government by the parties’ recognition of repeated prisoner’s dilemma incentives. In intertemporal bargaining, personal rules arise through a similar recognition among the successive motivational states of an individual, with the difference that a future state is not motivated to retaliate, as it were, against past states that have defected. The risk of future states’ loss of confidence in the success of the personal rule, and consequent defection in their own short term interests, will present the same threat as the risk of actual retaliation. The reason that a recovering alcoholic avoids taking a single drink is not that it would move a future self to take revenge, but that it would impair the current credibility of her future sobriety, without which she does not have much current reason not to get drunk.

An implication of the reward bundling effect is that a planner and a doer exist simultaneously with regard to the same set of alternatives, depending on how the person predicts contingent reward, rather than having to take turns at absolute control as in many behavioral economic models that I will discuss presently. Also, the power of combined curves suggests a specific mechanism for the otherwise vaguely intuited phenomenon of willpower, without the intervention of any additional faculty.

For rewards that can be quantified, a combination intertemporal bargaining and social competition can lead to a personal rule to discount the future exponentially, as long as the specified exponential discount rate is not so shallow that the person’s underlying hyperbolic function tempts her into frequent violations of her rule (Ainslie, 1991). To the extent that following such a rule becomes second nature, exponential discounting may come to be experienced as an elementary perception of the world. Despite the clear competitive advantage of learning exponential valuation, interpersonal variation in such learning seems to be observable in adult populations even in choices between significant amounts of cash (Coller et.al., 2010). People’s awareness of using their current choices as test cases is also variable, probably because this process can take place without the rules’ being formalized in so many words; it is best elicited by indirect means, such as thought experiments (below, and Ainslie, 2007).

As with interpersonal negotiations, intertemporal cooperation is threatened by the availability of alternative truce lines. Under the pressure of current temptation the alcoholic may reason that drinking on New Year’s Eve would not reduce her expectation of staying sober the rest of the year. But then that might be true of her birthday too, or your birthday, or Saturdays… At least there does exist a bright line between some drinking and none, as with some smoking and none, whereas an overeater or spendthrift has much less defence against rationalizations—She has to eat some food and spend some money, and it is hard to see one diet or budget as irreplaceable. Accordingly, most smokers manage to quit for good (Garvey et.al.,2002) as do about half of alcoholics (Helzer et.al, 1991; Smart, 1975), aided by the bright line between some consumption and no consumption at all. By contrast, five percent of overweight dieters manage to achieve long term weight reduction (Garner & Wooley, 1991). The feasibility of using intertemporal bargaining for self-control depends on the topography of available distinctions between one potential category of choice and another. But within that constraint, repeated prisoners dilemma contingencies can create a will without an organ, serving a self without a seat, just as the “will” of nations not to use nuclear weapons seems to be guided by an invisible hand.

Temptation does not then depend entirely, or even mainly, on the proximity of SS rewards, but on the person’s perception of rationales by which she can see the present case as an exception to her relevant personal rule. The greatest commitments are experienced as character traits: “I am not the kind of person who gets drunk” or “…cheats a friend.” Thus the most influential self-predictions are also those most harmed by a single lapse, the kind envisioned in Bodner & Prelec’s examples of “self-signaling” (2001). Similarly, the various addictions-anonymous organizations’ seemingly paradoxical advice to admit helplessness against addiction is aimed against the ordinary use of willpower, in which a person risks more or less of the credibility of an intention by making particular exceptions, preserving enough credibility to motivate enough compliance. “Helplessness” means that such arbitrage will always lead to loss of control, so the addict cannot “use” willpower but still sees her whole sobriety to be at stake in each choice. In the same way nations can tacitly negotiate tariff practices, but are helpless against using nuclear weapons. Different degrees of flexibility are possible in different circumstances, but the role of the current choice as a precedent is the same in all these cases.

Evidence about the mechanism of internal self-control

Internal self-control goes beyond arranging external constraints Ulysses-fashion; it may sometimes include the rather unstable tactics of not paying attention to temptations or avoiding emotional arousal, but to be reliable over time it must have a more robust foundation. This foundation gets called willpower or resolve, and various authors outside of economics have suggested mechanisms for it. Philosophers (a) such as Michael Bratman (1999) and Edward McClennen (1990) have suggested that it comes from an avoidance of reconsidering prior plans ; social psychologists (b) such as Mark Muraven and Roy Baumeiser (2000; Baumeister et.al., 2006) have hypothesized a faculty like a muscle that simply requires exercise and the avoidance of strain; behavioral psychologists (c) such as Howard Rachlin (1995) and Gene Heyman (1996) have said that it comes from the discovery that “molar” or “global” patterns of choice—Aristotle’s “categorical choices”-- are more rewarding than “molecular” or “local” ones.

However, these suggestions are inadequate. Not reconsidering plans (a) can preserve prior plans by definition, but this mechanism will not account for cases where plans to resist a temptation are reconsidered and still kept, or formed de novo, in the face of the temptation. It does not meet William James’ definition of effortful will, where “both alternatives are steadily held in view, and in the very act of murdering the vanquished possibility the chooser realizes in that instant how much he is making himself lose” (1890, v.2, p. 534). Some philosophers have lately moved toward filling in this gap with intertemporal bargaining models in line with the one described here (McClennen, 1997; Hanson, 2009).

The will-muscle (b) lacks a rationale for how its force interacts with the forces that are weighed in choosing, as well as for why it should be so much stronger against some targets than others (for instance, against smoking but not overeating). The original muscle model came from experiments in which subjects’ work on an unattractive task reduced their subsequent performance on an unrelated task that also required self-control. However, later work has shown that mere expectation of an impending unattractive task has the same effect. Subjects who expect to be in an effortful situation show the same reduction in self-control as if they had already undergone the effort, which rules out literal exhaustion (“depletion”) as a mechanism (Muraven, 2006). The strength effect is probably not a literal force, but a mental accounting phenomenon, and the entity depleted is not will but willingness. This is consistent with an intertemporal bargaining model in which a single marketplace of motivation is swayed by how expectations are categorized: When a person has been “good,” or expects to be, for a significant time, she may reduce her demands on herself; and when she has been good repeatedly, the self-reputation that she stakes against relevant impulses grows, making higher demands possible.

The behaviorists’ proposed mechanism, learning the value of categorical choice (c), does not abolish wishes for impulsive gratification. If it did, the whole history of civilization would have been different. The model I am describing is more from the tradition of behavioral psychology than from any other, but it adds a feature that is sometimes considered heretical within that field—the breakdown of a behaving organism into sequential parts (Rachlin, 2005). I am arguing that this feature provides a necessary enforcement mechanism for otherwise toothless resolutions to behave according to principles.

Direct evidence for the intertemporal bargaining hypothesis is obviously hard to obtain, since it rests on a recursive self-evaluation process that may be sensitively dependent on the person’s shifting interpretations of a given set of priors. Intuitively, the process of recursive self-prediction can be pinpointed by thought experiments, the simplest of which is Monterosso’s problem (Monterosso & Ainslie, 1999): Consider a smoker who is trying to quit, but who craves a cigarette. Suppose that an angel whispers to her that, regardless of whether or not she smokes the desired cigarette, she is destined to smoke a pack a day from tomorrow on. Given this certainty, she would have no incentive to turn down the cigarette— the effort would seem pointless. What if the angel whispers instead that she is destined never to smoke again after today, regardless of her current choice? Here, too, there seems to be no incentive to turn down the cigarette—it would be harmless. Fixing future smoking choices in either direction (or anywhere in between) evidently makes smoking the dominant current choice. Only if future smoking is in doubt does a current abstention seem worth the effort. But the importance of her current choice cannot come from any physical consequences for future choices; hence the conclusion that it matters as a test case, or precedent. Other thought experiments that lead to the same conclusion, particularly Kavka’s and Newcomb’s problems, are reviewed in Ainslie (2007). It is interesting that, in direct self-reports, people insist on their wills’ being impenetrable entities, and cannot usually report their interpretation of current choices as test cases except through the medium of such thought experiments; possible reasons that people prefer this opacity are beyond the present subject.

Controlled research is more tenuous, but promising. In the bundling experiment by Kirby and Guastello, already described, a third group of student subjects received instructions for a “suggested” condition: The subjects would choose freely each week as in the “free” condition, but were also told, “the choice you make now is the best indication of how you will choose every time” (p. 159). The rate of LL choice by these subjects, to whom it was suggested that the first choice set a precedent, was intermediate between the subjects who got no suggestion and those that chose all five weekly rewards at once. Another experiment has been based on the money part of Kirby and Guastello’s design. Student subjects reported the smallest SS reward they would strictly prefer in one day over a fixed LL reward in ten days (Hofmeyr et.al, 2010). The subjects were then told that they would be offered paid choices between these two options “from four to six times” at biweekly intervals, and were actually given four. Half the subjects were self-identified smokers, a rough marker for addiction proneness. The smokers became more patient when they had to make the series of biweekly choices all at once, or when it was suggested to them that “the choice you make now is the best indication of how you will choose every time,” thus perhaps compensating for their initial trend toward being less patient than the non-smokers.

At present the best evidence for the intertemporal bargaining model may simply be its parsimony in comparison with alternative models of self-control. Behavioral economists have recently proposed a number of mechanisms, but I will now argue that they fail to describe self-control that is truly simultaneous with temptation, that is, while competing options remain open. Furthermore, recursive self-prediction specifically predicts several related phenomena, which any model of internal self-control should be able to handle, but which behavioral economists have generally not considered. An exception is the article by Benabou and Tirole (2004), who follow the present author’s account (1992) except for the mechanism hypothesized to make self-prediction necessary (see next section). I will discuss some of these phenomena after examining representative behavioral economic proposals that have recently been based on β-δ discount functions.

Models of self-control without recursive self-prediction

The primary phenomenon requiring a theory of self-control is the temporary dominance of SS rewards over LL alternatives, that is, temptation. The crucial element of any model is the way that a person overcomes this temptation. The behavioral economic models of self-control that have been proposed thus far can be described in roughly three groups, based on the self-control mechanisms hypothesized—although sometimes the exact nature of the mechanism is unclear, or more than one is suggested. First, a person might be thought of as unable to exert mental influence on her future selves, giving her no options but to set up her future incentives in advance. Alternatively, forces on the impulsive side of the conflict might be thought of as arising autonomously or semi-autonomously, and pushing actively against simultaneous motives for self-control. The mechanism of control then depends on what the contending forces are assumed to be. Finally, a person might be thought of a maximizing discounted expected utility along a single dimension, in which case self-control depends on the bookkeeping process by which she estimates prospective reward. These groups could be labeled prior strategy models, dual motivation models, and re-valuation models, respectively.

None of the models that have appeared in the behavioral economic literature thus far use true hyperbolic curves, and all stop short of recursive self-prediction itself. Informal discussion has suggested that the obstacle is economists’ determination to predict unique decisions, in principle at least, from a given set of prior motives. Recursive self-prediction not only undermines the asymptotic approach that generates such predictions, it argues for their basic impossibility, depicting instead an inescapable self-referentiality (see “free will,” below). As with self-enforcing contracts of the interpersonal kind, personal rules both create and depend on regularities of choice, but are ultimately sensitive to self-predictive evidence according to an unlimited set of possible rationales. Existing behavioral economic models cannot depict an ongoing contest between self-regulation and simultaneous temptation within an individual because of their insistence that outcomes are calculable from a knowledge of prior incentives.

Prior strategy models. The model of a relatively powerless but farsighted agent is what Strotz proposed (1956), referring back to Ulysses and the Sirens. Beginning in 1999 O’Donoghue and Rabin published a series of game-theoretic analyses of impulsiveness (1999, 2000, 2001). They pointed out that people who are not aware of their tendencies to form temporary preferences for SSs (“naifs”) would always fail to complete long term plans. People who are fully aware of these tendencies (“sophisticates”) may take evasive action, but the authors proposed no way that people could increase their motivation in opposition to a surging SS reward (the β spike, above). Sophisticates have to take the prospect of preference reversal at certain choice points as given, and find prior actions that prevent themselves from arriving at those choice points, just as Strotz had originally concluded. That is, they are restricted to plans in which the choices of all future selves are in Nash equilibrium. In Thaler and Shefrin’s terms (the first of the dualistic economic models, as Benabou and Pycia point out-- 2002) the doer of one moment is the planner of future moments, with their functions distinguished only by the β spike that influences the doer but not the planner.

Dual motivation models. Many authors have been uncomfortable with the notion that plans cannot be expected to succeed without a Nash equilibrium among all expected motivational states. Under that assumption the planner is limited to a role no more privileged than a doer. Various proposals have been made to separate the planner and doer not just by time, but by capability. A frequent solution is a dualism that pits one faculty against another, either weak flesh versus a sovereign will or a lower part of the brain restrained by a higher part.

Gul and Pesendorfer were the first behavioral economists to criticize solutions that rely on prior commitment (2001, 2004), pointing out that self-control can often be exercised in the presence of tempting choices. They proposed that self-control is achieved by “transferring resources” from the current period to a period where the person expects to change her preferences, thereby presumably funding a will-like function that can oppose the later temptation. This transfer has a measurable cost, defined as the difference between [gain from the long range goal, net of effort] and [gain from the long range goal when precommitment makes effort unnecessary]. In the face of temptation a person has three options: to rely on an external precommitment that she previously set up; to give in to the temptation; or to oppose a currently dominant impulse with a special endowment of motivation called only “self-control.” The first and second options are the same as in O’Donoghue and Rabin. The third option lets the rational agent wrestle down the impulse while it is current, as people often say they do. However, the motivation that overcomes the impulse is no longer the motivation that the person weighs in making choices. It is in a sequestered fund of “transferred resources,” which represent a cost but are not a factor in re-evaluating preference. Without a capacity to re-evaluate options at the moment of choice this model is dualistic.

Fudenberg and Levine adopted the planner-doer model and differentiated it from quasi-hyperbolic models in that there were two “subsystems” that interact to determine choice (2006). Their “long-run self” and the “short-run self” are ultimately motivated by the same utility (p. 1455), but the selves do not compete on the basis of these utilities. The interaction is that the planner constrains the choices of future doers by “choosing an appropriate utility function” for them (p. 1450). Sometimes the long run plan is enforced by physical commitments, such as by drawing from a bank account in advance the amount of money to be available in a casino; beyond such constraints, it is not clear what keeps the doer from seizing control of its utility functions to favor current consumption. Without such a mechanism, this model, too, is dualistic.

Brocas and Carrillo propose that farsighted but reward-insensitive planner functions (“principals”) can get information about the value of alternatives only by calling upon (“delegating to”) short-sighted but reward-sensitive doer functions (“agents”), at a risk of being seduced by the options that tempt the doer functions (2008). A person’s balance between impulsiveness and control then depends on the extent to which the planners invite this sampling, gathering information but sometimes finding out thereby that they face a dominant SS reward. The authors propose that the planner uses “personal rules” to limit the doer’s opportunities, but the nature of the motivation that enforces these rules is not clear.

The main problem with dual motivation models is that they do not model a single choice mechanism, only move the determinants into either a self within the self (a sovereign will, such as self-control tout court) or a conflict of “energy systems [that] have some degree of independence from each other” (Donald McIntosh, quoted approvingly by Fudenberg and Levine, 2006, p. 1449). With dual energy systems the planner gets its authority from a special kind of anti-impulsive motivation, much like the “strength” of Baumeister’s will-muscle described in the previous section (Baumeister et.al., 2006). Some authors explicitly cite Baumeister’s work (e.g. Benhabib & Bisin, 2005; Fudenberg & Levine, 2006; Loewenstein & O’Donoghue, 2007). Loewenstein and O’Donoghue suggest how this strength can be weighed against an impulse to yield a single decision value, but the source of the strength remains separate, and obscure (2007). They measure willpower as an “effort cost” that “the deliberative system must exert… to induce some behavior different from the affective [i.e. impulsive] optimum,” which they characterize also as “the cognitive effort required to induce a given deviation from the affective optimum” (ibid., p. 18). They add that strength regenerates over time, and that when it is depleted “exerting willpower becomes more costly” (p. 18).

Fudenberg and Levine have lately moved closer to a re-valuation model by framing the Baumeister group’s “strength” as “willpower stock,” a “cognitive resource” and thus not a separate motivational force (Fudenberg & Levine, 2010). Likewise, the “long-run self” now does not have its own motivation, but “maximizes the discounted sum of short-run self utilities” (p. 46), which are themselves allowed to have some long-run valence. And the history of the person’s choices can be a factor in what this sum is. These authors’ revised model recognizes utility discount rates that are inversely proportional to delay, and might seem to be heading toward reward bundling; but long- and short-run selves are still separate agents with different cognitive capacities (“the strategic naivete of the short-run self as a passive actor”—p. 49), and these selves take turns making moves in response to temptation. It is odd also that a purely cognitive stock should be depleted by exercise. Most importantly, response history is not said to affect the sums of expected reward, and the creation of history is not itself a motive in choice; the history merely reflects the depletion or (spontaneous) regeneration of willpower stock. This model still does not allow an integrated current self to evaluate and sometimes control impulses simply according to maximal expected discounted reward.

Behavioral economists have been encouraged to create dualistic models by neuropathological and recent neurophysiological studies. The brain has many areas with specialized functions, and disruption of their usual relationships by injury, surgery, or experimental manipulation often results in subjects’ over-valuation of short term rewards (Bechara, 2006). Even in the normally functioning brain sites in the limbic system, an evolutionarily older set of structures in the borderland between cortex and midbrain, have been reported to be differentially active when rewards are imminent (McClure et.al., 2004, 2007). However, none of these findings is evidence that the various brain areas fail to coordinate with one another in the normal brain to create a single marketplace of choice, as an occasional behavioral economist has pointed out (e.g. Benhabib and Bisin, 2005, p. 480). Even when some areas are abnormally stimulated, as by addictive drugs, such a marketplace seems to remain operative (Heyman, 2009). Furthermore, the functional separation of long and short term motivational centers is in doubt. Evidence that some brain valuation sites do not respond to delayed rewards has been contradicted by research showing that all valuation sites respond to delayed prospects and discount them at the same rate (Kable & Glimcher, 2007; Pine et.al., 2010).

An entirely different problem with planner/doer models, particularly those that assign these roles to distinct sites in the brain, is that a given process may relate to another process as a planner while relating to still another as a doer. For instance, a planner with a goal of good long term social relationships may have to deal with doers who want to get revenge on a rival, conduct an illicit love affair, or play an ill-advised practical joke. But the processes governing those behaviors may in turn have to act as planners against doers that try to consummate them too rashly, and those doers in turn need to control urges to panic or succumb to embarrassment in the midst of these behaviors. A person’s interest often must be able to defend itself against both longer and shorter term interests, but could not do so if, as dualistic models require, the two roles demanded different information or different capacities.

There is still another problem with the models that base resistance to temptation on paying the cost of self-control per se, rather than on the relative values of the options. They make the motivation that supplements the momentary value of the LL option into something like a fuel, which is consumed as it operates, just as a flexed muscle consumes glucose. Gul and Pesendorfer (2001, 2004) propose this effort/fuel cost to be “the utility of the best foregone option” (the reward value of the temptation at the time of its maximum) minus “the utility of the option chosen,” (the value of the long term reward at that time). Fudenberg & Levine would multiply the foregone utility by a steepness factor (2006, p. 1455). In both models the difference between temptation value and discounted long term value is said to be a measure of the effort that an external commitment device would save the person from spending—or burning-- and hence of the value of such a device. This kind of computation confounds the differential motivation needed for self-control with whatever frustration, doubt, or pain is experienced while controlling oneself. Granted, in these models the extra motivation needed does have to exceed the “cost” level as computed in either model, and exceed it at every moment where the person is free to choose; and the utility of the “best foregone option” (= impulse) may include the avoidance of subjective effort incidental to self-control. But subjective effort is not proportional to force of will; close choices are more effortful than strongly motivated choices. The more confident the recovering alcoholic or smoker is that she will not lapse, the easier it is for her not to lapse. Nor does this change come from a reduced attractiveness of the impulse. For instance, Orthodox Jews who absolutely never smoke on the Sabbath report that this takes little effort, even though they may remain active smokers on other days (Dar et.al., 2005).

Instead of just outbidding the SS alternative, the long term interest is seen in the above models as consuming a resource—the reward that was set aside earlier—to suppress it. The value of external commitment devices may well include effort spared because of the certainty they bring, but there are internal as well as external sources of such certainty. Confidence in a personal rule may also produce certainty without effort. That is, the extra quantum of effort equivalent to the reward foregone by not indulging an impulse is not necessarily consumed by (or costed to) the operation of the rule. The difference in models is between deterring enemies (outbidding impulses) and fighting them (neutralizing impulses). In dual motivation models self-control does not involve a comparison of incentives but an active neutralization process, in which the will controls an impulse by paying a recurring motivational cost that matches the extra value of the β spike. In this it is reminiscent of Freud’s anticathexis, the expenditure of energy to maintain repression, in the model he called “economic” (1915, pp. 180-185). However, the prospective cost of self-control is more apt to be multifactorial, comprising intrinsically the value of the foregone impulse, but sometimes also anxiety at possible lapsing, the pangs of appetite aroused but not satisfied, the harmless reward foregone by giving the impulse a wide berth, and possibly other components of “effort”-- but not including the anti-impulsive motivation itself, increases of which, as I have just pointed out, make self-control less effortful rather than more.

Re-valuation models. In these there are no separate faculties or poorly communicating brain centers, but the choice of commensurable options in a single marketplace. The behavioral economic versions of these models still assume β-δ discount functions, explicitly or implicitly. In them the job of self-control entails framing the LL alternatives in a way that increases their expected value, just as in a reward bundling model, but without the deterrent effect of staking aggregate prospects against current choices; and they are constrained by the poor combining properties of β-δ functions.

Benhabib and Bisin propose that self-control requires “active maintenance of a [long term] goal-like representation” whose cost, unlike that of Gul’s and Pesendorfer’s willpower, “is independent of the temptation” (2005, p. 470). Here the usual rational agent or “control process” competes with an “automatic process,” as described in psychology to underlie “classical conditioning and Pavlovian responses” (pp. 463-464). They mean their model not to depend on the “visceral/rational dichotomy per se” (p. 464n), but since a Pavlovian response is unmotivated by definition and their automatic processes compete on the basis of motivation, the best interpretation of their model is that their automatic processes are viscerally rewarded. These two kinds of “processing pathways” compete strictly on the basis of prospective reward: “An executive function, or supervisory attention system, modulates the activation levels of the different processing pathways, based on the learned representation of expected future rewards” (p. 464). The executive function calls on one or the other (necessarily before the automatic pathway is active) depending on (1) what reward the automatic pathway is expected to bring (time unspecified), and (2) the cost of inhibiting the automatic pathway, which is determined not by the alternative reward-- as in Gul & Pesendorfer-- but by “the costs of maintaining a representation [of the long term goal] in active memory” (p. 466). Temptations that the executive function estimates to be not worth this effort are given in to.

This model does not seem to depict true willpower, in which, as William James said, the temptation is “held steadily in view” (1890, p. 534); rather its marketplace is shut down at some point before an SS reward is available, after which the still-dominant long term interest uses direction of attention as a commitment device. The sustainability of this tactic is necessarily limited. The examples the authors quote from experimental psychology are Stroop-type experiments testing the direction of attention over fractions of a second (pp. 464-465). This problem is exacerbated by the possibility that a SS option might become dominant hours or even days before it is available—the lure of a binge, for instance, or of a wasteful purchase. The model is similar to the Bratman/McClennen non-reconsideration hypothesis described above, and it suffers from the same limitations. However, the authors seem also to be edging toward recursive intertemporal bargaining. They recognize that a current choice-maker “expects from all future selves the same behavioral rule he himself adopts, and in equilibrium he sets his present consumption-saving rule accordingly” (2005, p. 477). They just do not describe how setting such a present-consumption rule can let aggregated LL rewards stay dominant in a marketplace that does not shut down during the actual choice, or how this could be a recursive process—that is, how the threat of future lapses could motivate avoidance of present consumption. Similarly, Heidhues and Koszegi discuss the motivational impact of failed attempts at self-control and uncertainty about the adequacy of future attempts (2009), necessary elements of recursive self-prediction; but these authors do not relate the two factors. That is, they do not consider whether failed attempts at self-control affect future costs or probabilities of self-control.

Benabou and Tirole accept the self-enforcing contract model I have described for pure hyperbolic curves, and most of its implications (2004). However, they seek to preserve the general form of the exponential curve, with its implication of a natural stability of intentions over time. They postulate that visceral motives generate β-δ curves, but recognize that something has to keep the β spike from being anticipated so that a revaluation of the SS reward toward being LL would not occur (p. 857)-- the way that appetite ordinarily becomes factored into the value of rewards. Therefore the intensity of the β spike has to be “imperfectly known,” and so people must have “an imperfect recall of past motives and feelings,” such that they must “draw inferences from their own past actions” (p. 848). By this roundabout route recursive self-prediction and intertemporal bargaining are made necessary, permitting the implications that I have outlined above.

Only one theorist has examined the possibility of intertemporal prisoner’s dilemmas, and he seems to have misunderstood the properties of hyperbolic curves. Andrew Musau uses exponential, quasi-hyperbolic, and hyperbolic functions to calculate the incentive for cooperation in the iterated game as the sum of the discounted values of each outcome (2009). However, he assumes that the hyperbolic discount curve becomes increasingly steeper than an exponential curve as delays increase, leading to a conclusion that there is no incentive to cooperate using any of the functions, so no intertemporal bargaining will happen. On the contrary, although hyperbolic curves are steeper than exponential ones at short delays, they are decreasingly steep with longer delays and become less steep than exponential curves, leading to more incentive to cooperate as more delayed rewards are taken into account.

Recent proposals have built upon Read, Loewenstein and Rabin’s “motivated choice bracketing” (1999), the observation that choices are apt to be more patient if made between whole categories of outcomes (i.e. bracketed broadly) than between single pairs (bracketed narrowly). An agent may construct goals (Hsiaw, 2009) or reference points (Koszegi & Rabin, 2009) that represent expectations about her future choices, and that constrain future behavior by the threat of disappointing these expectations. These concepts move toward an enforcement principle for broad bracketing, since their agents are aware, if “sophisticated,” that larger categories of future rewards depend on current choices. Such an approach might operationalize the process of recursive self-prediction, but so far they have lacked an explicit role for current choices as test cases of self-control, which would be necessary for an actual bargaining model. The agents’ motivation for long term goals would also be seriously underestimated without the use of pure hyperbolic curves.

Complex phenomena predicted by intertemporal bargaining

In the behavioral economic models reviewed in the foregoing section, impulses have been cognitively blinkered processes that prevail only because a foresighted process has failed to act upon them in some way that would take their incentives out of the marketplace of choice. Certainly some tactics against SS options involve this kind of precommitment, particularly the narrowing of attention to avoid re-evaluating an intention, or the early avoidance of emotional arousal. However, in many choices both sides stay in active competition, and the SS option is preferred despite an awareness of its long term consequences. In the recursive intertemporal bargaining model proposed here either the impulsive or the patient side can prevail simply because, in a contest between interests with full cognitive abilities, it promises the greater discounted prospective reward at the moment of choice. The need for this flexibility is evident in several common observations:

Rationalization. Although the likelihood that a person will indulge in impulsive consumption is often affected by how soon it would be possible, most indulgences are continuously available without great delay. The limiting factor is then whether she regards them as permissible, that is, non-threatening to relevant larger prospects. The perception that a much larger reward, or bundle of rewards, depends on a current choice puts pressure on a person to find some reason that the current choice is not a valid test case for the larger prospect. What is commonly called rationalization is the activity of distinguishing the case at hand from a larger category—“New Year’s doesn’t count,” “someone has made me a special offer,” “”I’m on vacation,” and many other forms of “just this once.” Without recursive self-prediction, this internal legalism would lack an incentive.
Repression and denial. In the β-δ models we have been considering, a person who foresees an impulse has an incentive to avoid it by directing her attention elsewhere. However, in clinical lore the motivated manipulation of attention is more apt to be in the service of covering up impulses than of preventing them. This restriction of attention in the service of short-term interests makes sense only in the context of intertemporal bargaining: A person who has attempted willpower and committed a lapse is motivated to avoid perceiving the lapse. If she succeeds in not seeing the lapse (repression) or not interpreting it as a lapse (denial) without identifying these maneuvers, she can maintain her expectation of intertemporal cooperation. Motivated self-deception has been shown to occur even with trivial motives when the criteria with which a subject tests her own performance are sufficiently ambiguous (Mijovic-Prelec & Prelec, 2010) .
Mental accounts. Thaler and Shefrin described how people divide occasions for choice into mental accounts, for instance pocket money vs. savings vs. investments, to prevent the defeat of long term plans by the temptations of everyday life (1981). Any bias toward present consumption would make such accounts useful, but there would be no mechanism to maintain them unless they were self-enforcing contracts that depended on recursive self-prediction (see Ainslie, 1991). They are, in effect, personal rules, and subject to rationalization, repression, and denial. The increased salience of increased amounts to personal rules for thrift could account for the “magnitude effect” in which people have been reported to express more patience toward larger rewards than toward smaller ones, an effect that is not seen in nonhuman animals (Green et al., 2004).
Chronic failures of will. Repeated lapses in a particular kind of circumstance are likely to result in the person’s discrimination of that circumstance from the larger bundles that are crucial to her willpower, leading to what are in effect involuntary mental accounts-- circumscribed areas of dyscontrol in which the person abandons her attempts to use her will so as to preserve its credibility in other areas (“I have to smoke after a meal,” “I can’t get myself to speak in public,” etc.). People’s experience of what would seem to be voluntary choices as involuntary symptoms has otherwise puzzled theorists.
Compulsive character. A person’s uncertainty about how she will interpret a current choice when looking back from the moment of a future choice will create an incentive to give temptations a wider berth than sheer calculation would require. If she is especially afraid of impulses she may come to perceive a failure to give such a wide berth as itself something of a lapse (Ainslie, 1992, p. 188). Bodner and Prelec showed that the additional “diagnostic utility” of such “self-signaling” could motivate drift into poles of scrupulous self-control or irresistible impulsiveness (1997, 2001). Conversely, if the direction of drift is away from scrupulous self-control, the result may be a chronic circumscribed failure of will, as in the foregoing paragraph.
Great variability in patience. The degree of human patience (k in Formula 2) that has been estimated both from self-reports and actual choices varies hugely among people, and among kinds of reward within a person (Frederick et.al., 2002). This variability contrasts with the single-digit consistency observed in nonhuman animals (Mazur & Biondi, 2009; Ainslie & Monterosso, 2003). This contrast makes sense if we understand the human findings not to represent innate preferences, but preferences based to a greater or lesser extent on intertemporal bargaining practices, with more or less skill and divergent histories.
“Free will.” One argument that has been advanced for dual motivation models is that whereas even “the quasi-hyperbolic model has multiple equilibria, the dual-self equilibrium is unique” (Fudenberg & Levine, 2006). Of course the multiple equilibrium attribute applies all the more to a recursive self-prediction model. However, it may be that a technically chaotic mechanism such as recursive self-prediction is the way of the world. Small changes of perceived symbolism—and hence category membership-- may radically change the implications of the current choice as a test case, so that a person cannot be absolutely sure of what her motivation and hence her choice will be even in the near future. This would create both the introspective opacity and sense of participation in one’s own decisions that have been seen as key to the experience of free will (Holton, 2009; Ainslie, 2011). Economics has adopted the commonsense notion that people weigh their incentives and then act on them in straightforward fashion. However, after some experience with temporary preferences an agent may come to factor self-signaling into her choices as second nature, making her decision process fundamentally recursive. Perhaps, to paraphrase physicist Stanislaw Ulam, the scope of non-linear self-control is like the scope of non-elephant zoology.

Conclusions

The proposals that have descended from Laibson’s β-δ formula have not fully modeled self-control. Despite a sometimes Ptolemaic intricacy, they do not depict motivational interaction that occurs while temptation is present. Those models that include willpower at all either envision a faculty with a motivation (“strength”) different from the motives that are weighed in the marketplace of choice, or rely on incompatible goals among diverse brain centers. Both assumptions are questionable, but these models’ biggest problem is that they do not let resolutions withstand re-examination while being challenged by impulsive alternatives. I have suggested here a return to the pure hyperbolic discount function as originally proposed (formula 2; Ainslie, 1975, 1992), which can motivate a recursive process of self-prediction and thereby the formation of self-enforcing intertemporal contracts. Such a process does not require a separately motivated faculty of will, or incompatible goals among brain centers. As the person bargains over time this process may evolve into rational planning and the experience of free will, but also into such familiar irrational phenomena as rationalization, defensive denial, circumscribed failures of will, and compulsiveness. Differences in this evolution among individuals and among topics within an individual may account for the notorious variability in human delay discount rates that experimenters elicit. But the fundamental virtue of a pure hyperbolic discounting model is the generation of self-control that is simultaneous with impulses-- willpower in William James’ “eyes open” sense.

Notes

1. This material is the result of work supported with resources and the use of facilities at the Department of Veterans Affairs Medical Center, Coatesville, PA, USA. The opinions expressed are not those of the Department of Veterans Affairs or of the US Government. I thank Glenn Harrison, Don Ross, Jon Monterosso, and an anonymous referee for comments on earlier drafts, and Lynne Debiak for the artwork

2. Modified from their formula 5 to express momentary value.

3. Of course, sudden appetite is also a problem for pure hyperbolic curves. I have argued elsewhere that sudden arousal without a signal that a reward is more likely or closer has to incorporate a positive feedback mechanism, in which an increasing expectation of choosing to consume leads to increasing arousal and vice versa (Ainslie, 2010b). Such a model works poorly for the conditioned arousal that is commonly imagined to underlie visceral reward, and in any case requires the kind of recursive self-prediction I will be describing here..

4. Imagining forbidden impulses is often a tempting activity in itself, regarded as a venial sin by the Catholic Church, as philosopher Richard Holton has pointed out (2009).

5. Even a single LL reward as just illustrated would have a value of .001 at delay = 40 if discounted in beta-delta or exponential fashion, but .27 if discounted hyperbolically. The hyperbolic function predicts not only a greater aggregate motive but a much stronger motive for commitment in advance.

6. A β-δ agent, too, might be aware of her future unpredictability when visceral rewards became available, and, if she already had shallow devaluation curves, interpret her current choice as a test case for a bundle of future choices. β-δ theorists do start with shallow devaluation curves, but have not explored this possibility, perhaps because they do not include self-prediction in the contingencies that influence choice.

7. Others have also reported smokers to discount future money more steeply than non-smokers (Bickel et.al., 1999). The experimental manipulations did not further increase the non-smokers’ patience. A possible interpretation is that the non-smokers were already avoiding impulsiveness, but the smokers were open to improvement from strategic methods.

8. This is not to say that such calculations cannot work for whole populations; the behavior of an individual water molecule is chaotic, but water still flows reliably downhill.

9. This finding is not surprising in light of the fact that animals lacking a prefrontal cortex have long been known to evaluate SS/LL reward tradeoffs (Ainslie & Herrnstein, 1981) and sometimes to commit themselves against future impulses (Ainslie, 1974; Deluty et.al , 1983).

10. Economic models have never dealt with urges such as panic that are punished almost immediately. There is no room here to deal with the deus ex machina of “conditioned”—unmotivated-- behaviors (See Ainslie, 2010b), but their practical effect is to create even shorter-term doers with respect to all ranges of longer term planner-doers above them.

11. Heidhues and Koszegi propose a model that does not differentiate deterrence, which might be cost-less (e.g. having oneself banned at nearby casinos) from effort that has to be expended; Although they interestingly include the possibility of “internal rules” (2009, p. 428), they do not say how these are motivated or what governs their cost.

12. Most processes described by psychologists as “automatic” might equally well be called “mindless,” the kind of process that leads an abstinent smoker to light up before she thinks about it, but not one that permits smoking while aware of the conflict. An attempt is sometimes made to stretch the concept to account for compulsive behaviors, but they are clearly a different phenomenon.

References

Ainslie, G. (1974) Impulse control in pigeons. Journal of the Experimental Analysis of Behavior 21, 485-489.

Ainslie, G. (1975) Specious reward: A behavioral theory of impulsiveness and impulse control. Psychological Bulletin 82, 463-496.

Ainslie, G. (1991) Derivation of "rational" economic behavior from hyperbolic discount curves. American Economic Review 81, 334-340.

Ainslie, G. (1992) Picoeconomics: The Strategic Interaction of Successive Motivational States within the Person. Cambridge: Cambridge U.

Ainslie, G. (2001) Breakdown of Will. New York, Cambridge U.

Ainslie, G. (2007) Can thought experiments prove anything about the will? In D. Spurrett, D. Ross, H. Kincaid and L. Stephens, Eds., Distributed Cognition and the Will: Individual Volition and Social Context. MIT.

Ainslie, G. (2010a) Procrastination, the basic impulse. In Andreou, Chrisoula and White, Mark, eds., The Thief of Time: Philosophical Essays on Procrastination. Oxford, pp. 11-27.

Ainslie, G. (2010b) Hyperbolic discounting versus conditioning and framing as the core process in addictions and other impulses. In D. Ross, H. Kincaid, D. Spurrett, and P. Collins, eds., What Is Addiction? MIT.

Ainslie, G. (2011) “Free will” as recursive self-prediction: Does a deterministic mechanism reduce responsibility? In J. Poland and G. Graham (eds.) Addiction and Responsibility. MIT.

Ainslie, G. and Haendel, V. (1983) The motives of the will. .In E. Gottheil, K. Druley, T. Skodola, H. Waxman (eds.), Etiology Aspects of Alcohol and Drug Abuse. Charles C. Thomas, pp. 119-140.

Ainslie, G. and Herrnstein, R. (1981) Preference reversal and delayed reinforcement. Animal Learning and Behavior, 9,476-482.

Ainslie, G. and Monterosso, J. (2003) Building blocks of self-control: Increased tolerance for delay with bundled rewards. Journal of the Experimental Analysis of Behavior 79, 83-94.

Ainslie, G. and Monterosso, J. (2004) A marketplace in the brain? Science 306, 421-423.

Ariely, D., and Wertenbroch, K. (2002). "Procrastination, Deadlines, and Performance: Self-Control by Pre-Commitment." Psychological Science, 13(3): 219-224.

Baumeister, R. F., Gailliot, M., DeWall, C. N., and Oaten, M. (2006) Self-regulation and personality: How interventions increase regulatory success, and how depletion moderates the effects of traits on behavior. Journal of Personality, 74, 1773-1801.

Bechara, A. (2006) Broken willpower: Impaired mechanisms of decision-making and impulse control in substance abusers. In N. Sebanz and W. Prinz, eds., Disorders of Volition. MIT, pp. 399-418.

Benabou, R. and Pycia, M. (2002) Dynamic inconsistency and self-control: A planner-doer interpretation. Economics Letters, 77, 419-424.

Bénabou, R. and Tirole, J. (2004). Willpower and personal rules. Journal of Political Economy, 112, 848-886.

Benhabib, J. and Bisin, A. (2005) Modeling internal commitment mechanisms and self-control: A neuroeconomics approach to consumption-saving decisions. Games and Economic Behavior 52, 460-492.

Bentall, R.P., Lowe, C.F., and Beasty, A. (1985) The role of verbal behavior in human learning II: Developmental differences, Journal of the Experimental Analysis of Behavior 43, 165-181.

Berns, G. S., Laibson, D., and Loewenstein, G. (2007). Intertemporal choice: toward an integrative framework. Neuroeconomics, 11, 482-488.

Bickel, Warren K., Odum, Amy L., and Madden, G. J. (1999) Impulsivity and cigarette smoking: Delay discounting in current, never, and ex-smokers. Psychopharmacology,146, 447-454.

Bodner, R. and Prelec, D. (1997) The diagnostic value of actions in a self-signaling model. Paper delivered at the Norwegian Research Council Working Group on Addiction, Oslo, Norway, May 26, 1995, MIT working paper, 1997.

Bodner, R. & Prelec, D. (2001) The diagnostic value of actions in a self-signaling model (in Isabelle Brocas & Juan D. Carillo, eds., Collected Essays in Psychology and Economics, Oxford.

Bratman, M. E. (1999) Faces of Intention: Selected Essays on Intention and Agency. Cambridge, UK, Cambridge University Press.

Brocas, Isabelle and Carrillo, Juan D. (2008) The brain as a hierarchical organization. American Economic Review 98, 1312-1346.

Carter, B. L. & Tiffany, S. T. (2001). The cue-availability paradigm: The effects of cigarette availability on cue reactivity in smokers. Experimental & Clinical Psychophamacology, 9, 183-190.

Coller, M., Harrison, G. W., and Rutström, E. E. (2010) Latent process heterogeneity in discounting behavior. Oxford Economic Papers.

Cropper, M. L., Aydede, S. K., and Portney, P. R. (1992) Rates of time preference for saving lives. American Economic Review, 82, 469-472.

Dar, R., Stronguin, F., Marouani, R., Krupsky, M., and Frenk, H. (2005) Craving to smoke in orthodox Jewish smokers who abstain on the Sabbath: A comparison to a baseline and a forced abstinence workday. Psychopharmacology, 183, 294-299.

Darcheville, Jean Claude, Riviere, Vinca, and Wearden, J. (1993) Fixed interval performance and self-control in infants. Journal of the Experimental Analysis of Behavior 60, 239-254.

Deluty, M.Z., Whitehouse, W.G., Millitz, M. and Hineline, P. (1983) Self-control and commitment involving aversive events. Behavioral Analysis Letters, 3, 213-219.

Elster, J. (1979) Ulysses and the Sirens: Studies in Rationality and Irrationality. Cambridge University Press.

Field, M. & Duka, T. 2001). Smoking expectancy mediates the conditioned responses to arbitrary smoking cues. Behavioural Pharmacology, 12, 183-194.

Fisher, I. (1930) The Theory of Interest. New York: Macmillan

Frederick, S., Loewenstein, G., and O’Donoghue, T. (2002) Time discounting and time preference: A critical review. Journal of Economic Literature 40, 351-401.

Freud, S. (1915/1956) The Unconscious. in J. Strachey and A. Freud (Eds.), The Standard Edition of the Complete Psychological Works of Sigmund Freud. Hogarth Press vol 14, pp. 161-215.

Fudenberg, D. and Levine, D.K. (2006) A dual-self model of impulse control. American Economic 96, 1449-1476.

Fudenberg, D., & Levine, D. (2010) Timing and self-control Working paper

Garner, D.M., & Wooley, S.C. (1991). Confronting the failure of behavioral and dietary treatments of obesity. Clinical Psychology Review, Vol. 11, p. 767.

Garvey, A.J., Kinnunen, T., Quiles, Z.N. and Vokonas, P.S. (2002). Smoking cessation patterns in adult males followed for 35 years. Poster presented at the Society for Research on Nicotine and Tobacco Annual Meetings, Savannah, GA;

Green, L., Fisher, E.B., Jr., Perlow, S., and Sherman, L. (1981) Preference reversal and self-control: choice as a function of reward amount and delay. Behavior Analysis Letters 1, 43-51.

Green, Leonard, Fristoe, N., and Myerson, J. (1994) Temporal discounting and preference reversals in choice between delayed outcomes. Psychonomic Bulletin & Review 1, 386.

Green, L. and Myerson, J. (2004) A discounting framework for choice with delayed and probabilistic rewards. Psychological Bulletin, 130, 769-792.

Green, L., Myerson, J., Holt, D. D., Slevin, J. R., and Estle, S. J. (2004) Discounting of delayed food rewards in pigeons and rats: Is there a magnitude effect? Journal of the Experimental Analysis of Behavior, 81, 39-50.

Green, L., Myerson, J., and Macaux, E. W. (2005). Temporal discounting when the choice is between two delayed rewards. Journal of Experimental Psychology: Learning, Memory, & Cognition, 31, 1121-1133.

Gul, F., and Pesendorfer, W. (2001). Temptation and self-control. Econometrics, 69(6), 1403-1435.

Gul, F., and Pesendorfer, W. (2004). Self-control, revealed preference and consumption choice. Review of Economic Dynamics, 7, 243-264.

Hanson, C. (2009) Thinking about Addiction: Hyperbolic Discounting and Responsible Agency. Rodopi.

Harrison, G. W., Lau, M. I., and Williams, M. B. (2002) Estimating individual discount rates for Denmark: A field experiment. American Economic Review 92, 1606-1617.

Harvey, C. M. (1994) The reasonableness of non-constant discounting. Journal of Public Economics, 53, 31-51.

Heidhues, P. and Koszegi, B. (2009) Futile attempts at self-control. Journal of the European Economic Association, 7, 423-434.

Helzer, J.E., Burnham, A., and McEvoy, L.T. (1991). Alcohol abuse and dependence. In Robins, L. N. and Regier, D. A., (eds.). Psychiatric Disorders in America: The Epidemiologic Catchment Area Study, pp. 81-115. Free Press.

Herrnstein, R. (1961) Relative and absolute strengths of response as a function of frequency of reinforcement. Journal of the Experimental Analysis of Behavior, 4,267-272.

Herrnstein, R. J. (1997) The Matching Law: Papers in Psychology and Economics. H. Rachlin and D. I. Laibson, eds. Sage.

Heyman, G. M. (1996) Resolving the contradictions of addiction. Behavioral and Brain Sciences, 19, 561-610.

Heyman, G. M. (2009) Addiction: A Disorder of Choice. Harvard U.

Hofmeyr, A., Ainslie, G., Charlton, R., and Ross, D. (2010) The relationship between addiction and reward bundling: an experiment comparing smokers and non-smokers. Addiction 106, 402-409.

Holton, R. (2009) Determinism, self-efficacy, and the phenomenology of free will. Inquiry, 52, 412-428.

Hsiaw, A. (2009). Goal-setting, social comparison and self-control. Retrieved on January 25, 2010 from Princeton University, Department of Economics Web site: https://www.princeton.edu/economics/seminar-schedule-by-prog/behavioral-f09/Hsiaw-Paper.pdf

James, W. (1890) Principles of Psychology, Holt.

Kable, J. W. and Glimcher, P. W. (2007) The neural correlates of subjective value during intertemporal choice. Nature Neuroscience, 10, 1625-1633.

Kirby, K. N. (1997) Bidding on the future: Evidence against normative discounting of delayed rewards. Journal of Experimental Psychology: General, 126, 54-70.

Kirby, K. N. (2006) The present values of delayed rewards are approximately additive. Behavioral Processes, 72, 273-282.

Kirby, K. N. and Guastello, B. (2001) Making choices in anticipation of similar future choices can increase self-control. Journal of Experimental Psychology: Applied, 7, 154-164.

Kirby, K. N. and Herrnstein, R. J. (1995) Preference reversals due to myopic discounting of delayed reward. Psychological Science, 6, 83-89.

Kirby, K. N. and Marakovic, N. (1995) Modeling myopic decisions: Evidence for hyperbolic delay-discounting within subjects and amounts. Organizational Behavior and Human Decision Processes, 64, 22-30.

Koopmans, T. C. (1960) Stationary ordinal utility and impatience. Econometrica 47, 263-291.

Koszegi, B. and Rabin, M. (2009) Reference dependent consumption plans. American Economic Review, 99, 909-936.

Laibson, D. (1994) Hyperbolic discounting and consumption. Ph.D. Thesis, MIT.

Laibson, D. (1997) Golden eggs and hyperbolic discounting. Quarterly Journal of Economics, 62, 443-479.

Loewenstein, G. (1996) Out of control: Visceral influences on behavior. Organizational Behavior and Human Decision Processes, 35, 272-292.

Loewenstein, G. F. (1999) A visceral account of addiction. In J. Elster and O.-J. Skog, eds., Getting Hooked: Rationality and Addiction. Cambridge, UK: Cambridge University Press.

Loewenstein, G. F. and O’Donoghue, T. (2004) Animal spirits: Affective and deliberative processes in economic behavior. CAE Working Paper #0414. http://ssrn.com/abstract=539843.

Loewenstein, G. F. and O’Donoghue, T. (2007) The heat of the moment: Modeling interactions between affect and deliberation. http://www.cramton.umd.edu/workshop/papers/loewenstein-odonoghue-heat-of-the-moment.pdf

Marshall, A. (1921) Industry and Trade. London: Macmillan.

Mazur, J. (2000) Tradeoffs among delay, rate, and amount of reinforcement. Behavioral Processes, 49(1), 1-10.

Mazur, J. E. (1986) Choice between single and multiple delayed reinforcers. Journal of the Experimental Analysis of Behavior, 46, 67-77.

Mazur, J.E. (1987) An adjusting procedure for studying delayed reinforcement. in M.L. Commons, J.E. Mazur, J.A. Nevin, and H. Rachlin, (eds.), Quantitative Analyses of Behavior V: The Effect of Delay and of Intervening Events on Reinforcement Value. Erlbaum.

Mazur, J. E., and Biondi, D. R. (2009). Delay-amount tradeoffs in choices by pigeons and rates: Hyperbolic versus exponential discounting. Journal of the Experimental Analysis of Behavior, 91(2), 197-211.

McClennen, E. F. (1990) Rationality and Dynamic Choice. Cambridge University Press.

McClennen, E. F. (1997) Pragmatic rationality and rules. Philosophy and Public Affairs, 26, 210-258.

McClure, S. M., Ericson, K. M., Laibson, D. I., Loewenstein, G., and Cohen, J. D. (2007) Time discounting for primary rewards. The Journal of Neuroscience, 27, 5796-5804.

McClure, S. M., Laibson, D. I., Loewenstein, G., and Cohen, J. D. (2004) The grasshopper and the ant: Separate neural systems value immediate and delayed monetary rewards. Science, 306, 503-507.

Metcalfe, J. and Mischel, W. (1999) A hot/cool-system analysis of delay of gratification: Dynamics of willpower. Psychological Review, 106, 3-19.

Mijovic-Prelec, D., and Prelec, D. (2010). Self-deception as self-signaling: A model and experimental evidence. Philosophical Transactions of the Royal Society B: Biological Science, 365, 227-240.

Mischel, W. and Moore, B. (1980) The role of ideation in voluntary delay for symbolically-presented rewards. Cognitive Therapy and Research, 4, 211-221.

Monterosso, J. and Ainslie, G. (1999) Beyond Discounting: Possible experimental models of impulse control. Psychopharmacology, 146, 339-347.

Monterosso, J. R., Ainslie, G., Toppi-Mullen, P., and Gault, B. (2002) The fragility of cooperation: A false feedback study of a sequential iterated prisoner's dilemma. Journal of Economic Psychology, 23:4, 437-448.

Muraven, M. (2006) Conserving self-control strength. Journal of Personality and Social Psychology 91, 524-537.

Muraven, M. and Baumeister, R. (2000) Self-Regulation and Depletion of Limited Resources: Does Self-Control Resemble a Muscle? Psychological Bulletin, 126, 247-259.

Musau, A. (2009) Modeling alternatives to exponential discounting. Munich Personal RePEc Archive (MPRA) paper no. 16416, June 2. http://mpra.ub.unimuenschen.de/16416/

Navarick, D.J. (1982) Negative reinforcement and choice in humans. Learning and Motivation 13, 361-377.

O’Donoghue, T. and Rabin, M. (1999). Doing it now or later. The American Economic Review, 89(1), 103-124.

O’Donoghue, T. and Rabin, M. (2000) The economics of immediate gratification. Journal of Behavioral Decision-Making, 13, 233-250.

O’Donoghue, T. and Rabin, M. (2001) Choice and procrastination. The Quarterly Journal of Economics 116, 121-160.

Phelps, E. S. and Pollack, R. A. (1968) On second-best national saving and game-equilibrium growth. Review of Economic Studies, 35, 185-199.

Pine, A., Shiner, T., Seymour, B., and Dolan, R. (2010) Dopamine, time, and impulsivity in humans. The Journal of Behavioral Neuroscience, 30(26), 8888-8896.

Prelec, D. and Prelec, D. (2010) Self-deception as self-signaling: A model and experimental evidence. The Philosophical Transactions of the Royal Society B: Biological Sciences, 365(1538), 227-240.

Rachlin, H. (1995) Self-control: Beyond commitment .Behavioral and Brain Sciences 18, 109-159.

Rachlin, H. (2005) Problems with internalization. Behavioral and Brain Sciences, 28(5).

Read, D., Loewenstein, G., and Rabin, M. (1999) Choice bracketing. Journal of Risk and Uncertainty, 19, 171-197.

Samuelson, P.A. (1937) A note on measurement of utility. Review of Economic Studies, 4, 155-161.

Schelling, T.C. (1980) The intimate contest for self-command. The Public Interest, 60, 94-118.

Smart, R.G. (1975). Spontaneous recovery in alcoholics: A review and analysis of the available research. Drug and Alcohol Dependence, 1, 277–285.

Sonuga-Barke, E. J. S., Lea, S. E. G. and Webley, P. (1989) Children’s choice: Sensitivity to changes in reinforce density. Journal of the Experimental Analysis of Behavior 51,185-197.

Strotz, R.H. (1956) Myopia and inconsistency in dynamic utility maximization. Review of Economic Studies, 23,166-180.

Telser, L.G. (1980) A theory of self-enforcing agreements. Journal of Business, 53, 27-45.

Thaler, R. and Shefrin, H. (1981) An economic theory of self-control. Journal of Political Economy, 89,392-406.

Trope, Y., and Fishback, A. (2000) Counteractive self-control in overcoming temptation. Journal of Personality and Social Psychology 79, 493-506.

Wertenbroch, K. (1998). Consumption self-control by rationing purchase quantities of virtue and vice. Marketing Science, 17, 317-337.

1.This material is the result of work supported with resources and the use of facilities at the Department of Veterans Affairs Medical Center, Coatesville, PA, USA. The opinions expressed are not those of the Department of Veterans Affairs or of the US Government. I thank Glenn Harrison, Don Ross, Jon Monterosso, and an anonymous referee for comments on earlier drafts, and Lynne Debiak for the artwork.