`
George Ainslie
Veterans Affairs Medical Center, Coatesville PA, USA
University of Cape Town, South Africa
George.Ainslie@va.gov
Published in Neuroethics, (2017) 10(1): 143-153
Lewis ascribes the stubborn persistence of addictions to habit, itself a normal process that does not imply lack of responsiveness to motivation. How- ever, he suggests that more dynamic processes may be involved, for instance that “our recurrently focused brains inevitably self-organize.” Given hyperbolic delay discounting, a reward-seeking internal marketplace model describes two processes, also normal in them- selves, that may give rise to the “deep attachment” to addictive activities that he describes: (1) People learn to interpret current choices as test cases for how they can expect to choose in the future, thus recruiting additional incentive (willpower) against a universal tendency to temporarily prefer smaller, sooner to larger, later re- wards. However, when this incentive is not enough, the same interpretation creates incentive to abandon the failed area, leading to the abstinence violation effect and a localized weak will. (2) Normal human value does not come entirely, or even mainly, from expectation of external rewards, but is generated endogenously in imagination. Hyperbolic discounting provides an ac- count of how we learn to cultivate the hedonic impor- tance of occasions for endogenous reward by building appetite. In this account, expectations of the far future have to be rewarded endogenously if they are be as important as currently rewarded alternatives; and this importance is prone to collapse. Both will and hedonic importance are recursive and thus hard to study by controlled experiment, but do represent modelable, reward-based hypotheses about the dynamic nature of habit.
In Addiction and the Brain, Lewis argues that addiction is a pattern of choice, rather than involuntary behavior imposed by a disease. He describes the neural changes in addicts’ brains that have been held to demonstrate the disease model, and points out that all changes in people’s behavior must have neural substrates. The changes seen in addiction just reflect “recurrent desire for a single goal,” and are reversible. He acknowledges the role of dispositional and environmental factors in mak- ing some people more liable to addiction, but points out that everyone has a tendency to overvalue imminent rewards. He ascribes addictions’ especial resistance to change to the “deep learning” of the relevant neural connections. He argues that this depth is the product of habit, so belief in a disease model of addiction is mistaken.
His points are all well taken. We are learning more about the great plasticity of neural connections. The fact that most addicts give up this behavior without therapy argues that, whatever neural changes may have energized the habit, they have not eliminated the basic process of choice. Conversely, his point that people all have an innate susceptibility to addictive choices is supported by research that, if we define addiction broadly, half of Americans suffer from it [1]. He also avoids invoking an overarching “self,” whose choices are merely influenced, rather than determined, by reward. However, just to say that “addiction is an outcome of learning… that has been accelerated and/or entrenched through recurrent pursuit of highly attractive goals” is not to deal with the entrenchment process itself. Why do addictive activities form trenches, “like the ruts carved by rainwater in the garden,” while other activities shift when the reward for them shifts? Why, indeed, do addictive habits sometimes shift with great suddenness, either to sobriety or to relapse [2]? The word habit describes only their persistence itself, but it is a place to begin.
Three kinds of habits are familiar. All imply a pattern of reward, since choice without reward leads to response alternation or exploratory behav- ior rather than repetition [3]. Call them routine habits, good habits, and bad habits.
Routine habits are subroutines that you learn for navigating familiar paths to reward with a minimum of attention. Repeated rewarded behaviors get more and more efficient and require less and less attention. We learn many of these to form words, ride a bicycle, and drive to work while thinking of something else. This has been studied experimentally by seeing how long it takes subjects to change choices as contingencies of reward change, or the extent to which subjects ignore how initial choices affect opportunities at second or third choice points [4]. As Lewis describes, the development of habits is accompanied by a shift of neural activity in midbrain striatal areas from “planning” or “voluntary” to “habitual” systems [5]. A similar shift has been de- scribed from “goal-directed” or “model-based” to “model-free” systems [6].
Some authors propose routine habits as an explanation for why addictions persist in the face of contrary incentives (e.g. [7]). In making frequent choices to get small amounts of money in the laboratory, people with either addictions or obsessive-compulsive disorder (OCD) have been shown to respond to changed cues more slowly than normals [6]. However, this is not a promising hypothesis. Although routinely habitual behaviors are sometimes called automatic or robotic, “mindless” would be a better word. It is easy to call off the subroutine when you have to stop at the grocery store on the way to work. Although in some animal experiments routine habits persist despite nonreward [8], and ablation of the brain center that switches to model-based choice can increase this persistence [9]), these subjects have not been confronted with the strong disincentives that might be comparable to the costs of addiction; and in human experiments the effects of the relevant brain damage have been moderate to slight [10] (discussed in [11]).
Good habits (or hard habits) are those that take effort—keeping a diary every night or jogging every day or getting out of bed when the clock radio plays a certain theme every morning—and seem to require an excuse to break on a particular day, lest they be harder to begin again. The test of a good habit might be that you feel better off afterward, or even that you feel a slight rush of pleasure when an external circumstance prevents you from doing it today. This rush of pleasure is evidence that the habit is not something you simply prefer; but nevertheless abandoning or “breaking” the habit feels like a loss. The behaviors of sobriety may be routine habits for someone who is not tempted by alcohol, but good habits of great significance for an alcoholic, and a great loss if they are broken.
Bad habits (or lazy habits) are those that resist effort, either because the effort feels too great or is too hard to focus. Bad habits make you feel less well off afterwards. I have the bad habit of continuing to read news after I’ve finished the stories I care about. I believe it persists because the one type shades imperceptibly into the other. But I know I’ve done it when I get the stale feeling of wasted time. This is a trivial example; Lewis’ addictive habits lie at the other end of a continuum, as he points out. Although someone may call an activity that she actually prefers a bad habit—cracking her knuckles or putting her feet on the furniture, or even smoking or drinking—the term has motivational meaning only when a converse good habit has been broken. You make an effort and it is not enough. Bad habits are the junk heap of failed good habits.
Good and bad habits are familiar, but are not well modeled in the motivational literature. To begin with, they clearly have more to them than plain repetition. The observation that they often gain force with repetition needs explanation itself. Lewis seems unsure how much to rely simply on rewarded repetition as the explanation for why addictive habits resist new learning. In the book he introduces the intervening variable of compulsion: “The brain is certainly built to make any action, repeated enough times, into a compulsion. But the emotional heart of addiction—in a word, desire— makes compulsion inevitable, because unslaked desire is the springboard to repetition, and repetition is the key to compulsion” (p. 33). But is repetition sufficient for compulsion? He seems to be saying yes, but elsewhere he often hints at something more dynamic: “a brain that changes itself”; “habits link with other habits”; “bad habits self-organize like any other habits”; “habits that become self-perpetuating and self-stabilizing”; and perhaps most significantly, “Our recurrently-focused brains inevitably self-organize” (my italics).
where Value0 = value if immediate and k is degree of impatience [18].
Hyperbolic discounting creates conflict between options that will be rewarded imminently and alternatives that will be rewarded more, but later. Choices that have been shaped by later, larger (LL) rewards reflect the “objective” value of the rewards, but are apt to be overturned by the lure of smaller, sooner (SS) rewards as the pairs of alternative outcomes draw nearer (rightmost pair in Fig. 1). The SS rewards could be said to be the basis of an interest that opposes an LL interest, by analogy to economic interests in markets. Trivially, any behaviors that are learned to get a reward could be called its interest, but the term is useful only where one interest has an incentive to interfere with another one. The key property of these interests is that an SS interest does not die out if its goal is not chosen at a distance, because its relative influence will grow as it draws closer. The SS interest will not be extinguished even if it fails to get its goal after many tries, as long as it has a chance of succeeding. This conflict can be demonstrated with nonhuman animals in the laboratory [19], but is probably not significant in nature because their long term interests—to hoard, migrate, build dams—are served by inborn instincts that reward the necessary behaviors immediately. In effect, hoarding is fun. Humans, by contrast, have to learn their long term interests. Innate instincts clearly keep some influence—hoarding may still be fun—but these incentives are likely to be distractions from the kinds of goals that can be seen over months or years, and so to be the basis of short term interests.
Reports of hyperbolic discounting and its variants (for instance hyperboloid discounting, [20, 21]) have led to widespread awareness of precommitting behaviors that serve long term interests, which can be demonstrated in elementary form in nonhumans [22, 23] and with more sophistication in humans [20, 24]. Acting in your long term interest there are simple ways to forestall temptations: keep your attention away from them so they do not enter consideration, inhibit the relevant appetite or emotion before it gets too strong, find external influences (for instance Alcoholics Anonymous) or commitments (for instance disulfiram = Antabuse). The wishes of your family and friends are major incentives in normal self-control. But all of these methods have serious limitations: Attention is hard to divert for long; appetites and emotions are rewarding in their own right; other people have their own agendas, and neither they nor physical commitments may be available when you need them.
The most effective impulse control has a large internal component. People do learn to choose consistently in practice, at least when dealing with money transactions, lest someone else who has learned this skill take advantage of their impulsiveness. But the learning is easier in some topics than in others, and some people become more skillful at it than others. Most importantly, its practice is not ahistorical, so someone’s record at practicing it in the past affects her potential to do it now.
“Habit” from Intertemporal Bargaining The few existing models of internal self-control have suggested that a person either 1) has a separately motivated motivational faculty that exerts its “strength” [25] or 2) avoids weighing her incentives after the moment of change [26]. I have detailed the problems with these models elsewhere [12]. Alternatively, with the relatively flat tails of hyperbolic discount curves, just making a whole series of SS/LL choices at once gives a boost to the LL interest (Fig. 1), a phenomenon that consistency-maintaining (exponential) discount curves would not produce. The expected additive effect has been found in both nonhumans [27] and humans [28, 29].Limitations of Intertemporal Bargaining Probably only humans use our own past and present behaviors as cues to predict what we will do in the future. Even for humans it seems like a jury-rigged method for planning, not shaped specifically by the needs of long term consistency—in the way, for instance, that animals have evolved longer memories for flavors than for other information so as to identify a poison they ate hours before becoming sick (“bait-shyness” [32]). Self-control by identifying intertemporal prisoners' dilemmas is a kludge. As we might expect, it has major limitations. Intertemporal bargains only compensate for the underlying hyperbolic discount function, rather than changing it. Recruiting incentive by interpreting current choices as test cases creates resolve that is often effective but is also brittle. In the face of strong temptations this tactic is apt to backfire:
The latter two side effects can be expected to limit the power of the will, leaving long term interests at the mercy of delay discounting.
Addictions are especially apt to be ascribed to habit when they persist despite progressive increase in their prospective harm. Prospection is often said to be blunted in addicts, but this is not an adequate explanation. An addiction often leaves a person’s factual expectations intact, or at least no less accurate than other people’s—she often expects to get cancer, HIV, jail time or death. As for predicting unhappiness, most people are poor at predicting how their actual moods will be only a few weeks hence. Subjects’ predictions of future experience are distorted by being “essentialized” (fail to take account of detail), do not allow for fatigue, and do not imagine changing circumstances [38]. The short answer seems to be that addicts can do the same job at predicting future unhappiness as others, but care less about it. However, this observation confronts us with our lack of knowledge about how valuation of the future normally takes place.
A reward-based analysis in particular is complicated by the apparent steepness of the inborn discount rate. The data we have only suggest the nature of this process, but they still make it clear that a straightforward delay discounting model is not adequate. I propose a modification that incorporates the detachment of forward-looking motivation from objective evidence, while maintaining the assumption of strict determination by a single reward-comparing mechanism. However, I do abandon the behaviorist discipline that reward must come from external events.
When current comfort is at stake, its demands tend to overwhelm other motives. Most studies of delayed gratification deal with surplus value. Whether an experimental subject chooses $50 now or $100 in a year, her resources for sustaining a good mood over the following few hours will remain the same. The discount rate for actual comfort vs. discomfort is much steeper. The single digit annual discount rates that are adequate to sell people financial investments clearly apply only to surplus wealth—that beyond what is needed to sustain current hedonic tone. Four year old children, who can metarepresent others’ beliefs [39] and tell distances to past events, still have difficulty waiting a few minutes to get two marshmallows instead of one [40]. Even adult subjects who strongly prefer six bits of food to two when both are immediate will usually not choose the six bits if they will be delayed two minutes—in marked contrast to the same subjects’ patience for sums of money [41]. In the extreme, the desire for continuing a crack cocaine high has often been enough to bankrupt someone, a modern equivalent of the Biblical Esau selling his birthright for a mess of pottage. More mundanely, we have little tolerance for the boredom of a bad lecture or getting stuck in traffic, times when our usual supply of entertainment is interrupted. Volunteer subjects will often choose not to wait two minutes to quadruple their access to a video game [42] and are similarly impatient to get relief from unpleasant noise [43]. Playwrights notoriously have to design not just a plot that develops over two hours or so, but smart dialogue that provides payoffs from minute to minute—cf. the “flip value” required of novelists. At the discount rates implied by people’s impatience with actual discomfort, the conventional exponential formula makes the value of an experience that is even a few days away infinitesimal. And yet people often deprive themselves seriously for distant goals, even resist torture—or give up addictions. How can we understand this contrast?
The relatively high tails of hyperbolic discount curves raise the value of distant events relative to what it would be with exponential curves, but this would still not be enough for events that are expected after days to compete with events that are expected after minutes [44]. Call the realm that is distant enough that expected options cannot compete, even with the help of bundles, the far future. Expectations for the far future have to bring into the present not only the picture of future events but also a significant share of their likely motivational impact. Beliefs about the risks of smoking, for instance, must create a significant fraction of the incentive created by facing the diagnosis of cancer if they are to compete with an immediate nicotine sensation. In effect, the discounted value of distant prospects has to be amplified to compete with attractions at hand. To compete in real time with these attractions—even to avoid the boredom of waiting for distant outcomes– the models that govern expectations of the future must pay off currently, that is, must be games that pay in the same league as video games or tasty snacks. Since the reward in these scenarios is evidently not discounted continuously from the far future, it must be endogenous—generated in imagination.
Far Future Expectations Depend on Endogenous Reward The motivational effect of scenarios varies with a mental process that may be independent of its predictive accuracy. Conventional bookkeeping makes the reward value of any option depend ultimately on the extent to which it predicts hardwired rewards, a set not restricted to food, comfort, drugs, and sex but still innately configured, non-assignable [45, 46]. A soft currency of imagined rewards might seem to need backing by a hard currency that is outside of a person’s control, lest she short-circuit the reward process and divert reward from its adaptive purpose. But there are many concrete cases of eating food or gratifying sexual desire where we can consume rewards at will, and in those cases the constraint is our appetites for them. An analogous constraint, combined with a hyperbolic impatience to gratify the appetite before the best time, permits a model of endogenous rewards that stand on their own [47, 48]. It can be argued that the great majority of “secondary” rewards in a wealthy society do not predict hardwired primary rewards, but occur by an endogenous process.
According to this model, we set up endogenous rewards in imagination by the same process as in daydreaming, but controlling the hyperbolically discounted urge to cash them in early, as it were, by attaching them to infrequent and unpredictable occasions. This attachment is a betting process, the terms of which are enforced by the same recursive self- prediction as personal rules—the cost of cheating at solitaire, or of saying that a frightening movie is “only a story,” is to reduce the stake of similar bets in the future. Likewise, the cost of laxness in testing the reality of far future expectations, or of not imagining the future at all, is to reduce our stake in predictive evidence—the occasions for endogenous reward in scenarios. This stake could be called hedonic importance, which we experience when we “give importance” to a project and “find it important” in turn. With endogenous reward, it is the aptness of occasions to pace appetite that determines their value as external goods. This aptness for being bet on is thus the counterpart of hardwired rewardingness. The aptness has its own determinants, both the probability pattern of the occasions and the importance that they have recursively developed [47, 48]. These determinants overlap with realism, but must ultimately serve the need of far future scenarios to be good stories in order to compete with imminent alternatives. Given the limitless potential of endogenous reward, the importance of the occasions on which it hinges can cumulate to enormous values—spending thousands to spot a white tiger in nature, climbing Mount Everest, perhaps dying a martyr, or just obtaining portents of a full, satisfying life in the future. Conversely, investment in hardwired drug effects or in the challenges that build appetite for gambling or video games can undermine the hedonic importance of future prospects. Since this importance is not determined by discounting the objective value of future prospects, but by current imagination that is at most inspired by such value, it is subject to the same drift or collapse as the importance of sports teams or romantic quests. That is, the notion that future goods are “really” worth present sacrifice is not a perception but a construct: recursively determined hedonic importance.
The neural correlates of scenarios in self-control are just beginning to be visible: more patient choice has been found to be correlated with activity in the ventromedial prefrontal cortex (PFC) when subjects imagine future events [49]. Similarly, presenting subjects with words naming their own expected future events during an intertemporal choice task causes more patient choice, accompanied by activity in the ventromedial PFC and anterior cingulate gyrus (an “episodic imagery network”) and increased coupling between this gyrus and the hippocampus [50]. These findings are tantalizing, but the motivational contingencies that induce and constrain the activity of imagination cannot themselves be seen. Even activities that pay off in the very short term may be rewarded endogenously, as shown in some experiments with what is ostensibly secondary reward. In one, subjects chose between amounts of expected money ranging from $0.01 to $0.24 at delays ranging from 2 s to 64 s. The value of delayed amounts declined as a hyperbolic function of the delay, even though the money itself would not be delivered until the end of the experiment, and obviously could not be spent until even later [51]. As the authors pointed out, the substantial behavioral impact of these meaningless delays indicated that the reward announcements were valued for their own sake, like points in a video game. In a similar experiment using winnings signaled by pictures, the ventral striata of half the subjects showed discounting, but the brains of the other half did not, the latter half realistically reflecting the uniform delay to actual delivery [52]. The observation that half of subjects did not show discounting also suggests that this valuation did not occur through a passive process such as secondary reward. As for the impact of these short term endogenous rewards on welfare, the addictive potential of immediate endogenous rewards from video games and other apps is just becoming apparent [53, 54].
Hyperbolic Discounting Is still Present in Scenarios Although it is clearly impossible that far future prospects are discounted continuously over their expected delays, their value in scenarios still tends to be rated hyperbolically—for instance, in reported preferences on the order of $4000 now vs. $10,000 in ten years [55]. The same hyperbolic pattern is seen when subjects value future health [56], climate change [57], or procrastination [58]. However, the fact that the impatience factor k in subjects’ discount formulas (Formula (1)) varies by hundreds-fold (e.g. [17]) indicates that they are not reporting the raw feel of the various outcomes; they are apparently imagining discount factors according to personal meanings. (The variation of k among members of a nonhuman species is in the single digits [59].) Another telling finding, if replicated, is that brain imaging subjects who evaluate amounts versus delays of money with respect to a future reference point have been reported to discount hyperbolically in relation to that point as if to the present. That is, if offered $20 in 60 days vs. $30–60 at 180 days, the shape of their curve to the 60 day point is the same as that of the curve to the present moment when they choose $20 now vs. $30– 60 at 120 days, albeit with amplitudes reduced proportionately [60]. This result suggests that scenarios allow subjects to move a make-believe “now” to a future point, and thus that subjects in other discounting studies might feel similarly detached from the literal contingencies of delay. The hyperbolic discounting of outcomes in the far future seems to be somewhat constructed, rather than anchored in raw anticipation.
Certainly the hyperbolic value function shows up in studies that do not involve delay of payment, or even payment at all—for instance in the report that volunteer subjects value a hypothetical past prize as a hyperbolic function of its supposed recency [61], or where subjects report willingness to make altruistic gifts as a hyperbolic function of a wholly dimensionless attribute, “social distance” [62]. The hyperbolic shape seems to suggest itself to people’s scenarios involving quantity. The important point is that the hyperbolic discounting reported in choices about the far future appears to be just a widespread feature of scenarios, not the presumably innate psychophysical discount function itself (such as that demonstrated in nonhumans, for instance [19]). It seems to be learned readily, perhaps by simple analogy, but is nevertheless elective just as an exponential pattern of discounting is.A person chooses behaviors for three kinds of incentive, all of which are well known, and the first two of which are well studied: The behavior may be intrinsically rewarding; it may be instrumental in getting other rewards; or it may acquire hedonic importance through endogenous reward. A professional athlete is rewarded by physical sensations including endorphins as she performs her activities, by pay, and by the occasions for endogenous reward provided by events in the play. A fan watching the athlete is rewarded only by endogenous reward, which, however, can reach great intensity as her history of fandom increases the play’s hedonic importance. This importance grows or shrinks by recursive self-prediction. Any of these incentives can lead to choices that reduce long term reward—impulses that sometimes become addictions. When the incentives are perceived as RPDs they are apt to give rise to intertemporal bargains (personal rules), which are also enforced by recursive self-prediction [48]. Subsequent defections may make the impulses worse, sometimes entrenching an addiction. Discriminating the two recursive mental processes—intertemporal bargaining and hedonic importance—by controlled experiment is probably impossible, though it might be parametrically modeled, Turing fashion. My point here is that addictive “habits” are not explained by repetition, but by something more dynamic.
Rewarded repetition itself does not insulate an activity against extinction; in fact at the most elementary level, unpredictability is what does this [63]. I have argued here that Lewis’ “deep trench” is based on two recursive phenomena:
These two factors are apt to be how “synaptic networks are not only self-reinforcing but mutually reinforcing…to form a web that holds addiction in place.” As Lewis points out, addictive choices are still motivated.
This material is the result of work supported with resources and the use of facilities at the Department of Veterans Affairs Medical Center, Coatesville, PA, USA. The opinions expressed are not those of the Department of Veterans Affairs or of the US Government.
1.From ventral to dorsal striatum in rats, or the analogous dorsomedial to dorsolateral striatum in humans [3].
2. Even pains and negative emotions must compete for attention by a positive value up front, experienced as an urge [65].
3. The “intrinsic” rewards that roboticists have begun to model are still inborn, “inherently interesting or enjoyable” [66, 67].
4. People sometimes value even recent experiences by some means other than the summation of momentary values found over multiple trials with nonhumans [68]. In a pioneering project to observe directly how people evaluate visceral experiences, Kahneman and his coworkers found that “decision utility” is not the integral of momentary experiences [69]. That is, a subject’s estimate of how painful a just-passed laboratory procedure was is the sum of her most extreme and most recent memories of it. So, for instance, adding a period of lesser discomfort at the end of a colonoscopy leads subjects to rate it less aversive. The subjects seem to have been sampling their component experiences rather than adding them up, and doing so without regard to their durations.