Updated SPM Command Line, Including Beta Series Option

I've updated the code of the previously discussed previous script in the previous post, previously, including the addition of more help text to explain certain options and termination of the script if the timing file is not found.

However, the biggest change is a block of code to convert the timings files so that each trial is now a regressor, before entering them into the GLM. With so many regressors, the GLM will look a little funky; possibly something like this:

Note that, since there are so many regressors and so many scans, not all of them are going to be labeled individually; but it should look as though there is only one estimation per column, or blip of white underneath each regressor. Also due to space constraints the parameter estimability may not be completely visible, but when each trial is being estimated individually, you should still get a beta for each trial. Check the output directory to make sure that you have the number of betas you think you should have: one for each trial, plus the amount of session constants (usually equal to the number of runs for the subject).

The updated script can be found here; as always, send me any comments if you encounter any serious issues or bugs. I recommend using it as a replacement for the previously discussed previous script, since it can either estimate each trial individually or average them together, depending on whether you set the DO_BETASERIES flag to 1 or 0.

All of this is explained clearly in the following video:

A Computational Model of Arbitration between Model-Based and Model-Free Learning (Featuring Django Unchained!)

Decision-making has fascinated both neuroscientists and economists for decades; and in particular, what makes this such an intriguing topic isn't when people are making good decisions, but when they are screwing up in major ways. Although making terrible decisions doesn't necessarily bar you from having success - just look at our past six or seven presidents - alleviating terrible decisions can sometimes make your life easier, especially when it comes to avoiding decisions that could be bad for you, such as licking a steak knife.

A recent Neuron paper by Lee, Shimojo, and O'Doherty examined how the brain switches between relying on using habitual actions to make decisions, versus generating a cognitive model of what decisions might be associated with which outcomes, and making a decision based on your prediction about what should be most optimal, similar to making a decision-tree or flowchart outlining all the different possibilities associated with each action. These decision-making strategies are referred to as model-free and model-based decision systems, respectively; and reliance on only one system, especially in a context where that system might be inappropriate, would lead to inefficiencies and sometimes disastrous consequences, such as asking out your girlfriend's sister. O'Doherty, who seems to churn out high-impact journals with the effortlessness of a Pez Dispenser, has been working on these and related problems for a while; and this most recent publication, to me, represents an important step forward in computational modeling and how such decision-making processes are reified in the brain.

Before discussing the paper, let me clarify a couple of important distinctions about the word "errors," particularly since one of the layers of the model discussed in the paper calculates different kinds of error. When computational modelers talk about errors, they can come in multiple forms. The most common description of an error, however, is some sort of discrepancy between what an organism is trying to do, or what an individual is expecting, and what that organism actually does or actually receives. Errors of commission, in particular, have been extensively studied, especially in popular decision-making and reaction-time paradigms such as the Stroop task, which is simply screwing up or making an unintended mistake; but recently other forms of error have been defined, such as reward prediction error, which calculates the discrepancy between what was expected, and what was actually received. The authors contrast this reward prediction error with a related concept called state prediction error, which is the discrepancy between an internal model of the environment and the actual state that someone is in. So, actions that are appropriate or likely to be rewarded in one state, may no longer be valid once the state is detected to have shifted or somehow changed.

While this may sound like so much jargon and namby-pampy scientific argot, state prediction errors and reward prediction errors are actually all around us, if we have eyes to see. To take one example, near the end of Django Unchained, our protagonist, Django, has killed all of Calvin Candie's henchmen in a final climactic shootout in the CandyLand foyer. Stephen, thinking that Django has spent all six revolver rounds in the shootout - including a particularly sadistic dismemberment of Billy Crash - believes that he still has some options left open for dealing with Django, such as continuing to talk trash. However, when Django reveals that he has a second revolver, Stephen's internal model of his environment needs to update to take this new piece of information into account; actions that would have been plausible under the previous state he believed himself to be in are no longer viable.

A reward prediction error, on the other hand, can be observed in the second half of the scene, where Django lights the dynamite to demolish the CandyLand mansion. After walking some distance away from the manse, Django turns around to look at the explosion; clearly, he predicts the house to explode in an enormous fireball, and also predicts it to occur at a certain time. If the dynamite failed to go off, or if it went off far too early or too late, would lead to a prediction error. This distinction between the binary occurrence/non-occurrence of an event, as well as its temporal aspect, has been detailed in a recent computational model of prediction and decision-making behavior by Alexander & Brown (2011), and also illustrates how a movie such as Django Unchained can not only provide wholesome entertainment for the whole family, but also serve as a teaching tool for learning models.

This brings us to the present paper, which attempted to locate where in the brain such an arbitration process is done in order to select a model-based or model-free decision system. A model-free system, as described above, takes the lesser amount of cognitive effort and control, since using habitual or "cached" behaviors to guide decisions is relatively quick and automatic; model-based systems, on the other hand, require more cognitive control and mapping out prospective outcomes associated with each decision, but can be more useful than reflexive behaviors when more reflection is appropriate.

The task required participants to make either a left or right button press, which would make a new icon appear on the screen, and after a few button presses, a coin would appear. However, the coin was only rewarding in certain circumstances; in one condition, or "state," only certain colors of coins would be accepted and turned into rewards, while in the other condition, any type of coin would be rewarding. This was designed to favor either model-free or model-based control in certain situations, and also to compare how an arbitration model would correlate with behavior that either is more flexible under model-based conditions, or more fixed under model-free conditions, using a dynamical threshold to shift behavior from model-based to model-free systems over time. The arbitration model also computes the reliability of the model-based and model-free systems to determine which should be implemented, which is affected by prediction errors on previous trials.

Figure 2 from Lee et al showing how prediction errors are computed and then used to calculate the reliability of either a model-based or model-free system, which in turn affects the probability of implementing either system.

The authors then regressed the computational signals against the FMRI data, in order to see where such computational signals would load onto observed brain activity during trials requiring either more or less model-based or model-free strategies. The reliability signals from the model-free and model-based systems were found to load on the inferior lateral PFC (ilPFC) and right frontopolar cortex (FPC), suggesting that these two cortical regions might be involved in the arbitration process to decide which system to implement, with the more reliable system being weighted more.

Figure 4, ibid, with panel A depicting orthogonal reliability signals for both model-based and model-free systems in bilateral ilPFC. Panel B shows a region of rostral anterior cingulate cortex associated with the difference in reliability between the two systems, and both the ilPFC and right FPC correlated with the highest reliability index for a particular trial for whichever system was implemented during that trial.

Next, a psychophysiological interaction (PPI) analysis was conducted to see whether signals in specific cortical or subcortical regions modulated the activity of model-free or model-based signals, which revealed that when the probability of a model-free state was high, there was a corresponding negative correlation between both the ilPFC and right FPC and regions of the putamen also observed to encode model-free signals; significantly, no effects were found for the reverse condition when the probability of model-based activity was high, suggesting that the arbitrator functions primarily by affecting the model-free system.

In total, these results suggest that reliability signals for different decision systems are modulated by activity in the frontocortical regions, and that signals for the model-based and model-free systems themselves are encoded by several different cortical regions, including the orbital PFC for model-based system activity, and supplementary motor area and dorsolateral PFC for model-free activity. In addition, the ventromedial PFC appears to encode a weighted signal of both model-based and model-free signals, tying together how subcortical and value-computing structures may influence the decision to either implement a model-based or model-free system, incorporating reliability information from frontopolar regions about which system should be used. Which, on the face of it, can be particularly useful when dealing with revolver-wielding, dynamite-planting psychopaths.

Link to paper

Establishing Casaulity Between Prediction Errors and Learning

You've just submitted a big grant, and you anxiously await the verdict on your proposal, which is due any day now. Finally, you get an email with the results of your proposal. Sweat drips from your brow and onto your hands and onto your pantlegs and soaks through your clothing until you look like some alien creature excavated from a marsh. You read the first line - and then read it again. You can't believe what you just saw - you got the grant!

Never in a million years did you think this would happen. The proposal was crap, you thought; and everyone else you sent it to for review thought it was crap, too. You can just imagine their faces now as they are barely able to restrain their choked-back venom while they congratulate you on getting the big grant while they have to go another year without funding and force their graduate students to work part-time at Kilroy's for the summer and get hit on by sleazy patrons with slicked-back ponytails and names like Tony and Butch and save money by moving into that rundown, cockroaches-on-your-miniwheats-infested, two-bedroom apartment downtown with five roommates and sewage backup problems on the regular.

This scenario illustrates a key component of reinforcement learning known as prediction error: Organisms tend to associate outcomes with particular actions - sometimes randomly, at first - and over time come to form a cause-effect relationship between actions and results. Computational modeling and neuroimaging has implicated dopamine (DA) as a critical neurotransmitter responsible for making these associations, as shown in a landmark study by Schultz and colleagues back in 1997. When you have no prediction about what is going to happen, but a reward - or punishment - appears out of the blue, DA tracks this occurrence by increasing firing, usually originating from clusters of DA neurons in midbrain areas in the ventral tegmental area (VTA). Over time, these outcomes can become associated with particular stimuli or particular actions, and DA firing drifts to the onset of the stimulus or action. Other types of predictions and violations you may be familiar with include certain forms of humor, items failing to drop from the vending machine, and the Houdini.

Figure 1 reproduced from Schutlz et al (1997). Note that when a reward is predicted but no reward occurs, DA firing drops precipitously.

In spite of a large body of empirical results, most reinforcement learning experiments have difficulty establishing a causal link between DA firing and the learning process, often due to relatively poor temporal resolution. However, a recent study in Nature Neuroscience by Steinberg et al (2013) used a form of neuronal activation known as optogenetics to stimulate neurons with pulses of light during critical time periods of learning. One aspect of learning, known as blocking, presented an opportunity to use the superior temporal resolution of optogenetics to test the role of DA in reinforcement learning.

To illustrate the concept of blocking, imagine that you are a rat. Life isn't terribly interesting, but you get to run around inside a box, run on a wheel, and push a lever to get pellets. One day you hear a tone, and a pellet falls down a nearby chute; and it turns out to be the juiciest, moistest, tastiest pellet you've ever had in your life since you were born about seven weeks ago. The same thing happens again and again, with the same tone and the same super-pellet delivered into your cage. Then, at some point, right after you hear the tone you begin to see light flashed into your cage. The pellet is still delivered; all that has changed is now you have a tone and a light, instead of just the tone. At this point, you begin to get all hot and excited whenever you hear the tone; however, the light isn't really doing it for you, and about the light you couldn't really care less. Your learning toward the light has been blocked; everything is present to learn an association between the light and the uber-pellet, but since you've already been highly trained on the association between the tone and the pellet, the light doesn't add any predictive power to the situation.

What Steinberg and colleagues did was to optogenetically stimulate DA neurons whenever rats were presented with the blocked stimulus; in the example above, the light stimulus. This induced a prediction error that was then associated with the blocked object - and rats later presented with the blocked object exhibited similar learning behavior to that stimulus as they did to the primary reinforcer - in the example above, the tone stimulus - lending direct support to the theory that DA serves as a prediction error signal, rather than a salience or surprise signal. Followup experiments showed that optogenetic stimulation of DA neurons could also interfere with the extinction process, when stimuli are no longer associated with a reward, but still manipulated to precede a prediction error. Taken together, these results are a solid contribution to reinforcement learning theory, and have prompted the FDA to recommend more dopamine as part of a healthy diet.

And now, what you've all been waiting for - a gunfight scene from Django Unchained.