fMRI Power Analysis with NeuroPower

One of my biggest peeves is complaints about how power analyses are too hard. I often hear things like "I don't have the time," or "I don't know how to use Matlab," or "I'm being held captive in the MRI control room by a deranged physicist who thinks he is taking orders from the Quench button."

Well, Mr. Whiny-Pants, it's time to stop making excuses - a new tool called NeuroPower lets you do power analyses quickly and easily, right from your web browser. The steps are simple: Upload a result from your pilot study, enter a few parameters - sample size, correction threshold, credit card number - and, if you listen closely, you can hear the electricity of the Internet go booyakasha as it finishes your power analysis. Also, if a few days later you notice some odd charges on your credit card statement, I know nothing about that.

The following video will help you use NeuroPower and will answer all of your questions about power analysis, including:

  • What is a power analysis?
  • Why should I do a power analysis?
  • Why shouldn't I do a power analysis on a full dataset I already collected?
  • How much money did you spend at Home Depot to set up the lighting for your videos?
  • What's up with the ducks? "Quack cocaine"? Seriously?

All this, and more, now in 1080p. Click the full screen button for the full report.

The Noose Tightens: Scientific Standards Being Raised

For those of you hoping to fly under the radar of reviewers and get your questionable studies published, I suggest that you do so with a quickness. A new editorial in Nature Neuroscience outlines the journal's updated criteria for methods reporting, which removes the limit on the methods section of papers, mandates reporting the data used to create figures, and requires statements on randomization and blinding. In addition, the editorial board takes a swipe at the current level of statistical proficiency in biology, asserting that

Too many biologists [and neuroscientists] still do not receive adequate training in statistics and other quantitative aspects of their subject. Mentoring of young scientists on matters of rigor and transparency is inconsistent at best. In academia, the ever-increasing pressures to publish and obtain the next level of funding provide little incentive to pursue and publish studies that contradict or confirm previously published results. Those who would put effort into documenting the validity or irreproducibility of a published piece of work have little prospect of seeing their efforts valued by journals and funders; meanwhile, funding and efforts are wasted on false assumptions.

What the editors are trying to say, I think, is that a significant number of academics, and particularly graduate students, are most lazy, benighted, pernicious race of little odious vermin that nature ever suffered to crawl upon the surface of the earth; to which I might add: This is quite true, but although we may be shiftless, entitled, disgusting vermin, it is more accurate to say that we are shiftless, entitled, disgusting vermin who simply do not know where to start. While many of us learn the basics of statistics sometime during college, much is not retained, and remedial graduate courses do little to prepare one for understanding the details and nuances of experimental design that can influence the choice of statistics that one uses. One may argue that the onus is on the individual to teach himself what he needs to know in order to understand the papers that he reads, and to become informed enough to design and write up a study at the level of the journal for which he aims; however, this implies an unrealistic expectation of self-reliance and tenacity for today's average graduate student. Clearly, blame must be assigned: The statisticians have failed us.

Another disturbing trend in the literature is a recent rash of papers encouraging studies to include more subjects, to aid both statistical reliability and experimental reproducibility. Two articles in the last issue of Neuroimage - One by Michael Ingre, one by Lindquist et al - as well as a recent Nature Neuroscience article by Button et al, take Karl Friston's 2012 Ten Ironic Rules article out to the woodshed, claiming that small sample sizes are more susceptible to false positives, and that instead larger samples should be recruited and effect sizes reported. More to the point, the underpowered studies that are published tend to be biased to only finding effects that are inordinately large, as null effects simply go unreported.

All of this is quite unnerving to the small-sample researcher, and I advise him to crank out as many of his underpowered studies as he can before larger sample sizes become the new normal, and one of the checklist criteria for any high-impact journal. For any new experiments, of course, recruit large sample sizes, and when reviewing, punish those who use smaller sample sizes, using the reasons outlined above; for then you will have still published your earlier results, but manage to remain on the right side of history. To some, this may smack of Tartufferie; I merely advise you to act in your best interests.

The Will to (FMRI) Power

Power is like the sun: Everybody wants it, everybody finds in it a pleasant burning sensation, and yet nobody really knows what it is or how to get more of it. In my introductory statistics courses, this was the one concept - in addition to a few other small things, like standard error, effect size, sampling distributions, t-tests, and ANOVAs - that I completely failed to comprehend. Back then, I spoke as a child, I understood like a child, I reasoned like a child; but then I grew older, and I put away childish things, and resolved to learn once and for all what power really was.

1. What is Power?

The textbook definition of statistical power is rejecting the null hypothesis when it is, in fact, false; and everyone has a vague sense that, as the number of subjects increases, power increases as well. But why is this so?

To illustrate the concept of power, consider two partially overlapping distributions, shown in blue and red:

The blue distribution is the null distribution, stating that there is no effect, or difference; the red distribution, on the other hand, represents the alternative hypothesis that there is some effect or difference. The red dashed line represents our rejection region, beyond which we would reject the null hypothesis; and we can see that the more density of the alternative distribution that lies outside of this cutoff region, the greater probability we have of randomly drawing a sample that leads to a rejection of the null hypothesis, and therefore the greater our statistical power.

However, the sticking point is this: How do we determine where to place our alternative distribution? Potentially, it could be anywhere. So how do we decide where to put it?

One approach is to make an educated guess; and there is nothing wrong with this approach, given that it is solidly based on theory, and this may be appropriate if you do not have the time or resources to run an adequate pilot sample to do a power calculation. Another approach may be to estimate the mean of the alternative distribution based on the results from other studies; but, assuming that those results were significant, they have a greater probability of being sampled from the upper tail of the alternative distribution, and therefore have a larger probability of being greater than the true mean of the alternative distribution.

A third approach is to estimate the mean of the alternative distribution based on a sample - which is the logic behind doing a pilot study. This is often the best estimate we can make of the alternative distribution, and, given that you have the time and resources to carry out such a pilot study, is the best option for estimating power.

Once the mean of the alternative distribution has been established, the next step is to determine how power can be affected by changing the sample size. Recall that the standard error, or standard deviation of your sampling distribution of means, is inversely related to the square root of the number of subjects in your sample; and, critically, that the standard error is assumed to be the same for both the null distribution and the alternative distribution. Thus, increasing the sample size leads to a reduction in the spread of both distributions, which in turn leads to less overlap between the two distributions and again increases power.

Result of increasing the sample size from 4 to 10. Note that there is now less overlap between the distributions, and that more of the alternative distribution now lies to the right of the cutoff threshold, increasing power.

2. Power applied to FMRI

This becomes an even trickier issue when dealing with neuroimaging data, when gathering a large number of pilot subjects can be prohibitively expensive, and the funding of grants depends on reasonable estimates from a power analysis.

Fortunately, a tool called fmripower allows the researcher to calculate power estimates for a range of potential future subjects, given a small pilot sample. The interface is clean, straightforward, and easy to use, and the results are useful not only for grant purposes, but also for a sanity check of whether your effect will have enough power to warrant going through with a full research study. If achieving power of about 80% requires seventy or eighty subjects, you may want to rethink your experiment, and possibly collect another pilot sample that includes more trials of interest or a more powerful design.

A few caveats about using fmripower:

  1. This tool should not be used for post-hoc power analyses; that is, calculating the power associated with a sample or full dataset that you already collected. This type of analysis is uninformative (since we cannot say with any certainty whether our result came from the null distribution or a specific alternative distribution), and can be misleading (see Hoenig & Heisey, 2001).
  2. fmripower uses a default atlas when calculating power estimates, which parcellates cortical and subcortical regions into dozens of smaller regions of interest (ROIs). While this is useful for visualization and learning purposes, it is not recommended to use every single ROI; unless, of course, you correct for the number of ROIs used by applying a method such as Bonferroni correction (e.g., dividing your Type I error rate by the number of ROIs used).
  3. When selecting an ROI, make sure that it is independent (cf. Kriegeskorte, 2009). This means choosing an ROI based on either anatomical landmarks or atlases, or from an independent contrast (i.e., a contrast that does not share any variance or correlate with your contrast of interest). Basing your ROI on your pilot study's contrast of interest - that is, the same contrast that you will examine in your full study - will bias your power estimate, since any effect leading to significant activation in a small pilot sample will necessarily be very large.
  4. For your final study, do not include your pilot sample, as this can lead to an inflated Type I error rate (Mumford, 2012). A sample should be used for power estimates only; it should not be included in the final analysis. 

Once you've experimented around with fmripower and gotten used to its interface, either with SPM or FSL, simply load up your group-level analysis (either FSL's cope.feat directory and corresponding cope.nii.gz file for the contrast of interest, or SPM's .mat file containing the contrasts in that second-level analysis), choose an unbiased ROI, your Type I error rate, and whether you want to estimate power for a specific subject number or across a range of subjects. I find the range much more useful, as it gives you an idea of the sweet spot between the number of subjects and power estimate.

Thanks to Jeanette Mumford, whose obsession with power is an inspiration to us all.

Mine's Bigger: Connectivity Analysis Uses Sample Size of 439 Subjects

The next time you look at your dwindling scanning budget and realize you need to start coming in at 9:00pm on Fridays to pay a reduced scanning rate, just remember that there are other researchers out there who scan hundreds of subjects for a single study. (This isn't supposed to make you feel better; it's just a fact.)

A recent connectivity analysis by Dennis and colleagues recruited four hundred and thirty-nine subjects for a cross-sectional study to determine changes in connectivity from the ages of twelve to thirty. Overall, older participants showed decreasing long-range connectivity between regions, increased modularity (a measure of subdivision within regions), and hemispheric differences in global efficiency, consistent with developmental theories that short-range connections are pruned during adolescence while long-range connections are strengthened.

However, the observed hemispheric differences in global efficiency contrasted with previous findings:
Our results are contrary to those of Iturria-Medina et al. (2011), who found greater global efficiency in the right hemisphere, but these were from a relatively small sample of 11 subjects, and our sample is over 40 times larger. [Emphasis added.]

Boom. Over forty times larger, son. Sit the hell down. "But forty times eleven is four hundred forty." Yeah, well, the jury's still out on mathematics. "But we ran a well-controlled study!" WHAT PART OF FORTY TIMES LARGER DON'T YOU UNDERSTAND?!

Proof that size equals power can be found here.