Using R to Do Your Statistics and Crush Your Enemies (Maybe)

Over the course of my checkered career as a graduate student drudge, one of the best resources I have found for learning R, and, more importantly, actually getting it to do useful stuff, is the R guide from the Personality Project over at OSU. I encourage anyone interested in R to check it out, especially since my own experience with R got off to a rocky start; my introductory graduate course in statistics used R, but the instruction was so spotty and the concepts so difficult to understand that one day, instead of calculating a simple t-test like I wanted to, I accidentally ended up bypassing the Pentagon's firewall and starting a countdown for a nuclear warhead to be launched at Zimbabwe, which was stopped remotely at the last second by Edward Snowden.

The point is that R is a powerful language and that, once you become even partially familiar with it, you will be able to carry out basic statistical tests quickly and easily. One of the most instructive sections of the website, for me, is the one on ANOVAs, since I often use this to compare beta weights extracted across different regions of interest and test for double dissociations. Other sections give advice on how to restructure your data to be analyzed in different ways by R, linear regression, and multivariate statistics.

P.S. Some of the examples require links to datasets on the R project website which may no longer be properly linked (e.g., the ANOVA examples use commands like [datafilename = ""], but give errors when attempting to read them into a table). I've converted some of them to my personal website, which should make them able to fit into tables without any errors. So, for example, you would use a command like [datafilename="""], and so on for the other datasets.

P.P.S. I was planning to make a short video touring the personality project website and a few of the examples, but I've caught a cold recently, and right now my voice sounds mucusy and gravelly and full of sputum. While it may be pleasing for the ladies to hear my voice like this, it isn't as useful for instructional purposes; and really, that's what I'm all about.

Stats Videos

Over the past semester, I have been using online tutorials as a supplement for my Introductory Statistics course. Partly to share my knowledge with the rest of the world, partly to deflect any questions from students by asking "Well, did you watch the video yet?" While there is the possibility that such easy access to statistical concepts may reinforce truancy, absenteeism, and sloth, as well as fostering society's increasingly pathological dependence on technology, I try to be a bit more sanguine, thinking that it will serve as a useful tool for learning and memorization.

One reason behind making these is to make sure that I actually understand the concepts; another reason is to provide a reusable source of statistics material for students who do not fully grasp everything in one class sitting (and who does?); and yet another reason is to show off my wardrobe. (Hint: I rotate about two, maybe three shirts. You can see where my priorities are.)

Hot, Hot, HOT: Blog Dedicated to MVPA

In my whole world-wide wandering, I have concluded that nobody really knows what multivoxel pattern analysis (MVPA) is. For example, a couple of days ago I asked the checkout cashier at Target what MVPA was; he looked at me like some loutish knight beriddled by a troll.

I had nearly given up trying to understand what it was, until I stumbled upon this blog dedicated to vivisecting MVPA so that you can peer into the inner workings of this gruesome monstrosity. After having read this blog for the better part of an afternoon, I have concluded that it is able to make your wildest dreams come true and can answer any questions you could possibly have about MVPA, including discussions on the effect of different experiment parameters on classification accuracy, what a searchlight algorithm is, and what MVPA detects, exactly. It also includes, as far as I can tell, the only convincing argument in favor of allowing people to keep midgets as pets.

The author is a far more dedicated instructor than I am, with large swaths of R code accompanying most of the tutorials so that you can get a better feel for what's going on. I give this baby a 10/10.

SPM Official (!) Videos

Now in video form!

I don't know how I missed this, but apparently there are official SPM videos up on the SPM website (if you can believe it) - very similar to what I have been producing the past few months. It still eludes me how they managed to steal my idea over a year before I implemented it, but there you go. I haven't actually watched the casts; more like, I've skipped around to a few points in each, with the sound off, because I'm considerate and I don't want to disturb my labmates. (They have also threatened to beat me up and give me a swirly if I unmuted the volume on my computer.)

In any case, although conspicuously lacking the raw sex appeal of my tutorials, these guys still seem to do a good job in explaining the software and the concepts behind it, even if they do tend to speak at times with an accent. "If the sound is off, how do you know they speak with an accent?" It's called being cultured. (Turns up Enya; begins getting pummeled by labmates.)

A link to the videos can be found here; not that I'm insecure or anything, but please don't allow them to replace me.

Stats Videos (Why do you divide samples by n-1?)

Because FMRI analysis requires a strong statistical background, I've added a couple videos going over the basics of statistical inference, and I use both R and Excel to show the output of certain procedures. In this demo, I go over why the sums of squares of sample populations are divided by n-1; a concept not covered in many statistical textbooks, but an important topic for understanding both statistical inference and where degrees of freedom come from. This isn't a rigorous proof, just a demonstration of why dividing by n-1 is a unbiased estimation of sample variance.

SPM Tutorial 1: The Basics

As the majority of neuroimagers use SPM for data analysis, I have begun on another set of introductory tutorials geared toward the beginning SPM user. SPM is what is used in my lab (despite my plugging for AFNI on a regular basis), and while I believe it has its shortcomings - just like any other data analysis package - it has several benefits as well.

Most important, it is run through the Matlab interface. While this may be seen as a hinderance - especially since Matlab is commercial software, thereby making the price of Matlab the price of SPM - I believe that several advantages derive from using Matlab. First, Matlab is an important piece of software that not only serves as the workhorse for SPM, but also allows more complex and sophisticated data analysis, whether that data has been processed in AFNI, FSL, or any other package. Second, while SPM can be used solely through the graphical interface for most purposes, Matlab allows the user to run automated processes from the command line; and a better grasp of the Matlab syntax will make one a better programmer, in addition to strengthening the intuition between what is output to the Matlab terminal and what is being processed within SPM. Last, Matlab's use of matrices provides a clearer link between the raw FMRI data and the operations performed on that data. While Unix can simulate matrices through complex array computations (at least, I think - I've never tried it), the matrices output into Matlab are easier to comprehend and manipulate.

Because of this last point, I believe that SPM has a distinct advantage over the other packages. However, its benefits will disclose themselves only to the diligent, inquiring user who desires to harness the power of Matlab to augment and enhance their data analysis, rather than merely leaving it as a background process to be taken care of by the facade of SPM's graphical user interface. The only drawback is that, for the neuroimager who has been using a bash shell his entire life, learning a new programming environment can be daunting, irritating, and, in some cases, life-threatening. However, there is no need to fear - for those with extensive programming experience, even within one programming language, there are several crossovers into Matlab; and even for programming novitiates, I believe that Matlab can provide a safe, warm, womb-like environment to guide you over the initial hurdles of programming.

By beginning another series on a third software package, one may well ask whether there will ever be any sort of order imposed on this scattering of walkthroughs and tutorials. I admit that I make them more or less as they come to me as I desire, often in thrall of a laudanum-steeped vision; and that it does, in fact, feel as though I am merely binding together several of my most unruly children under one roof. Over the next few months I intend to create a stricter hierarchy for what tutorials should follow which, and I intend to create more ordered playlists that users can click through; but for now, it is an information dump. Partly to help others, yet more often to help me, as I remember material much better if I teach it. But are not the most rewarding acts those which satisfy the needs of all involved?

AFNI Tutorial: to3d

In the beginning, a young man is placed upon the scanning table as if in sacrifice. He is afraid; there are loud noises; he performs endless repetitions of a task incomprehensible. He thinks only of the coercively high amount of money he is promised in exchange for an hour of meaningless existence.

The scanner sits in silent judgment and marks off the time. The sap of life rushes to the brain, the gradients flip with terrible precision, and all is seen and all is recorded.

Such is the prologue for data collection. Sent straight into the logs of the server: Every slice, every volume, every run. All this should be marked well, as these native elements shall evolve into something far greater.

You will require three ingredients for converting raw scanner data into a basic AFNI dataset. First, the number of slices: Each volume comprises several slices, each of which measures a separate plane. Second, the number of volumes: Each run of data comprises several volumes, each of which measures a separate timepoint. Third, the repetition time: Each volume is acquired after a certain amount of time has elapsed.

Once you have assembled your materials, use to3d to convert the raw data into a BRIK/HEAD dataset. A sample command:
to3d -prefix r01 -time:zt 50 206 3000 alt+z *000006_*.dcm
This command means: "AFNI, I implore you: Label my output dataset r01; there are 50 slices per volume, 206 volumes per run, and each volume is acquired every 3000 milliseconds; slices are acquired interleaved in the z-direction; and harvest all volumes which contain the pattern 000006_ and end in dcm. Alert me when the evolution is complete."

More details and an interactive example can be found in the following video.


Yesterday I was surprised to find AFNI message boards linking to my first blog post about AFNI. I felt as though the klieg lights had suddenly been turned on me, and that hordes of AFNI nerdlings would soon be funneled into this cramped corner of cyberspace. If you count yourself among their number, then welcome; I hope you enjoy this blog and find it useful.

However, there are a few disclaimers I should state up front:

  1. I do not work for AFNI; I am merely an enthusiastic amateur. If you post any questions either on this blog or on Youtube I will be more than willing to answer them. However, if it is something over my head that I can't answer, then I will suggest that you try the official AFNI message board - it is policed 24/7 by the AFNI overlords, and they will hunt down and answer your questions with terrifying quickness.
  2. I am by no means an AFNI or fMRI expert; as far as you're concerned, I could be an SPM saboteur attempting to lead you astray. When I write about something, you should do your own research and come to your own conclusions. That being said, when I do post about certain topics I try to stick to what I know and to come clean about what I don't know. I hope you can appreciate that, being a guy, this is difficult for me.
  3. This blog is not just about AFNI and fMRI; it is about my brain - it is about life itself. I reserve the right to post about running, music, Nutella, Nutella accessories (including Graham-cracker spoons), books, relationship advice, and other interests. If you have a request about a certain topic, then I will be happy to consider it; however, do not expect this blog to be constrained to any one topic. Like me, it is broad. It sprawls. If you come desiring one thing and one thing only, you will be sorely disappointed; then shall you be cast into outer darkness, and there will be a wailing and gnashing of teeth.

My goal is to identify, target, and remove needless obstacles to understanding. As I have said before, the tutorials are targeted at beginners - though eventually we may work our way up to more sophisticated topics - and I try to present the essential details as clearly as possible. As you may have noticed at some point during your career, there are an elite few who have never had any trouble understanding fMRI analysis; they are disgusting people and should be avoided. For the rest of us, we may require additional tools to help with the basics; and I hope that the tutorials can help with that.

Good luck!

Unix for Neuroimagers: Shells and Variables

First, a few updates:

1) We just finished our first week of the semester here, and although things haven't been too busy, it may be a couple of weeks before I get back on a steady updating schedule. I'll do what I can to keep dropping that fatty knowledge on the regular, and educating your pale, soy-latte-white, Famous Dave's BBQ-stained faces on how to stay trill on that data and stack that cheddah to the ceiling like it's your job. And if you got one of those blogs dedicated to how you and your virgin-ass Rockband-playing frat brothers with names like Brady and Troy and Jason eating those cucumber salad sandwiches or whatever and you drop a link to this site, I'll know it. You show me that love, and I show it right the hell back.

2) While you're here, how about you donate a piece of that stack to the American Cancer Society. I mean, damn; I'm out there seven days a week on those roads, sweating and suffering, but you - you're at work procrastinating again, wringing your snow-bunny white hands over whether you should drop out of graduate school or just toughen it out and graduate in eight years, and while you're at it possibly take a swipe at that new Italian breezey who just entered the neuroscience program. Donate first, worry about those problems later.

3) We got another performance for you all this November, including Schumann's Adagio and Allegro for cello and piano, Resphigi's Adagio con Variazioni, and the Debussy cello sonata. Time and location TBA. Also, more music videos will be uploaded soon, but while you're waiting, you can listen to the latest Mozart Fantasie in D Minor, which has proved one of my most popular videos to date; last I checked, it had 57 views, which I think qualifies for viral status. We goin' worldwide, baby! World-WIDE!!

4) AFNI tutorials are next on the docket, after wrapping up the intro Unix tutorials for neuroimagers, and possibly doing a couple more FSL tutorials on featquery, FSL's ROI analysis tool. Beyond that, there isn't much else I have to say about it; now that you've mastered the basics, you should be able to get the program to jump through whatever hoops you set up for it and to do whatever else you need. There are more complex and sophisticated tools in FSL, to be sure, but that isn't my focus; I will, on the other hand, be going into quite a lot of details with AFNI, including how to run functional connectivity and MVPA analyses. It will take time, but we will get there; as with the FSL tutorials, I'll start from the bottom up.

Anyway, the latest Unix tutorial covers the basics on shells and variables. Shells are just ways of interfacing with the Unix OS; different shells, such as the t-shell (tcsh) and bash shell, do the same thing, but have different syntax and different nomenclature for how they execute commands. So, for example, an if/else statement in the t-shell looks different from a similar statement in the bash shell.

Overall, there's no need to worry too much about which shell you use, although AFNI's default is tcsh, so you may want to get yourself used to that before doing too much with AFNI. I myself use tcsh virtually all of the time, except for a few instances where bash is the only tool that works for the job (running processes on IU's supercomputer, Quarry, comes to mind). There are lots of tcsh haters out there for reasons that are beyond me, but for everything that I do, it works just fine.

As for variables, this is one of the first things you get taught in any intro computer science class, and those of you who have used other software packages, such as R or Matlab, already know what a variable is. In a nutshell, a variable is a thing that has a value. The value can be a string, or a letter, or a number, or pretty much anything. So, for example, when I type in the command
set x=10
in the t-shell, the variable is x, and the value is now 10. If I wish to extract the value from x at any time, I prepend a dollar sign ('$') to it, in order to tell Unix that what follows is a variable. You can also use the 'echo' command to dump the value of the variable to the standard output (i.e., your terminal). So, typing
echo $x
returns the following:
which is the value that I assigned to x.

From there, you can build up more complicated scripts and, by having the variable as a placeholder in various locations in your script, only have to change the value assigned to it in order to change the value in each of those locations. It makes your programming more flexible and easier to read and understand, and is critical to know if you wish to make sense of the example scripts generated by AFNI's "uber" scripts.

With all of the tutorials so far, you have essentially all of the fundamentals you need to operate FSL. Really, you only need to understand how to open up a terminal and make sure your path is pointing to the FSL binaries, but after that, all you need to do is understand the interface, and you can get by with pointing and clicking. However, a more sophisticated understanding is needed for AFNI, which will be covered soon. Very soon. Patience, my pretties.

Unix for Neuroimagers (Part 1): Fundamentals

As promised, we have begun a new series of tutorials aimed at the beginning neuroimager, who may think that he can get by without learning any computer science whatsoever. This is, unfortunately, not true, and those who wish to pursue a career in cognitive neuroscience need to at least learn the fundamentals; if not for constructing your own scripts, then at least for being able to understand and interpret what others have done. Those who refuse to go beyond their comfort zone and test the waters of what must seem to them a foreign and unfriendly language, will forever be relegated to pushing the Big Button that executes commands or scripts without really knowing what is going on. To enter this field without a working knowledge of programming is to be partially blind.

To that end, these first few screencasts will guide the beginner over the initial hurdles of navigating their computer environment solely through the command line interface. At the basic level of navigation and file manipulation, some simple analogies can be drawn between typing in commands such as cd and ls, and pointing and clicking within the standard Windows or Macintosh operating systems that most people are used to. However, at the more abstract levels involving if/else statements and for loops, there are no ready analogies I can think of, and at this point the Unix shell must be treated as any other foreign language; the only way to learn it is by doing, and by consistent repetition.

Usually I try to supplement the videos with some written commentary, but in this case, I believe that there is a very good introductory resource already out there, which can be found here. Walking through all of these tutorials will probably take less than an hour, but it is worth coming back to them from time to time to reinforce what you have learned.

For those of you who are new to neuroimaging or programming and may doubt whether it will ever really stick, let me assure you that it will, in time. Five years ago I was engaged in a research project the summer of my senior year; however, I knew nothing about computer programming. Nothing. I had taken a class of Java in high school, but barely managed to scrape by with a B-, and that was with copying all of the homeworks from one of the Asian kids in the class under threat of death. I was so useless, I didn't even know how to use Excel (no lie), and needed my advisor to hold my hand through doing the most basic analyses in SPSS. It was humiliating at first, because I knew that many of my colleagues were well versed in all of these tools which appeared as second nature to them, and I felt as though they looked down upon me as some hateful and nasty piece of filth clinging to the soles of their shoes.

However, over time I came to teach myself, more through necessity than anything else. My first job out of college required the use of Unix, the experience with which I greatly exaggerated to a ridiculous degree in my application (I had skimmed through a 1980 Unix textbook borrowed from the public library, and said that I had "experience" with Unix); and I set about to learn whatever I could and whatever I saw was most needed. What I found was that, out of the vast Pacific of commands and options present in Unix, only a minuscule fraction of them were required to successfully carry out the work that I needed to do. More important was molding my mind to accommodate the language and understand how to think through operations before translating my will into a series of typed commands. If I, who had virtually no programming background, a terrible memory, and the attention span of a bag full of puppies, could learn the basics within a year, then so can you. Persist.

This could be you!

There is no need to go it entirely alone. As with any skill, much of the fruits of your labors and lightning flashes of insight will only come through long and solitary hours of study and practice; however, it is as important to learn how to identify external resources, whether they float within the ether of the Internet or are made incarnate in the large, stinking, fleshy pile of humanity which sits next to you in your laboratory. One particularly useful resource I have found is, a message board related to all things Unix. More than once have I posted problems which I had been breaking my head against for many hours or days, only to have a complete solution posted within a matter of seconds. As long as you are willing to tolerate a little condescension and can put up with some invisible nerdling berating you for sloppy indentation, you should be able to solve most of your problems quite easily.

Good luck, comrade.