There are many things I love about my job. Grading is not one of them. Dealing with revisions of journal articles is another.
Right now, I’m confronted with the task of looking at a draft of a paper, last seen sometime in the fall of 2009, because it was under review at some journal for ages. Now it’s come back to us with the request to perform some additional analysis. Sometimes you get lucky and “additional analysis” means “add one variable and see if that changes your results”. Most of the time you don’t get that lucky, and “additional analysis” means “add a set of variables and respecify [trans: measure them differently] a set of other existing variables, and run the model in another three ways”. With this one, we got really not lucky, and “additional analysis” meant “reconfigure your experimental materials and collect a bunch of new data and run all of the models all over again with the new data to see if it changes anything”.
At least it was a request for additional analysis, and not a “thanks, but no thanks”. Rejections aren’t any fun for the researchers. They’re also not a lot of fun for editors. Which is why I’m writing a blog post instead of writing the rejection for an article that was submitted to a journal of which I am an associate editor.
Fortunately, my paper (with the request for additional analysis) is really our paper (meaning, I’m part of a research team). So there were three of us to tinker with the experimental materials and collect new data, and three of us to deal with the requested writing revisions to the draft. Sadly, there are only two of us to deal with the additional statistical analysis, since a recurring star of this show, “Roy,” is one of the team of authors and is strictly a Big Picture kind of guy. Envision him as the CEO, and me and the other guy as Operations Managers. Roy doesn’t do stats. This is actually good, in a way, because I am the Designated Computer Guru for the household, and handle all technical support and training requests for things like Excel, Anti-Virus software, and YouTube operations. Statistical analysis software is a cut above these items in terms of difficulty and complexity, and face it: it’s possible to shepherd Big Picture Guys through low-level technical stuff, but it’s almost never worth the hassle. You have to stop and unglaze their eyeballs every two minutes to start with. And then wake them up in another five. And the end result is usually a slap on the back, a “Jolly Good Work! Keep It Up!” and the sight of a rapidly retreating back.
So it’s just as well that Roy doesn’t want to get embroiled in the stats.
The problem is that Flunky One (me) and Flunky Two (other guy on the project) were trained for different stats than the ones this paper is using. We’re both Regression People. I am a pretty literate Regression Person, in that I know many, many different types of regressions, including some fairly exotic practices that were developed by Social Scientists to deal with the prolific and pervasive violations of the assumptions of statistical analyses developed for Natural Sciences. I have 24 credits of graduate statistics, quite enough for a Master’s degree in just that. But…they are all some kind of regression. I think Flunky Two is in the same boat, with the exception that he has some grounding in basic experimental stats as well, and understands ANOVAs.
The problem with this paper is that we got all excited about our research question and our Super Duper Sophisticated Experimental Instruments – which really are very sound, despite the misguided and limited views of the reviewer for this paper – and we didn’t stop to think about how we were going to analyze the data while we were building the experiments.
It was one of those Field of Dreams “build it and they will come” moments…only we didn’t realize that until we built the dataset and sat down to do the analysis.
Regression is a good thing, like a hammer. You can use a hammer for many things other than driving a nail. But you cannot use a hammer to remove a wood screw (at least, not without destroying the wood).
What we have is a dataset that has multiple dependent variables. You can’t just bust it up and look at them one at a time, because one single subject provided a response to both dependent variables, and so they’re likely to be correlated (since the answers come from a common individual). You can’t use plain regression with this. You have to use a specialized tool called MANOVA.
Without going into the specifics of MANOVA (I heard that sigh of gratitude!) it’s something that you can do with SPSS, my preferred analysis tool. Yes, it’s slightly less powerful than SAS, but it’s 95% less of a pain in the ass than SAS too, and usually doesn’t require any actual programming, which is good, because there is only room for so much stuff in my brain, and all of the slots in my personal hash table are pretty much full.
So I sighed, and started to do some original research on MANOVA. Only a reckless idiot uses a statistical analysis tool that he (or she) doesn’t understand. Playing with fire, that is. Some weeks later, I had a sufficient grasp to start the analyses.
Only problem was that MANOVA had disappeared from the SPSS menus in a recent upgrade.
Another few weeks of research revealed that MANOVA was now available through the command line. I hate writing code for statistical analysis. I have had to do it, I know how to do it, but I hate doing it. And I hated every second of doing it then, especially as I couldn’t locate the syntax for the set of options that I wanted, and the output from the procedure looked entirely – not almost entirely, but entirely – different from all of the output I could find online (because I needed some guided help in interpreting the numbers).
I battled with this for maybe six weeks, hating it all the while. Then I got an new release of SPSS, and this one indicated that the MANOVA functionality had now been wrapped up into Multivariate General Linear Models.
This required another couple weeks of research before I was ready to run it. And run it I did, and socked the results into the paper, and we went the paper off.
The same paper that has now come back to haunt me. Because – in the intervening period – I bought a new computer and migrated all (or almost all) of the files over. And I have had to change e-mail platforms two times during the period (forced changes from work), and none of those transitions offered the option of neatly importing all of my old e-mails. The e-mails about stuff like how I ran the analysis, and which models we chose, and where the hell the numbers in the tables came from, basically. So I’ve got the old draft, the old data file (I think it is the right old file, but since there are four versions of it, it’s hard to tell). And all of this is from the fall of 2009.
Now, this is actually pretty common, that you have to go back and work on statistical analyses that you performed back in the Dark Ages, so I’ve got a Master Plan for such occasions.
Step 1. Load up the data file.
Step 2. Re-run the analyses exactly as they were run for the last version.
Step 3. Check the output against the tables in the paper to make sure they are the same.
Then and only then can I start tinkering with the new stuff.
Naturally, due to the myriad sources of confusion discussed above, the output and the tables for this job are not the same. Which means I’m in the unenviable position of having to attempt to reconstruct, from the ground up, what the hell I did with this data two freaking years ago. Yeah, good luck with that, I’m hearing from you.
Where the hell can I find Mr. Peabody’s Way-Back Machine, that’s what I want to know.