Neolefty: January 2010 Archives

January 24, 2010

Keeping computers from ending science's reproducibility

Here's a really good description of one of the key things that the people I work with are wrestling with right now:

Ars Technica: Keeping computers from ending science's reproducibility

Basically, scientific exploration is relying more and more on computation. Scientists used to know how to tell other scientists how to reproduce their results, when it was just manipulation of the physical world. But computers have turned out to be really messy, and it's hard to describe how they were used to perform a particular experiment.

One of the main things we're trying to provide (that is, the people I work with at NCSA) is an accurate -- even reproducible -- description of the computer processes that led to particular data or conclusions. It's the kind of thing that Joe Futrelle can foam at the mouth about, but only arouse concerned looks from the people around him because it sounds so esoteric and fiddly. I thought this article did a good job of explaining why it's vital to the process of scientific investigation. Here are the first two paragraphs:

In recent years, scientists may have inadvertently given up on a key component of the scientific method: reproducibility. That's an argument that's being advanced by a number of people who have been tracking our increasing reliance on computational methods in all areas of science. An apparently simple computerized analysis may now involve a complex pipeline of software tools; reproducing it will require version control for both software and data, along with careful documentation of the precise parameters used at every step. Some researchers are now getting concerned that their peers simply aren't up to the challenge, and we need to start providing the legal and software tools to make it easier for them.

In the past, reproduction was generally a straightforward affair. Given a list of reagents, and an outline of the procedure used to generate some results, other labs should be able to see the same things. If a result couldn't be reproduced, then it could be a sign that the original result was so sensitive to the initial conditions that it probably wasn't generally relevant; more seriously, it could be viewed as a sign of serious error or fraud. In any case, the ability to reproduce a given result is key to its general acceptance and, since a successful experiment is often the foundation of further research, often essential for pushing a field forward.

Posted by Billy at 11:43 PM | Comments (0)