Replication, or the lack thereof, in Economics

My scientist friends have always been puzzled by my responses to questions about replicating studies in Economics. It’s just not done very often. In fields like astrophysics and biology, replication is almost as important, if not more important in some cases, as the novel finding itself, but not so in Economics. I’ve seen evidence that other social sciences are similar and there was some recent debate about the replication of psychology experiments and the failure to come to the same conclusions using similar methodologies. (There were other pieces on this, but this is one that I found today). In short, journals favor novel and interesting outcomes, so obvious or unsurprising results are far less likely to be published. The publication of the novel results leads to a power imbalance (she already published this, so she’s the expert and gets the soapbox). No one wants to fund or highlight research that’s already been done. Replications that confirm are boring and replications that challenge established findings have to be 110% on everything.

It’s really hard to challenge established findings. Look at how long (three years after publication) and how many papers it took for Emily Oster to admit her paper on missing women and Hepatitis B was wrong. Regardless, she still has a job and now tenure at Chicago. Or how many papers have been written challenging Donohue and Levitt’s abortion paper and they still stand by it.

I got a bit far afield, though. Economists are not generally in favor of duplication of effort. If someone’s doing it already, unless you can do it a lot better, you shouldn’t really do it. Hence persistent ideas of comparative advantage and gains from trade.

However, the recent spate of randomized control trials, particularly in development settings, has prompted more and more debate about the validity of these experiments and appears to have resulted in at least one group that’s eager to test and replicate in order to confirm (or deny?) the validity of certain projects.

Clearly, there are limits to what can be replicated using existing data, and limited funding to collect new data using similar methods.It’s unclear to me how they will choose appropriate experiments to reproduce or test, and as much faith as economists tend to put in a sample size of one, I’d bet we won’t be too happy with a sample size of two, but I think it’s a good start. The Development Impact Blog by the World Bank will keep up with the process of replication, so worth following if you’re interested. I know I’ll be watching.

h/t @JustinWolfers

Though kind of dated now, Daniel Hamermesh’s paper on replication in economics is here.

RCTs and placebo effects

A few weeks ago, a paper was posted on the CSAE 2012 Conference website that seemed to fly in the face of much of the current research that is happening in development economics. The advent of RCTs (randomized control trials) brought about a significant change in the way we do policy analysis, but also in the costs of it. This paper suggested that RCTs were capturing placebo effects. Just like when people believe they are taking curative medicines, they feel better, so do those benefiting from RCTs experience placebo effects from knowing they are part of an experiment.

The answer, according to the researchers, is to conduct a double-blind experiment, where neither the researchers nor the participants whether they were part of the treatment or control.

The paper garnered a lot of attention early on. I noticed many colleagues and others had the immediate and short reaction of “wow” and “yikes”, and I wasn’t the only one. Berk Ozler, at the Development Impact Blog, has a good review of the paper up with a great, punny title. Among other problems:

First, it turns out that the modern seeds are treated with a purple powder in the market in Morogoro (to prevent cheating and protect the seed from insect damage during storage), so the experimenters sprayed the traditional seeds with the same purple powder. As you can immediately tell, this is less than ideal. First, as this is a not a new product, farmers in the blind RCT are likely to infer that the seeds they were given are modern seeds. Given that beliefs are a major part of the story the authors seem to want to tell, this is not a minor detail. Second, if the purple powder really does protect the seeds from insect damage, the difference between the MS and TS is now reduced.

Berk’s analysis is well worth a read. Kim Yi Dionne also addresses placebo effects, though a different paper.

Update: the original post said that this paper was forthcoming in Social Science and Medicine. This is not the case. Sorry for the confusion and thanks to Marc Bellemare for catching it.

Update #2: The Economist has a nice review of this paper up as well on the Free Exchange blog. It doesn’t touch most of the analysis issues, but it does explain well why double-blind experiments might not be useful in Economics. h/t @cdsamii

Public Randomization

A significant issue in conducting randomized control trials in a community is the issue of fairness. The idea behind RCTs is to mimic the medical model by scientifically ascertaining just how useful a treatment might be. In the case of development economics, this could be a subsidy or an extra year of education, for example. In order to eliminate (or at least reduce) the effect of confounding factors, the researcher randomizes over the population, picking a representative sample to receive the treatment and compares their results to those who did not receive the treatment.

While in theory this should give us the best answer as to how to combat poverty, or get children to school, or determine the effect on whatever outcome we hope to affect, it’s also problematic. The process of randomization necessarily leaves some people out, essentially denying them help that could be life-saving or life-transforming. It might also provide benefits that researchers view as small, but that are capable of creating divisions in a community, or perhaps jealousy, suspicion, or bitterness.

Different RCTs deal with this in different ways. Some do nothing. Some hope the treatment group doesn’t notice. Some tell the control group that they will get the treatment after the analysis is done, some take this course but without informing the treatment or control groups. All of these solutions have their issues, which are dependent on the type of treatment. In some cases, control respondents might change their answers to certain questions to appear more sympathetic, or deserving of the treatment. Or they might anticipate how the treatment is going to affect them in the future and have their answers reflect their hopes rather than their actual state.

As in all survey data, the mere act of asking the question affects the answer.

Last week, Kim Yi Dionne, a professor at TAMU, posted on her blog about making the randomization process public. While I don’t think it solves the problem of people changing their answers to what they think they should be (either to make the treatment look better or worse), it does deal with the bitterness and competition that can often arise out of randomly selected treatment groups.

I especially love the education component of it.

[A Malawian research supervisor] posed a question to the audience: if he wanted to know how the papayas in the village tasted, would he have to eat every papaya from every tree (pointing to the nearby papaya trees)? Some villagers laughed, many said “ayi” (no) aloud. He said, instead he would eat one or two from one tree, then take from another tree, but probably not take one from every tree in the village so that he could know more about the papayas in this village.*

Every mentor I have had for research in the developing world has been adamant that we share findings with the community whose participation was requisite to our success. But rarely do we take the opportunity to educate about how we came to our conclusions, hoping the conclusions themselves will suffice.

I think it’s brilliant.