The measure of a market (a really old one)

A year ago, about this time, I was on my way to Ottawa for the Canadian Economic Association and Canadian Network of Economic History Meetings. They coincided, so I presented papers at both and took the opportunity to adamantly assert that I was not an economic historian.

A year later, I have a book chapter, a working paper, a paper (almost–I’m just waiting for confirmation it was sent out) under review, and a paper idea percolating that all belong under the label Economic History. I’m trying to get the paper idea in shape to submit to the CNEH meetings again this year, with the deadline fast approaching.

While my coauthors and I were hard at work on the paper on financial portfolios in the early 18th century, the one that is (almost) under review, a seminar participant at Stanford asked my coauthor what an optimal portfolio would look like. We didn’t know. None of us is really a finance person. I got involved with the project because it had a gender component and the others on the papers are economic historians.

I took it upon myself to pick the brain of my colleague who teaches finance at Gettysburg and found myself quickly immersed in the heady world of portfolio optimization, betas, alphas, indices, Markov matrices, and so much more. From my understanding of the literature, the S&P500 represents the closest thing we have to an optimal portfolio, and so creating a similar index for the time period we’re interested in should provide the answer to the seminar participant’s question.

What I find particularly interesting about the index method of portfolio construction is that the S&P500 in particular is thought to provide an accurate picture (returns and growth-wise) of a balanced portfolio of all assets–not just financial instruments. If you were to put a big chunk of your money in a fund that purchased stocks along with the S&P market capitalization strategy, you would actually be bringing your portfolio out of balance by buying things like real estate or durable investment goods.

The idea, I believe, is that the stock market has “evolved” such that it captures the risk and reward of all those types of instruments–not just the stocks themselves. I find this assumption, particularly given the dramatic dips and peaks in the stock market we’ve seen over the past four or five years to be heroic, at best, but it becomes more problematic when we turn to 1700s finance.

The financial instruments available in the period for which I have data are incredibly few in number and even more limited in scope. Besides a bank or two, they are joint-stock, charted trading companies, whose fortunes lie entirely in the wind and the water and the ability of colonists to extract resources from the colonies. There’s no ability to invest in steel or textiles or the machines that make them. I don’t have information about real estate or other investments for most of the people in the sample, and I certainly don’t have their prices. So, our optimal portfolio can really only be for the available stocks, not for the entire gamut of instruments.

I doubt that I’ll be the one to rewrite modern portfolio theory, and I do think this is the best place to start, but it’s not ideal. Story of an economics paper, I guess.

Spatial auto-correlation is not causation

There’s a strong tendency in human nature to draw distinctions along dichotomous lines. Good and evil, black and white, ugly and pretty. We all know that these distinctions only really work in children’s fiction, and even then tend to fall flat, but we try anyway. In teaching, particularly a new subject, those dichotomies are both useful and can lead to the downfall of a lesson.

In that vein, the instructor in my spatial econometrics workshop last week presented two significant data issues that a researcher might encounter in using spatial data: spatial heterogeneity and spatial dependence.

By way of definition: spatial heterogeneity is simply that there is something about an area or a piece of space that is different than the spaces around it. My dichotomizing, learning mind went immediately to the idea of observables. Clearly, if we are trying to include spatial information–location–in a regression, we know that the area has certain characteristics. As long as we explicitly control for these in our regression (and believe they are accurately measured), it doesn’t present much of a problem.

However, this is not always the case due to the level of analysis problem. In a general econometric specification, we control for the unit of spatial analysis that is relevant–county, Metropolitan Statistical Area (MSA), state, whatever it may be. By choosing the level and assigning a dummy variable, perhaps, we assume that all those characteristics are captured uniquely, but also that they are assigned independently to the spatial unit. Take for instance the distribution of the African-American population in the United States. Regression analysis that uses that variable as a covariate assumes that the number of African-Americans in Georgia is independent from the number of African-Americans in South Carolina, which makes little intuitive sense. Both were states with large plantation economies that employed Black slaves from Africa in production of goods. It makes sense that these two states, spatially proximate, would also have similar factors leading to their demographic makeup. Thus, spatial heterogeneity: areas in the South have higher Black populations than in the North.

The corollary to spatial heterogeneity is spatial dependence. Like spatial heterogeneity, we see patterns occur in certain variables, but rather than an outside, perhaps observable and easily measurable factor that accounts for the clustering, there’s something inherent about the place itself that causes proximate areas to change their realization of some variable. Think of housing prices. Housing prices are higher in places with certain amenities (close to transportation, mountains, whatever), but housing prices are also higher in areas with higher housing prices. Perhaps homeowners see their neighbors selling their houses for more and thus put them on the market for more. Or buyers see houses in the area with higher values and thus are willing to spend more. This spills over county and other lines, too.

Both of these problems, regardless of how strict that line is between the two, manifest in spatial auto-correlation. The variation we see in each variable for two spatially proximate observations is less than the variation for two independently observations because the information comes from the same place. Some of this we can control for, some of it we can’t, and some of it we can try to control for with the tools I’ll discuss in coming days.

Regardless, it’s important to remember that the realization of spatial heterogeneity and spatial dependence is the same mathematically. Statistically, we cannot differentiate between whether some unobservable variable caused everything to be higher, or whether each observation is exerting an effect on its neighbors (a butterfly flaps its wings…). So, even with acknowledgement of these problems, we have not established causation.

A familiar refrain is, thus, minimally modified: spatial auto-correlation is not causation.

A note on correlation and causation: (see Marc Bellemare’s primer for a more detailed explanation)

Anyone who has ever taken a statistics course is familiar with the refrain that correlation is not causation. It’s a common refrain because it’s something that is often ignored when statistics are cited in news articles and personal anecdotes. My favorite example of this is that ice cream sales and murder rates are highly correlated. Only the biggest of scrooges would believe that ice cream sales caused murder rates to increase. In the abridged words of Elle Woods, happy people don’t kill people. And in my words, ice cream makes people happy.

They do move together, though, which is essentially the definition of correlation. When ice cream sales go up, murder rates go up; when murder rates go down, ice cream sales go down. Not because one causes the other, but rather because of the seasonality of both variables. More homicides occur in the summertime, and more ice cream is sold in the summertime.

Spatial Econometrics: The Miniseries

Last week, I spent three days in a workshop (or short course) on spatial econometrics at the University of Colorado‘s interdisciplinary population center, the Institute for Behavioral Science. At the beginning of last semester, many of my methods students expressed interest in doing their research papers on a topic with a significant spatial component. I would have loved for them to incorporate spatial analysis, but it was a topic I had touched only tangentially and didn’t feel qualified to learn it at the same time as teaching that (incredibly demanding) course for the first(ish) time. In addition, having just attended the PAA meetings in San Francisco, I’ve been looking for ways to expand my econometric skills and incorporate spatial data into my work. It was really fantastic. I don’t know whether they’ll be hosting the event again next summer, but do keep a lookout if you’re interested. I thought it was extremely helpful. And fun (see nerdy tweets from last week about loving matrix algebra). Paul Voss, of the University of North Carolina’s Population Center, Elisabeth Root, and Seth Spielman were all great.

I posted a short introduction to spatial econometrics last week based on my readings for the first class and am now excited to share some of the things I learned, so over the next few weeks, I’ll post some of my thoughts in a mini-series on spatial econometrics. This post will be updated with a list of posts in the series, so do follow along.

Experts, please keep me honest! This stuff is very cool, but I’m still a newbie.

Preliminary outline (subject to change):

  1. An introduction to Spatial Econometrics
  2. Spatial Autocorrelation is Not Causation
  3. The Weights Matrix for Spatial Analysis
  4. Some Notes on Terminology in Spatial Econometrics

Percolating

My short course this week at CU’s pop center was incredible and exhausting and incredibly exhausting. The easiest part, for me, was thinking about spatial models using matrix algebra, if that’s any indication of what we did all week. I’m fairly certain I forgot how to write STATA code and learned just enough R and GeoDa to be dangerous, but you can bet that’s not the last of me.

Next week should bring to fruition ideas and blog posts that have been percolating: teen pregnancy, more spatial econometrics (separately, although, that gives me an idea…), and some 1720s finance, as well as back to your regularly scheduled programming.

Have a safe and happy holiday weekend!

Replication, or the lack thereof, in Economics

My scientist friends have always been puzzled by my responses to questions about replicating studies in Economics. It’s just not done very often. In fields like astrophysics and biology, replication is almost as important, if not more important in some cases, as the novel finding itself, but not so in Economics. I’ve seen evidence that other social sciences are similar and there was some recent debate about the replication of psychology experiments and the failure to come to the same conclusions using similar methodologies. (There were other pieces on this, but this is one that I found today). In short, journals favor novel and interesting outcomes, so obvious or unsurprising results are far less likely to be published. The publication of the novel results leads to a power imbalance (she already published this, so she’s the expert and gets the soapbox). No one wants to fund or highlight research that’s already been done. Replications that confirm are boring and replications that challenge established findings have to be 110% on everything.

It’s really hard to challenge established findings. Look at how long (three years after publication) and how many papers it took for Emily Oster to admit her paper on missing women and Hepatitis B was wrong. Regardless, she still has a job and now tenure at Chicago. Or how many papers have been written challenging Donohue and Levitt’s abortion paper and they still stand by it.

I got a bit far afield, though. Economists are not generally in favor of duplication of effort. If someone’s doing it already, unless you can do it a lot better, you shouldn’t really do it. Hence persistent ideas of comparative advantage and gains from trade.

However, the recent spate of randomized control trials, particularly in development settings, has prompted more and more debate about the validity of these experiments and appears to have resulted in at least one group that’s eager to test and replicate in order to confirm (or deny?) the validity of certain projects.

Clearly, there are limits to what can be replicated using existing data, and limited funding to collect new data using similar methods.It’s unclear to me how they will choose appropriate experiments to reproduce or test, and as much faith as economists tend to put in a sample size of one, I’d bet we won’t be too happy with a sample size of two, but I think it’s a good start. The Development Impact Blog by the World Bank will keep up with the process of replication, so worth following if you’re interested. I know I’ll be watching.

h/t @JustinWolfers

Though kind of dated now, Daniel Hamermesh’s paper on replication in economics is here.

On anticipating divorce, again

Related to my post earlier this week, a new working paper shows that women in the US respond to increased divorce rates by working harder. Knowledge of high divorce rates appears to be enough to incentivize working harder in anticipation of even a probabilistic one-earner household. I haven’t had the chance to read the paper itself (I will, but 68 pages!?), but Ezra Klein discusses it here:

Why would this be the case? Researchers believe it’s because marriage provides “implicit social insurance” for women, who are still more likely to be the secondary income-earners in the U.S. and Europe. So in the U.S., where divorce rates are higher, “women have a higher incentive to obtain work experience in case they find themselves alone in the future,” they write. “European women anticipate not getting divorced as often and hence find less reason to insure themselves by working as much as American women.”

A longer treatment of the paper by the authors is on the VoxEU website.

Referenced: Chakraborty, Indraneel, Hans A Holter and Serhiy Stepanchuk (2012). “Marriage Stability, Taxation and Aggregate Labor Supply in the US vs. Europe”, Working Paper.

An introduction to spatial analysis

After my first, rather disastrous, year of graduate school in Boulder, I almost transferred to Geography. Or at least, I thought a lot about it. While the math in Economics was kind of kicking my butt, everyone working with graphs and maps seemed so blissfully happy. Ultimately, I stuck it out in Economics, and am extremely glad that I did, but I haven’t lost my love of maps and have always been curious about spatial research.

Next week, I’ll be doing a three-day workshop at the University of Colorado‘s Institute of Behavioral Science. Many of my economics professors were associated with IBS, but none really did spatial analysis, so I was left to find out some of it on my own. A few years ago, I helped design a survey on handwashing and other hygiene behaviors for a group building latrines and protecting water sources in Nepal. The data are fascinating and though we started analyzing it, everyone had limited use of one of the two tools necessary to do spatial regression. I had the Stata skills and my coauthors had limited GIS skills, but combining them wasn’t going to happen. This short course is hopefully the next step in getting those papers off the ground and into journals, but also more importantly, back to the community where we did the research. Though we’ve presented some findings to them, I’m sure there are many more insights to be had with these data.

With that, I’ll be reading a lot of spatial analysis papers over the next week. The syllabus has hundreds of pages of reading, much of which I’ve printed out and am planning for my long trip back to Colorado next week, but I’m willing to share the “lite” version with you all.

For definitional purposes, spatial analysis is “the formal quantitative study of phenomena that manifest themselves in space,” according to Luc Anselin. More informatively, I think, spatial analysis allows us to “interpret what ‘near’ and ‘distant’ mean in a particular context” and showcase whether and how proximity or location have an effect on an outcome we’re interested in.

Anselin divides spatial analysis into two categories–data-driven analysis and model-driven analysis, and highlights the challenges of each, which I imagine will get plenty of air time next week and are a little bit daunting to a student and devotee of econometrics:

Indeed, the characteristics of spatial data (dependence and heterogeneity) often void the attractive properties of standard statistical techniques. Since most EDA techniques are based on an assumption of independence, they cannot be implemented uncritically for spatial data…As a result, many results from the analysis of time series data will not apply to spatial data.

Model-driven analysis seems much more up my alley and suited to regression, but the main problem, which I encountered in my own research, “is how to formalize the role of ‘space.'”

Just like this basic the ideas and tools used in spatial regression seem fairly consistent with my view of econometrics in general. There are tradeoffs to employing different models and assumptions, and measurement error is alive and well. Notably, although this could be out of date by now: “Spatial effects in models with limited dependent variables, censored and truncated distributions, or in models that have count data have been largely ignored…multivariate dependent distributions other than the normal are highly complex.” More to come, I’m sure. My colleague has already told me I have to teach him in the Fall, and I’m hoping to be able to incorporate some of this into my Methods class, so get ready for some spatial econometrics here.

As an aside, if you happen to be in Colorado, check out these cool solar events that are happening, including a world-record-braeking attempt at the most people in one place to watch a solar eclipse together at CU’s Folsom Stadium. Or, well, you could just go look at it where you are, too.

Referenced: Anselin, Luc. 1989. “What Is Special about Spatial Data? Alternative Perspectives on Spatial Data Analysis.” Conference Proceedings, Spatial Statistics: Past, Present, and Future. Institute of Mathematical Geography, Syracuse University.

Anticipating divorce

This Journal of Human Resources paper by Elizabeth Ananat and Guy Michaels is a few years old now, but as I’m readying my first dissertation chapter for submission, I’ve been reading up and reminding myself of various literatures and it seemed appropriate. Ananat and Michaels present an intuitive, causal story for how divorce causes women to live in poverty. It seems pretty straightforward: the break-up of a marriage means women are less likely to live in a household without income from someone else, but also that women work to compensate for such income losses by going back to work, moving in with siblings, etc.

Divorce increases the probability of living in a household without other earners. In fact, we estimate that breakup of the first marriage significantly increases the likelihood that a woman lives in a household with less than $5,000 of annual income from others—the likelihood rises from just over 5 percent for those whose first marriage is intact to nearly 50 percent for those whose first marriage breaks up. However, women can and do respond to income loss from divorce by combining with other households, through paths including remarriage or moving in with a roommate, sibling, or parents. Moreover, women further compensate through private (for example, alimony and child support) and public (for example, welfare) transfers, and by increasing their own labor supply.

I use the same logic to say that as long as she has some idea that the divorce (or union dissolution in my case as I include unmarried couples) is imminent, a woman should make compensatory decisions regarding the future loss of income, not just the immediate loss of income.

E.O. Ananat with Guy Michaels. “The Effect of Marital Breakup on the Income and Poverty of Women with Children.Journal of Human Resources 43.3 (2008): 611-629.

The downfall of data

The PAAs last week were all about data. The exhibits at the conference were sponsored by various longitudinal surveys such as the PSID, the Mexican Family and Life Survey, RAND FLS and more. As I perused the poster sessions, it was amazing how many posters came from employees at the US Census Bureau. Having interviewed there last year, I was aware of their numbers, but the PAAs bring to light just how much work they are doing at the Census to illuminate American life. Beyond that, presentations used the Fragile Families and Child Wellbeing data, as I do, the NLSY, the ACS, the Mexican Migration Project, and so many more. The concentration on data was unlike I’ve seen at any other conference. Theory was definitely not a big focus.

So, it’s with sadness that today I saw the news that the House voted to cut funding for the American Community Survey, a Census Bureau instrument that tracks all sorts of data about Americans. I received the survey at my home in Boulder shortly after the decennial Census. My roommates, feeling survey fatigued, refused to fill it out, but I, being the economist and possible eventual end-user of this data, went ahead. I also encouraged friends and family to fill out their Census forms.

This comes on the heels of funding being cut for the NLSY (though restored for FY 2012), a concurrent distaste for political science research in the House, and doesn’t bode well for other demographic endeavors. Economists, sociologists, anthropologists, biostatisticians, public health researchers, epidemiologists, political scientists and more depend on these data–from studies already in existence and to-be-collected–to do meaningful and interesting research. While (sometimes) privately funded, small-scale longitudinal studies like the Fragile Families study provide a good snapshot of groups, only nationwide, representative studies can help us to know what is going on in the country as a whole.

The link above claimed the survey was an unconstitutional invasion of privacy. Which is absolute crap. The US government does things that are far more invasive than ask how many years you went to school and how many flush toilets you have. And far less useful.

Update: John Sides talks about his NSF Grant and similarly cut funding for political science research on the Monkey Cage blog.

Amendment 1 passes and the PAAs

This space has been quiet for the past few days and will likely remain so for the next couple. I’ve just returned from San Francisco and have a stack of papers to grade before my principles exams come in on Friday.

The PAAs were really fantastic. I got to meet some really talented scholars, listen to some interesting paper presentations and got really good feedback on my own paper. Conferences are hard. There’s so much time sitting and listening, something academics aren’t very accustomed. Despite years of training, it seems to be a skill we lose upon graduation, our hands itching to shoot up and say our piece. But my discussant (who I discovered was a CU grad, too!) was great and I’m so grateful for everyone’s attendance and the spirited discussion at the end of the session. I hope to be able to return next year for the meetings in New Orleans and make this a part of my regular circuit. In particular, the Economic Demography workshop on the Wednesday just prior to the start of the meetings, showcased six quality papers that were so great to hear. My advisor organized it, natch.

On a sadder note, Amendment 1 passed in North Carolina today. My facebook and twitter feeds were full of disappointment from all sides and, as one of my former homes, I took it a little personally. It’s hard to see how national polls show a plurality supporting gay marriage or at least civil unions when states keep passing ridiculous laws that will be difficult for the next generation to dismantle. At the same time, how is it reasonable for a state to put forth such a controversial amendment when the side that would have likely opposed it has a candidate running essentially unopposed. Maybe that’s the point, but really, not cool, North Carolina.

I will try to post more on the PAAs, my paper, the Economic Demography conference and more in the next week. I’d love to share some details about some of the papers I saw presented as well, and hope those will come into the public eye soon.

For now, though, thank goodness for Chocolate Fudge Brownie.