Recently there has been speculation that the rise in computation power will put us scientists out of business, or at least seriously change our business. Chris Anderson has probably the most extreme view with his End of Theory article. A more reasonable approach is the Risk of the Data Scientist by Nathan Yau. The basic argument in both cases is that with modern computation and access to extremely large datasets, it is possible to computationally understand the underlying relationships. Chris Anderson argues that this is pretty much all you need to do science; traditional theory building and testing isn’t necessary. Nathan Yau doesn’t go that far, but he does argue that the hot new job in science will be the kind of person who can do this.
I’m not sure I agree. Admittedly, I’ve done my share of large-scale data analysis and statistics. I think the advent of advanced computational tools is allowing us to do things that we couldn’t do otherwise. But I think being a “data scientist” scientist isn’t enough. Theory is what drives the questions that we ask, and what allows us to put our findings into perspective. Yes, much like Google, you can tell which color of blue works best for links with a large-scale experiment. But do you really want to have to do that experiment every time you deploy a web page?
Basically, theory speeds up research. You probably can all the research you with without having a strong theoretical background. But if you do understand the theory, you can get a LOT more done, and get it done faster. Having good theory-driven questions allows you to focus on only the data and statistics that are most important. In Emilee Rader and my work on del.icio.us, it wasn’t till we had the theory-driven question about how tags are created that our quantitative analysis started to produce really interesting results. Having theory allows you to focus on the results that are most interesting, and understand why they are most interesting. And having theory allows you to not re-do analyses that are almost certainly going to come up the same as expected.
Being a data scientist will allow you to get research done. But if you want to get more research done, learn theory also.
Post a Comment