The Rise of the Economist Programmer
"It's written in Fortran 77," she added almost as an afterthought. The look on my face must have been a dead giveaway, because she reached up onto her shelf and pulled off a dusty how-to book and handed it to me. Why on earth was I being asked to use a programming language that was written over 20 years ago and had long since been replaced by languages like C++ and the up-and-coming Java?
I started my Ph.D. in Finance at the peak of Y2K and the dot.com boom. Programmers were in short supply and in late 1999, a seasoned programmer in both finance and economics programs was not unheard of, but it (and ergo I) wasn't the norm. The complex web of skills that power the big data analytics projects we undertake now are taken for granted, but back then, most economists only dabbled in programming. While econ professors regularly took leadership roles, forming the statistical packages many of us use today, the general expectation was that the more someone knew about economics or finance, the less they knew about programming. And that’s why I was using Fortran 77 to begin what ultimately became a foundation to my career as a data scientist.
Many years prior, an economist who knew Fortran had coded a few lines that ran a regression of stock returns on market returns to calculate a stock’s beta. It was nothing special, but it was well-written and it produced reliable results that matched every other study written at the time. Economists, nothing if not devout students on how to manage scarce resources, were presented with the choice of learning how to recode a regression or taking the existing market beta code. They chose to hand that piece of Fortran 77 code down like some form of holy relic.
Machine learning is so central to finance because the data coming in isn't the easy data.
Programmers, on the other hand, who care about code and code quality, tend to see things in the code of economists that drive them mad. It's overly complicated, sometimes redundant and often inefficient. However, as inelegant as this code often is, it is functional and gets the job done – it gets the right answers. And for a data scientist, that’s what matters the most.
Coding has evolved dramatically and the day-to-day of a data scientist does not include lower level programming operations. Indeed, it's possible to complete the entire design, development, selection, testing and fine tuning of complex predictive analysis without leaving the comfort of a graphical user interface. There's a lot you can do today that would have once required a dual PhD to make this happen. Now that same work can be done by a master’s student in any economics program. The tools that are available for doing data science are becoming incredibly democratized.
I sat down with my colleague Jamie Stewart, Vice President of GX Software Solution Management to talk about this phenomenon. He sees the same change in our industry,
"With today's software, you don't need to know how to code; you can plug and play using a more intuitive web interface that requires limited coding skills to get started. The elitism is breaking down – if you come out with a theory you have to be prepared for more criticism and feedback than in years past. You used to be taken at your word as an established expert, daring people – ‘try to prove me wrong’. Now anyone has the ability to make an informed critique of your work. Tools are available now that let you go more in-depth than you used to. A lot of established theories are being stress tested that we couldn't test before."
People have understood how to work with regularly formatted, periodic data for years. The next big hurdle isn't learning the software, it's mining the data.
We have Text files, PDFs, Twitter feeds and other unstructured data: you can't throw that into a linear regression and prove your hypothesis. Part of why we’re seeing machine learning become so central to finance is because the data coming in isn't the easy, preformatted data. But that unstructured, unformatted data is where the next great value lies. A lot of the unstructured data tools, like Python, are free. It's well-suited for working with textual data and there are certainly free databases to play around with. All of the basic stuff is out there. If you aren't looking to do a deep historical analysis you can do a lot of research with limited resources. Algorithms and the platforms that house them are becoming smart enough to pick appropriate transformations of data, select appropriate variables and make a good guess at the actual algorithm that would help you most. Jamie advises, "The top academics, like asset managers, need to look for new sources of information because of that data democratization. You have to do something unique to justify your expertise."
If the algorithm falls into the realm of active machine learning, the expertise of an economist can fine tune the performance of the system. For processing and programming it’s very different when you don’t consider the purpose of the analysis. For instance, you may have a minor thing like the word "crash" – but it means different things to the finance industry versus the auto insurance industry. Context matters. It's important when you are building a mechanized system to take things into account for the audience. "Machines," as Jamie added, "For the time being, don’t have context unless we teach it to them."
What the data scientist brings is domain knowledge and an overall sanity check. Jamie pointed out that the machine alone has no context that can be useful for financial application. To go to the next level, to ask critical questions and challenge existing hypothesis – that’s where the real challenge is.
Economists don’t have a reputation for being the best programmers in the world. But they can still be amazing data scientists. And the way the world is heading, this might be for the best.
Topics: Advanced Technology
As head of Quantextual Research at State Street Global Exchange, Stephen Lawrence blends machine learning and big data with the contextual knowledge of human insight to streamline the investment research process. When not listening to U2, Steve listens to audio books on innovation, economics and the science fiction writings of William Gibson.