From Surveys to Big Data, the Evolution of Data Capture

July 10,2018

Surveys were arguably the first form of organized data capture, used by ancient civilizations to count crops, military sizes and more.

The first known census was undertaken by the Babylonians in 3800 BC. The Roman Republic used the census as a list to keep track of all adult males fit for military service. The Domesday Book, also known as the “Great Survey,” was ordered by William I of England in 1086 AD so that he could properly tax the land he had conquered. Even the Incas, who did not have a written language, had a way to record numeric and non-numeric information about their empire with the quipu, a knotted string. In the US, the decennial census is constitutionally mandated. More than just a headcount, the census is often still the lynchpin of governmental bookkeeping.

After surveys and censuses, the next evolution of data capture was more administrative. It was not collected just by a government about its populace but rather by individual organizations as a system of self-reporting. Spanish and English merchants needed to know how much value their ships carried (and could be lost, if they sank at sea) in order to obtain lines of credit from European banks.

Manufacturers in the Industrial Revolution needed to know how much raw material they had to order to keep their factories up and running. And today’s investors look at the daily net asset value (NAV) to track fund performance.

Now, thanks to the digital era, we have big data. Big data, unlike a survey which you respond to, or administrative data which is selectively tracked, is organic.

Transactional big data, information on the things people do, tells us more about them than the things they say. As Roberto Rigobon, Society of Sloan Fellows Professor of Management and a Professor of Applied Economics at the MIT Sloan School of Management said at a recent State Street conference, “The beauty of organic big data is that the answer is likely true. We alter our responses when someone is asking us a question.” The human mind is notoriously imperfect thanks to a multitude of psychological forces. We are all subject to self-serving bias, which means we have a tendency to artificially inflate our attributes and downplay our flaws; seeing the world as we want to see it (or the way we want others to see us in it) as opposed to the way things actually are.

The things people do are more informative than the things they say.

Not to mention, the way in which questions are asked often prompts a certain type of answer. Depending on what you ask, how you ask it and to whom you are talking, your data may be slightly biased in a multitude of directions.

Organic big data includes everything from satellite imagery, GPS tracking, online search behavior, consumer product prices, credit card transactions and more. It’s data that is not always being gathered or reported on purpose, but it is being tracked. It’s data that is a by-product of behaviors, actions or patterns.

As we move into an increasingly interconnected world, the sheer amount of data being generated and the speed at which it’s happening, is unprecedented. By 2020, about 1.7 megabytes of new information will be created every second for every human on Earth, most of it unstructured and organic. Our digital universe will grow 10 times from 4.4 to 44 zettabytes (44 trillion gigabytes). All of that information is being collected and stored by industries and governments to better understand what people do and why they do it. As the Harvard Business Review wrote, “Handling Big Data means dealing with unprecedented volumes of information — whether terabytes of retail customer data, hospital records, or financial transactions — and complex decisions regarding questions to ask, frameworks and tools to use, and insights to seek.”

In an increasingly data-driven world, companies have the responsibility to protect consumer data and privacy. Consumers and clients are often willing to trade in their organic big data, such as their GPS location information, in exchange for something of value — local deals or coupons for instance. The burden rests with the company to ensure they are living up to that trust and delivering a valuable and secure customer experience. Recent data security breaches have shown that the public is willing to give companies only so much leeway with their personal information and the world of good data governance goes hand-in-hand with data collection.

For example, people are willing to share their organic big data with health-monitoring apps because it helps them live a healthier lifestyle. They may or may not be willing to have their personal information then shared with insurance companies. In a world where data is an asset, companies have a fiduciary duty to protect their consumers’ data, much in the same way a bank has to protect clients’ investments.

The Roman surveyors of antiquity, while certainly a master of data in their own time, could never have imagined a data world quite like this.

Megan Czasonis | State Street Associates

Megan Czasonis is a Managing Director in the Portfolio and Risk Research team at State Street Associates (SSA). Megan collaborates regularly with SSA’s academic partners to develop and implement new research on asset allocation, risk management, and investment strategy.