"The world is moving toward collecting ever more data about nearly every aspect of our lives.
This data often improves our ability to analyze the world around us, but there are also situations in which more data makes our decisions and understanding of the world worse, not better.
This insight is the focus of a growing and somewhat counterintuitive field of study, which examines scenarios in which less data leads to better statistical models or better decision making. When actually making decisions with data, oftentimes "less is more."
"In an ideal world with an ideal person or algorithm processing the data, more data is better," said Jerker Denrell, a professor of strategy and decision making at the University of Warwick. "However, we're not always in that world."
The phrase "less is more" when it comes to making decisions was popularized by the German psychologist Gerd Gigerenzer, whose career has revolved around identifying situations in which less data leads to better decisions. The conclusions are relevant for both human decision makers and for algorithms.
This field finds that often a small number of data points are extremely useful, and that as data points are added, they become increasingly less useful. Unless used carefully, they can muddle the picture rather than improve it.
A recent paper in the field asked: "To predict who will perform well in a particular job, is it always better for an employer to use as much information as possible about job candidates?" The answer, from authors Felipe Csaszar, Diana Jue-Rajasingh and Michael Jensen at the University of Michigan's Ross School of Business, is no.
Their paper models the issue of statistical discrimination, an economic theory dating to the 1970s that argued one reason for the persistence of discrimination was that a company seeking to maximize profit as its only goal would use all available information about, say, job candidates. A hiring manager, in pursuit of that goal, might use race or gender, even if subconsciously, to make the best prediction.
Race and gender are poor predictors of performance compared with skill. But they might be more easily observed, which might lead a hiring manager to overweight them, the authors say. In this case statistical discrimination, which recommends using both types of information to get the best prediction, does the opposite, they say.
Research that shows how human decisions are seldom 100% rational has led to increased reliance on robotic algorithms that attempt to suck in as much data as possible. In some cases, these algorithms work wonders at removing emotion and prejudice. At other times, they codify error. In her book "Weapons of Math Destruction" the mathematician Cathy O'Neil gives examples of emotionless algorithms producing discriminatory or useless results.
There are many instances where less data has proved to be better. In the early 1990s, Dr. Gigerenzer compared two large German cities, one with a professional soccer team and one without. The city with the team is bigger 87% of the time. People who rely on this simple trick can better guess which cities are bigger than people who try to recall reams of complex urban details. Sports teams, it turns out, are more correlated with population than many things people might know, such as whether a city is a state capital or on a major highway.
(Here's a U.S. version: Tally a city's number of pro football, baseball, basketball and hockey teams. Among pairs of the 50 largest cities, 89% of the time, the city with more teams is bigger. A useful tip if you ever need to guess whether Pittsburgh is larger than Nashville.)
Or take a company seeking to predict which customers are still active -- that is, interested and likely to continue buying from the firm -- and which customers have lost interest in the company's products. Researchers have found that an incredibly simple rule of thumb -- whether someone purchased from the company in the past nine months -- better predicts whether customers are active than cutting-edge complex models.
An example in investing is the 1/N rule, which states that if you want to buy, say, 12 stocks you should give them each a 1/12th allocation in your portfolio. A team from the London Business School compared this to 14 models that use vast amounts of data to try to find superior allocations, and found that none of those 14 models beats the 1/N rule.
Of course, it isn't always easy to recognize which data points to lop off. Earlier in the pandemic, the Massachusetts Institute of Technology dropped its requirement that applicants submit an SAT (or ACT) score. Critics have long argued such tests are poor predictors of academic success and a barrier for people from disadvantaged backgrounds.
But after two years, MIT brought back the SAT this spring. The dean of admissions said MIT had discovered the SATdid a better job for "students from these groups relative to other things we can consider. The reason for this is that educational inequality impacts all aspects of a prospective student's preparation and application, not just test-taking." The test, it turns out, was better than having the right extracurriculars, letters of recommendation or advanced classes, which tilted even more heavily in favor of more-advantaged students.
Though it isn't always easy to figure out which data to discard, it is an important area of research as the numbers collected about our lives grow.
"It's something like a win-win-win that we are showing," said Dr. Csaszar. "If you use a simpler decision-making process, you become better at making predictions, and you become more fair, because you don't take into account race or other things that are discriminatory. There's an alignment between simple, fair and accurate."" [1]
1. U.S. News -- The Numbers: When It Comes to Data, Less Can Be More
Zumbrun, Josh.
Wall Street Journal, Eastern edition; New York, N.Y. [New York, N.Y]. 05 Nov 2022: A.2.
Komentarų nėra:
Rašyti komentarą