Welcome to the Age of Better Data

Research by Tuck Associate Dean Praveen Kopalle finds the retail revolution isn’t just about big data, but also better data and the theory needed to harness it.

For years now, the popular press has been heralding the coming age of data supremacy in retail. The reports are often framed in Orwellian language and invariably focus on size. It is, after all, the big data revolution. 

But new research by Tuck Associate Dean Praveen Kopalle posits that the real revolution depends more on quality of data rather than quantity, and argues that as data becomes ever more voluminous, varied retailers must rely on theory to fully harness its massive potential. In “The Role of Big Data and Predictive Analytics in Retailing,” forthcoming in The Journal of Retailing, Kopalle and his co-authors describe five major data dimensions which, used together, can give retailers a remarkably deep and nuanced understanding of customer behavior. 

“What changes the game is not the individual components, but the interplay between time, location, channel, product, and customer,” Kopalle says. “That is the game-changer.”


What changes the game is not the individual components, but the interplay between time, location, channel, product, and customer, that is the game-changer.


The same factors that make big data so powerful—its sheer scale and diversity—also make it difficult to harness. That is where theory comes into play. As Kopalle puts it, “The substantive theories in retailing tell us what to look for in the data and the statistical theories tell us where and how to look in the data.”

One of the statistical theories goes back more than 200 years to an English minister named Thomas Bayes who, with quill pens in leather-bound ledgers, worked out the probability theorem that underpins much of today’s data-driven retail world. With Bayesian analysis and ample processing power, managers can turn mountains of raw data into more effective and profitable sales strategies.

Targeted Marketing

Working across data dimensions, retailers can build predictive models that target customers with uncanny precision. “Now retailers are able to uncover price elasticity at the individual level, so the coupon that I get will be different than the coupon that you get,” Kopalle says.

Kopalle describes an offer sent to the mobile phones of Starbucks customers in Central London. The message (“It’s lunchtime!”) came with a discount code and directions to the nearest Starbucks. The retailer leveraged all five data dimensions to tailor that offer to a very specific subset of peckish Londoners. Such offers will become more ubiquitous as retailers continue to collect more data about their customers’ preferences, and learn to make better use of it.

This predictive capability can be a double-edged sword. In 2012, The New York Times published a story describing how the retailer Target was able to determine that customers were pregnant based only on their shopping habits. The initiative backfired when Target sent pregnancy-related offers to a teenager whose family did not know she was pregnant. Kopalle and his co-authors, Eric T. Bradlow of the Wharton School and Manish Gangwar and Sudhir Voleti, both of the Indian School of Business, include a discussion of ethics in the paper, and suggest that customers should have to opt-in to data collection programs.

Randomizing Retail

Though the power and pervasiveness of big data is not in dispute, rigorous scientific studies on the subject are quite rare. One reason is that retail is a poor laboratory. Retail data typically isn’t random; it reflects previous pricing and marketing decisions.

“Retailers are not randomly setting prices,” Kopalle says. “They are not foolish. They are using their experience and intuition to set profitable prices. If the prices are not set randomly and you’re using that data to estimate consumer behavior, you may not get the right result.”

To control for just such factors, Kopalle and his colleagues conducted a randomized field test in partnership with a large national chain in the United States. The 13-week experiment included 42 stores divided randomly into test and control groups. The researchers used a data-based theoretical model to optimize prices for 788 individual products in 14 categories from vitamins to soup, with the goal of maximizing total profitability.

The test stores were significantly more profitable, showing a 40.7-cent gross margin dollar improvement per SKU (shop keeping unit). Extrapolated to the enterprise level—10,000 SKUs per store in 100 stores—that adds up to real money. “We estimate the total margin improvement per year is $7.8 million,” Kopalle says.

Welcome to the age of better data.

“The planets are well aligned to make much better decisions in retailing,” says Kopalle, “leading to higher profitability and at the same time giving consumers what they prefer.”