Small Data Science

How to extract actionable knowledge from less or no data.

“Big Data” is everywhere. If you don’t use it, you don’t count. At least so it seems online.

But what if you have less or no data? Use Small Data Science! Small data requires a different mode of thinking than big data.

Microconf Europe 2015 offered an opportunity to give a presentation on “Small Data Science”. The talk went from being just an idea to being elected by the attendees as a guest talk. Here are the slides for “Small Data Science”: small-data-science-mceu15.

My definition of “Small Data Science” is pragmatic:

How to extract actionable knowledge from less or no data?

Let’s examine this definition and see where this leads. Extracting actionable knowledge implies that you already have some background knowledge. Build on this and formulate a hypothesis first.

A hypothesis is a testable assumption.

As an example, look at these two similar hypotheses:

Hypothesis 1: Shopify shops need Adwords help.

Hypothesis 2: Shopify shops use Adwords help.

Hypothesis 1 is difficult to test, and probably needs big data. How do you measure a need? And what is a need for Adwords help? Besides, if Shopify shops need Adwords help, do they actually purchase a service to help them?

On the other hand, hypothesis 2 is much easier to test. Have a look at the Shopify marketplace! You can easily see that multiple companies already offer different Adwords services, and that Shopify shops already bought these services. Without doing an experiment, but with a good hypothesis and a tiny bit of data (and websearch)
your hypothesis is verified.

This example leads to another hint:

Use all background knowledge you can get.

As an application of this rule Dan Norris wrote in his book “The 7 Day Startup”:

“Solve Problems where People are already paying for solutions”

This rule can help you finding a market for your product or service fast. Be aware, this rule helps you to find problems to solve. Sometimes the rule is applied in reverse: Build a solution for which people are already paying for to solve their problems. That’s usually a bad business proposition. The trick is to solve “established” problems, but offer a different solution.

When you have no data, heuristics can be of great help. Heuristics are rules of thumb. Their main advantage is that they mostly work. Their main disadvantage is that they sometimes don’t work. I think for small data, the most useful heuristic is:

Find the most important reason and ignore the rest

This rule asks you to focus on the most important thing your data can tell you. Don’t try to wrangle the data so much that you can extract whatever. Small data don’t support complex hypotheses. Heuristics often work because they were developed across many different small data samples in different situations. That means they pool lots of separate data sets into a rule of thumb.

Comparing Small and Big data

As a generalization, when you have few data, high uncertainty and many alternatives, make it as simple as possible. If you have much data, low uncertainty and few alternatives, you can build a complex (and hopefully accurate) model.

Small data	Big data
high uncertainty	low uncertainty
many alternatives	few alternatives
←	→
Make it simple	Make it complex

Small Data Thinking

Your mode of thinking influences what conclusions you reach when reasoning with small data. Here’s a list of mind tricks to help you get better decisions:

Bet against yourself – If you need to estimate a number, try to rephrase the question in terms of a bet for money: Would I bet $ for/against? When you do this, your brain gets into a more analytic mode of thinking, weighting in the possible imaginary loss of $. This often results in more accurate estimates.
Use counts to estimate unknowns, not percentages. It seems that most people (except statisticians, especially Bayesians) find it difficult to reason with chances, probability and percentages. All these problems can also reformulated in terms of counts. It seems our brain is naturally more inclined to estimate something like 20 out 100 people than to say 20%.
Reason backwards and use pattern matching. Basically this says to learn from examples and to build a logical model. This only works reliably if you have sufficient background knowledge.
Beware of the “Law of Small Numbers”. When faced with just a few data points, people often jump to conclusions, assuming a pattern in the data has some reason. The law of small data says that people reason about small samples as if they were big, statistically valid data. Unfortunately that often leads to intuitive, but wrong conclusions.

Here’s the reading list:

“Risk Savvy: How to Make Good Decisions” by Gerd Gigerenzer
“How to Measure Anything: Finding the Value of Intangibles in Business” by D.W. Hubbard
“Priceless: The Myth of Fair Value (and How to Take Advantage of it)” by W. Poundstone
“The 7 Day Startup: You Don’t Learn Until you Launch” by Dan Norris