Countries
Close
Home How to Lie with Statistics – Misleading Ways to Use Numbers

# How to Lie with Statistics – Misleading Ways to Use Numbers

Disclosure
In our content, we occasionally include affiliate links. Should you click on these links, we may earn a commission, though this incurs no additional cost to you. Your use of this website signifies your acceptance of our terms and conditions as well as our privacy policy.

Why Trust Tech Report

Tech Report is one of the oldest hardware, news, and tech review sites on the internet. We write helpful technology guides, unbiased product reviews, and report on the latest tech and crypto news. We maintain editorial independence and consider content quality and factual accuracy to be non-negotiable.

The internet makes it easier than ever to find useful information. Unfortunately, it’s also easy to find half-truths and lies. Learning about how to lie with statistics will show you the most common methods to misrepresent data. Next time you see it, you’ll know what to look for.

As the saying goes, ‘A half-truth is the best lie.’ As you’ll see, the top methods for how to lie with statistics typically rely on factual data, which makes it all the more difficult to spot misinformation.

So, let’s see how to lie with statistics, and let’s dissect some real-life examples. This way, you’ll be better prepared to separate fact from fiction.

In This Guide

## How to Lie with Statistics – Top Ways to Misrepresent Data

There are multiple techniques for how to lie with statistics. The numbers don’t have to be fabricated, either. As you’ll see, we can use factual data but still twist the truth either willingly or by mistake.

We can do this in two ways – either by misrepresenting the data itself or by using flawed reasoning to interpret findings. First, let’s see how to lie with statistics by misrepresenting the numbers.

### Misusing ‘Averages’

In everyday speech, we often say ‘average’ to denote a typical value representative of a bigger data set. But in a mathematical sense, ‘average’ could refer to a number of values, primarily the mean or the median.

Each of these ‘averages’ could represent different numbers, so the distinction is important.

Here’s how each is calculated:

1️⃣The mean adds all the values in a set and divides the total sum by the count of values.

For example, the average of the numbers {2, 30, 40, 300} added together and divided by 4 equals 93.

2️⃣. The median arranges the values in a set from smallest to largest and takes the middle value based on the total number of values.

In a set of 5 values, the median would be the third value in the middle. If the set has an even number of values, we must calculate the mean value of the two middle numbers. For example, the median of {2, 30, 40, 300} is (30 + 40) / 2, which equals 35.

The mean and median averages can vary considerably depending on the range of values in the total data set. Opting to use one over the other can skew the readers’ perception, even if the numbers being used are factually correct.

Using the mean as an ‘average’ could be especially misleading if the data set contains major outliers.

This often happens when reporting average salary figures, for example. Including the salaries of the top earners can skew the mean value considerably, making it seem like the ‘average’ salary is higher than it actually is.

Say you have a group of 10 people. Two people earn \$20k/year, six people earn \$70k/year, and two people earn \$300k/year.

Calculating the mean value would give us an ‘average’ salary of \$106k/year. But looking at the actual salary distribution in the data set, you wouldn’t say the average person in the group earns six figures a year when the majority earn \$70k/year.

That’s why using the median is preferable when dealing with outliers.

#### Real Example

PolitiFact shows in a 2024 article how ‘average’ salary figures can be misleading. In an April 8 speech, president Joe Biden stated the average salary in the semiconductor field is \$110,000.

There’s some truth here; the ‘average’ (mean) salary in the semiconductor industry is over \$100k/year. However, this figure includes all the jobs within the industry, including those high-paid ones that do require a degree.

According to PolitiFact, people without a college degree in the semiconductor industry earn around \$40k/year. Those with an associate degree earn up to \$70k, while those with graduate degrees make up to \$160k.

The highest salaries skew the mean average higher, so the numbers don’t tell the whole story. In this case, the industry’s ‘average’ salary doesn’t actually apply to the average worker.

### Conflating Percentages

Conflating percentage changes and percentage point changes is an equally inconspicuous way to paint different pictures using the same data.

Here’s how they differ:
1️⃣ The percentage change measures the relative change between two values and expresses the result as a percentage of the original value.

For example, if the share of unemployed people in a given city went from 4% to 8%, that would represent a doubling of the original value – a +100% percentage change. If the share went from 4% to 2%, that’s a halving of the original value, so a -50% percentage change.

2️⃣ The percentage point change measures the absolute difference between two percentages.

Using the percentage point change in the previous scenario, if the share of unemployed workers went from 4% to 8%, that would yield a +4% percentage point increase. If the share of unemployed workers halved from 4% to 2%, that’s a -2% percentage point change.

It would be equally correct to use either 100% or 4% to report on the example above. But each number could have a different impact on readers, with bigger numbers being more attention grabbing.

Using the relative percentage change could overstate the magnitude of a given change, especially when the absolute change is otherwise small.

If we don’t know the absolute percentage point difference, we’re left wondering about the change’s true impact.

#### Real Example

Using the percentage change is common in medical reporting and is not meant to mislead the average reader.

The relative value change provides a standardized way to compare and communicate the effects of interventions across different studies or populations.

However, you can imagine how the title below could make the findings seem much worse to the layperson if we don’t know the actual baseline risk and the absolute percentage point change in risk.

Let’s say the absolute risk of Type 2 diabetes in the general population is 3%. Then, a 50% relative risk increase could bump the risk of developing diabetes to 4.5% (a +1.5% percentage point increase). One percentage sounds more worrisome than the other.

In other cases, using the percentage change can mislead readers and sway public opinion on a given topic.

A 52% increase in murder rate sounds disastrous, but what would this actually mean for the average Tallahassee inhabitant?

According to PolitiFact, this 52% increase refers to the murder rate between 2002–2009 before Andrew Gillum became mayor, and 2010–2017, throughout his tenure. In 2002–209, the murder rate in Tallahassee was 4.6 murders per 100,000 citizens.

From 2010 to 2017, the murder rate increased to 7 murders per 100,000 citizens. This translates to a percentage change of 52%.

However, this doesn’t mean the absolute murder rate in Tallahassee is now over 50%. The murder rate would still be under 0.1 per 1,000.

Numbers can tell us a lot about a topic, but pictures help us better visualize the data. Graphs also help us communicate information in a quick and easy-to-read format.

You don’t even have to read graphs because the visuals readily show you the information at a glance. Except, the format of a graph may not always be so straightforward. Sometimes, graphs can give you the wrong idea until you look closer.

There are two main ways to create a misleading graph, even while using the correct data:

#### Using Unlabeled Graphs

Unlabeled graphs omit critical information needed to accurately interpret the data. Without all the context surrounding the visuals, we can’t actually understand the meaning or relevance of the information. Consider this example:

Here, we have a graph that tells us basically nothing except that the average temperature is going up. Is this increase something to worry about?

We can’t tell, because we don’t know the timeframe or real magnitude of the change, so we can’t draw an informed conclusion from this picture.

However, the information used to plot this graph is factual. We used the average high temperature in Washington, D.C., from January to June to create this graph.

The vertical axis represents the temperature change, and the horizontal axis represents each of the six months. With this information, we now know this chart shows us a seasonal fluctuation in temperature.

#### Using Graphs Not Starting at 0

Graphs measure value changes on a scale, typically starting from 0 as a reference point.

When graphs don’t start at 0, this can distort the proportions of bars and lines and exaggerate the visual change between two values. This distortion can happen even when the absolute change is very small.

Starting the graph at a different reference point can make even minute differences seem like exponential changes.

Bonus points if the graphs use fancy visuals, which can sometimes lead to unintentionally hilarious results like this:

This graph starts at 5’0’’ as a reference point, and this distorts the absolute difference between the heights on the chart. This is how a 5-inch difference makes Indian women appear significantly shorter than Latvian women, to say the least.

Here’s how the chart would look like if starting from 0:

As you can see, the visual difference between the bars is much smaller now. Note that the vertical axis represents the same heights as before, but in centimeters.

#### Real Example

Below, you can see an example of a graph that doesn’t start at zero. To make things worse, the chart’s y-axis is also unlabeled, so we can’t see at first glance where the data range starts or stops.

This chart shows us the difference in the number of people with a job and people on welfare. The problem is that the graph makes a 6.8% difference look like a 500% difference because of the distorted scale.

### Cherry Picking Information

‘Cherry picking’ refers to the act of using only the information that supports one’s point, leaving out the bigger context and contradictory data.

While the information presented might be true by itself, we can still arrive at erroneous conclusions if we don’t know the bigger picture.

Cherry-picking can occur at any point when working with data, including when collecting, analyzing, and referencing data to draw conclusions.

How It Works
Let’s say we wanted to show that protein powder improves heart health, so we compared two groups of people – those who do and those who don’t consume protein powder.

We chose to only look at those who consume protein powder while also going to the gym. This would be a sampling bias because gym goers typically partake in other heart-healthy habits, like regular exercise and eating a balanced diet.

Our analysis might show that people who use protein powder have better heart health. But we can’t tell if the results are due to the protein or other differences between gym goers and the broader population.

Another way to support our premise is to select only the studies that show protein powder is heart-healthy.

For example, we could have 10 studies that show protein powder has no effects on heart health and 2 studies that say protein powder improves heart health. We then reference only the latter two studies, even though the majority of the data points in a different direction.

#### Real Example

One of the best examples of data cherry-picking is the infamous chocolate hoax study, which found that eating chocolate helps with weight loss.

However, according to study lead author John Bohannon, the study design was intentionally shoddy, and the research team selectively picked positive findings to support the benefits of chocolate.

As explained by the author, if you use a small sample of people and measure a large number of metrics, it’s almost guaranteed to find a noteworthy result by chance.

The same concept applies to any other topic where zooming in or out on a data set alters the findings. This often happens in discussions about climate change, for example.

Using the same data but focusing only on a limited timespan can give the illusion of a stable trend. While a statement might be true, it could still misrepresent the wider picture.

### Using A ‘Semi-attached Figure’

A ‘semi-attached figure’ is an unrelated piece of information cited to support a claim. Two topics might seem connected at first glance, but closer inspection shows the cited data doesn’t prove the point.

If you can’t prove what you want to prove, demonstrate something else and pretend that they are the same thing.Darrell Huff, author of ‘How to Lie with Statistics’

Semi-attached figures can be difficult to spot because they make intuitive sense. As a general rule, when someone uses data from a study on one topic to try to prove a different point, that’s a semi-attached figure.

#### What It Looks Like

Imagine a random commercial promoting a biotin-infused shampoo. The ad starts off by mentioning the importance of biotin, truthfully claiming that biotin is an essential vitamin that plays a role in hair health.

Then, the ad claims their new shampoo formula contains biotin. Based on this information, we might be tempted to infer this shampoo helps us achieve healthy-looking hair. Both statements in the ad are true, but our conclusion might be incorrect.

While biotin is an essential nutrient for hair health, the information cited doesn’t prove that topical application of biotin-infused shampoo makes hair healthier. Maybe the shampoo works; maybe it doesn’t. Based on the data provided, we can’t support either conclusion.

#### Real Example

In 2009, an ad for Kellogg’s Frosted Mini-Wheats cereal claimed the product could boost children’s attentiveness by nearly 20%. The claim was based on a study commissioned by Kellogg, which showed that children who ate the cereal improved their attentiveness.

The problem is the data didn’t actually support the claim. The cited study didn’t compare Kellogg’s cereal to other cereal or other breakfast foods. In fact, the control group in the study received no breakfast at all.

After being sued and reaching a settlement, Kellogg agreed to drop the 20% claim. However, the company would still be able to claim proof that kids who eat a filling breakfast (like mini-wheats) are more attentive than kids who skip breakfast.

A similar case occurred with Coca-Cola’s VitaminWater advertising a few years back when the company made product health claims based on the vitamin content in the beverage.

## Logical Fallacies When Using Statistics

Logical fallacies are common reasoning errors used when building arguments. The data we use could be factual, and the resulting argument might sound convincing, but the conclusion is logically flawed.

Logical fallacies can be innocent mistakes but equally insidious as intentionally misrepresenting data. Let’s see some of the most common logical fallacies and how to lie with statistics using them.

### Correlational Fallacies

A correlational fallacy implies deducing a cause-and-effect relationship between two elements only because they appear to be associated.

For example, we might notice that people who own ashtrays are more likely to get lung cancer. In this scenario, ashtray ownership and lung cancer rates are correlated, but it would be factually wrong to claim one causes the other.

In this case, smoking is the third factor that influences both ashtray ownership and increased rates of lung cancer.

Reverse causation is another type of correlational fallacy where two events are correlated, but the cause and effect are reversed. The observed effect is assumed to be the cause and vice versa.

For example, there’s a correlation between painkiller use and chronic headaches because people with chronic headaches take painkillers to relieve the pain. Saying that painkillers cause headaches would be a reverse causation fallacy.

#### The Post Hoc Fallacy

The post hoc fallacy is very similar to correlational fallacies. It implies a cause-and-effect conclusion between two events that occur one after another.

It sounds something like this– Event A happened, then event B happened. Therefore, A caused B. Here’s how it would look in practice…

Sugar consumption in the US started going down in 2000. A few years later, the obesity rate went up. Therefore, eating less sugar made Americans put on weight. And we have a fancy-looking chart to show you the correlation.

Maybe we should all start eating more sugar to prevent weight gain. That’s obviously not the case. This just goes to show that two things may not be causally related, even if they happen in close succession.

#### Real Example

We could easily find correlations to support any claim, including the opposite of what we know to be true. For example, the smoker’s paradox or the obesity paradox are both based on correlations between smoking or overweight status and better health outcomes.

More studies found a correlation between lower weight and increased mortality, particularly in heart failure patients.

These findings are sometimes interpreted as ‘people who are overweight are less likely to die than normal weight people.’ or even ‘Overweight people live longer.’

But the correlation between higher BMI and better survival doesn’t prove that being overweight causes better outcomes. In fact, this might be a case of reverse causation, where lower weight is seen as a cause rather than a consequence of poorer health.

### The Base Rate Fallacy

The base rate fallacy is a reasoning error where someone focuses primarily on isolated information (e.g.: an event occurring in a specific case) and ignores the general picture (the base rate of an event occurring at large).

For example, let’s say you have to flip a coin 10 times, and there’s a 50/50 baseline chance that a coin toss will land on either heads or tails.

Your coin lands on heads 7 times in a row, so you might conclude there’s a higher chance the next 3 flips will also be heads. But the base rate is still the same, so you still have a 50% chance to land tails.

#### Real Example

The following statistics were used to show that the vaccinated population in the UK suffered more deaths compared to the unvaccinated population. This is factually true, but the assertion that the vaccine is ineffective or dangerous would be incorrect.

According to Reuters, far more people in the UK had taken the vaccine. This means that vaccinated people far outweigh the unvaccinated ones, so the statistic ignores the base rate of vaccinated people in the total population.

Since the vaccinated population is much higher to begin with, more deaths are expected in this group. Adjusting for group size would give a fairer comparison.

When comparing the mortality rate per 100,000 people, the mortality rate in the unvaccinated population was higher.

## What Information Should You Trust?

It’s always best to go straight to the source and get the facts yourself instead of taking somebody’s claims for granted. This is easier said than done.

All studies have limitations that can affect results; credible sources acknowledge these shortcomings.

Although the scientific method helps us make sense of the world around us, our conclusions are only as accurate as the data they’re based on. Someone can correctly use a source that supports their claim, but the facts can still be wrong.

Data is simply inaccurate sometimes. That’s why we see contradictory information on the same topic and why you could find sources to support any idea.

But there are ways to better understand it all. The evidence hierarchy helps us prioritize information and understand where different findings fit into the larger context.

As a general rule, information from studies higher up on the evidence hierarchy is more reliable. This factsheet from FACSIAR explains how the different study types compare.

The studies at the base of the pyramid provide weaker evidence and carry more limitations than the studies at the top. This doesn’t mean they’re guaranteed to be wrong, but it’s more likely.

### Comparing Levels of Evidence

Case studies are an example of ‘weaker’ evidence. They’re low on the evidence pyramid because they are more likely to be influenced by unintentional bias.

Such studies typically involve smaller groups of participants and rely on methods like interviews or observations, which can have limitations that can impact study findings.

Smaller groups may not adequately represent the broader population, so the findings could be biased by chance. Data collection methods like surveys can also lead to inaccurate findings because of the participant’s subjective interpretation of survey questions.

Small study findings don’t always pan out when repeated on larger populations. However, they provide insight into potential trends and play a crucial role in generating new hypotheses for future research.

Meta-analyses are at the other end and provide the strongest level of evidence, so their findings are generally more reliable.

A meta-analysis pools together multiple studies that address the same question, which boosts the overall sample size and lowers the risk of biased results.

Each meta-analysis can encompass hundreds or even thousands of studies, offering a thorough summary of the prevailing findings on a research topic.

Thanks to this, they give us a quick overview of the scientific consensus and help us avoid confirmation bias when faced with contradictory information.

But even these comprehensive analyses aren’t perfect. The quality of a meta-analysis depends on the quality of the studies in it. A collection of predominantly low-quality data will give us misleading results, a phenomenon often referred to as ‘garbage in—garbage out.’

The key takeaway is that no research method is perfect, and science can get it wrong sometimes. But those truly seeking knowledge will change their opinions when mounting evidence proves them wrong, rather than clinging to incorrect beliefs for personal comfort.

## In Conclusion

Understanding how data can be manipulated helps us discern truth from fiction. The most common methods include misleading average figures, conflating percentage changes, and using distorted graphs or cherry-picked information.

But it’s also important to look beyond surface-level statistics and consider the context and methodology behind the data. Even seemingly credible numbers and studies can lie, so it’s best to do our due diligence instead of taking ‘facts’ at face value.

Ultimately, critical thinking and a skeptical approach to statistics will help us navigate the sea of information and avoid being deceived by interest groups or false advertising.

## References

Click to expand and view sources

#### Our Editorial Process

The Tech Report editorial policy is centered on providing helpful, accurate content that offers real value to our readers. We only work with experienced writers who have specific knowledge in the topics they cover, including latest developments in technology, online privacy, cryptocurrencies, software, and more. Our editorial policy ensures that each topic is researched and curated by our in-house editors. We maintain rigorous journalistic standards, and every article is 100% written by real authors.

#### Diana Ploscaru Statistics & Tech Content Contributor

Diana is a seasoned writer with over four years of freelancing experience. Using her keen interest in statistics and data analysis, she specializes in crafting informative and practical content across various interesting topics.

She's also passionate about workflow optimization, constantly researching and trying out the newest tools and project management software. Because it's always exciting to find new ways to streamline daily tasks!

In her free time, she enjoys studying foreign languages and going for hour-long walks to reach her daily step goal.

## Latest News

News

Crypto News

#### MicroStrategy’s Success Inspired Semler Scientific’s Bitcoin Strategy Adoption

In a recent interview, Eric Semler revealed what spurred his company, Semler Scientific, to adopt a Bitcoin Strategy. Before making the move, Semler and the company’s board studied MicroStrategy’s success...

Crypto News

#### Hong Kong-based Virtual Bank Mox Introduces Crypto ETF Trading

Mox Bank, a Hong Kong-based virtual bank, has launched crypto ETF trading for customers. According to the August 7 announcement, the bank will only allow Bitcoin and Ether ETF trading...

Crypto News

Crypto News

Crypto News

Crypto News