Statistical inference

What is statistical inference?

Statistical inference is a way to use information from a small group of things to learn about a bigger group. It helps us make smart guesses about what we haven’t seen yet, using what we have seen. For example, if we study a few plants and find out how tall they grow, we can use that to guess how tall plants of the same kind usually grow.

How is this different from just describing what we see?

This kind of thinking is different from just describing what we see. When we describe what we see, we only talk about the things right in front of us. But with statistical inference, we try to understand the whole group, even the parts we didn’t see.

Using "inference" in computer learning

In computer learning, people sometimes use the word "inference" to mean making a prediction. They use a model they already built to guess what might happen next. This is a different use of the word, but it still means using what we know to figure out something new.

Introduction

Statistical inference helps us learn about a large group by looking at just a small part of it. We use samples of data to make guesses about the whole group. First, we choose a model to explain how the data might look. Then we use that model to make guesses.

Some common ways to share what we learn include:

A single best guess called a point estimate
A range of likely values, like a confidence interval
A range showing where we think the true value lies
Saying a guess is wrong by rejecting a hypothesis
Grouping data points together or sorting them into categories

Models and assumptions

Main articles: Statistical model and Statistical assumptions

Statistical inference uses basic ideas or "assumptions" to work. A statistical model is a set of ideas about how data is created and what we can learn from it. We often start with simple descriptions, called descriptive statistics, before making deeper guesses.

Statisticians talk about three levels of these ideas or models:

Fully parametric: We guess that data comes from a special group of patterns with just a few unknown pieces. For example, we might think values in a group follow a "Normal" pattern, with an unknown average and spread. Datasets are taken by simple random picking. The generalized linear models are a common flexible group of these models.
Non-parametric: We make fewer guesses about how data is created. For example, any continuous pattern has a middle value, which we can guess using the middle of our data or the Hodges–Lehmann–Sen estimator, which works well when data is picked randomly.
Semi-parametric: These are guesses that are between fully and non-parametric. For example, we might guess that a group's average is a certain number. We might also guess that the average response changes in a straight way with some other value (a parametric guess) but not guess anything about how spread out the responses are. The well-known Cox model uses semi-parametric guesses.

No matter which level of guess we use, our conclusions are only good if our guesses match how the data was really created.

With very large groups of data, the pattern of averages tends to look Normal, thanks to a rule called the central limit theorem. This helps when we have lots of data points.

Randomization-based models

Main article: Randomization

See also: Random sample and Random assignment

When data comes from a planned random design, we can study how a number changes under all possible plans. This helps us make conclusions without needing extra guesses. This works well in surveys and experiments. In Bayesian inference, randomization helps make sure samples match the group they come from.

Randomization lets us make clear and fair rules for analysis. Many experts like using randomization when it is possible. But sometimes, planned experiments cost too much without giving better results. Observational studies can sometimes be just as good as poor planned experiments.

Analyzing planned experiments often uses statistical models, but we need to know the randomization plan to pick the right model. Ignoring the plan can give wrong results.

Model-free methods give another way to study data from planned experiments. These methods change and learn from the data as they go. For example, in simple straight-line models, we can study how one thing changes with another, using either random or fixed designs, and still get good results under some conditions.

Paradigms for inference

Different schools of statistical inference are well-known. These schools, or "paradigms," work well together. Methods from one often make sense in others.

Bandyopadhyay and Forster describe four main paradigms: The classical (or frequentist) paradigm, the Bayesian paradigm, the likelihoodist paradigm, and the Akaikean-Information Criterion-based paradigm.

Frequentist inference

Main article: Frequentist inference

This paradigm looks at how likely ideas are by thinking about repeating experiments many times. By looking at how data would look if we did the experiment many times, we can measure how trustworthy a result is.

Examples of frequentist inference

Bayesian inference

Examples of Bayesian inference

Credible interval for interval estimation
Bayes factors for model comparison

Likelihood-based inference

Main article: Likelihoodism

Likelihood-based inference is a way to guess the secrets of a statistics problem using what we see. Likelihoodism uses the likelihood function to show how likely our data is, if we guess certain values for our secrets. In likelihood-based inference, the goal is to find the guesses that make the data most likely.

The steps in likelihood-based inference usually are:

Creating the statistics model: We decide what we think about the data and what we don’t know.
Building the likelihood function: We use our model to see how likely different guesses are.
Finding the best guesses: We use math to find the guesses that make the data most likely.
Checking how sure we are: We see how much we can trust our guesses.
Checking our model: We make sure our ideas about the data make sense.
Making conclusions: We use what we found to tell us about the real world or test ideas.

AIC-based inference

Main article: Akaike information criterion

The Akaike information criterion (AIC) helps us pick the best statistics models for our data. Given several models, AIC tells us how good each one is compared to the others. It helps us choose between model selection.

AIC is based on information theory: it tells us how much information we lose when we use a model to describe what really happened to create the data. It balances how well the model fits the data and how simple the model is.

Other paradigms for inference

Minimum description length

Main article: Minimum description length

The minimum description length (MDL) idea comes from information theory. MDL picks models that make the data easy to describe.

The MDL idea has been used in many areas, like making codes for communication, linear regression, and finding patterns in large amounts of data.

Fiducial inference

Main article: Fiducial inference

Fiducial inference was an older way to make conclusions from data. Later work showed this way had limits, but it can still be useful in some cases. Some tried to connect early ideas of fiducial argument to newer theories.

Structural inference

Building on older ideas from 1938 to 1939, George A. Barnard made a new way called "structural inference" or "pivotal inference." Donald A. S. Fraser made a bigger theory for structural inference using group theory.

Inference topics

Statistical inference helps us understand data and make good guesses about larger groups of information.

Key topics include statistical assumptions, statistical decision theory, estimation theory, and statistical hypothesis testing. Other topics are revising opinions in statistics, design of experiments, the analysis of variance, and regression. We also study survey sampling and summarizing statistical data.

Predictive inference

Predictive inference is a way to guess what might happen next by looking at what has already happened. It uses past information to make predictions about the future.

At first, this method focused on things we could see and measure. Later, a new idea changed how people thought about it. This new idea said that future events should be similar to past events. This thought became well-known after a paper was translated into English in 1974. Since then, many experts have supported this way of thinking.