Bernoulli distribution

In probability theory and statistics, the Bernoulli distribution is a way to describe the results of a simple yes-or-no experiment. It is named after the Swiss mathematician Jacob Bernoulli. Imagine flipping a coin: the result can only be heads or tails. In the Bernoulli distribution, we call heads "1" and tails "0". The chance of getting heads is called p, and the chance of getting tails is called q, which is simply 1 minus p.

This distribution is very useful because many real-life situations can be thought of as yes-or-no questions. For example, a student might pass a test (1) or fail it (0). Each possible outcome has its own probability. When the coin is fair, the chances of heads or tails are both 50%, but with an unfair coin, these chances can be different.

The Bernoulli distribution is a special case of something called the binomial distribution, which deals with experiments that can have more than one trial. In the Bernoulli distribution, we only look at one single trial. It is also a type of two-point distribution because the results can only be one of two values, like 0 or 1.

Properties

The probability mass distribution function of a Bernoulli experiment along with its corresponding cumulative distribution function

If a random thing, called X, follows the Bernoulli distribution, it can only have two results: 1 or 0. The chance of getting 1 is called p, and the chance of getting 0 is called q, which is simply 1 minus p.

This special way of measuring chances is a simpler version of something called the binomial distribution, where we only look at one try instead of many. The Bernoulli distribution is part of a group known as the exponential family.

Mean

The expected value of a Bernoulli random variable is p. This is because the variable can be 1 with probability p or 0 with probability q, where q is 1 minus p.

We find the expected value by multiplying each possible value by its probability and adding them together:

Expected value = Probability of 1 × 1 + Probability of 0 × 0
= p × 1 + q × 0
= p.

So, the expected value of a Bernoulli random variable is p.

expected value

Variance

The variance of a Bernoulli distribution tells us how spread out the results are. For a Bernoulli distribution, the variance is calculated as p times (1 - p). This means if the chance of getting a "yes" (or 1) is p, then the variance is p(1-p).

We can find this by looking at the expected value of X squared and subtracting the square of the expected value of X. This gives us p - p², which simplifies to p(1-p) or pq. The variance for any Bernoulli distribution will always be between 0 and 1/4.

Skewness

The skewness of a Bernoulli distribution shows how the probabilities are balanced between its two possible outcomes. It is calculated using a special formula that compares the probabilities of getting 0 or 1. This value helps us understand if the distribution leans more towards one side or if it is balanced.

Higher moments and cumulants

The Bernoulli distribution describes simple yes-or-no experiments. For any experiment, the result can be "yes" (which we call 1) with a probability of p, or "no" (which we call 0) with a probability of q = 1 - p.

When we look at repeated outcomes or more detailed patterns, we use tools like "moments" and "cumulants" to understand the distribution better. These help us describe how the results behave on average and in more complex ways.

Entropy and Fisher's Information

Entropy

Entropy tells us how much surprise or randomness there is in something. For the Bernoulli distribution, which has just two possible outcomes (like yes/no or success/failure), entropy measures this surprise.

The entropy is highest when the chances of both outcomes are equal — for example, a coin that is equally likely to land heads or tails. This means we can’t predict the result very well, so there’s more surprise. When one outcome is certain — like if a coin always lands heads — the entropy is zero because there’s no surprise at all.

Fisher's Information

Fisher information tells us how much information we can get about an unknown value from watching random events. In the Bernoulli distribution, this information is greatest when the chances of success and failure are equal (like with a fair coin). This makes sense because when both outcomes are equally likely, each result gives us the most new information about what we’re trying to learn.

Related distributions

If you have several independent tests, each with the same chance of success, their total successes follow a binomial distribution. The Bernoulli distribution is a special case of this with just one test.

The categorical distribution extends the Bernoulli idea to more possible outcomes. The Beta distribution is closely linked as a likely range for the success chance. The geometric distribution counts how many tests are needed for the first success. When the success chance is one-half, a simple change to the results gives a Rademacher distribution.