Bernoulli distribution
Adapted from Wikipedia · Discoverer experience
In probability theory and statistics, the Bernoulli distribution is a way to describe the results of a simple yes-or-no experiment. It is named after the Swiss mathematician Jacob Bernoulli. Imagine flipping a coin: the result can only be heads or tails. In the Bernoulli distribution, we call heads "1" and tails "0". The chance of getting heads is called p, and the chance of getting tails is called q, which is simply 1 minus p.
This distribution is very useful because many real-life situations can be thought of as yes-or-no questions. For example, a student might pass a test (1) or fail it (0). Each possible outcome has its own probability. When the coin is fair, the chances of heads or tails are both 50%, but with an unfair coin, these chances can be different.
The Bernoulli distribution is a special case of something called the binomial distribution, which deals with experiments that can have more than one trial. In the Bernoulli distribution, we only look at one single trial. It is also a type of two-point distribution because the results can only be one of two values, like 0 or 1.
Properties
If a random thing, called X, follows the Bernoulli distribution, it can only have two results: 1 or 0. The chance of getting 1 is called p, and the chance of getting 0 is called q, which is simply 1 minus p.
This special way of measuring chances is a simpler version of something called the binomial distribution, where we only look at one try instead of many. The Bernoulli distribution is part of a group known as the exponential family.
Mean
The expected value of a Bernoulli random variable is p. This is because the variable can be 1 with probability p or 0 with probability q, where q is 1 minus p.
We find the expected value by multiplying each possible value by its probability and adding them together:
Expected value = Probability of 1 × 1 + Probability of 0 × 0
= p × 1 + q × 0
= p.
So, the expected value of a Bernoulli random variable is p.
Variance
The variance of a Bernoulli distribution tells us how spread out the results are. For a Bernoulli distribution, the variance is calculated as p times (1 - p). This means if the chance of getting a "yes" (or 1) is p, then the variance is p(1-p).
We can find this by looking at the expected value of X squared and subtracting the square of the expected value of X. This gives us p - p², which simplifies to p(1-p) or pq. The variance for any Bernoulli distribution will always be between 0 and 1/4.
Skewness
The skewness of a Bernoulli distribution shows how the probabilities are balanced between its two possible outcomes. It is calculated using a special formula that compares the probabilities of getting 0 or 1. This value helps us understand if the distribution leans more towards one side or if it is balanced.
Higher moments and cumulants
The Bernoulli distribution describes simple yes-or-no experiments. For any experiment, the result can be "yes" (which we call 1) with a probability of p, or "no" (which we call 0) with a probability of q = 1 - p.
When we look at repeated outcomes or more detailed patterns, we use tools like "moments" and "cumulants" to understand the distribution better. These help us describe how the results behave on average and in more complex ways.
Entropy and Fisher's Information
Entropy
Entropy tells us how much surprise or randomness there is in something. For the Bernoulli distribution, which has just two possible outcomes (like yes/no or success/failure), entropy measures this surprise.
The entropy is highest when the chances of both outcomes are equal — for example, a coin that is equally likely to land heads or tails. This means we can’t predict the result very well, so there’s more surprise. When one outcome is certain — like if a coin always lands heads — the entropy is zero because there’s no surprise at all.
Fisher's Information
Fisher information tells us how much information we can get about an unknown value from watching random events. In the Bernoulli distribution, this information is greatest when the chances of success and failure are equal (like with a fair coin). This makes sense because when both outcomes are equally likely, each result gives us the most new information about what we’re trying to learn.
Related distributions
If you have several independent tests, each with the same chance of success, their total successes follow a binomial distribution. The Bernoulli distribution is a special case of this with just one test.
The categorical distribution extends the Bernoulli idea to more possible outcomes. The Beta distribution is closely linked as a likely range for the success chance. The geometric distribution counts how many tests are needed for the first success. When the success chance is one-half, a simple change to the results gives a Rademacher distribution.
Related articles
This article is a child-friendly adaptation of the Wikipedia article on Bernoulli distribution, available under CC BY-SA 4.0.
Images from Wikimedia Commons. Tap any image to view credits and license.
Safekipedia