Where has this been all my life? The Law of Total Probability is really cool, and it seems accessible enough to be presented in high school, where it would be very useful as well, I think, although I’ve never seen it there. For example, from the book Causal Inference in Statistics we get this nice problem (in addition to the quote below): “Suppose we roll two dice, and we want to know the probability that the second roll (R2) is higher than the first (R1).” The Law of Total Probability can make answering this much more straightforward.

To understand this ‘law,’ we should start by understanding two simple things about probability. First, for any two mutually exclusive events (the events can’t happen together), the probability of \(\mathtt{A}\) or \(\mathtt{B}\) is the sum of the probability of \(\mathtt{A}\) and the probability of \(\mathtt{B}\):

\(\mathtt{P(A\text{ or }B)\,\,\,\,=\,\,\,\,\,\,P(A)\,\,\,\,+\,\,\,\,P(B)}\)

Second, thinking now of two mutually exclusive events “A and B” and “A and not-B”, we can write the probability of \(\mathtt{A}\) this way, since if \(\mathtt{A}\) is true, then either “A and B” or “A and not-B” must be true:

\(\mathtt{P(A)=P(A,B)\,\,+\,\,P(A,\text{not-}B)}\)

In different situations, however, \(\mathtt{B}\) could take on many different values—for example, the six possible values of one die roll, \(\mathtt{B_1}\)–\(\mathtt{B_6}\)—even while we’re considering just one value for the mututally exclusive event \(\mathtt{A}\)—for example, rolling a 4. The Law of Total Probability tells us that

\(\mathtt{P(A)=P(A,B_1)+\cdots+P(A,B_n)}\).

If we pull a random card from a standard deck, the probability that the card is a Jack [\(\mathtt{P(J)}\)] will be equal to the probability that it’s a Jack and a spade [\(\mathtt{P(J,C_S)}\)], plus the probability that it’s a Jack and a heart [\(\mathtt{P(J,C_H)}\)], plus the probability that it’s a Jack and a club [\(\mathtt{P(J,C_C)}\)], plus the probability that it’s a Jack and a diamond [\(\mathtt{P(J,C_D)}\)].

Now with Conditional Probabilities

Where this gets good is when we throw conditional probabilities into the mix. We can make use of the fact that \(\mathtt{P(A,B)=P(A|B)P(B)}\), where \(\mathtt{P(A|B)}\) means “the probability of A given B.” For example, the probability of randomly pulling a Jack, given that you pulled spades, is \(\mathtt{\frac{1}{13}}\), and the probability of randomly pulling a spade is \(\mathtt{\frac{1}{4}}\). Thus, the probability of pulling the Jack of spades is \(\mathtt{\frac{1}{13}\cdot \frac{1}{4}=\frac{1}{52}}\). We can, therefore, rewrite the Law of Total Probability this way:

\(\mathtt{P(A)=P(A|B_1)P(B_1)+\cdots+P(A|B_n)P(B_n)}\)

And now we’re ready to determine the probability given in the opening paragraph, \(\mathtt{P(R2>R1)}\), the probability that a second die roll is greater than the first die roll: \[\mathtt{P(R2>R1)=P(R2>R1|R1=1)P(R1=1)+\cdots P(R2>R1|R1=6)P(R1=6)}\]

The final result is \(\mathtt{\frac{5}{6}\cdot \frac{1}{6}+\frac{4}{6}\cdot \frac{1}{6}+\frac{3}{6}\cdot \frac{1}{6}+\frac{2}{6}\cdot \frac{1}{6}+\frac{1}{6}\cdot \frac{1}{6}+\frac{0}{6}\cdot \frac{1}{6}=\frac{5}{12}}\).