Reading through Judea Pearl’s Book of Why, along with Causal Inference in Statistics, co-authored by Pearl, I came across a nifty, new-to-me explanation of the famous Monty Hall Problem.

To make it clearer, we can start with a toy mathematical problem, \(\mathtt{a+b=c}\), which I model with the causal diagram below. A causal diagram is, of course, a little inappropriate for modeling a mathematical equation, but it’s also one that students may first use implicitly to think about operations and equations (and, without proper instruction, it’s one adults may use all their lives).

Here, we will think of the variables \(\mathtt{a}\) and \(\mathtt{b}\) as independent. Changing the number we substitute for \(\mathtt{a}\) does not affect our choice for \(\mathtt{b}\) and vice versa. However, \(\mathtt{a}\) and \(\mathtt{c}\) are dependent, and \(\mathtt{b}\) and \(\mathtt{c}\) are dependent as well. Increasing or decreasing \(\mathtt{a}\) or \(\mathtt{b}\) alone (or together) will have an effect on \(\mathtt{c}\).

But once we fix \(\mathtt{c}\), or “condition on \(\mathtt{c}\)” as Pearl would write, then \(\mathtt{a}\) and \(\mathtt{b}\) become dependent variables. That’s at once clear as a bell and pretty wacky. If we fix \(\mathtt{c}\) at 10, then changing \(\mathtt{a}\) will change \(\mathtt{b}\) and vice versa. But \(\mathtt{a}\) and \(\mathtt{b}\) were entirely independent prior to knowing what \(\mathtt{c}\) was. Afterwards, they’re dependent on each other. The diagram that represents this situation (above) is what Pearl calls a “collider.”

Pearl et. al also use a more everyday example:

Suppose a certain college gives scholarships to two types of students: those with unusual musical talents and those with extraordinary grade point averages. Ordinarily, musical talent and scholastic achievement are independent traits, so, in the population at large, finding a person with musical talent tells us nothing about that person’s grades. However, discovering that a person is on a scholarship changes things; knowing that the person lacks musical talent then tells us immediately that he is likely to have high grade point average. Thus, two variables that are marginally independent become dependent upon learning the value of a third variable (scholarship) that is a common effect of the first two.

For the Monty Hall problem, the collider model looks identical. And the correct model helps us see two things (forgiving some [I hope minor] mathematical sloppiness).

First, the model helps us see that Monty opening the door does not change the probability that you have made the correct initial choice (which is \(\mathtt{\frac{1}{3}}\)). It’s even accurate to say that the probability that the car is behind any of the three doors hasn’t changed from \(\mathtt{\frac{1}{3}}\). The arrows are pointing the wrong way for those things to be possibilities. In and of itself, the correct model should prevent us from upgrading the probability from \(\mathtt{\frac{1}{3}}\) to \(\mathtt{\frac{1}{2}}\) after the freebie goat door is opened.

Second, when Monty opens a door to reveal a goat (thus fixing the value of “door Monty opens”), now changing your choice of door changes the probability that the car is behind that door, since these two variables are now dependent.

Thus, since the probability of being correct must change when I change my door selection, it must change from \(\mathtt{\frac{1}{3}}\) to something else. And since the other door is the only other option, and all the probabilities in the situation must add to \(\mathtt{1}\), switching must change the probability to \(\mathtt{\frac{2}{3}}\).