Graph

Probability graphical model

$P(D,I,G,S,L) = P(D)P(I|D)P(G|D,I)P(S|D,I,G)P(L|D,I,G,S).$

$P(A,B,C,D) = P(A)P(B)P(C|A,B)P(D|B)$

Independent parameters.

How many independent parameters are required to uniquely define the CPD of C (the conditional probability distribution associated with the variable C) in the same graphical model as above, if A, B, and D are binary, and C and E have three values each?

here’s a brief explanation: A multinomial distribution over m possibilities x1,…,xm has m parameters, but m−1 independent parameters, because we have the constraint that all parameters must sum to 1, so that if you specify m−1 of the parameters, the final one is fixed. In a CPD P(X

Y), if X has m values and Y has k values, then we have k distinct multinomial distributions, one for each value of Y, and we have m−1 independent parameters in each of them, for a total of k(m−1). More generally, in a CPD P(X

Y1,…,Yr), if each Yi has ki values, we have a total of k1×…×kr×(m−1) independent parameters.

Example: Let’s say we have a graphical model that just had X→Y, where both variables are binary. In this scenario, we need 1 parameter to define the CPD of X. The CPD of X contains two entries P(X=0) and P(X=1). Since the sum of these two entries has to be equal to 1, we only need one parameter to define the CPD.

Now we look at Y. The CPD for Y contains 4 entries which correspond to: P(Y=0

X=0),P(Y=1

X=0),P(Y=0

X=1),P(Y=1

X=1). Note that P(Y=0

X=0) and P(Y=1

X=0) should sum to one, so we need 1 independent parameter to describe those two entries; likewise, P(Y=0

X=1) and P(Y=1

X=1) should also sum to 1, so we need 1 independent parameter for those two entries.

Therefore, we need 1 independent parameter to define the CPD of X and 2 independent parameters to define the CPD of Y.

*Inter-causal reasoning.

Consider the following model for traffic jams in a small town, which we assume can be caused by a car accident, or by a visit from the president (and the accompanying security motorcade).

Calculate P(Accident = 1

Traffic = 1) and P(Accident = 1

Traffic = 1, President = 1). Separate your answers with a space, e.g., an answer of

0.15 0.25

means that P(Accident = 1

Traffic = 1) = 0.15 and P(Accident = 1

Traffic = 1, President = 1) = 0.25. Round your answers to two decimal places and write a leading zero, like in the example above.

To calculate the required values, we can apply Bayes’ rule. For instance,

P(A=1|T=1,P=1)=P(A=1,T=1,P=1)P(T=1,P=1)=P(A=1,T=1,P=1)P(A=0,T=1,P=1)+P(A=1,T=1,P=1). We can then use the chain rule of Bayesian networks to substitute the correct values in, e.g.,

P(A=1,T=1,P=1)=P(P=1)×P(A=1)×P(T=1|P=1,A=1) This example of inter-causal reasoning meshes well with common sense: if we see a traffic jam, the probability that there was a car accident is relatively high. However, if we also see that the president is visiting town, we can reason that the president’s visit is the cause of the traffic jam; the probability that there was a car accident therefore drops correspondingly.