Law of Iterated Expectations & Covariance

Law of Iterated Expectations

\(E[Y] = E_X[E[Y |X]].\)

The notation \(E_X[.]\) indicates the expectation over the values of \(X\). Note that \(E[Y|X]\)

is a function of \(X\).

Proof for Law of Iterated Expectations

Proof for discrete random variables:

\(E[E[Y|X]]=\sum\limits_{x} E[Y|X=x]P(X=x)
\\= \sum\limits_{x} \sum\limits_{y} yP(Y=y|X=x)P(X=x)
\\= \sum\limits_{x} \sum\limits_{y} \dfrac{yP(X=x,Y=y)}{P(X=x)}P(X=x)
\\= \sum\limits_{y} \sum\limits_{x} yP(X=x,Y=y)
\\= \sum\limits_{y} yP(Y=y)
\\= E(Y).\)

Proof for continuous random variables:

\(E[E[Y|X]]=\int_{-\infin}^{\infin}(\int_{-\infin}^{\infin}yf_{Y|X}(y|x)dy)f_X(x)dx
\\= \int_{-\infin}^{\infin}(\int_{-\infin}^{\infin}y\dfrac{f(x,y)}{f_X(x)}dy)f_X(x)dx
\\= \int_{-\infin}^{\infin} \int_{-\infin}^{\infin}yf(x,y)dxdy
\\= \int_{-\infin}^{\infin} y \int_{-\infin}^{\infin}f(x,y)dxdy
\\= \int_{-\infin}^{\infin} y f_Y(y)dy
\\= E(Y).\)

The process of the proving includes the concept of conditional expectation, which can be learned from this article.

Covariance

In any bivariate distribution,

\(Cov[X, Y] = Cov_X[X, E[Y| X]] = \int_x(x - E[X]) E[Y| X]f_X(x) dx.\)

(Note that this is the covariance of \(x\) and a function of \(x\).)

Proof for discrete random variables:

\(Cov[X,E[Y|X]] = E[X-E[X]][E[Y|X]-E[E[Y|X]]]
\\=E[X-E[X]][E[Y|X]-E[Y]]
\\=E\{[X-E[X]]E[Y|X]-[X-E[X]]E[Y]\}
\\=E[X-E[X]]E[Y|X]-E[X-E[X]]E[Y]
\\=E[XE[Y|X]-E[X]E[Y|X]]-E[X-E[X]]E[Y]
\\=E[XE[Y|X]]-E[X]E[E[Y|X]]-E[X-E[X]]E[Y]
\\=E[XE[Y|X]]-E[X]E[Y]-E[X-E[X]]E[Y]
\\=E[x\sum\limits_y y P(Y=y|X=x)]-E[X]E[Y]
\\=\sum\limits_x \{x[\sum\limits_y y P(Y=y|X=x)]P(X=x)\}-E[X]E[Y]
\\=\sum\limits_x \{x[\sum\limits_y \dfrac{yP(X=x,Y=y)}{P(X=x)}] P(X=x) \}-E[X]E[Y]
\\=\sum\limits_x \sum\limits_y x y P(X=x,Y=y)-E[X]E[Y]
\\=E[X Y]-E[X]E[Y]
\\=Cov[X,Y].\)

Key Steps: \(E[XE[Y|X]]=E[X Y]\), \(E[E[Y|X]]=E[Y]\).

Proof for continuous random variables:

\(Cov[X,E[Y|X]]=E[XE[Y|X]]-E[X]E[Y]
\\=E[x \int_{-\infin}^{\infin} y f_{Y|X}(y|x)dy]-E[X]E[Y]
\\=E[x \int_{-\infin}^{\infin} y \dfrac{f(x, y)}{f_X(x)}dy]-E[X]E[Y]
\\=\int_{-\infin}^{\infin}[x \int_{-\infin}^{\infin} y \dfrac{f(x, y)}{f_X(x)}dy]f_X(x)dx-E[X]E[Y]
\\=\int_{-\infin}^{\infin} \int_{-\infin}^{\infin} x y f(x, y)dydx-E[X]E[Y]
\\=E[X Y]-E[X]E[Y]
\\=Cov[X,Y].\)

Inference

If random variable \(\epsilon\) is mean independent of random variable \(X\), then \(\epsilon\) and \(X\) are linear irrelevant i.e. \(E[\epsilon|X] = E[\epsilon](=0) \Rightarrow \rho_{\epsilon X}=0\)

Proof

\(E[\epsilon|X] = E[\epsilon](=0), Cov(\epsilon, X)=Cov(E[\epsilon|X],X) = Cov(E[\epsilon],X) = 0 \Rightarrow \rho_{\epsilon X} = 0 .\)

Decomposition of Variance OR Law of Total Variance

In a joint distribution,

\(Var[Y] = Var_X[E[Y| X]] + E_X[Var[Y| X]].\)

Proof for Law of Total Variance

The proof above uses the law of iterated expectations several times. A deeper and more direct understanding of the Law of Total Variance and whose relation to the K-means cluster and OLS can be found in this article.