# 协方差与协方差矩阵

Please refresh the page if equations are not rendered correctly.
---------------------------------------------------------------

## 1. 协方差

\operatorname{cov}(X, Y)=\mathrm{E}[(X-\mu)(Y-\nu)]

## 2.协方差矩阵

\operatorname{Covariance \ Matrix \ \mathbf{C}}=\frac{1}{m-1}\left[\begin{array}{cccc}
\operatorname{cov}\left(x_1, x_1\right)&\operatorname{cov}\left(x_1, x_2\right)&\ldots&\operatorname{cov}\left(x_1, x_n\right) \\
\operatorname{cov}\left(x_2, x_1\right)&\operatorname{cov}\left(x_2, x_2\right)&\ldots&\operatorname{cov}\left(x_2, x_n\right) \\
\vdots&\vdots&\ddots&\vdots \\
\operatorname{cov}\left(x_n, x_1\right)&\operatorname{cov}\left(x_n, x_2\right)&\ldots&\operatorname{cov}\left(x_n, x_n\right)
\end{array}\right]

c_{i j}=\operatorname{cov}\left(x_i, x_j\right)=\mathrm{E}\left[\left(x_i-\mu_i\right)\left(x_j-\mu_j\right)\right]

\mathbf{C} =\mathrm{E}\left[(\mathbf{X}-\mathrm{E}[\mathbf{X}])(\mathbf{X}-\mathrm{E}[\mathbf{X}])^{\mathrm{T}}\right]

Nomenclatures differ. Some statisticians, following the probabilist William Feller in his two-volume book A n Introduction to Probability Theory and Its Applications, { }^{[2]} call the matrix \mathrm{K}_{\mathbf{X X}} the variance of the random vector \mathbf{X}, because it is the natural generalization to higher dimensions of the 1-dimensional variance. Others call it the covariance matrix, because it is the matrix of covariances between the scalar components of the vector \mathbf{X}.

\operatorname{var}(\mathbf{X})=\operatorname{cov}(\mathbf{X}, \mathbf{X})=\mathrm{E}\left[(\mathbf{X}-\mathrm{E}[\mathbf{X}])(\mathbf{X}-\mathrm{E}[\mathbf{X}])^{\mathrm{T}}\right] .

Both forms are quite standard, and there is no ambiguity between them. The matrix \mathrm{K}_{\mathbf{X X}} is also often called the variance-covariance matrix, since the diagonal terms are in fact variances.
By comparison, the notation for the cross-covariance matrix between two vectors is

\operatorname{cov}(\mathbf{X}, \mathbf{Y})=\mathrm{K}_{\mathbf{X Y}}=\mathrm{E}\left[(\mathbf{X}-\mathrm{E}[\mathbf{X}])(\mathbf{Y}-\mathrm{E}[\mathbf{Y}])^{\mathrm{T}}\right]

x_1 = [-2.1, -1, 4.3] \\
x_2 = [3.0, 1.1, 0.12]

X = np.stack((x1, x2), axis=0)


\left[\begin{array}{ccc}
-2.1&-1&4.3 \\
3.0&1.1&0.12
\end{array}\right]

x1 = [-2.1, -1,  4.3]
x2 = [3,  1.1,  0.12]
X = np.stack((x1, x2), axis=0)

>>> np.cov(X)
array([[11.71      , -4.286     ], # may vary
[-4.286     ,  2.144133]])

>>> np.cov(x1, x2)
array([[11.71      , -4.286     ], # may vary
[-4.286     ,  2.144133]])

>>> np.cov(x1, bias=False)
array(11.71)

>>> np.cov(x1,bias=True)
array(7.80666667)

>>> np.cov(x,ddof=0)
array(7.80666667)


numpy.cov(m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None, *, dtype=None)[source]

- 当bias参数取默认值时，计算各随机变量的均值时采用(m-1)，其中m为number of observations given in each radom vector (unbiased estimate)。反之，如果设置为True， 则采用m求均值。
- If ddof not None the default value implied by bias is overridden. Note that ddof=1 will return the unbiased estimate, even if both fweights and aweights are specified, and ddof=0 will return the simple average (用随机向量的实际元素个数m求均值). See the notes for the details. The default value is None.

## 3. Pearson相关性系数

R_{i j}=\frac{c_{i j}}{\sqrt{c_{i i} c_{j j}}}

The values of R are between -1 and 1 , inclusive.

Numpy中，可以直接使用numpy.corrcoef函数求得。

Everything not saved will be lost.