# Calculating covariance matrices with Python

#### Thomas Dickson

#### 2 minute read

Covariance matrices, also known as variance-covariance matrices, are used to quantify the relationship between two or more variables. This is another of my posts where I want to solve a problem using more fundamental numpy linear algebra functions rather than `np.cov`

. The key source for the notation below is from here.

To create a variance-covariance matrix from $\boldsymbol{X}$, which is an $n \times k$ matrix holding ordered sets of raw data, you start with transforming the raw data into deviation scores in matrix $\boldsymbol{x}$.

\[\boldsymbol{x} = \boldsymbol{X} - \boldsymbol{1} \boldsymbol{1'} \boldsymbol{X} (1/n)\]Where:

- $\boldsymbol{1}$ is a $n \times 1$ vector of ones.
- $\boldsymbol{1’}$ is the transpose of $\boldsymbol{1}$.
- $\boldsymbol{x}$ is an $n \times k$ matrix of deviation scores: $x_{11}, x_{12}, …, x_{nk}$.
- $\boldsymbol{X}$ is an $n \times k$ matrix of raw data.

We then need to calculate $\boldsymbol{x’x}$ which is the $k \times k$ deviation sums of squares and cross products matrix for $\boldsymbol{x}$. Then we divide each term in the deviation sums of squares and cross products matrix by $n$ to create the variance-covariance matrix $\boldsymbol{V}$.

\[\boldsymbol{V}=\boldsymbol{x'x}(1/n)\]Where:

- $\boldsymbol{V}$ is a $k \times k$ variance-covariance matrix.
- $\boldsymbol{x’}$ is the transpose of $\boldsymbol{x}$.
- $\boldsymbol{x’x}$ is the deviation sums of squares and cross product matrix.
- n is the number of measurements/rows/scores in each column of the data matrix $\boldsymbol{X}$.

Now we understand how to use linear algebra to calculate the variance-covariance matrix lets look at how we can use Python. Numpy has the function ‘np.cov’ that calculates the covariance matrix. Note that the `bias`

parameter in `np.cov`

normalises by $n$ if true rather than the default setting of $n-1$.

1
2

X = np.array([[1, 2], [3, 6]])
np.cov(X.T, bias=True)

1
2

array([[1., 2.],
[2., 4.]])

The convenience of this function is not enough for me - I want to check that I can get as close to the linear algebra as possible.

1
2
3

x = X - np.dot(np.ones(X.T.shape), X) * 1/x.shape[0]
var_co_var_x = np.dot(x.T, x) * 1/x.shape[0]
var_co_var_x

1
2

array([[1., 2.],
[2., 4.]])

Sweet.