Calculating covariance matrices with Python

Thomas Dickson

2 minute read



Covariance matrices, also known as variance-covariance matrices, are used to quantify the relationship between two or more variables. This is another of my posts where I want to solve a problem using more fundamental numpy linear algebra functions rather than np.cov. The key source for the notation below is from here.

To create a variance-covariance matrix from $\boldsymbol{X}$, which is an $n \times k$ matrix holding ordered sets of raw data, you start with transforming the raw data into deviation scores in matrix $\boldsymbol{x}$.

\[\boldsymbol{x} = \boldsymbol{X} - \boldsymbol{1} \boldsymbol{1'} \boldsymbol{X} (1/n)\]

Where:

We then need to calculate $\boldsymbol{x’x}$ which is the $k \times k$ deviation sums of squares and cross products matrix for $\boldsymbol{x}$. Then we divide each term in the deviation sums of squares and cross products matrix by $n$ to create the variance-covariance matrix $\boldsymbol{V}$.

\[\boldsymbol{V}=\boldsymbol{x'x}(1/n)\]

Where:

Now we understand how to use linear algebra to calculate the variance-covariance matrix lets look at how we can use Python. Numpy has the function ‘np.cov’ that calculates the covariance matrix. Note that the bias parameter in np.cov normalises by $n$ if true rather than the default setting of $n-1$.

1
2
X = np.array([[1, 2], [3, 6]])
np.cov(X.T, bias=True)
1
2
array([[1., 2.],
       [2., 4.]])

The convenience of this function is not enough for me - I want to check that I can get as close to the linear algebra as possible.

1
2
3
x = X - np.dot(np.ones(X.T.shape), X) * 1/x.shape[0]
var_co_var_x = np.dot(x.T, x) * 1/x.shape[0]
var_co_var_x
1
2
array([[1., 2.],
       [2., 4.]])

Sweet.