Covariance Matrix Estimation

Overview

Accurate covariance estimation is critical for portfolio optimization and risk management. The challenge is balancing responsiveness with stability.

Exponential Weighting

Most common approach: weight recent observations more heavily.

\[\Sigma_t = (1-\lambda) \sum_{i=0}^{\infty} \lambda^i r_{t-i} r_{t-i}^T\]

Typical half-life: 21-63 days ($\lambda = 1 - \exp(-\log(2)/HL)$)

Advantages:

Simple to implement
Automatically adapts to changing volatility
No arbitrary lookback window

Disadvantages:

Can be too reactive during volatile periods
May underweight important historical regimes

Shrinkage

Combine sample covariance with structured estimator:

\[\Sigma_{shrunk} = \delta \Sigma_{target} + (1-\delta) \Sigma_{sample}\]

Common targets:

Identity matrix (equal correlations of zero)
Constant correlation model
Factor model

Ledoit-Wolf optimal shrinkage:

\[\delta^* = \frac{\text{Var}(\Sigma_{sample})}{\text{MSE}(\Sigma_{sample}, \Sigma_{target})}\]

Factor Models

For large universes, use factor structure:

\[\Sigma = B F B^T + D\]

where $B$ are factor loadings, $F$ is factor covariance, and $D$ is idiosyncratic risk.

Advantages:

Reduces dimensionality
More stable estimates
Economic interpretation

Disadvantages:

Requires factor selection
May miss important pair-wise relationships

Practical Implementation

import numpy as np
import pandas as pd

def exp_weighted_cov(returns, halflife=21):
    """
    Calculate exponentially weighted covariance matrix.
    
    Parameters
    ----------
    returns : pd.DataFrame
        Asset returns (dates x assets)
    halflife : float
        Half-life for exponential weighting
        
    Returns
    -------
    pd.DataFrame
        Covariance matrix
    """
    # Decay factor
    alpha = 1 - np.exp(-np.log(2) / halflife)
    
    # Exponentially weighted covariance
    cov = returns.ewm(alpha=alpha, min_periods=252).cov()
    
    # Return most recent estimate
    return cov.loc[returns.index[-1]]

Choosing Half-Life

Shorter half-life (10-21 days):

More responsive to regime changes
Higher turnover
Risk of overreacting to noise

Longer half-life (42-126 days):

More stable
Lower turnover
Risk of missing regime changes

Recommendation: Use 21-42 days for futures, with adaptive scaling for regime changes.