The Perils of estimating Correlation directly from historical Returns

Assume that you need to estimate the correlation between two assets A and B.  Maybe you need this estimate for risk management, portfolio allocation or some other purpose.  You decide that five years of monthly returns will be used to estimate correlation.  The rationale for using five years is that it provides a reasonable number of data points without going too far back to a time when the correlation structure most likely was different.

Unbeknownst to you, the true distribution of returns for both assets is normal with a mean of 8.0% and a standard deviation of 20.0%.  Furthermore, the assets have a constant correlation of zero.

Using simulation methods, we can study the distribution of historical 5-year correlations that may arise despite the true correlation of zero.  We perform the following process 100,000 times:

  1. Simulate five years (60 months) of returns for each asset assuming zero correlation
  2. Calculate the sample correlation between the two return series over the five year period

The resulting estimates are highly variable.  95% of observations fall between -0.21 and 0.21.  The minimum correlation estimate in the simulation was -0.54 and the maximum correlation estimate was 0.59.

The full distribution of correlation estimates over the simulation is presented in the following histogram.

This high degree of correlation estimation error exists when the return structure is highly simplified (normally distributed, constant correlation).  The correlation estimation problem becomes even more pressing when more realistic assumptions are included:

  • Correlation regime changes
  • Non-linear return relationships (think correlations going to 1 in market crashes)
  • Non-normality

In particular, correlation regime changes and non-linear return relationships shorten the historical period that is appropriate for estimating correlations, which only exacerbates estimation error.