Article

Pearson’s Correlation Coefficient (CORREL) Measuring Linear Relationships Between Assets

Pearson’s correlation coefficient, often denoted as ‘r’, is a statistical measure that quantifies the strength and direction of the linear relationship between two continuous variables. In finance, these variables are typically the price series of two different assets (e.g., the closing prices of Stock A and Stock B, or a stock and a market index) over a specified N-period window.

The value of Pearson’s ‘r’ ranges from -1 to +1:

+1 (Perfect Positive Correlation): Indicates a perfect positive linear relationship. When one asset’s price increases, the other asset’s price increases proportionally. They move in perfect lockstep in the same direction.
-1 (Perfect Negative Correlation): Indicates a perfect negative linear relationship. When one asset’s price increases, the other asset’s price decreases proportionally. They move in perfect lockstep but in opposite directions.
0 (No Linear Correlation): Indicates no linear relationship between the movements of the two assets. Their price changes are independent of each other in a linear sense.
Values between 0 and +1 indicate varying degrees of positive linear correlation (e.g., +0.7 is a strong positive correlation, +0.2 is a weak positive correlation).
Values between 0 and -1 indicate varying degrees of negative linear correlation (e.g., -0.7 is a strong negative correlation, -0.2 is a weak negative correlation).

It’s crucial to remember that correlation measures linear association. Two variables could have a strong non-linear relationship but still have a correlation coefficient close to zero. Also, correlation does not imply causation.

Usage: Understanding the correlation between different assets is fundamental in finance for several reasons:

Portfolio Diversification: A cornerstone of modern portfolio theory is diversification, which aims to reduce overall portfolio risk. This is often achieved by combining assets that have low or, ideally, negative correlations with each other. If assets move independently or oppositely, losses in one asset might be offset by gains in another.
Pairs Trading: This strategy involves identifying two assets that have historically exhibited a high correlation. When the correlation temporarily weakens and the prices of these assets diverge significantly from their historical relationship, a trader might go long on the underperforming asset and short on the outperforming asset, betting that their prices will eventually reconverge.
Risk Assessment: Knowing how an asset correlates with broader market indices (e.g., S&P 500) or other risk factors helps in assessing its systematic risk (beta is related to correlation) and how it might behave in different market environments.
Hedging: Correlation can help in finding assets that can be used to hedge an existing position. For example, if you hold an asset, you might short a highly positively correlated asset or go long a highly negatively correlated asset to offset potential losses.

To calculate a rolling correlation, two input price arrays of the same length for the calculation window are required.

TA-Lib Function: The Technical Analysis Library (TA-Lib) provides a function to calculate the rolling Pearson’s correlation coefficient:

talib.CORREL(prices1, prices2, timeperiod=N)

prices1: The first array or series of prices.
prices2: The second array or series of prices. These must be of the same length as prices1.
timeperiod: The lookback period over which to calculate the correlation.

Code Example (Calculation & Plot with yfinance Data):

The following Python code demonstrates how to fetch data for two assets using yfinance, align their price series, calculate their rolling correlation, and then plot one asset’s price along with the correlation coefficient.

import yfinance as yf
import talib
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# --- 1. Data Fetching and Alignment ---
asset1_symbol = "AAPL"  # Example: Apple Inc.
asset2_symbol = "MSFT"  # Example: Microsoft Corp. (or use a market index like "SPY")

data_start_date = "2023-01-01"
data_end_date = "2024-05-01" # Current date for yfinance download

try:
    # Complying with user preference for yfinance download
    data_asset1 = yf.download(asset1_symbol, start=data_start_date, end=data_end_date, auto_adjust=False, progress=False)
    data_asset2 = yf.download(asset2_symbol, start=data_start_date, end=data_end_date, auto_adjust=False, progress=False)

    if data_asset1.empty or data_asset2.empty:
        raise ValueError("Data download failed for one or both assets.")

    # Complying with user preference for droplevel
    if isinstance(data_asset1.columns, pd.MultiIndex) and data_asset1.columns.nlevels > 1:
        data_asset1.columns = data_asset1.columns.droplevel(level=1)
    if isinstance(data_asset2.columns, pd.MultiIndex) and data_asset2.columns.nlevels > 1:
        data_asset2.columns = data_asset2.columns.droplevel(level=1)

    # Align data: Use 'Close' prices and join on date index, then drop NaNs from merged data
    close_prices_asset1 = data_asset1[['Close']].rename(columns={'Close': f'Close_{asset1_symbol}'})
    close_prices_asset2 = data_asset2[['Close']].rename(columns={'Close': f'Close_{asset2_symbol}'})
    
    aligned_data = close_prices_asset1.join(close_prices_asset2, how='inner').dropna()

    if aligned_data.empty or len(aligned_data) < 2: # Need at least 2 points for correlation
        raise ValueError("Not enough overlapping data after alignment.")
        
    prices1 = aligned_data[f'Close_{asset1_symbol}']
    prices2 = aligned_data[f'Close_{asset2_symbol}']
    aligned_date_index = aligned_data.index

except Exception as e:
    print(f"Error in data fetching or alignment: {e}")
    # Exit or use dummy data for plotting structure if desired for testing
    aligned_data = pd.DataFrame() 


if not aligned_data.empty:
    # --- 2. Pearson's Correlation Coefficient (CORREL) Calculation ---
    time_period_correl = 30    # Example: 30-day rolling correlation

    if len(prices1) >= time_period_correl:
        correl_values = talib.CORREL(
            prices1, 
            prices2,
            timeperiod=time_period_correl
        )
        
        indicator_name = f"CORREL({time_period_correl})"
        print(f"\n--- {indicator_name} - Pearson's Correlation ({asset1_symbol} vs {asset2_symbol}) ---")
        
        # TA-Lib's CORREL returns NaNs for the first (timeperiod-1) values
        valid_correl_values = correl_values[~np.isnan(correl_values)]
        if len(valid_correl_values) >= 5:
            print(f"Output {indicator_name} (last 5 valid): {valid_correl_values[-5:].round(3)}")
        elif len(valid_correl_values) > 0:
            print(f"Output {indicator_name} (all valid): {valid_correl_values.round(3)}")
        else:
            print(f"Output {indicator_name}: No valid values calculated (all NaNs).")

        # --- 3. Plotting ---
        # The correl_values array will have NaNs at the beginning. Matplotlib will plot available data.
        # The aligned_date_index corresponds to the full length of prices1, prices2, and correl_values.
        
        fig, axes = plt.subplots(2, 1, figsize=(14, 10), sharex=True,
                                 gridspec_kw={'height_ratios': [2, 1]}) # Price chart taller

        # Plot Price of Asset 1 (for context)
        axes[0].plot(aligned_date_index, prices1, label=f'{asset1_symbol} Close Price', color='blue')
        axes[0].set_title(f'{asset1_symbol} Price and Rolling Correlation with {asset2_symbol}')
        axes[0].set_ylabel(f'{asset1_symbol} Price')
        axes[0].legend(loc='upper left')
        axes[0].grid(True, linestyle=':', alpha=0.6)

        # Plot Correlation Coefficient
        axes[1].plot(aligned_date_index, correl_values, label=indicator_name, color='purple')
        axes[1].axhline(1.0, color='red', linestyle='--', linewidth=0.8, label='Perfect Positive (+1)')
        axes[1].axhline(0, color='gray', linestyle=':', linewidth=0.8, label='No Correlation (0)')
        axes[1].axhline(-1.0, color='green', linestyle='--', linewidth=0.8, label='Perfect Negative (-1)')
        axes[1].set_ylim(-1.1, 1.1) # Correlation ranges from -1 to 1
        axes[1].set_ylabel('Correlation Coefficient')
        axes[1].set_xlabel('Date')
        axes[1].legend(loc='lower left')
        axes[1].grid(True, linestyle=':', alpha=0.6)

        plt.tight_layout()
        plt.show()

    else:
        print(f"\nSkipping CORREL plot: Insufficient aligned data (need >= {time_period_correl} points).")
        if not aligned_data.empty:
             print(f"Available aligned data points: {len(prices1)}.")
else:
    print(f"\nSkipping CORREL plot: Data preparation failed.")

Explanation of the Code:

Import Libraries: Includes yfinance, talib, numpy, matplotlib.pyplot, and pandas.
Data Fetching and Alignment:
- Data for two specified assets (asset1_symbol, asset2_symbol) is downloaded using yf.download(). User preferences auto_adjust=False and droplevel are applied.
- Crucially, the ‘Close’ prices of the two assets are extracted and then aligned using a pd.DataFrame.join(..., how='inner'). This ensures that only dates where both assets have price data are kept. dropna() is called on the merged data to remove any remaining NaNs. This step is vital for a correct correlation calculation.
- Error handling is included for data download and alignment.
Pearson’s Correlation Calculation:
- time_period_correl (e.g., 30) is set for the rolling window.
- A check if len(prices1) >= time_period_correl: ensures enough aligned data points.
- talib.CORREL(prices1, prices2, timeperiod=time_period_correl) calculates the rolling correlation. The output will have NaN values for the first timeperiod - 1 entries.
- The last few valid correlation values are printed.
Plotting:
- A two-panel plot is created using plt.subplots().
- Price Plot (axes[0]): The top subplot displays the closing prices of the first asset (prices1) to provide price context.
- Correlation Plot (axes[1]): The bottom subplot displays the calculated rolling correlation coefficient.
  - Horizontal lines are drawn at +1, 0, and -1 to mark perfect positive, no, and perfect negative correlation levels, respectively.
  - The Y-axis is set from -1.1 to +1.1 to clearly show the correlation range.
- Standard plot elements (titles, labels, legends, grid) enhance readability.
- plt.tight_layout() adjusts subplot spacing.
- plt.show() displays the chart.
Insufficient Data Handling: Messages are printed if data is insufficient at various stages.

By calculating and visualizing the rolling correlation, traders and investors can gain valuable insights into how different assets interact, aiding in diversification strategies, pairs trading idea generation, and overall risk management.