← Back to Home
Pearson’s Correlation Coefficient (CORREL) Measuring Linear Relationships Between Assets

Pearson’s Correlation Coefficient (CORREL) Measuring Linear Relationships Between Assets

Pearson’s correlation coefficient, often denoted as ‘r’, is a statistical measure that quantifies the strength and direction of the linear relationship between two continuous variables. In finance, these variables are typically the price series of two different assets (e.g., the closing prices of Stock A and Stock B, or a stock and a market index) over a specified N-period window.

The value of Pearson’s ‘r’ ranges from -1 to +1:

It’s crucial to remember that correlation measures linear association. Two variables could have a strong non-linear relationship but still have a correlation coefficient close to zero. Also, correlation does not imply causation.

Usage: Understanding the correlation between different assets is fundamental in finance for several reasons:

To calculate a rolling correlation, two input price arrays of the same length for the calculation window are required.

TA-Lib Function: The Technical Analysis Library (TA-Lib) provides a function to calculate the rolling Pearson’s correlation coefficient:

talib.CORREL(prices1, prices2, timeperiod=N)

Code Example (Calculation & Plot with yfinance Data):

The following Python code demonstrates how to fetch data for two assets using yfinance, align their price series, calculate their rolling correlation, and then plot one asset’s price along with the correlation coefficient.

import yfinance as yf
import talib
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# --- 1. Data Fetching and Alignment ---
asset1_symbol = "AAPL"  # Example: Apple Inc.
asset2_symbol = "MSFT"  # Example: Microsoft Corp. (or use a market index like "SPY")

data_start_date = "2023-01-01"
data_end_date = "2024-05-01" # Current date for yfinance download

try:
    # Complying with user preference for yfinance download
    data_asset1 = yf.download(asset1_symbol, start=data_start_date, end=data_end_date, auto_adjust=False, progress=False)
    data_asset2 = yf.download(asset2_symbol, start=data_start_date, end=data_end_date, auto_adjust=False, progress=False)

    if data_asset1.empty or data_asset2.empty:
        raise ValueError("Data download failed for one or both assets.")

    # Complying with user preference for droplevel
    if isinstance(data_asset1.columns, pd.MultiIndex) and data_asset1.columns.nlevels > 1:
        data_asset1.columns = data_asset1.columns.droplevel(level=1)
    if isinstance(data_asset2.columns, pd.MultiIndex) and data_asset2.columns.nlevels > 1:
        data_asset2.columns = data_asset2.columns.droplevel(level=1)

    # Align data: Use 'Close' prices and join on date index, then drop NaNs from merged data
    close_prices_asset1 = data_asset1[['Close']].rename(columns={'Close': f'Close_{asset1_symbol}'})
    close_prices_asset2 = data_asset2[['Close']].rename(columns={'Close': f'Close_{asset2_symbol}'})
    
    aligned_data = close_prices_asset1.join(close_prices_asset2, how='inner').dropna()

    if aligned_data.empty or len(aligned_data) < 2: # Need at least 2 points for correlation
        raise ValueError("Not enough overlapping data after alignment.")
        
    prices1 = aligned_data[f'Close_{asset1_symbol}']
    prices2 = aligned_data[f'Close_{asset2_symbol}']
    aligned_date_index = aligned_data.index

except Exception as e:
    print(f"Error in data fetching or alignment: {e}")
    # Exit or use dummy data for plotting structure if desired for testing
    aligned_data = pd.DataFrame() 


if not aligned_data.empty:
    # --- 2. Pearson's Correlation Coefficient (CORREL) Calculation ---
    time_period_correl = 30    # Example: 30-day rolling correlation

    if len(prices1) >= time_period_correl:
        correl_values = talib.CORREL(
            prices1, 
            prices2,
            timeperiod=time_period_correl
        )
        
        indicator_name = f"CORREL({time_period_correl})"
        print(f"\n--- {indicator_name} - Pearson's Correlation ({asset1_symbol} vs {asset2_symbol}) ---")
        
        # TA-Lib's CORREL returns NaNs for the first (timeperiod-1) values
        valid_correl_values = correl_values[~np.isnan(correl_values)]
        if len(valid_correl_values) >= 5:
            print(f"Output {indicator_name} (last 5 valid): {valid_correl_values[-5:].round(3)}")
        elif len(valid_correl_values) > 0:
            print(f"Output {indicator_name} (all valid): {valid_correl_values.round(3)}")
        else:
            print(f"Output {indicator_name}: No valid values calculated (all NaNs).")

        # --- 3. Plotting ---
        # The correl_values array will have NaNs at the beginning. Matplotlib will plot available data.
        # The aligned_date_index corresponds to the full length of prices1, prices2, and correl_values.
        
        fig, axes = plt.subplots(2, 1, figsize=(14, 10), sharex=True,
                                 gridspec_kw={'height_ratios': [2, 1]}) # Price chart taller

        # Plot Price of Asset 1 (for context)
        axes[0].plot(aligned_date_index, prices1, label=f'{asset1_symbol} Close Price', color='blue')
        axes[0].set_title(f'{asset1_symbol} Price and Rolling Correlation with {asset2_symbol}')
        axes[0].set_ylabel(f'{asset1_symbol} Price')
        axes[0].legend(loc='upper left')
        axes[0].grid(True, linestyle=':', alpha=0.6)

        # Plot Correlation Coefficient
        axes[1].plot(aligned_date_index, correl_values, label=indicator_name, color='purple')
        axes[1].axhline(1.0, color='red', linestyle='--', linewidth=0.8, label='Perfect Positive (+1)')
        axes[1].axhline(0, color='gray', linestyle=':', linewidth=0.8, label='No Correlation (0)')
        axes[1].axhline(-1.0, color='green', linestyle='--', linewidth=0.8, label='Perfect Negative (-1)')
        axes[1].set_ylim(-1.1, 1.1) # Correlation ranges from -1 to 1
        axes[1].set_ylabel('Correlation Coefficient')
        axes[1].set_xlabel('Date')
        axes[1].legend(loc='lower left')
        axes[1].grid(True, linestyle=':', alpha=0.6)

        plt.tight_layout()
        plt.show()

    else:
        print(f"\nSkipping CORREL plot: Insufficient aligned data (need >= {time_period_correl} points).")
        if not aligned_data.empty:
             print(f"Available aligned data points: {len(prices1)}.")
else:
    print(f"\nSkipping CORREL plot: Data preparation failed.")
Pasted image 20250604210557.png

Explanation of the Code:

  1. Import Libraries: Includes yfinance, talib, numpy, matplotlib.pyplot, and pandas.
  2. Data Fetching and Alignment:
    • Data for two specified assets (asset1_symbol, asset2_symbol) is downloaded using yf.download(). User preferences auto_adjust=False and droplevel are applied.
    • Crucially, the ‘Close’ prices of the two assets are extracted and then aligned using a pd.DataFrame.join(..., how='inner'). This ensures that only dates where both assets have price data are kept. dropna() is called on the merged data to remove any remaining NaNs. This step is vital for a correct correlation calculation.
    • Error handling is included for data download and alignment.
  3. Pearson’s Correlation Calculation:
    • time_period_correl (e.g., 30) is set for the rolling window.
    • A check if len(prices1) >= time_period_correl: ensures enough aligned data points.
    • talib.CORREL(prices1, prices2, timeperiod=time_period_correl) calculates the rolling correlation. The output will have NaN values for the first timeperiod - 1 entries.
    • The last few valid correlation values are printed.
  4. Plotting:
    • A two-panel plot is created using plt.subplots().
    • Price Plot (axes[0]): The top subplot displays the closing prices of the first asset (prices1) to provide price context.
    • Correlation Plot (axes[1]): The bottom subplot displays the calculated rolling correlation coefficient.
      • Horizontal lines are drawn at +1, 0, and -1 to mark perfect positive, no, and perfect negative correlation levels, respectively.
      • The Y-axis is set from -1.1 to +1.1 to clearly show the correlation range.
    • Standard plot elements (titles, labels, legends, grid) enhance readability.
    • plt.tight_layout() adjusts subplot spacing.
    • plt.show() displays the chart.
  5. Insufficient Data Handling: Messages are printed if data is insufficient at various stages.

By calculating and visualizing the rolling correlation, traders and investors can gain valuable insights into how different assets interact, aiding in diversification strategies, pairs trading idea generation, and overall risk management.