Article

Evaluating Adaptive Kalman Filter Strategy Consistency with Rolling Backtests

A common method for testing a trading strategy is to run a single backtest over a long period. While useful, this can sometimes mask underlying weaknesses. A strategy might perform exceptionally well during a specific market regime (like a bull run) but fail miserably in others, yet the overall result might still look positive.

To gain a deeper, more honest understanding of a strategy’s performance, we can use a rolling backtest. This method involves testing a strategy with a fixed set of parameters over sequential, non-overlapping windows of time. It helps answer a critical question: “Does this strategy work consistently across different market conditions?”

This article breaks down a Python framework designed to perform exactly this type of rolling analysis using backtrader and yfinance.

Section 1: The Rolling Backtest Framework

The core of the system is the run_rolling_backtest function. Instead of running one large backtest, it iterates through time, running many smaller, independent backtests on consecutive windows.

def run_rolling_backtest(
    ticker="BTC-USD",
    start="2018-01-01",
    end="2025-12-31",
    window_months=3,
    strategy_params=None
):
    """
    Runs a backtest on sequential, non-overlapping time windows.
    """
    strategy_params = strategy_params or {}
    all_results = []
    start_dt = pd.to_datetime(start)
    end_dt = pd.to_datetime(end)
    current_start = start_dt

    while True:
        # Define the end of the current window
        current_end = current_start + rd.relativedelta(months=window_months)
        if current_end > end_dt:
            break

        print(f"\nROLLING BACKTEST: {current_start.date()} to {current_end.date()}")

        # Download data for the current window
        data = yf.download(ticker, start=current_start, end=current_end, progress=False)
        if data.empty or len(data) < 90: # Ensure sufficient data
            print("Not enough data.")
            current_start += rd.relativedelta(months=window_months)
            continue

        if isinstance(data.columns, pd.MultiIndex):
            data = data.droplevel(1, 1)

        # Set up and run a standard backtrader backtest for the window
        feed = bt.feeds.PandasData(dataname=data)
        cerebro = bt.Cerebro()
        cerebro.addstrategy(AdaptiveKalmanFilterStrategy, **strategy_params)
        cerebro.adddata(feed)
        cerebro.broker.setcash(100000)
        cerebro.broker.setcommission(commission=0.001)
        cerebro.addsizer(bt.sizers.PercentSizer, percents=95)

        start_val = cerebro.broker.getvalue()
        cerebro.run()
        final_val = cerebro.broker.getvalue()
        ret = (final_val - start_val) / start_val * 100

        # Store the result of this window's backtest
        all_results.append({
            'start': current_start.date(),
            'end': current_end.date(),
            'return_pct': ret,
        })

        print(f"Return: {ret:.2f}% | Final Value: {final_val:.2f}")
        
        # Move the window forward
        current_start += rd.relativedelta(months=window_months)

    return pd.DataFrame(all_results)

The logic is straightforward but powerful:

Iterate Through Time: A while loop moves a time window (e.g., 3 months) from the specified start date to the end date.
Isolate Data: In each iteration, it downloads data only for that specific window.
Run Independent Backtest: It then executes a complete backtrader backtest on that isolated data using a given strategy (in this case, AdaptiveKalmanFilterStrategy) with a fixed set of parameters.
Record Performance: The percentage return for that single window is calculated and stored.
Roll Forward: The window is then “rolled” forward, and the process repeats on the next slice of time.

The final output is a pandas DataFrame where each row represents the performance of the strategy during a specific period, providing a clear history of its consistency.

Section 2: Analyzing the Results

A list of returns is just data. We need tools to transform it into insights. The framework provides two helper functions for this purpose.

def report_stats(df):
    """
    Calculates and prints key performance statistics from the rolling results.
    """
    returns = df['return_pct']
    stats = {
        'Mean Return %': np.mean(returns),
        'Median Return %': np.median(returns),
        'Std Dev %': np.std(returns),
        'Min Return %': np.min(returns),
        'Max Return %': np.max(returns),
        'Sharpe Ratio': np.mean(returns) / np.std(returns) if np.std(returns) > 0 else np.nan
    }
    print("\n=== ROLLING BACKTEST STATISTICS ===")
    for k, v in stats.items():
        print(f"{k}: {v:.2f}")
    return stats


def plot_return_distribution(df):
    """
    Creates a histogram to visualize the distribution of returns.
    """
    sns.set(style="whitegrid")
    plt.figure(figsize=(10, 5))
    sns.histplot(df['return_pct'], bins=20, kde=True, color='dodgerblue')
    plt.axvline(df['return_pct'].mean(), color='black', linestyle='--', label='Mean')
    plt.title('Rolling Backtest Return Distribution')
    plt.xlabel('Return %')
    plt.ylabel('Frequency')
    plt.legend()
    plt.tight_layout()
    plt.show()

These functions are critical for interpretation:

report_stats: This function calculates vital statistics across all the periods.
- Mean Return: The average performance per window.
- Standard Deviation: The most important metric here. It measures the consistency of the returns. A low standard deviation means the strategy performs similarly across different periods. A high value indicates its performance is erratic and unpredictable.
- Sharpe Ratio: This combines the mean and standard deviation to give a measure of risk-adjusted return. For a rolling analysis, a higher and more stable Sharpe Ratio is the goal.
plot_return_distribution: This function creates a histogram, which is a powerful visual tool. It allows you to quickly see:
- Central Tendency: Where do the returns for most periods cluster?
- Skewness: Is the strategy characterized by many small wins and a few large losses, or vice-versa?
- Outliers: Are there extreme winning or losing periods that dominate the average?

Section 3: How to Use and Interpret

The if __name__ == '__main__': block demonstrates how to use the framework. A user can specify their ticker, date range, window size, and the parameters for the strategy they wish to test.

if __name__ == '__main__':
    # Run the rolling backtest with default settings
    # (3-month windows for BTC-USD from 2018 to present)
    df = run_rolling_backtest()

    # Print the results table
    print("\n=== ROLLING BACKTEST RESULTS ===")
    print(df)

    # Calculate and print summary statistics
    stats = report_stats(df)
    
    # Visualize the return distribution
    plot_return_distribution(df)

The Strategy: Adaptive Kalman Filter

Bakctrader strategy class for the adaptive Kalman Filter that we have seen before:

class AdaptiveKalmanFilterStrategy(bt.Strategy):
    # declare plot‐lines and subplots
    lines = (
        'kf_price',
        'kf_velocity',
        'adaptive_R',
        'adaptive_Q0',
        'adaptive_Q1',
    )
    plotlines = dict(
        kf_price    = dict(_name='KF Price',    subplot=False),
        kf_velocity = dict(_name='KF Velocity', subplot=True),
        adaptive_R  = dict(_name='R',           subplot=True),
        adaptive_Q0 = dict(_name='Q[0,0]',      subplot=True),
        adaptive_Q1 = dict(_name='Q[1,1]',      subplot=True),
    )

    params = dict(
        vol_period     = 20,
        delta          = 1e-4,
        R_base         = 0.1,
        R_scale        = 1.0,
        Q_scale_factor = 0.5,
        initial_cov    = 1.0,
        printlog       = False,
    )

    def log(self, txt, dt=None, doprint=False):
        if self.params.printlog or doprint:
            dt = dt or self.datas[0].datetime.date(0)
            print(f'{dt.isoformat()} {txt}')

    def __init__(self):
        # data
        self.data_close = self.datas[0].close

        # ——— Kalman state & matrices ———
        self.x = np.zeros(2)  # [level, velocity]
        self.P = np.eye(2) * self.params.initial_cov
        self.F = np.array([[1., 1.],
                           [0., 1.]])
        self.H = np.array([[1., 0.]])
        self.I = np.eye(2)
        self.initialized = False

        # Initialize Q and R so they'll exist before first next()
        self.Q = np.eye(2) * self.params.delta
        self.R = self.params.R_base

        # ——— Indicators ———
        # 1-bar log returns
        self.log_returns = LogReturns(self.data_close, period=1)
        # rolling volatility
        self.volatility  = bt.indicators.StandardDeviation(
            self.log_returns.logret,
            period=self.params.vol_period
        )

    def _initialize_kalman(self, price):
        self.x[:] = [price, 0.0]
        self.P    = np.eye(2) * self.params.initial_cov
        self.initialized = True
        self.log(f'KF initialized at price={price:.2f}', doprint=True)

    def next(self):
        price = self.data_close[0]

        # —— wait for enough bars to init KF & vol —— 
        if not self.initialized:
            if len(self) > self.params.vol_period and not np.isnan(self.volatility[0]):
                self._initialize_kalman(price)
            return

        vol = self.volatility[0]
        # if vol or price is NaN, push NaNs to keep plot aligned
        if np.isnan(vol) or np.isnan(price):
            for ln in self.lines:
                getattr(self.lines, ln)[0] = np.nan
            return

        # ——— Predict ———
        self.x = self.F.dot(self.x)
        self.P = self.F.dot(self.P).dot(self.F.T) + self.Q

        # ——— Adapt Q & R ———
        vol = max(vol, 1e-8)
        self.R = self.params.R_base * (1 + self.params.R_scale * vol)
        qvar = self.params.delta * (1 + self.params.Q_scale_factor * vol**2)
        self.Q = np.diag([qvar, qvar])

        # ——— Update ———
        y = price - (self.H.dot(self.x))[0]
        S = (self.H.dot(self.P).dot(self.H.T))[0, 0] + self.R
        K = self.P.dot(self.H.T) / S
        self.x = self.x + (K.flatten() * y)
        self.P = (self.I - K.dot(self.H)).dot(self.P)

        # ——— Record lines ———
        self.lines.kf_price[0]    = self.x[0]
        self.lines.kf_velocity[0] = self.x[1]
        self.lines.adaptive_R[0]  = self.R
        self.lines.adaptive_Q0[0] = self.Q[0, 0]
        self.lines.adaptive_Q1[0] = self.Q[1, 1]

        # ——— Trading: full long & short ———
        vel = self.x[1]
        if not self.position:
            if vel > 0:
                self.log(f'BUY (vel={vel:.4f})')
                self.buy()
            elif vel < 0:
                self.log(f'SELL SHORT (vel={vel:.4f})')
                self.sell()
        elif self.position.size > 0 and vel < 0:
            self.log(f'CLOSE LONG & SELL SHORT (vel={vel:.4f})')
            self.close(); self.sell()
        elif self.position.size < 0 and vel > 0:
            self.log(f'CLOSE SHORT & BUY LONG (vel={vel:.4f})')
            self.close(); self.buy()

    def stop(self):
        self.log(f'Ending Portfolio Value: {self.broker.getvalue():.2f}', doprint=True)

BTC-USD

ETH_USD

SOL-USD

Interpreting the Output:

When analyzing the results, you are looking for signs of a robust strategy:

A positive mean return that is reasonably higher than the median.
A low standard deviation of returns, indicating consistent performance.
A high Sharpe Ratio (ideally > 1.0), suggesting good returns for the amount of risk taken.
A return distribution that is roughly bell-shaped or slightly skewed to the right (many small wins, few large wins) and has a small spread.

By testing a strategy across many different market environments, this rolling backtest framework provides a much more rigorous and honest assessment of its potential, helping traders build more durable and reliable automated systems.