updated 13 May 2017

You've got a Backtest. Now What?

  • is this result realistic?
  • have we overfit (fooled ourselves)?
  • what range of values might we expect out of sample?
  • how much confidence do we have in our model?

Monte Carlo and the bootstrap

Sampling from limited information

  • estimate the 'true' properties of a distribution from incomplete information
  • evaluate the likelihood (test the hypothesis) that a particular result is
  • not the result of chance
  • not overfit
  • understand confidence intervals for other descriptive statistics on the backtest
  • simulated different paths that the results might have taken, if the ordering had been different

History of Monte Carlo and bootstrap simulation

  • Laplace was the first to describe the mathematical properties of sampling from a distribution
  • Mahalanobis extended this work in the 1930's to describe sampling from dependent distributions, and anticipated the block bootstrap by examining these dependencies
  • Monte Carlo simulation was developed by Stan Ulam and John von Neumann (with computation by Françoise Ulam) as part of the hydrogen bomb program in 1946 (Richard Rhodes, Dark Sun, p.304)
  • computational implementation of Monte Carlo simulation was constructed by Nicholas Metropolis on the ENIAC and MANIAC machines
  • Metropolis was an author in 1953 of the prior distribution sampler extended by W.K Hastings to the modern Metropolis-Hastings form in 1970
  • Maurice Quenouille and John Tukey developed 'jackknife' simulation in the 1950's
  • Bradley Efron described the modern bootstrap in 1979

Simulation with Daily P&L (without replacement)

Without replacement:

  • results will have the same mean and final P&L, but different path
  • allows inference on likely error bounds of various performance estimates

Without replacement but with in-place perturbation:

  • provides wider variation of the individual returns,
  • e.g. sample without replacement, then perturb the samples with a wide tailed random draw

Empirical Example

  • Bollinger Bands demo from quantstrat
  • only one instrument in the demo
  • levels into positions
  • allows for flat.to.flat and flat.to.reduced trade sizing

Empirical Example, No replacement

nrsim <- mcsim(  Portfolio = "bbands"
               , Account = "bbands"
               , n=1000
               , replacement=FALSE
               , l=1, gap=10)

nrblocksim <-  mcsim(  Portfolio = "bbands"
               , Account = "bbands"
               , n=1000
               , replacement=FALSE
               , l=10, gap=10)

P&L Quantiles:

0% 25% 50% 75% 100%
-Inf 0 0 0.0017 Inf
0% 25% 50% 75% 100%
-Inf 0 0 0.0016 Inf

Empirical Example, No replacement, cont.

Simulation with Daily P&L (with replacement)

  • simple sampling with replacement provides multiple paths
  • block sampling, with replacement
    • mimics some of the autocorrelation structure of returns
    • may create deeper drawdowns if down streaks are effectively repeated
  • choosing block size
    • some multiple of average holding period, 1/5 or 1/4 holding period is a good guess
    • block size equal to observed significant autocorrelation
    • variable distribution of block size centered around one of the above, with tails

Empirical Example, with replacement

rsim <- mcsim(  Portfolio = "bbands"
               , Account = "bbands"
               , n=1000
               , replacement=TRUE
               , l=1, gap=10)
rblocksim <-  mcsim(  Portfolio = "bbands"
               , Account = "bbands"
               , n=1000
               , replacement=TRUE
               , l=10, gap=10)

P&L Quantiles:

0% 25% 50% 75% 100%
-Inf 0 0 0.0013 Inf
0% 25% 50% 75% 100%
-Inf 0 0 0.0012 Inf

Empirical Example, With replacement, cont.

Disadvantages of Sampling from portfolio P&L

  • not transparent
  • potentially unrealistic
  • really only a statistical confidence model
  • path won't line up with historical market regimes

Simulation with per-Symbol P&L

Un-/Lightly Correlated Symbols:

  • if per-symbol returns are not highly correlated, it can make sense to simulate each instrument separately
  • this allows confidence intervals to be constructed on backtest statistics such as MAE/MFE, largest loser, drawdowns, etc.

Correlated Symbols:

  • resample the portfolio time index, as in the portfolio method
  • construct per-symbol P&L by extracting the daily P&L for each symbol using the portfolio time index
  • allows some inference and confidence bands to be drawn on per-symbol basis, deeper analysis than just looking at portfolio resampling results

Dis/advantages of simulating each instrument

  • resampling from symbol equity curves does not respect min/max position constraints
  • may effectively flip long/short
  • holding periods may not reflect backtest dynamics
  • symbol and market correlations need to be considered for realism

Simulation with trades

  • resampling P&L of the trades
  • resampling entries and exits
    • round turn size, direction, duration are sampled from the trades
    • also resample from any flat periods
    • applied in order to market data as new transactions at the then-prevalent price
  • trade expectations in the random-trade model, compared to backtest expectations
    • Drawdowns and tail risk, as in other simulation types

Dis/advantages of bootstrapping trades

Disadvantages:

  • much more complicated to model trade dynamics
  • maintaining constraints e.g. max position

Advantages:

  • can more closely compare strategy to random entries and exits with same overall dynamic
  • creates a distribution around the trading dynamics, not just the daily P&L

  • best for modeling "skill vs. luck"

Empirical Example

nrtxsim <- txnsim( Portfolio = "bbands"
                 , n=100
                 , replacement=FALSE)
## Loading required package: data.table
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:xts':
## 
##     first, last
wrtxsim <- txnsim( Portfolio = "bbands"
                 , n=100
                 , replacement=TRUE)

Comments:

  • without replacement samples identical number of trades, randomizing start date
  • with replacement samples number of trades to get correct total duration
  • entry and exit prices are set at the time of each trade, from current market

Empirical Example, Without replacement

P&L Quantiles:

0% 25% 50% 75% 100%
-19134 -4774 -701 4435 15730

Empirical Example, With replacement

P&L Quantiles:

0% 25% 50% 75% 100%
-19134 -4774 -701 4435 15730

Checking for Overfitting via the simulation

Extras for later

  • outline of trade resampling process
  • using resampled market data, with or without multi-asset dependence, to train or run the system
  • k-fold cross validation
  • combinatorially symmetric cross-validation (CSCV) and probability of backtest overfitting (PBO) from Bailey et al. (2014)

References

Aronson, David. 2006. Evidence-Based Technical Analysis: Applying the Scientific Method and Statistical Inference to Trading Signals. Wiley.

Bailey, David H, and Marcos López de Prado. 2012. “The Sharpe Ratio Efficient Frontier.” Journal of Risk 15 (2): 13. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1821643.

———. 2014. “The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting and Non-Normality.” Journal of Portfolio Management, Forthcoming. http://www.davidhbailey.com/dhbpapers/deflated-sharpe.pdf.

Bailey, David H, Jonathan M Borwein, Marcos López de Prado, and Qiji Jim Zhu. 2014. “The Probability of Backtest Overfitting.” http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2326253.

Harvey, Campbell R., and Yan Liu. 2015. “Backtesting.” SSRN. http://ssrn.com/abstract=2345489.

White, Halbert L. 2000. “System and Method for Testing Prediction Models and/or Entities.” Google Patents. http://www.google.com/patents/US6088676.