Inefficient Markets VII: - HEC

Inefficient Markets VII:
predictability
Damien Challet
[email protected]
October 29, 2014
Damien Challet
Inefficient Markets VII:
Previous episodes
Adaptive Complex Market Hypothesis
Models and mathematical methods for
Heterogeneity
Learning
Interaction
Price predictability
Damien Challet
Inefficient Markets VII:
Dynamics of predictability
Geanakoplos&Farmer (2008)
Persistence?
Damien Challet
Inefficient Markets VII:
Predictability: where from?
Wrong backtesting
Finite buying power
Market impact
Behavioural biases
Lazyness or ingenuity
Backtesting difficulties
The Rise of the Machines
Legal contraints
Trading constraints
Who cares?
Damien Challet
Inefficient Markets VII:
From ideas to trading:
1
Data
2
Backtesting
3
Portfolio of strategies
4
Build portfolio draft
5
Risk management → final portfolio
6
Trading
Damien Challet
Inefficient Markets VII:
What can go wrong
1
Data
2
Backtesting
3
Portfolio of strategies
4
Build portfolio draft
5
Risk management → final portfolio
6
Trading
Damien Challet
Inefficient Markets VII:
How Science works
1
Acquire data
2
Lots of data
3
Find patterns: graphs, statistics
4
Design a model, test it, validate it
5
New data that invalidate the model: find a new model
Damien Challet
Inefficient Markets VII:
How Making Money works
1
Acquire data
2
Lots of data
3
Find patterns: graphs, statistics
4
Design a trading strategy
5
Test it, validate it
6
New data that invalidate the strategy: find a new strategy
Damien Challet
Inefficient Markets VII:
Theory vs practice
Theoretical [mathematical] finance
1
Assumptions, mostly wrong
2
Fancy theorems
3
Works some of the time
NOTA BENE:
Always respect
the nature of
of real markets
Damien Challet
Inefficient Markets VII:
Wrong assumptions: example I
GARCH models
fit a GARCH model to real data
find negative volatility
jumps
long-range memory
Solution:
other volatility models (Heston, etc)
include long-range memory: FIGARCH, Zumback&Lynch
Damien Challet
Inefficient Markets VII:
Example I: volatility response function
Power-law decay, bumps at typical human time scales
Taken from Zumbach and Lynch (2001)
Damien Challet
Inefficient Markets VII:
Example I: volatility mugshots
Time reversal asymmetry:
vol at large time horizons influence smaller time horizons
taken from Borland et al. (2005)
Damien Challet
Inefficient Markets VII:
Wrong assumptions: example II
Risk management
cf 2007,2008 crisis
Speculation
wrong tools: EMA?
missed opportunities
you are the fish: cf option pricing
Damien Challet
Inefficient Markets VII:
Nature of markets: Random walks
Price returns
log pt+1 = log pt + rt+1
IID returns: pt ∼ random walk
Log-normal RW
(pt −p1 )2
1
e− 2σ ²t
P(log pt ) = √
2πtσ
Diffusion
E[(pt − p1 )2 ] ∝ σ 2 t
√
E[|pt − p1 |] ∝ σ t
Damien Challet
Inefficient Markets VII:
Predictability detection
Biased Gaussian random walk (yt = xt rt,1 )
E1 (y) = µ
∆t
∑ yt ' µ∆t ±
√
σ ∆t
t=1
Time needed to detect bias
√
µ∆t ≥ K σ ∆t
√
√
σ 1
∆t ≥ K
∝
µ
S
Statistics point of view
√ µ
∆t √
σ
Damien Challet
∼ t-stat
Inefficient Markets VII:
Sharpe ratio
Sharpe ratio (1994)
Sharpe ratio ≡ signal-to-noise ratio
Raw
Et →t (xr)
S = p0 1
var (xr)
With respect to benchmark B
Et →t (xr − rB )
SB = p0 1
rB : benchmark return
var (xr − rB )
Damien Challet
Inefficient Markets VII:
Improving a Sharpe ratio: compressed sensing
from wired.com/magazine/2010/02/ff_algorithm/all/1
Damien Challet
Inefficient Markets VII:
Improving a Sharpe ratio: Karhunen-Loeve Transform
From C. Maccone, Deep Space Flight and Communications:
Exploiting the Sun as a Gravitational Lens
Damien Challet
Inefficient Markets VII:
Nature of markets: mean-reverting and trending
Mizuno et al. (2007)
1
Autoregressive fit
Pt = Pt + ηt
where
K
Pt =
∑ wk Pt−k
optimal moving average
k=1
E(ηt ηt0 ) = σε δt,t0
wk : Yule-Walker equations, ensures uncorrelated noise
2
Moving average of moving average
Φt =
3
1 M−1
∑ Pt−τ
M τ=0
Is P attracted or repulsed by Φt ?
Damien Challet
Inefficient Markets VII:
Detecting mean-reverting and trending
Empirical fact:
Pt+1 − Pt = −
bt
(Pt − Φt ) + ft
M−1
where Et (f ) = 0
Damien Challet
Inefficient Markets VII:
Detecting mean-reverting and trending
Empirical fact:
Pt+1 − Pt = −
bt 1 d
(Pt − Φt )2 + ft
M − 1 2 dPt
where Et (f ) = 0
Damien Challet
Inefficient Markets VII:
Detecting mean-reverting and trending
measure of bt : long memory
bt < 0 → over-diffusion at short times
bt > 0 → under-diffusion at short times
Damien Challet
c.f. variance ratio tests
Inefficient Markets VII:
Alway respect the underlying process
Over/under-diffusive processes (approximate generalization)
Biased Gaussian random walk (yt = xt rt,1 )
E1 (y) = µ
∆t
∑ yt ' µ∆t ± (σ ∆t)H
t=1
Time needed to detect bias
µ∆t ≥ K(σ ∆t)H
(∆t)1−H ≥ K
σH 1
∝
µ
S
Statistics point of view
(∆t)1−H
Damien Challet
µ
DH
∼ t-stat
Inefficient Markets VII:
How to measure H?
Hurst exponent: tricky
R: >7 methods
8-th: Mizuno
Variance ratio tests
H determines the style of strategies (trend following vs
mean-reverting)
Damien Challet
Inefficient Markets VII:
Strategy: definition
A trading stragegy xt
xt = position to hold at time t
Simplest case: xt ∈ {−1, 0, +1}
Example:
signal st ,
(
+1 if st > +θ
xt =
−1 if st < −θ
θ = 0 first.
signals si,t (BEWARE)
xt = ∑ ai,k si,t−k
i,k
OR
xt = ∑ Θ(si,t − θ ) − Θ(θ − si,t )
i
Backtest gain xt rt
Damien Challet
Inefficient Markets VII:
Strategy goodness: measures
Backtesting: “what if?”
xt : strategy, position at time t
log-gain from t0 to t1 :
t1
Gt0 →t1 =
∑ xt rt,1
t=t0
Performance measures of x
percentage of positive trades
φ=
t1
1
θ (xt rt,1 )
∑
t1 − t0 t=t0
gain ratio
γ=
Damien Challet
t1
xr
∑t=t
0 t t,1
t1
∑t=t0 |xt rt,1 |
Inefficient Markets VII:
Backtesting
Freeman, J. D. Behind the smoke and mirrors: Gauging the integrity
of investment simulations. Financial Analysts Journal (1992), 26–31
Leinweber, D. J. Stupid data miner tricks: overfitting the S&P 500.
The Journal of Investing 16, 1 (2007), 15–22.
Damien Challet
Inefficient Markets VII:
Backtesting
Methodological problem?
IN sample only: Leinweber, Nerds on Wall Street: Math,
Machines and Wired Markets (2009)
Bangladesh butter production with S&P500 next returns:
R2 = 0.75:
if it was up 1%, the S&P 500 was up 2% the next year.
Conversely, if butter production was down 10%, you could predict
the S&P 500 would be down 20%.
Bangladesh butter production + US cheese production:
R2 = 0.95
Bangladesh butter production + US cheese production +
Bangladesh sheep population:
R2 = 0.99
OUT sample: R2 = 0:
Damien Challet
OVERFITTING
Inefficient Markets VII:
Backtesting
IN and OUT samples
Sliding windows
Alternate windows
Add
Transaction costs
Market impact
Other costs
Also
Wrong data
Subtly wrong data
Trading problems
Damien Challet
Inefficient Markets VII:
Google Trends: Search Volume Index
Damien Challet
Inefficient Markets VII:
GT: example
Damien Challet
Inefficient Markets VII:
Google Hedge Fund
In “Googled: The End of the World As We Know It”, Ken Auletta
Sergey Brin: "We should run a hedge fund."
Eric Schmidt: "Sergey, among your many ideas, this is the worst"
Sergey Brin: "No, we can do it because we have so much
information."
Eric Schmidt: “[...] legal complications [...] NO!”
Damien Challet
Inefficient Markets VII:
GT and predictability: claims
Keywords: ticker, company names
Bordino et al. 2011,
increase in SVI → increase of traded volume
Da et al 2013, [2004-2008]
increase in SVI → higher stock prices in the next 2 weeks
Joseph et al 2011 [2005-2008],
increase in SVI → higher stock prices in the next week
Takeda et al. 2013 [2008-2011]:
weak for future returns, strong for future volume
Kristoufek 2013 [2004-2013]:
portfolio weight ∼ SVI−α
Preis et al 2013 [2004-2011]:
fancy keywords;
relative increase in SVI → lower index in the next week
Damien Challet
Inefficient Markets VII:
Counter-example
Damien Challet
Inefficient Markets VII:
A practitioner point of view
1
Trading strategies
2
Backtest period
3
Assets
4
Keywords
5
Download GT data
6
Timescale of returns
7
Parameters
8
Input GT data only,
9
Input past returns only
10
Input both
11
Compare.
Damien Challet
Inefficient Markets VII:
Prediction: past returns vs GT data
Nowcasting: Choi and Varian (2009)
Forecasting: Da et al. (2013)
Damien Challet
Inefficient Markets VII:
A practitioner point of view
1
Trading strategies
2
Backtest period
3
Assets
4
Keywords
5
Download GT data
6
Timescale of returns
7
Parameters
8
Input GT data only,
9
Input past returns only
10
Input both
11
Compare.
Damien Challet
Inefficient Markets VII:
1. Trading strategies
Linear methods
Conditional predictability
Ensemble learning methods
Damien Challet
Inefficient Markets VII:
2+3 Backtest period, assets
2. Backtest period
Whole period
Sliding in/out-of-sample periods
3. Choice of assets
Index components: S&P100
Damien Challet
Inefficient Markets VII:
4. Keywords
Recipe for disaster:
1
Think of finance-related keywords
finance, debt, CDS, bonds, crisis
2
Use Google Sets:
finance → marketing, real estate, insurance, accounting,
debt consolidation, investing,
[...]
Damien Challet
Inefficient Markets VII:
4. Keywords: example
Preis et al. (2013): contrarian strategy
Damien Challet
Inefficient Markets VII:
4. Keywords: null hypothesis?
1
100 classic cars
2
100 classic arcade video games
3
200 classic illnesses/ailments
keyword
t-stat
keyword
t-stat
keyword
t-stat
multiple sclerosis
-2.1
Chevrolet Impala
-1.9
Moon Buggy
-2.1
muscle cramps
-1.9
Triumph 2000
-1.9
Bubbles
-2.0
premenstrual syndrome
-1.8
Jaguar E-type
-1.7
Rampage
-1.7
alopecia
2.2
Iso Grifo
1.7
Street Fighter
2.3
gout
2.2
Alfa Romeo Spider
1.7
Crystal Castles
2.4
bone cancer
2.4
Shelby GT 500
2.4
Moon Patrol
2.7
Damien Challet
Inefficient Markets VII:
4. Keywords
−1
0
1
t−stat
2
3
debt
Moon Patrol
0
50
100
150
k
IN SAMPLE
Damien Challet
Inefficient Markets VII:
200
4. Keywords: example
Games
0.95
0.90
cumulated performance
0.85
1.00
0.95
0.90
0.80
0.80
0.85
cumulated performance
1.05
1.00
Cars
2004
2006
2008
2010
2012
2004
2006
2012
Preis et al.
1.05
cumulated performance
0.95
1.00
1.05
1.00
0.95
0.90
0.85
cumulated performance
2010
1.10
Illnesses
2008
2004
2006
2008
2010
2012
Damien Challet
2004
2006
2008
2010
Inefficient Markets VII:
2012
4. Keywords
KISS:
Symbols
Company names
Key products
Damien Challet
Inefficient Markets VII:
5. GT data
1
Weekly
2
Starts in 2004
3
Data not available before 2008-08
4
File format change in 2012-01
before
Nov 27 2005,
Dec 4 2005,
after
2005-11-27 2005-12-04 -
1.14, 5%
1.00, 5%
2005-12-03,31
2005-12-10,28
Damien Challet
Inefficient Markets VII:
5. GT daily data
AAPL 20081201−20090430
20
40
60
20
80
100
20
40
40
60
60
80
100
20
80
40
60
80
100
100
AAPL 20081201−20090430
Jan
Mar
May
Damien Challet
Jan
Inefficient Markets VII:
Mar
May
5. GT daily data: delay
(downloaded 2014-01-20, 09:02:00 UTC)
Damien Challet
Inefficient Markets VII:
GT+returns
GT
returns
1.6
1.4
1.2
1.0
cumulated performance
1.8
2.0
Prediction: binary inputs
2006
2008
Damien Challet
2010
2012
Inefficient Markets VII:
2014
Backtest: GT + returns
2.0
1.5
1.0 1.0
0.0
net exposure
gross exposure
0 1 2 3 4 5 −1.0
80
# stocks
40
0
performance
GT data + price returns
2006
2008
Damien Challet
2010
2012
Inefficient Markets VII:
Prediction: GT data only
2.0
1.5
1.0 1.0
0.0
net exposure
gross exposure
0 1 2 3 4 5 −1.0
80
# stocks
40
0
performance
GT data
2006
2008
Damien Challet
2010
2012
Inefficient Markets VII:
Prediction: returns only
2.0
1.5
1.0 1.0
0.0
net exposure
gross exposure
0 1 2 3 4 5 −1.0
80
# stocks
40
0
performance
Price returns
2006
2008
Damien Challet
2010
2012
Inefficient Markets VII:
0
0
80
2006
2008
2010
2012
Damien Challet
40
80
# stocks
40
# stocks
gross exposure
0.0
2.0
1.0 1.0
1.5
2.0
performance
1.5
performance
1.0 1.0
net exposure
0 1 2 3 4 5 −1.0
gross exposure
0.0
net exposure
0 1 2 3 4 5 −1.0
Prediction: comparison
GT data
Price returns
2006
Inefficient Markets VII:
2008
2010
2012
Market state: Clustering
N objects i = 1, · · · , N
T properties xi,t t = 1, · · · , T
Normalisation E(xi ) = 0, E(xi2 ) = 1
Group objects in K clusters si ∈ 1, · · · , K
Similarity measure?
Damien Challet
Inefficient Markets VII:
Clustering
K-means
Fix K
Find si ∈ {1, · · · , K} that minimise cost function
H = ∑ ∑ δsi ,s (Xs − xi )2
s
i
1
Xs = ∑ δsi ,s xi
ns i
ns = ∑ δsi ,s
number of objects in cluster s
i
Minimisation?
Value of K?
Damien Challet
Inefficient Markets VII:
Maximum likelihood clustering
Marsili (2003)
Correlation matrix Cij = E(xi xj ) ≥ 0
Cij has O(N 2 ) coefficients (too many)
Clustering by correlations: dimensionality reduction
Cluster = objects with ∼ same cross-correlation
Ansatz: C diagonal by blocks


1 i = j
Ci,j = cs si = sj , i 6= j


0 si 6= sj
ns = ∑ δs,si
i
cs = ∑ δs,si δs,sj Ci,j
i,j
Damien Challet
Inefficient Markets VII:
Clustering
Stochastic model for xi,t
√
gsi ηsi ,t + εi,t
x˜ i,t = p
1 + gsi
η and ε iid and ∼ N (0, 1)
Cross-correlation inside cluster s
Cs =
gs δsi, sj + δi,j
1 + gsi
Model of time series given by
G = {g1 , · · · , gK } how many clusters, correlation
S = {s1 , · · · , sN } cluster attribution
Damien Challet
Inefficient Markets VII:
Clustering
Model of time series given by
G = {g1 , · · · , gK } how many clusters, correlation
S = {s1 , · · · , sN } cluster attribution
Likelihood
"
T
P(x|S, G) = ∏ Eη,ε
t=1
#
N
∏ δ (xi,t − x˜ i,t )
i=1
Exponentiation of Dirac functions with
Z +∞
dk ikx
δ (x) =
e
−∞
Damien Challet
2π
Inefficient Markets VII:
∝ P(S, G|x)
Clustering: maximum likelihood
Gaussian integration →
P(S, G|x) ∝ eTL {S,G}
1
gs cs
L {S, G} = − ∑[(1 + gs )(ns −
)
2 s
1 + gs ns
+ ns ln(1 + gs ) − ln(1 + gs ns )
Log-likelihood L ; maximisation: ∂∂L
gs = 0
(
gˆ s =
cs −ns
n2s −cs
ns > 0
0
ns = 0
1
ns
n2s − ns
Lc (S) =
ln + (ns − 1) ln 2
2 s,n∑
cs
ns − cs
s >0
Damien Challet
Inefficient Markets VII:
Clustering: maximum log-likelihood
Problem: si is discrete. Maximize Lc w.r.t S?
Enumerate: O(K N )
Random search
1
2
3
4
5
Start with arbitrary S
Propose si → s for all i
Compute differences in Lc for each i
Keep single move that improves Lc the most
Stop when no move improves Lc
Merging algorithm
1
2
3
Start with N clusters, si = i
Merge two clusters s0 , s00 so that Lc is the most improved
Repeat N − 1 times
Damien Challet
Inefficient Markets VII:
Clustering
Merging algorithm
1
Start with N clusters, si = i
2
Merge two clusters i, j so that Lc is the most improved
3
Repeat N − 1 times
Damien Challet
Inefficient Markets VII:
Clustering
Lc = ∑s ls : superposition of terms
merge r and s into q:
1
2
3
nq = nr + ns
cq : recompute from xi
merge ls and ls0 into lq


lq > lr + ls
lq < lr + ls , lq > max(lr , ls )


lq < lr + ls , lq < max(lr , ls ) : no links in dendrogram
Damien Challet
Inefficient Markets VII:
Clustering: assets
Clusters: economic sectors
1 electric and computers
2 electric and computers
3 mixed
4 gold
5 banks
Damien Challet
Inefficient Markets VII:
Clustering: days
xi,t : matrix
N ' T: transpose and cluster
Clusters of days → state
Damien Challet
Inefficient Markets VII:
Clustering: days
Day states: way of sectors co-moving
Damien Challet
Inefficient Markets VII:
Clustering: days
Date → state
5 meaningful states + 1 random
state
1
6
4
2
cluster
1
44
5
2
date
1990/01/02
1990/01/03
1990/01/04
1990/01/05
Claim: after crash, same sequence of states
Damien Challet
Inefficient Markets VII:
Custering of days: predictability?
At the close of time t, state µt
E(rt |µt ) very significantly non-zero
Is E(rt+1,1 |µt ) significantly non-zero?
E(rt+1,1 |µt ) = ∑ W(µt → ν)E(rt+1 |ν)
ν
where W(µt → ν) changes slowly a function of time
Raw Sharpe ratio
E(rt+1,1 |µ)
Sµ,raw = p
var(rt+1,1 |µ)
Benchmark: E(rt+1,1 ), δ rt+1,1 = rt+1,1 − E(rt+1,1 )
E(δ rt+1,1 |µ)
Sµ = p
var(δ rt+1,1 |µ)
Damien Challet
Inefficient Markets VII:
Custering of days: predictability?
For stock i : Hi = E(Si,µ )
Some predictability
Damien Challet
Inefficient Markets VII: