Pages

lundi 10 juin 2013

Backtesting: Description, getting data and uniform weights


For this post, I use python and pandas to backtest a trading strategy described in the paper:
"Facts and fantasies about commodity Futures" by Gorton and Rouwenhorst

It won't be an exact replication of their results because I don't have access to the same data.
From this paper, I will replicate their monthly rebalanced equi-weight index and look at their long/short index based on the future rolls (on a following post).

First I need some data. To do so I use the website www.quandl.com. You can get many time series from there and there is a nice python api.

I wrote a simple script to get 1st and 2nd futures for the products I want. They are then stored locally in a hdf5 file. There is a limit on the number of call you can do to the quandl website so it is better to store the data locally and then take it from there. If you register, you get a token, to increase your DL limit. It should be used in the Quandl.get line, otherwise remove
authtoken ="INSERTYOURTOKENHERE"
all together from the call.


import Quandl
import pandas as pd


energy = str('CL B NG HO RB EH BZ M CER')
grain = str('C W S SM BO O RR')
soft = str(' SB KC CC CT OJ')
equity = str(' ES SP MD NK IBV YM DJ RF RG RV TF')
fx = str(' AD CD SF EC BP JY NE MP BR RU')
rates = str('ED EM FF TU FV TY US UL SA I3')
metal = str('GC SI HG  PL PA')
agri = str('LC LN FC DA LB')
index =str('AW GI CI DX VX')

ticks=[energy,grain,soft,equity,fx,rates,metal,agri,index

store = pd.HDFStore('future.h5')

for t in ticks:
   symbols=t.split()
   for symb in symbols:
       print 'Getting '+symb
       data = Quandl.get("OFDP/FUTURE_"+symb+"1",authtoken ="INSERTYOURTOKENHERE" ,
                  collapse="daily",trim_start='1980-01-01',
                  trim_end='2010-01-01', returns='pandas')
       store[symb+'1'] = data
       data = Quandl.get("OFDP/FUTURE_"+symb+"2",authtoken ="INSERTYOURTOKENHERE" ,
                  collapse="daily",trim_start='1980-01-01',
                  trim_end='2010-01-01', returns='pandas')
       store[symb+'2'] = data
       
store.close()

Then, with another script, we can backtest the strategies. We need a bunch of imports and two lambda functions that will make our life easier. This is inspired by examples in the book "Python for data analysis" by Wes McKinney, author of pandas.

import pandas as pd
import pandas.io.data as web
import numpy as np
import matplotlib.pyplot as plt
from collections import defaultdict
import datetime


compound = lambda x: (1+x).prod()-1
daily_sr = lambda x: x.mean()/x.std()


Then we load the data from our file and the risk free rate (3 month T bills):


#tickers i'll need
energy = str('CL HO NG RB B')
metal = str('GC SI HG  PL PA')
grain = str('C O W S BO SM')
agri = str('LC LN FC LB')
soft = str('CC CT SB KC  OJ')

ticks=[energy,metal, grain,agri,soft]
store = pd.HDFStore('future.h5')

d1=defaultdict(dict)
d2=defaultdict(dict)

#get the data from my database
for t in ticks:
   symbols=(t.upper()).split()
   for symb in symbols:
       price=store.get(symb+'1')['Settle']
       d1[symb+'1']=price.asfreq('B').fillna(method='pad')
       price=store.get(symb+'2')['Settle']
       d2[symb+'1']=price.asfreq('B').fillna(method='pad')

       
store.close()

#get the risk free rate
rf_yearly = web.DataReader("TB3MS", "fred", start=datetime.datetime(1980, 1, 1),
                     end=datetime.datetime(2009, 12, 31))['TB3MS']
rf=rf_yearly.asfreq('B').fillna(method='pad')



#prepare data       
px = pd.DataFrame(d1)
px2 = pd.DataFrame(d2)

The strategy is the following. The data starts the 1st of  Jan 1980.
-The last business day of Jan, we check which futures were active for at least a week. Then we place 1 dollars divided uniformly between the actives ones at the close that day, say we have N futures active.
-The portfolio is closed on the last business day of Feb and a new one is reopened the same day changing the amount invested in each active future so that it is uniform again.
-The Feb profit on each future is (1/N) times the cumulative returns of the given future from the last day of Jan until the last day of Feb.
-Each portefolio invest in total 1, but since futures are fully marginized, the profit on the cash for 1 month is 1/12 of the 3 month T bill rate available the last day of Jan.
-The portfolio profit is the sum of each future profits + the return on the cash posted as collateral.

I use a function to calculate the portfolio weights as it will change depending on the strategy.


def uniform_weight(price):
    lookback=5
    mom_ret = price.pct_change(lookback)
    norm = mom_ret.count(axis=1)
    uniform = 0*mom_ret+1
    return uniform/norm

Such function uses a DataFrame price. Pct_change(lookback) calculates the percentage change over the period, here 5 days. If the future is not traded, the price is NaN and the pct_change as well.
The function .count(axis=1) counts the number of non NaN for each date of the frame (that's what the axis=1 means), that is the number of active futures. uniform will be equal to NaN when mom_ret is and equal to 1 otherwise. The function returns 1/N for the traded futures and NaN instead.

We are ready:
We get monthly profit from the cash collateral
From the future prices, we calculate the daily returns and then compound them to get monthly returns. To do so we use our lamda function declared earlier.
Given the daily prices, we calculate the portfolio weights, we then replace NaN' weights by 0.


freq= '1M' 
collat = rf.resample(freq,how='first')/(12*100)

daily_rets = px.pct_change()
returns = daily_rets.resample(freq, how=compound)


#risk free rate
results=pd.DataFrame({'Risk free' :collat})

#generate monthly rebalanced
port = uniform_weight(px)
port = port.fillna(0.0)
port = port.shift(1).resample(freq,how='first')
port_rets = (port * returns).sum(axis=1)+collat
results['Uniform']= port_rets

The .shift().resample() seems a bit scary but it is simple:
The weights calculated last business day of January are delayed (shift operator) to the 1st of Feb. Weights for portfolio for the whole month of Feb are then set to such delayed weights and the serie is subsampled to monthly frequency.
PL is calculated as the sum of the weight times the monthly cumulated returns of each futures and collateral PL is also added.
We could easily add transaction costs if we wanted.

One can get some statistics using the daily_sr lambda function we defined at the start. One should multiply by sqrt(12) as our returns are monthly.
We find a Sharpe ratio of 0.5.

However we probably made some mistake as we trusted the source for the future roll.
Say the Feb future closed yesterday at 100, it closes today at 101.
Say the March future closed today at 110 and tomorrow at 111.1
Say we roll today at close:
-The wrong way of rolling is to take the close for yesterday 100, the close for today 110 and the close for tomorrow 111.1.
Calculating returns would give 10% for today and 1% for tomorrow.
-The good way is to realize that the returns for today was 1% (we had the Feb future until today's close), rolling costs nothing because buying/selling futures generates no cash flows, and the return for tomorrow is 1% (as we hold the Mar future).

In order to correct for that we can simply glue prices and get returns. Returns of this glued prices for the front future is good except the day of the roll. We replace the wrong returns at roll date by the proper returns (the one on Mar future in the example).

To do so, I got all the futures for the tickers I wanted and proceed to roll manually, as described in the paper: on the last business day of January, if I hold the future expiring in February, I roll into a March future position.
I glued the prices and also stored corrected returns (returns on the last day of Jan is calculated on the Feb future, returns on the first day of Feb is calculated on the March future).

I rerun the strategy and get a Sharpe of 0.016.
Note that I run from 1980 to 2009 and not 1959 to 2004 as in the paper. I also do not have exactly the same futures.

That's already a long post so I'll stop here.











Aucun commentaire:

Enregistrer un commentaire