CAPM, a Review Note that we will be using the Sharpe-Linter version of CAPM:
Recall that we declare R_{f} (the risk-free rate) to be non-stochastic. Let us begin. We will create a generic function to analyze some given stock. Data Import We will first import our utilities import pandas as pd import numpy as np Let’s load the data from our market (NYSE) as well as our 10 year t-bill data. t_bill = pd.read_csv("./linearity_test_data/10yrTBill.csv") nyse = pd.read_csv("./linearity_test_data/NYSEComposite.csv") nyse.head() Date Close 0 11/7/2013 16:00:00 9924.37 1 11/8/2013 16:00:00 10032.14 2 11/11/2013 16:00:00 10042.95 3 11/12/2013 16:00:00 10009.84 4 11/13/2013 16:00:00 10079.89 Excellent. Let’s load in the data for that stock. def load_stock(stock): return pd.read_csv(f"./linearity_test_data/{stock}.csv") load_stock("LMT").head() Date Close 0 11/7/2013 16:00:00 136.20 1 11/8/2013 16:00:00 138.11 2 11/11/2013 16:00:00 137.15 3 11/12/2013 16:00:00 137.23 4 11/13/2013 16:00:00 137.26 Raw Data And now, let’s load all three stocks, then concatenate them all into a big-ol DataFrame.
load data df = { "Date": nyse.Date, "NYSE": nyse.Close, "TBill": t_bill.Close, "LMT": load_stock("LMT").Close, "TWTR": load_stock("TWTR").Close, "MCD": load_stock("MCD").Close } # convert to dataframe df = pd.DataFrame(df) # drop empty df.dropna(inplace=True) df Date NYSE TBill LMT TWTR MCD 0 11/7/2013 16:00:00 9924.37 26.13 136.20 44.90 97.20 1 11/8/2013 16:00:00 10032.14 27.46 138.11 41.65 97.01 2 11/11/2013 16:00:00 10042.95 27.51 137.15 42.90 97.09 3 11/12/2013 16:00:00 10009.84 27.68 137.23 41.90 97.66 4 11/13/2013 16:00:00 10079.89 27.25 137.26 42.60 98.11 ... ... ... ... ... ... ... 2154 10/24/2022 16:00:00 14226.11 30.44 440.11 39.60 252.21 2155 10/25/2022 16:00:00 14440.69 31.56 439.30 39.30 249.28 2156 10/26/2022 16:00:00 14531.69 33.66 441.11 39.91 250.38 2157 10/27/2022 16:00:00 14569.90 34.83 442.69 40.16 248.36 2158 10/28/2022 16:00:00 14795.63 33.95 443.41 39.56 248.07 [2159 rows x 6 columns] Log Returns Excellent. Now, let’s convert all of these values into daily log-returns (we don’t really care about the actual pricing.)
log_returns = df"NYSE", "TBill", "LMT", "TWTR", "MCD".apply(np.log, inplace=True) df.loc[:, ["NYSE", "TBill", "LMT", "TWTR", "MCD"]] = log_returns df Date NYSE TBill LMT TWTR MCD 0 11/7/2013 16:00:00 9.202749 3.263084 4.914124 3.804438 4.576771 1 11/8/2013 16:00:00 9.213549 3.312730 4.928050 3.729301 4.574814 2 11/11/2013 16:00:00 9.214626 3.314550 4.921075 3.758872 4.575638 3 11/12/2013 16:00:00 9.211324 3.320710 4.921658 3.735286 4.581492 4 11/13/2013 16:00:00 9.218298 3.305054 4.921877 3.751854 4.586089 ... ... ... ... ... ... ... 2154 10/24/2022 16:00:00 9.562834 3.415758 6.087025 3.678829 5.530262 2155 10/25/2022 16:00:00 9.577805 3.451890 6.085183 3.671225 5.518577 2156 10/26/2022 16:00:00 9.584087 3.516310 6.089294 3.686627 5.522980 2157 10/27/2022 16:00:00 9.586713 3.550479 6.092870 3.692871 5.514879 2158 10/28/2022 16:00:00 9.602087 3.524889 6.094495 3.677819 5.513711 [2159 rows x 6 columns] And now, the log returns! We will shift this data by one column and subtract. returns = df.drop(columns=["Date"]) - df.drop(columns=["Date"]).shift(1) returns.dropna(inplace=True) returns NYSE TBill LMT TWTR MCD 1 0.010801 0.049646 0.013926 -0.075136 -0.001957 2 0.001077 0.001819 -0.006975 0.029570 0.000824 3 -0.003302 0.006161 0.000583 -0.023586 0.005854 4 0.006974 -0.015657 0.000219 0.016568 0.004597 5 0.005010 -0.008476 0.007476 0.047896 -0.005622 ... ... ... ... ... ... 2154 0.005785 0.004940 -0.023467 -0.014291 0.001349 2155 0.014971 0.036133 -0.001842 -0.007605 -0.011685 2156 0.006282 0.064420 0.004112 0.015402 0.004403 2157 0.002626 0.034169 0.003575 0.006245 -0.008100 2158 0.015374 -0.025590 0.001625 -0.015053 -0.001168 [2158 rows x 5 columns] Risk-Free Excess Recall that we want to be working with the excess-to-risk-free rates R_{T}-R_{f}, where R_{T} is some security. So, we will go through and subtract everything by the risk-free rate (and drop the RFR itself): risk_free_excess = returns.drop(columns="TBill").apply(lambda x: x-returns.TBill) risk_free_excess NYSE LMT TWTR MCD 1 -0.038846 -0.035720 -0.124783 -0.051603 2 -0.000742 -0.008794 0.027751 -0.000995 3 -0.009463 -0.005577 -0.029747 -0.000307 4 0.022630 0.015875 0.032225 0.020254 5 0.013486 0.015952 0.056372 0.002854 ... ... ... ... ... 2154 0.000845 -0.028406 -0.019231 -0.003591 2155 -0.021162 -0.037975 -0.043738 -0.047818 2156 -0.058138 -0.060308 -0.049017 -0.060017 2157 -0.031543 -0.030593 -0.027924 -0.042269 2158 0.040964 0.027215 0.010537 0.024422 [2158 rows x 4 columns] Actual Regression It is now time to perform the actual linear regression! We will use statsmodels’ Ordinary Least Squares API to make our work easier, but we will go through a full regression in the end. import statsmodels.api as sm CAPM Regression: Lockheed Martin Let’s work with Lockheed Martin first for regression, fitting an ordinary least squares. Remember that the OLS functions reads the endogenous variable first (for us, the return of the asset.)
add a column of ones to our input market excess returns nyse_with_bias = sm.add_constant(risk_free_excess.NYSE) # perform linreg lmt_model = sm.OLS(risk_free_excess.LMT, nyse_with_bias).fit() lmt_model.summary() OLS Regression Results ============================================================================== Dep. Variable: LMT R-squared: 0.859 Model: OLS Adj. R-squared: 0.859 Method: Least Squares F-statistic: 1.312e+04 No. Observations: 2158 AIC: -1.263e+04 Df Residuals: 2156 BIC: -1.262e+04 Df Model: 1 Prob (F-statistic): 0.00 Covariance Type: nonrobust Log-Likelihood: 6318.9 ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const 0.0004 0.000 1.311 0.190 -0.000 0.001 NYSE 0.9449 0.008 114.552 0.000 0.929 0.961 ============================================================================== Based on the constants row, we can see that—within 95\% confidence—the intercept is generally 0 and CAPM applies. However, we do see a slight positive compared to the market. Furthermore, we can see that the regression has a beta value of 0.9449 — according the CAPM model, it being slightly undervarying that the market.
CAPM Regression: MacDonald’s # perform linreg mcd_model = sm.OLS(risk_free_excess.MCD, nyse_with_bias).fit() mcd_model.summary() OLS Regression Results ============================================================================== Dep. Variable: MCD R-squared: 0.887 Model: OLS Adj. R-squared: 0.887 Method: Least Squares F-statistic: 1.697e+04 No. Observations: 2158 AIC: -1.310e+04 Df Residuals: 2156 BIC: -1.309e+04 Df Model: 1 Prob (F-statistic): 0.00 Covariance Type: nonrobust Log-Likelihood: 6551.1 ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const 0.0003 0.000 1.004 0.315 -0.000 0.001 NYSE 0.9651 0.007 130.287 0.000 0.951 0.980 ============================================================================== Same thing as before, we are within 95\% confidence having a intercept of 0 (with a slight positive edge), and it looks like MacDonald’s vary a little bit more than Lockheed Martin. The food industry is probably a tougher business than that in defense. CAPM Regression: Twitter Lastly, to analyze the recently delisted Twitter!
perform linreg twtr_model = sm.OLS(risk_free_excess.TWTR, nyse_with_bias).fit() twtr_model.summary() OLS Regression Results ============================================================================== Dep. Variable: TWTR R-squared: 0.522 Model: OLS Adj. R-squared: 0.522 Method: Least Squares F-statistic: 2357. No. Observations: 2158 AIC: -8610. Df Residuals: 2156 BIC: -8599. Df Model: 1 Prob (F-statistic): 0.00 Covariance Type: nonrobust Log-Likelihood: 4307.1 ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const -0.0002 0.001 -0.346 0.730 -0.002 0.001 NYSE 1.0173 0.021 48.549 0.000 0.976 1.058 ============================================================================== Evidently, Twitter is much more variable. It looks like it has a nontrivial bias (the intercept being -0.001 being within the 95\% confidence band — that the security is possibly significantly underperforming the CAPM expectation in the market.) Furthermore, we have a positive beta value: that the asset is more variable than the market.
Manual Checking We can also use the betas formula to manually calculate what we expect for the beta values (i.e. as if they were one IID random variable.) risk_free_cov = risk_free_excess.cov() risk_free_cov NYSE LMT TWTR MCD NYSE 0.001143 0.001080 0.001163 0.001103 LMT 0.001080 0.001188 0.001116 0.001083 TWTR 0.001163 0.001116 0.002264 0.001155 MCD 0.001103 0.001083 0.001155 0.001200 Finally, to construct the beta values. Recall that:
and that:
get the market variance (covariance with itself) market_variation = risk_free_cov.NYSE.NYSE # calculate betas betas = {"LMT": (risk_free_cov.LMT.NYSE/market_variation), "TWTR": (risk_free_cov.TWTR.NYSE/market_variation), "MCD": (risk_free_cov.MCD.NYSE/market_variation)} # and make dataframe betas = pd.Series(betas) betas LMT 0.944899 TWTR 1.017294 MCD 0.965081 dtype: float64 Apparently, all of our assets swing less than the overall NYSE market! Especially Lockheed—it is only 94.4\% of the market variation. Furthermore, it is interesting to see that Twitter swings much more dramatically compared to the market.
Equal-Part Fund We will now create two funds with the three securities, one with equal parts and one which attempts to maximizes the bias (max returns) while minimizing the beta variance value compared to the market. First, let’s create a baseline fund in equal parts. Here it is: fund_1_returns = returns.LMT + returns.TWTR + returns.MCD fund_1_returns 1 -0.063167 2 0.023420 3 -0.017149 4 0.021384 5 0.049750 ... 2154 -0.036409 2155 -0.021132 2156 0.023917 2157 0.001720 2158 -0.014596 Length: 2158, dtype: float64 We will calculate the excess returns of this fund: fund_1_excess = fund_1_returns-returns.TBill fund_1_excess 1 -0.112813 2 0.021600 3 -0.023310 4 0.037041 5 0.058226 ... 2154 -0.041349 2155 -0.057265 2156 -0.040503 2157 -0.032449 2158 0.010994 Length: 2158, dtype: float64 Performance of the Equal-Part Fund # perform linreg fund_1_model = sm.OLS(fund_1_excess, nyse_with_bias).fit() fund_1_model.summary() OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.473 Model: OLS Adj. R-squared: 0.473 Method: Least Squares F-statistic: 1935. No. Observations: 2158 AIC: -7735. Df Residuals: 2156 BIC: -7724. Df Model: 1 Prob (F-statistic): 3.01e-302 Covariance Type: nonrobust Log-Likelihood: 3869.5 ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const 0.0007 0.001 0.841 0.401 -0.001 0.002 NYSE 1.1290 0.026 43.993 0.000 1.079 1.179 ============================================================================== Surprisingly, we have now created a significantly riskier investment that, though riskier, generates a much higher probability of reward (+0.001 is now within the 99\% band!) A More Optimized Fund To me, this is the payoff of this assignment. We will now use CAPM to create the “best” fund combination—given some variance, the funds which match CAPM. To do this, let’s create a generic linear combination of the assets. import sympy as sym x = sym.Symbol('x') y = sym.Symbol('y') z = sym.Symbol('z') fund_2_returns = xreturns.LMT + yreturns.TWTR + zreturns.MCD fund_2_returns 1 0.0139260753744255x - 0.0751364261353569y - ... 2 -0.00697525170622448x + 0.0295704573211193y ... 3 0.000583132897928884x - 0.0235859990058791y ... 4 0.000218587198947517x + 0.016568426347233y +... 5 0.00747599199607762x + 0.0478955096700351y -... ... 2154 -0.0234665578621085x - 0.0142913301107561y +... 2155 -0.00184214468578059x - 0.0076045993852194y ... 2156 0.00411172646842317x + 0.0154024001854269y +... 2157 0.00357547337231878x + 0.0062445563228315y -... 2158 0.00162509910496933x - 0.0150529686289622y -... Length: 2158, dtype: object Excellent. We will also calculate the excess returns of this fund: fund_2_excess = fund_2_returns-returns.TBill Y = fund_2_excess.to_numpy() Y [0.0139260753744255x - 0.0751364261353569y - 0.00195664549320096z - 0.0496463208073039 -0.00697525170622448x + 0.0295704573211193y + 0.000824317408861575z - 0.00181917459665826 0.000583132897928884x - 0.0235859990058791y + 0.00585367525146019z - 0.00616055581298536 ... 0.00411172646842317x + 0.0154024001854269y + 0.00440300114913317z - 0.0644196927849867 0.00357547337231878x + 0.0062445563228315y - 0.0081004573348249z - 0.0341688956152497 0.00162509910496933x - 0.0150529686289622y - 0.00116834209450634*z + 0.0255902303732043] We cast this type to a numpy array because we are about to perform some matrix operations upon it. Optimizing the Optimized Fund: Linreg Now, let us perform the actual linear regression ourselves. Recall that the pseudoinverse linear regression estimator is:
We have already our Y as a vector (above), and our X is: X = nyse_with_bias.to_numpy() X 1.00000000e+00 -3.88457302e-02] [ 1.00000000e+00 -7.42217926e-04] [ 1.00000000e+00 -9.46284244e-03] ... [ 1.00000000e+00 -5.81378271e-02] [ 1.00000000e+00 -3.15429207e-02] [ 1.00000000e+00 4.09643405e-02 We now have our matrices, let’s perform the linear regression! linear_model = np.linalg.inv((X.transpose()@X))@X.transpose()@Y linear_model [0.000544056413840724x - 6.62061061591867e-5y + 0.000429966553373172z - 0.000178620725465344 0.0457830563134785x + 0.118178191274045y + 0.0659651260604729z + 0.899115719100281] Excellent. So we now have two rows; the top row represents the “bias”—how much deviation there is from CAPM, and the bottom row represents the “rate”—the “beta” value which represents how much excess variance there is. Optimizing the Optimized Fund: Picking Optimizing Parameters We can will solve for a combination of solutions to give us specific values of returns vs risk. For instance, we can fix the variance to 1 (i.e. we can vary as much as the market.) We subtract one here for the solver, which expects the expressions equaling to 0. risk_expr = linear_model[1] - 1 risk_expr 0.0457830563134785x + 0.118178191274045y + 0.0659651260604729z - 0.100884280899719 Now, we will set a certain earning value, and solve for possible solutions. We will try to get the largest possible bias without needing to short something (i.e. cause a negative solution). By hand-fisting a value, it seems 0.001 is a good bet. deviance_expr = linear_model[0] - 0.001 deviance_expr 0.000544056413840724x - 6.62061061591867e-5y + 0.000429966553373172z - 0.00117862072546534 Optimizing the Optimized Fund: Optimize! solution = sym.solvers.solve([deviance_expr, risk_expr], x,y,z) solution {x: 2.16803104555387 - 0.819584899551304z, y: 0.0137520589394366 - 0.24067066980814z} We have one degree of freedom here: how much MacDonald’s we want! Let’s say we want none (which would, according to this, be an equally efficient solution.) How Does Our Fund Do? This would create the following plan:
for our case z_val = 0 # numerical solutions s_x = solution[x].subs(z,z_val) s_y = solution[y].subs(z,z_val) # solution fund_2_nobias_nomac = s_xreturns.LMT + s_yreturns.TWTR + z_val*returns.MCD fund_2_nobias_nomac.mean() 0.001185050286566688 Recall that this is the performance of the balanced portfolio:
fund_1_returns.mean() 0.0009224705380695683 So, for market-level risk (\beta =1, instead of the balanced portfolio’s \beta =1.1290), this is a pretty good deal! Some Plots Finally, let’s plot the prices of our various funds: import matplotlib.pyplot as plt import matplotlib.dates as mdates import seaborn as sns from datetime import datetime sns.set() fund_2_price = s_xdf.LMT + s_ydf.TWTR + z_val*df.MCD fund_1_price = df.LMT + df.TWTR fund_l_price = df.LMT fund_t_price = df.TWTR dates = df.Date.apply(lambda x:datetime.strptime(x, "%m/%d/%Y %H:%M:%S")) sns.lineplot(x=dates, y=fund_2_price.apply(sym.Float).astype(float)) sns.lineplot(x=dates, y=fund_1_price.apply(sym.Float).astype(float)) sns.lineplot(x=dates, y=fund_l_price.apply(sym.Float).astype(float)) sns.lineplot(x=dates, y=fund_t_price.apply(sym.Float).astype(float)) plt.gca().xaxis.set_major_locator(mdates.YearLocator()) plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y')) plt.gca().set_ylabel("Price") plt.show() Recall that we didn’t actually buy any MacDonald’s. So, we have—blue, being our “optimal” portfolio, yellow, our balanced portfolio, green, being Lockheed, and red, being Twitter. Our portfolio works surprisingly well!