9 min read

The Need for Better Indexes

How Passive is Passive Investing

Passive investing always hits my ear wrong. Not in the sense that low fee index tracking products are hard to beat or bad idea (they are hard to beat and often a prudent investment strategy). My hiccup on passive investing is: is it really passive? Or what does passive even mean? Anytime you allocate capital to more than one security you have a budgeting decision to make. Let’s use the S&P 500 as our example because it’s widely followed and is the underlying index for an incredibly large amount of money in index tracking products (viz., ETFs, mutual funds, and futures). To form the S&P 500 index we need two things: constituents (in this case 500 companies) and weights or how much of our hypothetical money to invest in each company’s stock. Let’s look past the first decision of choosing 500 U.S. domiciled companies and focus on the second choice of how to divvy up our investment among the 500 stocks because it’s often overlooked. It’s well known that the S&P 500 uses the market capitalization (i.e., size) of each company to determine the weights. Intro investing classes will often teach this as an example of calculating returns of multiple securities. But it’s not often asked is this a good idea? Or more precisely, is market cap weighting an optimal (or even sensible) way of constructing an index? Most investors are so conditioned to the S&P 500 market cap weights they don’t even realize there are alternative methods such as giving each company an equal weight. I propose that market cap weighting is not optimal and often results in an unbalanced or poorly diversified index. Let’s start by setting up some data to build on this thesis.

Data gathering

# required packages

# vector of tickers
ticker_vec <- c("XLY", "XLP", "XLE", "XLF", "XLV", "XLI", "XLB", "XLRE", 
                "XLK", "XLU")
n_tickers <- length(ticker_vec)
# sector names
label_vec <- c("Consumer Disc.", "Consumer Staples", "Energy", "Financials", 
               "Health Care", "Industrials", "Materials", "Real Estate", 
               "Info Tech", "Utilities")
# capital weights as a column vector
cap_wgt <- matrix(ncol = 1, data = c(12.94, 6.89, 6.19, 14.65, 14.10, 9.82, 
                                     2.84, 2.76, 26.92, 2.89) / 100)

price_dat <- lapply(ticker_vec, "getSymbols.yahoo", from = "1970-01-01",
                    to = "2018-05-01", periodicity = "daily", 
                    auto.assign = FALSE)
price_mat <- do.call("cbind", price_dat)
price <- price_mat[, seq(from = 6, to = ncol(price_mat), by = 6)]
# transform price into discrete returns
ret <- price / lag.xts(price, k = 1) - 1
ret <- na.omit(ret)

There’s a couple of things to point out here. We’re using sectors as our building blocks as opposed to 500 stocks. One of my favorite investment jargon aphorisms is don’t parachute into a jungle. Meaning don’t overwhelm yourself with complexity at the beginning of your journey, start with a simple example and then slowly foray into the thicket. The cap_wgt array represents the corresponding capital weights. I got them from the SPDR Sector website as of May 1st, 2018.

spec_col <- brewer.pal(n_tickers, "Spectral")
pie(cap_wgt, label_vec, col = spec_col, cex = 0.8, border = "white", 
    main = "Sector Capital Weights")

Looking at the capital weights we can already see some in-balances with the larger pie slices going to Info Tech, Health Care, and Financials.

Let’s see how things look from a volatility perspective. Before getting into details I’d like to motivate everyone to take volatility as a serious risk measure. Volatility detracts from compounding wealth. Take the classic trick question of a $100 portfolio that’s up 10% Monday then down 10% Tuesday. Starting the day Wednesday it would be easy to heuristically misfire and say our portfolio is back to $100. But we’re actually starting our day at $99. If you’ve never encountered this geometric return example go ahead an work through the problem, you can even change the order, down 10% on day 1 then up 10% day 2 and arrive at the same result. The loss gets worse with larger volatility. Plug in 50% instead of 10% and your portfolio is down to $75. Assuming a log-normal distribution you can use stochastic calculus to derive geometric return = arithmetic return - half of variance. This is exactly what’s going on in our examples. The (population) standard deviation of 10% and -10% is 10%, and the arithmetic average is 0%, so our loss is 0 - 10%^2 = 0.01%. Yes, these are just examples, and most financial returns (especially equity) don’t exactly follow the normal distribution, but that doesn’t change the well-grounded premise that volatility detrimental to accumulating wealth.

Moving back to our volatility weight calculation, we’ll define component contribution to risk from sector i = \(cctr_i\) and \[cctr_i = x_i\frac{\partial{\sigma}_p}{\partial{x_i}}\]

\[volatility\ weight_i = cctr_i\ /\ \Sigma^N_icctr_i\]


\[N = total\ number\ of\ sectors\] \[x_i = sector_i\ capital\ weight\] \[\sigma_p = \sqrt{x'\Omega x}\ = volatility\ of\ the\ index\] \[\Omega=sector\ covariance\ matrix\] \[x = column\ of\ capital\ weights\] If your calculus and linear algebra are a bit rusty don’t fret the details. Volatility weights (often times referred to as a risk budget) are simply a snapshot of how much risk (proxied by volatility) is coming from each constituent, or in our case sector. There’s a more elegant way to simplify the equation, but for the volatility weights to make sense to me I have to walk through the marginal contribution to risk (the partial derivative in cctr) -> component contribution to risk -> risk weights steps. If this is your first time calculating a risk budget you might find the intermediary steps helpful.

xcov <- cov(ret, use = "complete.obs")
index_variance <- t(cap_wgt) %*% xcov %*% cap_wgt %>% as.numeric()
index_vol <- sqrt(index_variance)
mctr <- t(cap_wgt) %*% xcov / index_vol
cctr <- t(mctr) * cap_wgt
risk_wgt <- cctr / index_vol
pie(risk_wgt, label_vec, col = spec_col, cex = 0.8, border = "white",
    main = "Sector Volatility Weights")

Financials jump out as increasing from a capital to volatility weight budget. Health Care and Info Tech remain large parts of volatility budget.

# utility function to format %
fPercent <- function(x) paste0(formatC(x * 100, digits = 2, format = "f"), "%")
# create a data frame of capital and risk weights
df <- apply(cbind(cap_wgt, risk_wgt), 2, "fPercent") %>% data.frame()
colnames(df) <- c("Capital Weight", "Risk Weight")
rownames(df) <- label_vec
# use DT datatable for a nice html output
datatable(df, options = list(dom = "t", paging = FALSE, seraching = FALSE, 
                                 ordering = FALSE,
                                 columnDefs = list(
                                     list(className = "dt-center", 
                                          targets = 0:2))))

From a capital perspective, the top 2 sectors (Info Tech and Health Care) are responsible for 41.57% of the index. From a volatility perspective, the top 2 sectors (Info Tech and Financials) are responsible for 50.24% of the index volatility. Advocates of passive investing commonly tout the diversification benefits of the index. An ETF that tracks the S&P 500 allows you to gain exposure to 10 sectors with one investment. But if over half of our volatility is coming from 2 of these sectors is our investment portfolio really balanced?

Let’s shift to stocks instead of sectors. We’ll use the Dow Jones to keep things manageable with 30 stocks. The Dow is famous for price weights: stocks with higher prices get a higher weight. To avoid spending too much time gathering data we’re going to approximate the Dow capital weights by the historic prices of its current constituents (adjusted for dividends share splits). This is short of 100% accuracy for several reasons including companies move in and out of the Dow Jones universe of 30 stocks (e.g., Apple replaced AT&T in 2015) but should give us a good approximation. We’ll calculate a rolling risk contribution of the 3 stocks with the largest risk weight from a trailing 63 day covariance estimate.

# url with html table of Dow Jones constituents (accessed 5/10/2018)
url <- "http://money.cnn.com/data/dow30/"
# read html and extract table as list of data frames
html <- read_html(url)
tbl <- html_table(html)
# the second list contains our data
company <- tbl[[2]]$Company
# seperate the ticker from company name
ticker_vec <- stri_extract_first(company, regex = "\\w+")
# download historical prices
price <- getSymbols(ticker_vec[1], periodicity = "daily", 
                    from = "1970-01-01", auto.assign = FALSE)[, 6]
for (i in 2:length(ticker_vec)) {
  price <- cbind(price, getSymbols(ticker_vec[i], periodicity = "daily", 
                                   from = "1970-01-01", 
                                   to = "2018-05-01", auto.assign = FALSE)[, 6])
# use common time-period (union of dates)
price <- na.omit(price)
# estimate capital weights as price / sum of all prices
tot_price <- rowSums(price)
cap_wgt <- price / tot_price
# convert to discrete returns
ret <- price / lag.xts(price, 1)
ret <- ret[2:NROW(ret), ]
# keep cap weights on same time-period as returns (omit the first data point)
cap_wgt <- cap_wgt[2:NROW(cap_wgt), ]
# place holder for rolling risk weights
risk_wgt <- matrix(nrow = dim(ret)[1] - 62, ncol = dim(ret)[2])
j <- 1
# loop through each return to calculate a rolling risk weight
# use 63 trading days as rolling estimate for covariance
for (i in 63:dim(ret)[1]) {
  roll_ret <- ret[(i - 62):i, ]
  w <- matrix(cap_wgt[i, ], ncol = 1)
  xcov <- cov(roll_ret)
  # here we simplify the risk weight calculation to avoid unneccesary calculations in a loop
  risk_wgt[j, ] <- (w * (xcov %*% w)) / (t(w) %*% xcov %*% w)[1]
  j <- j + 1

# utility function to calculate risk contribution of the top 3 risk weights
topN <- function(x, n = 3) sort(x, decreasing = TRUE)[1:n] %>% sum()
# apply the function to all rows of risk weights
risk_contr <- apply(risk_wgt, 1, "topN")
# transform to xts
risk_contr <- xts(risk_contr, index(cap_wgt[63:NROW(cap_wgt)]))


##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1935  0.2379  0.2723  0.2773  0.3109  0.4107

If all 30 companies contributed an equal or near equal amount of risk, the top 3 risk drivers would contribute about 10% of total risk (3 / 30 = 10%). However, on average 28% of the total risk comes from 10% of the companies and during times of stress (2008 - 2009) 3 stocks were responsible for over 40% of the total volatility.

risk_contr <- apply(risk_wgt, 1, "topN", n = 15)
risk_contr <- xts(risk_contr, index(cap_wgt[63:NROW(cap_wgt)]))


##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.6842  0.7368  0.7566  0.7588  0.7801  0.8674

As we increase the amount of stocks to include in our top risk contributors we continue to see a pattern of a large portion of risk coming from a smaller number of stocks. If we look at 15 companies or half of the index we see that on average of 75% of risk comes from these companies. The other way to look at this is on average half of the index is only responsible for 25% of the volatility. From a volatility perspective this is not a balanced index.

I believe the next step in the investing landscape is to create more sensible indexes that are built with diversification as an end goal. Stay tuned for future posts where I’ll propose some schemes for capital weights that result in a more risk-balanced index.