Commodities Analysis

Published by Alice Seaborn on Tuesday, 22 June, 2021

A crude analysis of precious metals commodities vis a vis their relative risk level against the market. The data wrangling that occurs therein can be expanded upon to answer more complex questions should the need arise. All data for gold , silver, and the benchmark broad US index (^GSPC) come from Yahoo finance and reach 20 years from the date of publication (April 27, 2021).

Table of Contents

  1. Abstract
  2. Dataset
  3. Analysis
    1. Comparative Returns
    2. Risk Return Analysis
  4. Conclusions


Study into the relationship between gold and silver commodities in comparison to US broad equities (S&P500). Emphasis is placed on what separates and correlates commodities prices. Ideally, a diversified portfolio of bullion can be constructed that leverages the strengths of individual precious metals.


Historical price and volume data for gold and silver was downloaded from Yahoo finance and market data for ^GSPC was downloaded for the same time period. The data was loaded from CSV files into Pandas DataFrames. Missing data entries were dropped and all dates were converted from strings into DateTime objects. Strangely, the market index was inverted against the commodities data. The GSPC dataframe was reversed to match the other datasets and the index was reset.


GSPC['Date'] = GSPC['Date'].apply(lambda x: dt.datetime.strptime(x, '%m/%d/%y'))
GOLD['Date'] = GOLD['Date'].apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d'))
SILV['Date'] = SILV['Date'].apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d'))

GSPC = GSPC.iloc[::-1]
GSPC.reset_index(inplace=True, drop=True)

The next issue was the subject of trading days. Although the null entries were dropped, the datasets do not always have the same null entries in the same places. Ergo, an inner join on the dataset for all matching dates was performed. First, the market index was joined against silver and then a portfolio dataframe object was constructed using the previous join and a final join against gold.

GSPC_SILV = pd.merge(left=GSPC[['Date', 'Close']], left_on='Date',
         right=SILV[['Date', 'Adj Close']], right_on='Date')

Portfolio = pd.merge(left=GSPC_SILV[['Date', 'Close', 'Adj Close']], left_on='Date',
         right=GOLD[['Date', 'Adj Close']], right_on='Date', suffixes=(" SILV", " GOLD"))

Portfolio.rename(inplace=True, columns={"Close": "GSPC", "Adj Close SILV": "SILV",
                                        "Adj Close GOLD": "GOLD"})
2000-08-30 1502.59 4.930 273.899994
2000-08-31 1517.68 5.003 278.299988
2000-09-01 1520.77 5.004 277.000000
2000-09-05 1507.08 4.998 275.799988


The goal of this shown study is to figure out how precious metals commodities

Comparative Returns

The returns each asset were calculated and kept as a float and not a percentage. Ergo, we can use these returns directly without modification in future calculations.

Portfolio["GSPC Daily Return"] = Portfolio["GSPC"].shift(1) / Portfolio["GSPC"] - 1
Portfolio["SILV Daily Return"] = Portfolio["SILV"].shift(1) / Portfolio["SILV"] - 1
Portfolio["GOLD Daily Return"] = Portfolio["GOLD"].shift(1) / Portfolio["GOLD"] - 1
Date GSPC SILV GOLD GSPC Daily Return SILV Daily Return GOLD Daily Return
2000-08-30 1502.59 4.930 273.899994 NaN NaN NaN
2000-08-31 1517.68 5.003 278.299988 -0.009943 -0.014591 -0.015810
2000-09-01 1520.77 5.004 277.000000 -0.002032 -0.000200 0.004693
2000-09-05 1507.08 4.998 275.799988 0.009084 0.001200 0.004351

The cumulative returns for the commodities were then calculated for use in future analyses.

Portfolio["GSPC Cum. Return"] = 1 + Portfolio["GSPC Daily Return"]
Portfolio["SILV Cum. Return"] = 1 + Portfolio["SILV Daily Return"]
Portfolio["GOLD Cum. Return"] = 1 + Portfolio["GOLD Daily Return"]
Date GSPC SILV GOLD GSPC Daily Return SILV Daily Return GOLD Daily Return GSPC Cum. Return SILV Cum. Return GOLD Cum. Return
2000-08-30 1502.59 4.930 273.899994 NaN NaN NaN NaN NaN NaN
2000-08-31 1517.68 5.003 278.299988 -0.009943 -0.014591 -0.015810 0.990057 0.985409 0.984190
2000-09-01 1520.77 5.004 277.000000 -0.002032 -0.000200 0.004693 0.997968 0.999800 1.004693
2000-09-05 1507.08 4.998 275.799988 0.009084 0.001200 0.004351 1.009084 1.001200 1.004351

In order to compare the price growth of the commodities against each other, the datasets were normalized prior to plotting. A new dataframe was constructed to store the normalized datasets.

def normalize(df):
    minima = df[0]
    maxima = df.max()
    return (df - minima) / (maxima - minima)

Normalized = pd.DataFrame()
Normalized['Date'] = Portfolio['Date']
Normalized['GSPC'] = normalize(Portfolio['GSPC'])
Normalized['SILV'] = normalize(Portfolio['SILV'])
Normalized['GOLD'] = normalize(Portfolio['GOLD'])
2000-08-30 0.000000 0.000000 0.000000
2000-08-31 0.005620 0.001672 0.002475
2000-09-01 0.006771 0.001695 0.001744
2000-09-05 0.001672 0.001558 0.001069

With the normalized price changes calculated, the commodity price growth can be plotted against each other. As shown below, sharp declines in index performance appear to correlate with sharp declines in the strike price of silver but not with gold. In other words, although gold and silver appear to be strongly correlated, gold appears to offer better protections against market-wide volatility.


Risk Return Analysis

How do the volatilities of silver and gold compare to the market as a whole? (β\beta) How well do silver and gold perform with respect to the market? (α\alpha)

Beta is an assessment of the volatility of an asset with respect to the market. - β<1\beta < 1 indicates asset is less volatile than the market - β=1\beta = 1 indicates asset is as volatile as the market - β>1\beta > 1 indicates asset is more volatile than that market

Beta can be expressed as the coefficient of the line of best fit whereas the alpha can be found in the same equation as the intercept (or the x0x^0 coefficient if you want to be that pedantic). Let RiR_i represent the historical daily returns of an investment and RmR_m represent the historical daily returns of a broad market index such as S&P 500 or the Russell 5000. Then the beta can be expressed like so:

Ri=βRm+α R_i = \beta \cdot R_m + \alpha

The issue with the approach above lies in the fact that uncorrelated assets will yield meaningless linear regressions, as measured by a low (<<1<< 1) R2R^2 value. Luckily, the beta can be found using alternative statistical methods, such as the covariance of the stock against the market when controlling for the variance of the stock itself:

β=cov(Ri,Rm)var(Rm) \beta = \frac{\text{cov}\left( R_i, R_m \right)}{\text{var}\left( R_m \right)}

Similarly, the alpha can be expressed arithmetically as the expected return of an investment RiR_i in excess of the expected market return RmR_m less the risk-free rate of return RfR_f1 and with respect to the excess risk of the investment βi\beta_i. See below:

α=(Ri(Rf+βi(RmRf))) \alpha = \left( R_i \cdot \left( R_f + \beta_i \cdot \left( R_m - R_f \right) \right) \right)

Leveraging the linear regression tools made available through scipy, a function for plotting the correlation between two datasets was created and standardized for all comparisons used in this study.

def plot_correlation(data, x, y):
    m, b, r, p, se = scipy.stats.linregress(data[x][1::].values, data[y][1::].values)
    seaborn.jointplot(data=data, x=x, y=y, kind="reg");
    return m, b, r, p, se

Weirdly enough, the returns for neither gold nor silver are highly correlated with the broader market. Consequently, beta assessments cannot be completed through linear regression.


y=0.0169x+0.0001y=-0.0169x+-0.0001 (R2R^2: 0.0002)


y=0.0661x+0.0001y=-0.0661x+-0.0001 (R2R^2: 0.0114)

However, gold and silver are correlated with each other and suggest that silver is the more volatile of the two commodities since β=1.4127>1\beta = 1.4127 > 1. This silver volaility could be attributed to the lower relative price of silver against gold. Silver is easier to buy and to sell and therefore one could expect a larger volume of trades to be placed for the commodity. The resulting liquidity is a possitive for the investor but also encourages a greater degree of short-term trading and speculation, leading to price swings that don't accurate reflect changes in the underlying value of the asset.


y=1.4127x+0.0003y=1.4127x+0.0003 (R2R^2: 0.6098)

Just my guess, let's check it out by comparing the trade volume data. I extracted just the volume information and then dropped any missing records from both datasets for the commodity.

GOLD_VOL = GOLD[["Date", "Volume"]]
SILV_VOL = SILV[["Date", "Volume"]]
GOLD_VOL.replace(to_replace=0, value=np.nan, inplace=True)
GOLD_VOL.dropna(axis=0, inplace=True)
SILV_VOL.replace(to_replace=0, value=np.nan, inplace=True)
SILV_VOL.dropna(axis=0, inplace=True)

Now the datasets can be plotted. However, not a lot can be asertained from these plots in and of themselves.


The volume datasets were normalized and their fundamental statistical characteristics were compared.

Normalized['GOLD Volume'] = normalize(GOLD['Volume'])
Normalized['SILV Volume'] = normalize(SILV['Volume'])
pd.DataFrame(data={"Volume Mean": [int(GOLD_VOL['Volume'].mean()), int(SILV_VOL['Volume'].mean())],
    "Norm. Volume STD": [round(Normalized['GOLD Volume'].std(), 4)*100, round(Normalized['SILV Volume'].std(), 4)*100],
    "Norm. Volume Var": [round(Normalized['GOLD Volume'].var(), 4)*100, round(Normalized['SILV Volume'].var(), 4)*100],
    "Daily Return STD": [round(Portfolio['GOLD Daily Return'].std(), 4)*100, round(Portfolio['SILV Daily Return'].std(), 4)*100],
    "Daily Return Var": [round(Portfolio['GOLD Daily Return'].var(), 4)*100, round(Portfolio['SILV Daily Return'].var(), 4)*100]},
    index=["GOLD", "SILV"])
Volume Mean Norm. Volume STD Norm. Volume Var Daily Return STD Daily Return Var
GOLD 4628 6.33 0.40 1.12 0.01
SILV 1675 5.09 0.26 2.02 0.04

The quantity of low trade volume days in this data strongly arouses my suspicion. Nonetheless, it is clear that gold is the more popular of the two commodities from the perspective of their average daily trading volumes. When examining the standard deviations of their returns and volumes after normalizing the data for comparison, gold actually appears to have a higher volume volitility but greater price stability when compared to silver. Ergo, I reject my previous claim that liquidity could contribute to greater volatility in the underlying asset since, if this hypothesis were true, we would expect high volume volatility in silver when compared against gold.

Options chains provide another route for testing this hypothesis. The volume of options purchased close to the expiration date of the contract could be compared to see the relative speculative interest in each commodity.


After seeing the above correlations, I have a better understanding of these assets. A diversified portfolio of precious metals commodities ought to offer a reasonable rate of return while also hedging against market risk.

  1. Typically, the 10+ year US treasury bond or the prime rate set by the US Federal Reserve are used for the risk-free rate of return.