Back in the good old days, pandas-datareader used to be the cool-kids way to pull finacial data. Alas, all good things must come to an end. Now, due to Yahoo Finance and Google not-cooperating, this API has now become defunct. However, if you're like me, you're still craving financial data.
Instead of relying on another person's API, I've launched a personal postgreSQL database on AWS RDS to store data for this site and my personal projects. Over the past three days, I've loaded in all historical S&P data and launched a backend server to update the data daily (blog post here if you're interested in reading more in-depth on that process).
Using PostgreSQL, the database driver chosen for my RDS, I ran some technical analysis on the S&P. For results of the below queries, click here to download an excel with visualizations.
First up, let's take a look at the structure of the data.
SELECT *
FROM visualizations.stock_prices
LIMIT 1;
ticker | date | open | high | low | close | adj_close | volume |
---|---|---|---|---|---|---|---|
^GSPC | 1927-12-30 | 17.66 | 17.66 | 17.66 | 17.66 | 17.66 | 0 |
To finish our warm-up, let's get the average stock price per year.
SELECT
EXTRACT('year' from s.date) as trading_year,
AVG(close) AS average_closing_price
FROM visualizations.stock_prices s
GROUP BY 1
ORDER BY 1;
trading_year | average_closing_price |
---|---|
1927.0 | 17.660000 |
1928.0 | 19.937200 |
1929.0 | 26.123936 |
1930.0 | 21.033626 |
1931.0 | 13.631071 |
1932.0 | 6.906720 |
1933.0 | 9.061736 |
1934.0 | 9.832008 |
1935.0 | 10.576640 |
1936.0 | 15.439881 |
Now, for our first window function, we'll use first_value to determine the date with the highest close price for any day of that year.
/* Max Cloing Price with Date Per Year */
WITH prices_with_max_close_date AS (
SELECT
EXTRACT('year' from date) AS trading_year,
date,
close,
first_value(date) over (PARTITION BY EXTRACT('year' from date) ORDER BY EXTRACT('year' from date), close DESC) AS max_close_date
FROM visualizations.stock_prices
)
SELECT
trading_year,
date,
close as max_closing_value
FROM prices_with_max_close_date
WHERE date = max_close_date
ORDER BY 1;
trading_year | date | max_closing_value |
---|---|---|
1927.0 | 1927-12-30 | 17.660000 |
1928.0 | 1928-12-31 | 24.350000 |
1929.0 | 1929-09-16 | 31.860001 |
1930.0 | 1930-04-10 | 25.920000 |
1931.0 | 1931-02-24 | 18.170000 |
1932.0 | 1932-09-07 | 9.310000 |
1933.0 | 1933-07-18 | 12.200000 |
1934.0 | 1934-02-06 | 11.820000 |
1935.0 | 1935-11-19 | 13.460000 |
1936.0 | 1936-11-09 | 17.690001 |
Another good pathway of analysis is to look at the deviation between days. We'll use lag to determine the days with the 25 largest changes in closing value from the prior day.
/* 25 Largest Changes in Closing Value */
WITH prices_with_daily_delta AS (
SELECT
date,
LAG(close) OVER (ORDER BY date) AS previous_day_stock_price,
close,
close / LAG(close) OVER (ORDER BY date) - 1 AS daily_price_delta
FROM visualizations.stock_prices
)
SELECT *
FROM prices_with_daily_delta
WHERE date >= '1957-03-04' /* Start of the S&P, there were too many dates in the 1930s without this filter! */
ORDER BY ABS(daily_price_delta) DESC
LIMIT 25;
date | previous_day_stock_price | close | daily_price_delta |
---|---|---|---|
1987-10-19 | 282.700012 | 224.839996 | -0.204669 |
2020-03-16 | 2711.020020 | 2386.129883 | -0.119841 |
2008-10-13 | 899.219971 | 1003.349976 | 0.115800 |
2008-10-28 | 848.919983 | 940.510010 | 0.107890 |
2020-03-12 | 2741.379883 | 2480.639893 | -0.095113 |
2020-03-24 | 2237.399902 | 2447.330078 | 0.093828 |
2020-03-13 | 2480.639893 | 2711.020020 | 0.092871 |
1987-10-21 | 236.830002 | 258.380005 | 0.090994 |
2008-10-15 | 998.010010 | 907.840027 | -0.090350 |
2008-12-01 | 896.239990 | 816.210022 | -0.089295 |
Now, as lag is a very important function for analysis, we'll use it again to determine the average daily deviation by decade.
/* Average Deviation by Decade */
WITH prices_with_daily_delta AS (
SELECT
date,
LAG(close) OVER (ORDER BY date) AS previous_day_stock_price,
close,
close / LAG(close) OVER (ORDER BY date) - 1 AS daily_price_delta
FROM visualizations.stock_prices
)
SELECT
EXTRACT('decade' FROM date) AS trading_year,
AVG(daily_price_delta) * 100 AS average_daily_deviation
FROM prices_with_daily_delta
GROUP BY 1
ORDER BY 1;
trading_year | average_daily_deviation |
---|---|
192.0 | 0.055115 |
193.0 | 0.002548 |
194.0 | 0.016748 |
195.0 | 0.053272 |
196.0 | 0.019334 |
197.0 | 0.009966 |
198.0 | 0.052885 |
199.0 | 0.060328 |
200.0 | -0.001166 |
201.0 | 0.046627 |
Let's take a quick break from window functions to look at the top intra-day trading range percentage.
/* Max Intra-Day Trading Range */
SELECT
date,
open,
low,
high,
(high - low) / open as intraday_trading_range_over_opening
FROM visualizations.stock_prices
ORDER BY 5 DESC
LIMIT 25;
date | open | low | high | intraday_trading_range_over_opening |
---|---|---|---|---|
1987-10-19 | 282.700012 | 224.830002 | 282.700012 | 0.204705 |
1987-10-20 | 225.059998 | 216.460007 | 245.619995 | 0.129565 |
2008-10-28 | 848.919983 | 845.270020 | 940.510010 | 0.112190 |
2008-11-13 | 853.130005 | 818.690002 | 913.010010 | 0.110558 |
2008-10-10 | 902.309998 | 839.799988 | 936.359985 | 0.107014 |
2008-10-13 | 912.750000 | 912.750000 | 1006.929993 | 0.103183 |
2008-10-09 | 988.419983 | 909.190002 | 1005.250000 | 0.097185 |
1987-10-21 | 236.830002 | 236.830002 | 259.269989 | 0.094751 |
1962-05-29 | 55.500000 | 53.130001 | 58.290001 | 0.092973 |
2008-10-15 | 994.599976 | 903.989990 | 994.599976 | 0.091102 |
For our grand finale, let's look at the 200-day moving average using the avg window function.
The 200-Day Moving average is a technical indicator used to gauge long-term price trends. If the price is consistently trading above the average, this can be seen as an upward trending market. Conversely, markets trading below the 200-day moving average are viewed to be in a downtrend. When the current price crosses below the 200-day moving average, this is seen as a nexus point indicating a change in price-trends.
/* 200-day close moving-average crossings */
WITH prices_with_ma AS (
SELECT
date,
close,
AVG(close) OVER (ORDER BY date ROWS BETWEEN 200 - 1 PRECEDING AND CURRENT ROW) AS close_200d_ma
FROM visualizations.stock_prices
WHERE date >= '1957-03-04' /* Start of the S&P, there were too many dates in the 1930s without this filter! */
),
prices_with_ma_lagged AS (
SELECT
*,
close - close_200d_ma AS close_minus_200d_ma,
LAG(close - close_200d_ma) OVER (ORDER BY date) AS close_minus_200d_ma_lagged
FROM prices_with_ma
)
SELECT *
FROM prices_with_ma_lagged
WHERE (close_minus_200d_ma > 0 and close_minus_200d_ma_lagged < 0)
OR (close_minus_200d_ma < 0 and close_minus_200d_ma_lagged > 0)
OR close_minus_200d_ma = 0;
date | close | close_200d_ma | close_minus_200d_ma | close_minus_200d_ma_lagged |
---|---|---|---|---|
1957-03-04 | 44.060001 | 44.060001 | 0.000000 | NaN |
1957-03-08 | 44.070000 | 44.158000 | -0.088000 | 0.029999 |
1957-03-14 | 44.070000 | 44.047778 | 0.022222 | -0.004999 |
1957-03-18 | 43.849998 | 44.030000 | -0.180002 | 0.001999 |
1957-03-19 | 44.040001 | 44.030833 | 0.009168 | -0.180002 |
1957-03-25 | 43.880001 | 44.032500 | -0.152499 | 0.017334 |
1957-03-27 | 44.090000 | 44.028889 | 0.061111 | -0.115294 |
1957-08-12 | 46.330002 | 46.476814 | -0.146812 | 0.441873 |
1958-04-18 | 42.709999 | 42.662900 | 0.047099 | -0.441650 |
1959-09-15 | 56.680000 | 56.943600 | -0.263600 | 0.067402 |
Window functions are an advanced but rewarding tool needed in any data scientist's arsenal.
Thanks for reading!