S&P Technical PostgreSQL Analysis using Window Functions

Posted on March 27, 2020

Back in the good old days, pandas-datareader used to be the cool-kids way to pull finacial data. Alas, all good things must come to an end. Now, due to Yahoo Finance and Google not-cooperating, this API has now become defunct. However, if you're like me, you're still craving financial data.

Instead of relying on another person's API, I've launched a personal postgreSQL database on AWS RDS to store data for this site and my personal projects. Over the past three days, I've loaded in all historical S&P data and launched a backend server to update the data daily (blog post here if you're interested in reading more in-depth on that process).

Using PostgreSQL, the database driver chosen for my RDS, I ran some technical analysis on the S&P. For results of the below queries, click here to download an excel with visualizations.

First up, let's take a look at the structure of the data.

SELECT *

FROM visualizations.stock_prices

LIMIT 1;
ticker date open high low close adj_close volume
^GSPC 1927-12-30 17.66 17.66 17.66 17.66 17.66 0

To finish our warm-up, let's get the average stock price per year.

SELECT
  EXTRACT('year' from s.date) as trading_year,
  AVG(close) AS average_closing_price


FROM visualizations.stock_prices s

GROUP BY 1

ORDER BY 1;
trading_year average_closing_price
1927.0 17.660000
1928.0 19.937200
1929.0 26.123936
1930.0 21.033626
1931.0 13.631071
1932.0 6.906720
1933.0 9.061736
1934.0 9.832008
1935.0 10.576640
1936.0 15.439881

First Value

Now, for our first window function, we'll use first_value to determine the date with the highest close price for any day of that year.

/* Max Cloing Price with Date Per Year */
WITH prices_with_max_close_date AS (
  SELECT
    EXTRACT('year' from date) AS trading_year,
    date,
    close,
    first_value(date) over (PARTITION BY EXTRACT('year' from date) ORDER BY EXTRACT('year' from date), close DESC) AS max_close_date
  FROM visualizations.stock_prices
)
SELECT
  trading_year,
  date,
  close as max_closing_value

FROM prices_with_max_close_date

WHERE date = max_close_date

ORDER BY 1;
trading_year date max_closing_value
1927.0 1927-12-30 17.660000
1928.0 1928-12-31 24.350000
1929.0 1929-09-16 31.860001
1930.0 1930-04-10 25.920000
1931.0 1931-02-24 18.170000
1932.0 1932-09-07 9.310000
1933.0 1933-07-18 12.200000
1934.0 1934-02-06 11.820000
1935.0 1935-11-19 13.460000
1936.0 1936-11-09 17.690001

Lag

Another good pathway of analysis is to look at the deviation between days. We'll use lag to determine the days with the 25 largest changes in closing value from the prior day.

/* 25 Largest Changes in Closing Value */
WITH prices_with_daily_delta AS (
  SELECT
    date,
    LAG(close) OVER (ORDER BY date) AS previous_day_stock_price,
    close,
    close / LAG(close) OVER (ORDER BY date) - 1 AS daily_price_delta

  FROM visualizations.stock_prices
)
SELECT *

FROM prices_with_daily_delta

WHERE date >= '1957-03-04' /* Start of the S&P, there were too many dates in the 1930s without this filter! */

ORDER BY ABS(daily_price_delta) DESC

LIMIT 25;
date previous_day_stock_price close daily_price_delta
1987-10-19 282.700012 224.839996 -0.204669
2020-03-16 2711.020020 2386.129883 -0.119841
2008-10-13 899.219971 1003.349976 0.115800
2008-10-28 848.919983 940.510010 0.107890
2020-03-12 2741.379883 2480.639893 -0.095113
2020-03-24 2237.399902 2447.330078 0.093828
2020-03-13 2480.639893 2711.020020 0.092871
1987-10-21 236.830002 258.380005 0.090994
2008-10-15 998.010010 907.840027 -0.090350
2008-12-01 896.239990 816.210022 -0.089295

Now, as lag is a very important function for analysis, we'll use it again to determine the average daily deviation by decade.

/* Average Deviation by Decade */
WITH prices_with_daily_delta AS (
  SELECT
    date,
    LAG(close) OVER (ORDER BY date) AS previous_day_stock_price,
    close,
    close / LAG(close) OVER (ORDER BY date) - 1 AS daily_price_delta

  FROM visualizations.stock_prices
)
SELECT
  EXTRACT('decade' FROM date) AS trading_year,
  AVG(daily_price_delta) * 100 AS average_daily_deviation

FROM prices_with_daily_delta

GROUP BY 1

ORDER BY 1;
trading_year average_daily_deviation
192.0 0.055115
193.0 0.002548
194.0 0.016748
195.0 0.053272
196.0 0.019334
197.0 0.009966
198.0 0.052885
199.0 0.060328
200.0 -0.001166
201.0 0.046627

Break Time

Let's take a quick break from window functions to look at the top intra-day trading range percentage.

/* Max Intra-Day Trading Range */
SELECT
  date,
  open,
  low,
  high,
  (high - low) / open  as intraday_trading_range_over_opening

FROM visualizations.stock_prices

ORDER BY 5 DESC

LIMIT 25;
date open low high intraday_trading_range_over_opening
1987-10-19 282.700012 224.830002 282.700012 0.204705
1987-10-20 225.059998 216.460007 245.619995 0.129565
2008-10-28 848.919983 845.270020 940.510010 0.112190
2008-11-13 853.130005 818.690002 913.010010 0.110558
2008-10-10 902.309998 839.799988 936.359985 0.107014
2008-10-13 912.750000 912.750000 1006.929993 0.103183
2008-10-09 988.419983 909.190002 1005.250000 0.097185
1987-10-21 236.830002 236.830002 259.269989 0.094751
1962-05-29 55.500000 53.130001 58.290001 0.092973
2008-10-15 994.599976 903.989990 994.599976 0.091102

Avg

For our grand finale, let's look at the 200-day moving average using the avg window function.

The 200-Day Moving average is a technical indicator used to gauge long-term price trends. If the price is consistently trading above the average, this can be seen as an upward trending market. Conversely, markets trading below the 200-day moving average are viewed to be in a downtrend. When the current price crosses below the 200-day moving average, this is seen as a nexus point indicating a change in price-trends.

/* 200-day close moving-average crossings */
WITH prices_with_ma AS (
  SELECT
    date,
    close,
    AVG(close) OVER (ORDER BY date ROWS BETWEEN 200 - 1 PRECEDING AND CURRENT ROW) AS close_200d_ma

  FROM visualizations.stock_prices

  WHERE date >= '1957-03-04' /* Start of the S&P, there were too many dates in the 1930s without this filter! */
),
prices_with_ma_lagged AS (
SELECT
  *,
  close - close_200d_ma AS close_minus_200d_ma,
  LAG(close - close_200d_ma) OVER (ORDER BY date) AS close_minus_200d_ma_lagged


FROM prices_with_ma
)
SELECT *

FROM prices_with_ma_lagged

WHERE (close_minus_200d_ma > 0 and close_minus_200d_ma_lagged < 0)
   OR (close_minus_200d_ma < 0 and close_minus_200d_ma_lagged > 0)
   OR close_minus_200d_ma = 0;
date close close_200d_ma close_minus_200d_ma close_minus_200d_ma_lagged
1957-03-04 44.060001 44.060001 0.000000 NaN
1957-03-08 44.070000 44.158000 -0.088000 0.029999
1957-03-14 44.070000 44.047778 0.022222 -0.004999
1957-03-18 43.849998 44.030000 -0.180002 0.001999
1957-03-19 44.040001 44.030833 0.009168 -0.180002
1957-03-25 43.880001 44.032500 -0.152499 0.017334
1957-03-27 44.090000 44.028889 0.061111 -0.115294
1957-08-12 46.330002 46.476814 -0.146812 0.441873
1958-04-18 42.709999 42.662900 0.047099 -0.441650
1959-09-15 56.680000 56.943600 -0.263600 0.067402

Window functions are an advanced but rewarding tool needed in any data scientist's arsenal.

Thanks for reading!