Sourcing Data From an API

There are several strategies to acquire data programmatically in the wild. One of the most common is through finding an API (Application Programming Interface) that allows access to an entities' database.

An example is the MTA's Turnstile website. The site allows you to access text files with csv data for subway turnstile traffic. The url's structure is:

      http://web.mta.info/developers/data/nyct/turnstile/turnstile_[YYMMDD].txt

To access multiple pages at once, we must create a function that interpolates the week starting date in the place of [YYMMDD]. i.e. Saturday, January 11th, 2020 = 200111.

import pandas as pd

def mta_data(week_numbers):
    url = "http://web.mta.info/developers/data/nyct/turnstile/turnstile_{}.txt"
    frames = list()
    for week_number in week_numbers:
        file_url = url.format(week_number)
        df = pd.read_csv(file_url)
        frames.append(df)
    return pd.concat(frames)

Let's access the last four weeks of data as example (as of January 13th, 2020).

week_numbers = [191221, 191228, 200104, 200111]
df = mta_data(week_numbers)
df.head()

	C/A	UNIT	SCP	STATION	LINENAME	DIVISION	DATE	TIME	DESC	ENTRIES	EXITS
0	A002	R051	02-00-00	59 ST	NQR456W	BMT	12/14/2019	03:00:00	REGULAR	7309003	2477349
1	A002	R051	02-00-00	59 ST	NQR456W	BMT	12/14/2019	07:00:00	REGULAR	7309008	2477362
2	A002	R051	02-00-00	59 ST	NQR456W	BMT	12/14/2019	11:00:00	REGULAR	7309080	2477433
3	A002	R051	02-00-00	59 ST	NQR456W	BMT	12/14/2019	15:00:00	REGULAR	7309289	2477498
4	A002	R051	02-00-00	59 ST	NQR456W	BMT	12/14/2019	19:00:00	REGULAR	7309595	2477541