0%

【How 2】 Vol. 1. How 2 get all tradable tickers in US markets

Hi everyone, this is the first article for the 【How 2】 column.

Every time that I have questions that popped into my mind, I always go Google and try the luck. It would take quite some time to filter the outdated answers, situation not applied answers, …etc. so that I can start forming the answer that help solve my question.

So, here’s the corner for accumulating all these small notes that would help people who have the same questions of “How to …..”

Let’s get started.

Intro

While working on the trading automation bot, there’s one common factor that is essential to acquire for all the stocks that are considered as your potential targets. That is stock price.

There are many third party service and data warehouse that allows you to fetch the stock hloc (high, low, open, close) price through their APIs. The format of these APIs go by:
yfinance

1
2
3
4
5
6
import yfinance as yf

msft = yf.Ticker("MSFT")

# get historical market data
hist = msft.history(period="max")

tiingo

1
2
3
4
5
6
import requests
headers = {
'Content-Type': 'application/json'
}
requestResponse = requests.get("https://api.tiingo.com/tiingo/daily/MSFT/prices?startDate=2019-01-02&token=<your_token>", headers=headers)
print(requestResponse.json())

quandl

1
2
3
4
5
6
7
8
import quandl

stock_tickers = [
'MSFT',
]
mydata = quandl.get(stock_tickers, start_date = '2019-03-19', end_date='2019-03-21')

mydata.loc[:,(mydata.columns.str.contains('Close'))].T

If you have a pair of good eyes, you’ll notice what we’re trying to tackle here. All the APIs are called by given the ticker of the stock.

Ticker is a brief symbol or code to represent a specific stock/company. So before using your preferable APIs to get the price data, you need to know the ticker of the stock beforehand. As we’re working with code, we would like to get a group or a list of tickers to feed to the code that will automatically start processing the data for us.

So the question today would be:

“How to get a list of tickers listed in NYSE or Nasdaq?”

Solution

As stated in the article [How to get all common stock tickers], that actually Nasdaq is maintaining the list of the listed stocks and all the preliminary data in the text files. So the idea would be get the content from remote FTP, and parse those content into the format we need.

Example:
nasdaqlisted.txt

Symbol Security Name Market Category Test Issue Financial Status Round Lot Size ETF NextShares
AACG ATA Creativity Global - American Depositary Shares, each representing two common shares G N N 100 N N
AACQ Artius Acquisition Inc. - Class A Common Stock S N N 100 N N

cont…

otherlisted.txt

ACT Symbol Security Name Exchange CQS Symbol ETF Round Lot Size Test Issue NASDAQ Symbol
A Agilent Technologies, Inc. Common Stock N A N 100 N A
AA Alcoa Corporation Common Stock N AA N 100 N AA
AAA Listed Funds Trust AAF First Priority CLO Bond ETF P AAA Y 100 N AAA

cont…

Apparently, we need a list of tickers. That shouldn’t be hard. However, as this the primitive data maintained by Nasdaq, there are something we need to pay attention to before processing the data.

  1. We need to screen out the Test Issue stocks that is actually not a real company
  2. To save time later while we processing the stock fundamental data to find out the better quality stock, we can first remove those companies whose financial status are either bankrupt or deficient.
  3. We remove the stocks that are not listed in our target exchange
  4. We remove the stocks that have either . or $ in their symbol. These are not the stocks that listed in the market that we’re paying attention to.

Two methods: Bash v.s. python

In Bash

1
2
3
4
5
6
7
echo "[\"$(echo "$(
echo -en "$(
curl -s --compressed 'ftp://ftp.nasdaqtrader.com/SymbolDirectory/nasdaqlisted.txt' | tail -r | tail -n+2 | tail -r | tail -n+2 | perl -pe 's/ //g' | tr '|' ' ' | awk '{printf $1" "} {print $4}'
)\n$(
curl -s --compressed 'ftp://ftp.nasdaqtrader.com/SymbolDirectory/otherlisted.txt' | tail -r | tail -n+2 | tail -r | tail -n+2 | perl -pe 's/ //g' | tr '|' ' ' | awk '{printf $1" "} {print $7}'
)" | grep -v 'Y$' | awk '{print $1}' | grep -v '[^a-zA-Z]' | sort
)" | perl -pe 's/\n/","/g')\"]

In Python

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import pandas as pd
import subprocess
from io import StringIO

def __get_tradable_tickers() -> list:
# * Get all the text from Nasdaq
proc = subprocess.Popen([
"curl", "-s", "--compressed", "ftp://ftp.nasdaqtrader.com/SymbolDirectory/nasdaqlisted.txt"
],
stdout=subprocess.PIPE
)
nasdaq_text, err = proc.communicate()

# * Get all the text from Nasdaq
proc = subprocess.Popen([
"curl", "-s", "--compressed", "ftp://ftp.nasdaqtrader.com/SymbolDirectory/otherlisted.txt"
],
stdout=subprocess.PIPE
)
other_text, err = proc.communicate()

# * Convert them into DataFrame
nasdaqlisted = pd.read_csv(StringIO(nasdaq_text.decode('utf-8')), sep='|', header=0)
otherlisted = pd.read_csv(StringIO(other_text.decode('utf-8')), sep='|', header=0)

# * Remove the
# * 1. test issue stocks,
# * 2. not in GM market,
# * 3. financially broke company,
# * 4. ticker that has "." or "$"
# * from the DataFrame
nasdaqlisted = nasdaqlisted[\
(~nasdaqlisted['Test Issue'].str.contains('Y', na=False))&\
(~nasdaqlisted['Symbol'].str.contains('\.|\$', na=False, regex=True))&\
(nasdaqlisted['Market Category'].eq('G'))&\
(nasdaqlisted['Financial Status'].eq('N'))
]

# * Remove the
# * 1. test issue stocks,
# * 2. not in NYSE,
# * 3. ticker that has "." or "$"
# * from the DataFrame
otherlisted = otherlisted[\
(~otherlisted['Test Issue'].str.contains('Y', na=False))&\
(~otherlisted['ACT Symbol'].str.contains('\.|\$', na=False, regex=True))&\
(otherlisted['Exchange'].isin(['A', 'N', 'P']))
]

# * To remove the duplicated tickers by using set
tickers = list(set(nasdaqlisted['Symbol'].values.tolist() + otherlisted['ACT Symbol'].values.tolist()))

return tickers

Take away

You can take away as many as possible. The codes in the pictures are free to use.

Reference

Enjoy reading? Some donations would motivate me to produce more quality content