Data tells story: November 2018

Wednesday 28 November 2018

build a rest API service to provide market data for yourself

The framework I used is python flask, which is a micro service framework and very efficient in building rest API service. I considered doing this with aws API which can provide elastic scalability, however, your application will be tightly relying on aws in that way. I would keep my application independent and portable, so I only used the EC2 (Elastic Computer) service.

How to access the EC2 instance in AWS

After build an instance in AWS, the next thing is to access the instance to install software such as python, python modules and the rest API software.

I am not going to rebuild the wheel, instead use the existing official document in amazon, which is quite useful and informative:

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/putty.html

Prior to the process, we need to prepare the private key file and convert it to .ppk file following the instruction in the link above 'To convert your private key'.

With Putty, we can get access to the terminal to install files and run applications. The first thing we need to do is to install python3, we can follow this instruction to accomplish:

https://docs.aws.amazon.com/cli/latest/userguide/awscli-install-linux-python.html

My system in EC2 is ubuntu, so I used the following command:

`$ sudo apt-get install python3`

winSCP is a powerful tool to download and upload files with remote computers, simply following the instruction above, it will prompt you to import settings, keys in putty during the process. Then we can exchange files with aws EC2 instance:

create an instance in aws for free

It takes ages for normal computer to run through the script to download all listed historical price. Hence I applied Amazon Web Service. AWS has a one year free tier so we can apply for this option to low our cost. There is another advantage, you can get access to your personal data wherever you are. 

There are many such articles on the internet guiding how to apply and install a free EC2 instance in AWS, I won't do something redundant here. This is the one I found very informative:

https://www.brianlinkletter.com/create-a-free-virtual-private-server-on-amazon-web-services/

For me, the operation system I setup is ubuntu. You need to save the private key file to your computer for further usage to get access to the aws instance. 

Some tips:
Be aware of the geographic area you choose, I forgot to choose this and it defaulted to 'Ohio' in America. It takes a little bit of money and some trouble to migrate to another geographic section.  So you need to think of this in the first place. 

You'd better just establish one instance since the free tier one year allows you to run EC2 750 hours  per month, so one instance is just right.

There is no need to use S3 service for data storage, since EC2 is enough for that. I once created a S3 bucket and upload all those files, the request exceeded 2000 then I was charged a little bit.

If you apply a domain name somewhere and want to link to the IP address in aws, you are using the Route 53 service in aws and this will charge you 0.55 USD per month. Another point is you need a fixed IP address to bond with your domain name.  To do this, you need to 'Allocate new address':

Also check if the previous IP address has been released, otherwise, you will be charged by Amazon, to check this, right click the IP address and the 'Release address' should be grayed out.

get historical data with python

When we want share prices, the first thing we may ask ourselves: how many companies are listed in ASX?  Here is what we can see in ASX official website:

https://www.asx.com.au/asx/research/ASXListedCompanies.csv

( there is a constant in stock.cons.py to remember this:

ASXLIST_FILE = 'https://www.asx.com.au/asx/research/ASXListedCompanies.csv')

So we can download historical market data based on this list.  But be aware, although there are 2257 companies listed in this spread sheet,  there are lots 'dead' codes which may be delisted without updating the document or no trades at all, so actually there are about 1800 equity codes.  We will use some data clean technology to filter out these 'dead' instruments.

Some pre-conditions should be met before you run this python script, and I will explain some most parts of the code.  

ASXScrapShareDailyPrice.py is the script you can run. Before that, something need to be carried out:

go to https://www.python.org/downloads/ to download the latest python version, 3.7.1 should be fine;
use pip install to setup these modules: selenium, bs4, lxml, pandas.
download phantomjs (http://phantomjs.org/download.html), unzip and put it in the fold: /usr/local/bin/phantomjs or any other fold you can specify in the code:

browser = webdriver.PhantomJS(executable_path='/usr/local/bin/phantomjs')

To run the python script:


cd /aushare

python ASXScrapShareDailyPrice.py

These two lines are to get the right link to obtain daily historical data, there are '%s' for further customization inside the codes, DAILY_PRICE is the place  to get historical data if this has never been executed before,  while the purpose of  DAILY_PRICE_DELTA is to keep your historical data file up to date.

DAILY_PRICE_DELTA = "https://au.finance.yahoo.com/quote/%s.AX/history?period1=%s&period2=%s&interval=1d&filter=history&frequency=1d"DAILY_PRICE ='https://au.finance.yahoo.com/quote/%s.AX/history?period1=0&period2=%s&interval=1d&filter=history&frequency=1d'

These codes read all the codes listed in ASX:

df =pd.read_csv(ASXLIST_FILE_NAME,header=1)codelist = df['ASX code'].values

for symbol in codelist:

If you have already executed the script and there is historical price data file, the script will read the existing file and get the latest date, then it will snatch data from this point after so that no duplicate work is needed for the existing data. These are implemented here:

if os.path.isfile(file_name):       


 df = pd.read_csv(file_name,header=0, index_col =0,date_parser=_parser,skipfooter =1,engine='python')
        if (df.empty):
            continue
        df.reset_index(inplace =True)
        recent_date = df['Date'].max()
        print(recent_date)
        s1 = recent_date +timedelta(days=1)
        print(s1)

        period1= int(time.mktime(s1.timetuple()))

    url = DAILY_PRICE_DELTA%(symbol,period1,period2)
    no_of_pagedowns = 2
else:
    period2= int(time.mktime(s2.timetuple()))
    url = DAILY_PRICE%(symbol,period2)
    no_of_pagedowns = 50

After downloading the data, we need to filter out those duplicated ones, I noted there are quite lots of invalid duplicated prices in yahoo finance, but we should not complain about it since it is free.  This is to clean out duplicated data:

result.drop_duplicates(inplace=True)

The final format and location of the historical data file would be defined in stock.cons.py:

DAILY_PRICE_FILE = 'data/daily/%s_price.csv'

So if the symbol name is 'Z1P', the file will be located in data/daily and the file name will be Z1P_price.csv.

Some people ( like me ) prefer to analyze share market using fundamental financial report, to catch balance sheet,  annual revenue report and cash flow, you need to run these python scripts with jupyter notebook ( to install jupyter you can go here: http://jupyter.org/ ):

ASXDataScrapBalance.ipynb

ASXDataScrapRevenue.ipynb

ASXDateScrapCashflow.ipynb

The mechanism of these scripts are almost the same as ASXScrapShareDailyPrice, but much simpler. 

Please note: don't use threadpool in downloading data, this can overwhelm the server and you will be given an error code 429 or even IP ban. 

All codes are here in github for educational purpose:

https://github.com/chenlocus/aushare

Sunday 25 November 2018

ASX has free API for delayed data

I has ceased the market data service for now. However, ASX does has free API to obtain delayed realtime price.

e.g.

https://www.asx.com.au/asx/1/share/Z1P

result:

{"code":"Z1P","isin_code":"AU000000Z1P6","desc_full":"Ordinary Fully Paid","last_price":0.967,"open_price":0.96,"day_high_price":0.975,"day_low_price":0.96,"change_price":0.002,"change_in_percent":"0.207%","volume":168651,"bid_price":0.965,"offer_price":0.97,"previous_close_price":0.965,"previous_day_percentage_change":"0.521%","year_high_price":1.34,"last_trade_date":"2018-11-26T00:00:00+1100","year_high_date":"2018-02-01T00:00:00+1100","year_low_price":0.635,"year_low_date":"2017-12-21T00:00:00+1100","year_open_price":0.02438,"year_open_date":"2014-02-28T11:00:00+1100","year_change_price":0.94262,"year_change_in_percentage":"3,866.366%","pe":0,"eps":-0.0784,"average_daily_volume":758458,"annual_dividend_yield":0,"market_cap":290215070,"number_of_shares":300741005,"deprecated_market_cap":290215000,"deprecated_number_of_shares":300741005,"suspended":false}

For how to obtain historical data for private usage, please check the blog regularly for update.

This service is suspended until further notice

I am sorry this free service is suspended due to some non-technical issues. Please wait for further notice.

If you are interested I will describe how to achieve these low-cost data for your private research purpose. Please let me know.

Tuesday 13 November 2018

update on equity daily historical data

Now the equity daily historical price can trace back to as earlier as 1970 if it did exist. The daily historical data will be updated daily for all the 2000 ASX listed companies. Enjoy!

How to download daily historical data.

Wednesday 7 November 2018

how to get historic EOD price on a ASX share

New feature is added, now we can download all EOD price for a ASX share in .csv format.

HTTP Request
GET http://www.biglion.com.au//api/v1/asx/history/download/
Query parameters:
symbol: share code, such as ANZ.
e.g.
GET http://www.biglion.com.au/api/v1/asx/history/download/?symbol=ANZ
This will return a csv file downloaded to your local machine.