Python - Web Scraping concurrent to improve my code? -

January 15, 2013

so i'm pulling statistics of nfl players. table shows max 50 rows, have filter down make sure don't miss stats, means i'm iterating through pages collect data season, position, team, week.

i figured out how url changes cycle through these, iteration process takes long, , thinking: we're able open multiple webpages @ 1 time, couldn't able run these processes parallel, each process simultaneously collects data each page, stores in temp_df, merge them @ end...instead of collecting 1 url, 1 url, merge, next url, merge, next,......at time. meaning iterates through 6,144 times (if i'm not iterating through positions), positions, on 36,000 iteration through.

but i'm stuck on how implement it, or if it's possible.

here's code i'm using. eliminated cycle through position give idea of how working, quarterbacks, p = 2.

so starts @ season 2005 = 1, team 1 = 1, week 1 =0, iterates last season 2016 = 12, team 32 = 33, , week 16 = 17:

import requests import pandas pd  seasons = list(range(1,13)) teams = list(range(1,33)) weeks = list(range(0,17))   qb_df = pd.dataframe()  p = 2 s in seasons:     t in teams:         w in weeks:         url = 'https://fantasydata.com/nfl-stats/nfl-fantasy-football-stats.aspx?fs=2&stype=0&sn=%s&scope=1&w=%s&ew=%s&s=&t=%s&p=%s&st=fantasypointsfanduel&d=1&ls=fantasypointsfanduel&live=false&pid=true&minsnaps=4' % (s,w,w,t,p)         html = requests.get(url).content         df_list = pd.read_html(html)         temp_df = df_list[-1]         temp_df['nfl season'] = str(2017-s)         qb_df = qb_df.append(temp_df, ignore_index = true)   file = 'player_data_fanduel_2005_to_2016_qb.xls' qb_df.to_excel(file)                print('\ndata has been saved.')

1/ create dict of season, team, weeks , urls.

2/ use multiprocessing pool call urls , data.

or use dedicated scraping tool scrapy.

Search This Blog

Enable

Python - Web Scraping concurrent to improve my code? -

Comments

Post a Comment

Popular posts from this blog

Sort a complex associative array in PHP -

vb.net - How to ignore if a cell is empty nothing -

How to restore default keyboard shortcuts on Ubuntu-17.04? -