urllib2 - Unable to download file,Python xlsx File download zero bytes -
after running code,downloaded file 0bytes. tried writing response too,also tried using buffer
what doing wrong,what else can try? please help
import urllib2 bs4 import beautifulsoup import os import pandas pd storepath='/home/vinaysawant/bankifsccodes/' def downloadfiles(): # remove trailing / had, gives 404 page url='https://rbi.org.in/scripts/bs_viewcontent.aspx?id=2009' conn = urllib2.urlopen(url) html = conn.read().decode('utf-8') soup = beautifulsoup(html, "html.parser") # select elements href attributes containing urls starting http:// link in soup.select('a[href^="http://"]'): href = link.get('href') # make sure has 1 of correct extensions if not any(href.endswith(x) x in ['.csv','.xls','.xlsx']): continue filename = href.rsplit('/', 1)[-1] print href print("downloading %s %s..." % (href, filename) ) #urlretrieve(href, filename) u = urllib2.urlopen(href) f = open(storepath+filename, 'wb') meta = u.info() file_size = int(meta.getheaders("content-length")[0]) print "downloading: %s bytes: %s" % (filename, file_size) print("done.") file_size_dl = 0 block_sz = 8192 while true: buffer = u.read(block_sz) if not buffer: break file_size_dl += len(buffer) f.write(buffer) status = r"%10d [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size) status = status + chr(8) * (len(status) + 1) print status, f.close() exit(1) downloadfiles() i tried
import urllib urllib.retreive(url) i tried using urllib2 urllib3 well.
i not pandas , urllib2 since there no answer question. think problem trying download first url
url='https://rbi.org.in/scripts/bs_viewcontent.aspx?id=2009 you define here , doesnt change
u = urllib2.urlopen(url) after try download thing associated url
buffer = u.read(block_sz) instead of them guess should try download href try change this
u = urllib2.urlopen(url) with that
u = urllib2.urlopen(href)
Comments
Post a Comment