python - Html-table scraping and exporting to csv: attribute error -
i'm trying scrape html table beautifulsoup on python 3.6 in order export csv, in scripts below. used former example, trying fit case.
url = 'http://finanzalocale.interno.it/apps/floc.php/certificati/index/codice_ente/2050540010/cod/4/anno/2015/md/0/cod_modello/ccou/tipo_modello/u/cod_quadro/03' html =urlopen(url).read soup = beautifulsoup(html(), "lxml") table = soup.select_one("table.tabfin") headers = [th.text("iso-8859-1") th in table.select("tr th")] but receive attributeerror.
attributeerror: 'nonetype' object has no attribute 'select'
then try export csv with
with open("abano_spese.csv", "w") f: wr = csv.writer(f) wr.writerow(headers) wr.writerows([[td.text.encode("iso-8859-1") td in row.find_all("td")] row in table.select("tr + tr")]) what's wrong this? i'm sorry if there's stupid error, i'm absolute beginner python.
thank all
there problem scraping of web site of ministero dell'interno. let's try code:
url = 'http://finanzalocale.interno.it/apps/floc.php/certificati/index/codice_ente/2050540010/cod/4/anno/2015/md/0/cod_modello/ccou/tipo_modello/u/cod_quadro/03' html = urlopen(url).read() soup = beautifulsoup(html) print soup.prettify() you get:
la sua richiesta è stata bloccata dai sistemi posti protezione del sito web.
si prega di assicurarsi dell'integrità della postazione utilizzata e riprovare.
scraping seems not welcome or think there nasty in request, , that's reason why table = none in code , attributeerror
possible solution:
** before starting else, please check if ministero dell'interno's data policy allows script consume data, otherwise not way need.**
step 2: can try pass custom headers request act browser. e.g.,
headers = {"user-agent": "mozilla/5.0 (windows; u; windows nt 5.1; en-us; rv:1.9.2.8) gecko/20100722 firefox/3.6.8 gtb7.1 (.net clr 3.5.30729)"} r = requests.get(url, headers = headers) soup = beautifulsoup(r.text, 'lxml') now have soup. note have 3 different <table class="tabfin"> in page. guess need second one:
table = soup.select("table.tabfin")[1] in way, works. excuse me if sound bit pedantic i'm afraid such approach should not compliant data license. please, check before scraping.
Comments
Post a Comment