python - Html-table scraping and exporting to csv: attribute error -


i'm trying scrape html table beautifulsoup on python 3.6 in order export csv, in scripts below. used former example, trying fit case.

url = 'http://finanzalocale.interno.it/apps/floc.php/certificati/index/codice_ente/2050540010/cod/4/anno/2015/md/0/cod_modello/ccou/tipo_modello/u/cod_quadro/03'  html =urlopen(url).read  soup = beautifulsoup(html(), "lxml")  table = soup.select_one("table.tabfin")  headers = [th.text("iso-8859-1") th in table.select("tr th")] 

but receive attributeerror.

attributeerror: 'nonetype' object has no attribute 'select'

then try export csv with

with open("abano_spese.csv", "w") f:     wr = csv.writer(f)     wr.writerow(headers)     wr.writerows([[td.text.encode("iso-8859-1") td in row.find_all("td")] row in table.select("tr + tr")]) 

what's wrong this? i'm sorry if there's stupid error, i'm absolute beginner python.

thank all

there problem scraping of web site of ministero dell'interno. let's try code:

url = 'http://finanzalocale.interno.it/apps/floc.php/certificati/index/codice_ente/2050540010/cod/4/anno/2015/md/0/cod_modello/ccou/tipo_modello/u/cod_quadro/03'  html = urlopen(url).read() soup = beautifulsoup(html) print soup.prettify() 

you get:

la sua richiesta è stata bloccata dai sistemi posti protezione del sito web.
si prega di assicurarsi dell'integrità della postazione utilizzata e riprovare.

scraping seems not welcome or think there nasty in request, , that's reason why table = none in code , attributeerror

possible solution:

** before starting else, please check if ministero dell'interno's data policy allows script consume data, otherwise not way need.**

step 2: can try pass custom headers request act browser. e.g.,

headers = {"user-agent": "mozilla/5.0 (windows; u; windows nt 5.1; en-us; rv:1.9.2.8) gecko/20100722 firefox/3.6.8 gtb7.1 (.net clr 3.5.30729)"} r  = requests.get(url, headers = headers) soup = beautifulsoup(r.text, 'lxml') 

now have soup. note have 3 different <table class="tabfin"> in page. guess need second one:

table = soup.select("table.tabfin")[1] 

in way, works. excuse me if sound bit pedantic i'm afraid such approach should not compliant data license. please, check before scraping.


Comments

Popular posts from this blog

Sort a complex associative array in PHP -

vb.net - How to ignore if a cell is empty nothing -

recursion - Can every recursive algorithm be improved with dynamic programming? -