html - Not able to select links from a module on a website using BeautifulSoup -

March 15, 2014

i have build scraper extract links company's website (i have permission), when try add in url jobs posted, i'm able retrieve of links. seems job's stored in kind of module whereby can't access them using scraper.

html parbase section html name of module can't seem access

question

why scraper not able pull urls job posts link have provided below?

link jos postings here: https://www.pwc.dk/da/karriere/ledige-stillinger.html

code scraper

import requests bs4 import beautifulsoup   url = "http://www.pwc.dk/da/karriere/ledige-stillinger.html" r = requests.get(url)  soup = beautifulsoup(r.content)  links = soup.find_all("a")  link in links:             print "<a href='%s'>%s</a>" %(link.get("href"), link.text)

as webpage javascript-heavy one, need use selenium gatecrash. install selenium , give try:

from selenium import webdriver bs4 import beautifulsoup  driver = webdriver.chrome() driver.get("https://www.pwc.dk/da/karriere/ledige-stillinger.html") soup = beautifulsoup(driver.page_source, "lxml") driver.quit() item in soup.select(".vbtitle a"):     print(item.get("href"))

Search This Blog

Enable

html - Not able to select links from a module on a website using BeautifulSoup -

Comments

Post a Comment

Popular posts from this blog

resizing Telegram inline keyboard -

javascript - How to bind ViewModel Store to View? -

recursion - Can every recursive algorithm be improved with dynamic programming? -