html - Not able to select links from a module on a website using BeautifulSoup -
i have build scraper extract links company's website (i have permission), when try add in url jobs posted, i'm able retrieve of links. seems job's stored in kind of module whereby can't access them using scraper.
html parbase section html name of module can't seem access
question
why scraper not able pull urls job posts link have provided below?
link jos postings here: https://www.pwc.dk/da/karriere/ledige-stillinger.html
code scraper
import requests bs4 import beautifulsoup url = "http://www.pwc.dk/da/karriere/ledige-stillinger.html" r = requests.get(url) soup = beautifulsoup(r.content) links = soup.find_all("a") link in links: print "<a href='%s'>%s</a>" %(link.get("href"), link.text)
as webpage javascript-heavy one, need use selenium gatecrash. install selenium , give try:
from selenium import webdriver bs4 import beautifulsoup driver = webdriver.chrome() driver.get("https://www.pwc.dk/da/karriere/ledige-stillinger.html") soup = beautifulsoup(driver.page_source, "lxml") driver.quit() item in soup.select(".vbtitle a"): print(item.get("href"))
Comments
Post a Comment