从python中的同一个div获取每个href_Python_Web Scraping_Beautifulsoup_Href

从python中的同一个div获取每个href

python web-scraping

从python中的同一个div获取每个href,python,web-scraping,beautifulsoup,href,Python,Web Scraping,Beautifulsoup,Href,我有这个汤：该网页在网格视图（16行x 5列）中有公司的引用，我想检索每个引用的url和标题。问题是每行中的所有5个引用都在一个名为row的类中，当我抓取页面时，我只能看到每行的第一个引用，而不是所有5个引用。以下是我目前的代码： url = 'http://www.slimstock.com/nl/referenties/' r = requests.get(url) soup = BeautifulSoup(r.content, "lxml") info_block = soup.

我有这个汤：

该网页在网格视图（16行x 5列）中有公司的引用，我想检索每个引用的url和标题。问题是每行中的所有5个引用都在一个名为

row

的类中，当我抓取页面时，我只能看到每行的第一个引用，而不是所有5个引用。以下是我目前的代码：

url = 'http://www.slimstock.com/nl/referenties/'

r = requests.get(url)

soup = BeautifulSoup(r.content, "lxml")

info_block = soup.find_all("div", attrs={"class": "row"})

references = pd.DataFrame(columns=['Company Name', 'Web Page'])

for entry in info_block:
    try:

        title = entry.find('img').get('title')
        url = entry.a['href']
        urlcontent = BeautifulSoup(requests.get(url).content, "lxml")

        row = [{'Company Name': title, 'Web Page': url}]
        references = references.append(row, ignore_index=True)  

    except:
        pass

有没有办法解决这个问题？

我认为你应该在“img”或“a”上迭代。你可以这样写：

for entry in info_block:
try:
    for a in entry.find_all("a"):
        title = a.find('img').get('title')
        url = a.get('href')
        urlcontent = BeautifulSoup(requests.get(url).content, "lxml")
        row = [{'Company Name': title, 'Web Page': url}]
        references = references.append(row, ignore_index=True)  
except:
    pass

我认为你应该重复“img”或“a”。你可以这样写：

for entry in info_block:
try:
    for a in entry.find_all("a"):
        title = a.find('img').get('title')
        url = a.get('href')
        urlcontent = BeautifulSoup(requests.get(url).content, "lxml")
        row = [{'Company Name': title, 'Web Page': url}]
        references = references.append(row, ignore_index=True)  
except:
    pass

谢谢，成功了！我能问你一些额外的问题吗，因为我对这个很陌生？在页面底部有一个

显示更多…

选项，如果它是一个按钮类，我会使用Selenium并说

driver.findelelement（By.cssSelector（“输入[值=\“显示更多…\”）））。单击（）。但事实并非如此。它仅在
元素中，即
元素中，即
元素中。如何将其刮到自动“单击”的显示更多…
？提前感谢@joasa AFAIK，您现在不能这样做，因为您正在处理静态html页面。您必须以其他方式“按下按钮”（例如，使用selenium）。啊，好的，我将尝试找到一种使用selenium的方法，尽管这些
和
实例让我很困惑。谢谢你的帮助，干杯谢谢，成功了！我能问你一些额外的问题吗，因为我对这个很陌生？在页面底部有一个显示更多…
选项，如果它是一个按钮类，我会使用Selenium并说driver.findelelement（By.cssSelector（“输入[值=\“显示更多…\”）））。单击（）。但事实并非如此。它仅在
元素中，即
元素中，即
元素中。如何将其刮到自动“单击”的显示更多…
？提前感谢@joasa AFAIK，您现在不能这样做，因为您正在处理静态html页面。您必须以其他方式“按下按钮”（例如，使用selenium）。啊，好的，我将尝试找到一种使用selenium的方法，尽管这些
和
实例让我很困惑。谢谢你的帮助，干杯