Python 抓取并导航到链接以获取更多信息_Python_Csv_Beautifulsoup

Python 抓取并导航到链接以获取更多信息

python csv

Python 抓取并导航到链接以获取更多信息,python,csv,beautifulsoup,Python,Csv,Beautifulsoup,我不确定我想做的是不是可能…但事情是这样的。我正在尝试导航并从该表中获取信息（简化）我需要做的是像这样导航到表中的每个链接 <a> href="/show_customer/11111">Erin</a> 并获取此html表单中的客户电子邮件地址 <div class="field"> <div class = "label">Email</div> <p>XXXX@email.com</p&

我不确定我想做的是不是可能…但事情是这样的。我正在尝试导航并从该表中获取信息（简化）

我需要做的是像这样导航到表中的每个链接

<a> href="/show_customer/11111">Erin</a>

并获取此html表单中的客户电子邮件地址

<div class="field">
   <div class = "label">Email</div>
   <p>XXXX@email.com</p>
   </div>


电子邮件
XXXX@email.com

并将其添加到我的csv中的相关行中

任何帮助都将不胜感激

您必须为

td

中的每个

href

发出http请求。以下是修改现有代码的方法：

from urllib2 import urlopen

for row in table_1.find_all('tr'):
    tds  = row.find_all('td')
    # Get all the hrefs to make http request
    links = row.find_all('a').get('href')
    try:
        data = [td.get_text() for td in tds]
        for field,value in zip(columns, data):
            print("{}: {}".format(field, value))
        # For every href make a request, get the page,
        # create a BS object
        for link in links:
            link_soup = BeautifulSoup(urlopen(link))

            # Use link_soup BS instance to get the email 
            # by navigating the div and p and add it to your data

        table.append(data)
    except:
        print("Bad string value")

请注意，您的

href

是相对于网站的url的。因此，在您提取

href

后，您必须在其前面加上网站的url，以形成有效的url

谢谢您的帮助！我得到的错误是“AttributeError:“ResultSet”对象在链接上没有属性“get”…也许我还没有这方面的基本知识…那里有指针吗？它仍然传递相同的错误。不过，谢谢你的努力。我真的很感激！

<div class="field">
   <div class = "label">Email</div>
   <p>XXXX@email.com</p>
   </div>

from urllib2 import urlopen

for row in table_1.find_all('tr'):
    tds  = row.find_all('td')
    # Get all the hrefs to make http request
    links = row.find_all('a').get('href')
    try:
        data = [td.get_text() for td in tds]
        for field,value in zip(columns, data):
            print("{}: {}".format(field, value))
        # For every href make a request, get the page,
        # create a BS object
        for link in links:
            link_soup = BeautifulSoup(urlopen(link))

            # Use link_soup BS instance to get the email 
            # by navigating the div and p and add it to your data

        table.append(data)
    except:
        print("Bad string value")