使用python Beautifulsoup使用同一类刮取具有多个值的数据_Python_Web Scraping_Beautifulsoup_Python Requests

使用python Beautifulsoup使用同一类刮取具有多个值的数据

python web-scraping

使用python Beautifulsoup使用同一类刮取具有多个值的数据,python,web-scraping,beautifulsoup,python-requests,Python,Web Scraping,Beautifulsoup,Python Requests,希望一切顺利在下面的程序中，如果我与for和findAll一起使用，我可以获得product_title和product_header的输出，否则我将从该网站获得第一个值对于product_tableheader和product_tablevalues，我没有找到标记。为了找到解决这个问题的办法，我做了很多努力。请帮帮我，伙计们 import requests from bs4 import BeautifulSoup class ProductTracker: def __ini

希望一切顺利

在下面的程序中，如果我与for和findAll一起使用，我可以获得product_title和product_header的输出，否则我将从该网站获得第一个值

对于product_tableheader和product_tablevalues，我没有找到标记。为了找到解决这个问题的办法，我做了很多努力。请帮帮我，伙计们

import requests
from bs4 import BeautifulSoup

class ProductTracker:
    def __init__(self, url):
        self.url = url
        self.user_agent = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36'}
        self.responce = requests.get(url=self.url, headers= self.user_agent).text
        self.soup = BeautifulSoup(self.responce, 'lxml')

    def product_title(self):
            title = self.soup.find('h1', {'class': 'view-header__primary-heading'})
            if title is not None:
                return title.text
            return "Tag Not Found"

    def product_header(self):
       # for tabletitle in self.soup.findAll('h3', attrs={'class': 'search-table-view__heading'}).text:
            tabletitle = self.soup.find('h3', {'class': 'search-table-view__heading'})
            if tabletitle is not None:
             return tabletitle.text
             return "Tag Not Found"
    def product_tableheader(self):
        tableheader = self.soup.find('span', {'class': 'search-table-view__cell-title'})
        if tableheader is not None:
            return tableheader.text
        return "Tag Not Found"

def product_tablevalues(self):
    tablevalues = self.soup.find_all('tr', class_=lambda value: value and value.startswith("search-table-view__web-parent-table-row"))
    values_lst = []
    for tablevalue in tablevalues:
        try:
            values_lst.append(tablevalue.td.text.strip())
        except:
                pass
        return values_lst
        return "Tag Not Found"

material = ProductTracker(url = "https://www.grainger.com/category/power-transmission/bearings/ball-bearings/radial-ball-bearings")
print(material.product_title())
print(material.product_header())
print(material.product_tableheader())
print(material.product_tablevalues())

使用

find_all

而不是

findAll

查找所有标题。下面是一个带有

产品\u标题的示例

：

    def product_header(self):
       # for tabletitle in self.soup.findAll('h3', attrs={'class': 'search-table-view__heading'}).text:
            tabletitles = self.soup.find_all('h3', {'class': 'search-table-view__heading'})
            table_titles_list = []
            for title in tabletitles:
                table_titles_list.append(title.text)
            return table_titles_list
            return "Tag Not Found"

标题的输出：

['NTN Single Row Radial Ball Bearings, Metric Series', 'BL Single Row Radial Ball Bearings, Metric Series', 'BL Single Row Radial Ball Bearings, Inch Series', 'DAYTON Single Row Radial Ball Bearings, Metric Series', 'SKF Single Row Radial Ball Bearings, Metric Series', 'DAYTON Single Row Radial Ball Bearings, Inch Series', 'DAYTON Single Row Flanged Radial Ball Bearings, Inch Series', 'NTN Single Row Radial Ball Bearings, Inch Series', 'TIMKEN Single Row Radial Ball Bearings, Metric Series', 'DAYTON Single Row Flanged Radial Ball Bearings, Metric Series', 'MRC Single Row Radial Ball Bearings, Inch Series', 'SKF Double Row Radial Ball Bearings, Metric Series', 'SKF Single Row Radial Ball Bearings, Inch Series', 'MRC Single Row Radial Ball Bearings, Metric Series', 'FAG BEARINGS Double Row Radial Ball Bearings, Metric Series', 'SNR Single Row Radial Ball Bearings, Metric Series', 'RBC Single Row Radial Ball Bearings, Inch Series', 'FAG BEARINGS Single Row Radial Ball Bearings, Metric Series', 'TORRINGTON BEARINGS Single Row Radial Ball Bearings, Metric Series']

页面中的表是动态加载的，因此您必须使用

selenium

从表中捕获详细信息。以下是您的操作方法：

将

\uuuu init\uuuu

函数更改为：

    def __init__(self, url):
        self.url = url
        self.user_agent = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36'}
        #self.responce = requests.get(url=self.url, headers= self.user_agent).text
        driver = webdriver.Chrome()
        driver.get(url)
        time.sleep(5)
        self.responce = driver.page_source
        driver.close()
        self.soup = BeautifulSoup(self.responce, 'lxml')

    def product_tableheader(self):
        tableheaders = self.soup.find_all('th', class_ =  lambda value: value and value.startswith("search-table-view__cell"))
        header_lst = []
        for tableheader in tableheaders:
            try:
                header_lst.append(tableheader.div.a.span.text.strip())
            except:
                try:
                    header_lst.append(tableheader.div.text.strip())
                except:
                    pass 
        return header_lst
        return "Tag Not Found"

将

product\u tableheader

功能更改为：

    def __init__(self, url):
        self.url = url
        self.user_agent = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36'}
        #self.responce = requests.get(url=self.url, headers= self.user_agent).text
        driver = webdriver.Chrome()
        driver.get(url)
        time.sleep(5)
        self.responce = driver.page_source
        driver.close()
        self.soup = BeautifulSoup(self.responce, 'lxml')

    def product_tableheader(self):
        tableheaders = self.soup.find_all('th', class_ =  lambda value: value and value.startswith("search-table-view__cell"))
        header_lst = []
        for tableheader in tableheaders:
            try:
                header_lst.append(tableheader.div.a.span.text.strip())
            except:
                try:
                    header_lst.append(tableheader.div.text.strip())
                except:
                    pass 
        return header_lst
        return "Tag Not Found"

输出：

['Bore Dia.', 'Outside Dia.', 'Width', 'Seal/Shield Type', 'Item #', 'Price', 'Bore Dia.', 'Outside Dia.', 'Width', 'Seal/Shield Type', 'Item #', 'Price', 'Bore Dia.', 'Outside Dia.', 'Width', 'Seal/Shield Type', 'Item #', 'Price', 'Bore Dia.', 'Outside Dia.', 'Width', 'Seal/Shield Type', 'Item #', 'Price']

您可以通过修改要搜索的标记和列表名，以类似的方式提取表值。快乐编码

仅供参考，报废！=刮削。评论不用于扩展讨论；对话结束了，你在吗