Python 从woocommerce在线商店中删除产品尺寸的问题_Python_Selenium_Web Scraping

Python 从woocommerce在线商店中删除产品尺寸的问题

python selenium web-scraping

Python 从woocommerce在线商店中删除产品尺寸的问题,python,selenium,web-scraping,Python,Selenium,Web Scraping,不知何故，我的网络刮板没有抓住产品的维度。 Html: 及产品没有尺寸时的输出为： dimensions: - 但当产品有尺寸时，输出仅为： dimensions: 如我所见，您正在使用Selenium。是否有任何理由不使用bs4（Beautiful Soup）或任何其他网页抓取模块如果您需要绕过某种JavaScript挑战或其他问题，我强烈建议您：使用Selenium 使用Beauty Soup模块提取您需要的信息据我所知，每当我需要为任何个人项目做一些网页抓取时，我通常会发现美丽

不知何故，我的网络刮板没有抓住产品的维度。 Html:

及

产品没有尺寸时的输出为：

dimensions: -

但当产品有尺寸时，输出仅为：

dimensions:

如我所见，您正在使用

Selenium

。是否有任何理由不使用bs4（Beautiful Soup）或任何其他网页抓取模块

如果您需要绕过某种JavaScript挑战或其他问题，我强烈建议您：

使用

Selenium

使用Beauty Soup模块提取您需要的信息

据我所知，每当我需要为任何个人项目做一些网页抓取时，我通常会发现美丽的汤更容易使用，并且有很好的记录（与Selenium一起使用）

这里有一个示例程序，可以满足您的要求

from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from bs4 import BeautifulSoup

options = Options()
# Use --headless in order to hide the browser window
options.add_argument("--headless")
driver = webdriver.Firefox(options=options)

# get the page and obtain it's source
driver.get("http://example.com/woocom")
source = driver.page_source

# Use BeautifulSoup to create and Object which contains
# every element in the webpage
page_object = BeautifulSoup(source , features="html.parser")

# If there is more one td with the "product_dimensions" class, we want to
# get everyone and then loop over them to get their text
dimensions = []
product_dimensions = page_object.findall("td", class_= "product_dimensions")
for element in product_dimensions:
    dimensions.append(element.get_text())

# If there is only one td with the "product_dimensions" class, then use "find" instead
# of "findall"
product_dimensions = page_object.find("td", class_= "product_dimensions").get_text()

如果不需要绕过任何JavaScript或类似代码，只需替换

驱动程序http://example.com/woocom“

和

请求。获取（”http://example.com/woocom“”

（请记住导入

请求

库并删除

source=driver.page\u source

，因为您不需要它作为

请求.get（）

自行返回页面源）

我希望这有帮助，但是，当问一些问题时，请尽量提供更多的信息，以便帮助其他人回答您

正如我所看到的，您使用的是

Selenium

。是否有任何理由不使用bs4（Beautiful Soup）或任何其他网页抓取模块

如果您需要绕过某种JavaScript挑战或其他问题，我强烈建议您：

使用

Selenium

使用Beauty Soup模块提取您需要的信息

据我所知，每当我需要为任何个人项目做一些网页抓取时，我通常会发现美丽的汤更容易使用，并且有很好的记录（与Selenium一起使用）

这里有一个示例程序，可以满足您的要求

from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from bs4 import BeautifulSoup

options = Options()
# Use --headless in order to hide the browser window
options.add_argument("--headless")
driver = webdriver.Firefox(options=options)

# get the page and obtain it's source
driver.get("http://example.com/woocom")
source = driver.page_source

# Use BeautifulSoup to create and Object which contains
# every element in the webpage
page_object = BeautifulSoup(source , features="html.parser")

# If there is more one td with the "product_dimensions" class, we want to
# get everyone and then loop over them to get their text
dimensions = []
product_dimensions = page_object.findall("td", class_= "product_dimensions")
for element in product_dimensions:
    dimensions.append(element.get_text())

# If there is only one td with the "product_dimensions" class, then use "find" instead
# of "findall"
product_dimensions = page_object.find("td", class_= "product_dimensions").get_text()

如果不需要绕过任何JavaScript或类似代码，只需替换

驱动程序http://example.com/woocom“

和

请求。获取（”http://example.com/woocom“”

（请记住导入

请求

库并删除

source=driver.page\u source

，因为您不需要它作为

请求.get（）

自行返回页面源）

我希望这有帮助，但是，当问一些问题时，请尽量提供更多的信息，以便帮助其他人回答你

你需要点击

附加信息

（zusätzliche信息 )选项卡以访问该元素的值

使用CSS选择器：

from selenium import webdriver

url = 'https://designerparadies.de/produkt/schultertasche-trunk-aus-leder/'
d = webdriver.Chrome()
d.get(url)
d.find_element_by_css_selector('[href*=additional_information]').click()
print(d.find_element_by_css_selector('.product_dimensions').text)
d.quit()

d.find_element_by_xpath("//*[contains(@class, 'additional_information_tab')]").click()

使用xpath:

from selenium import webdriver

url = 'https://designerparadies.de/produkt/schultertasche-trunk-aus-leder/'
d = webdriver.Chrome()
d.get(url)
d.find_element_by_css_selector('[href*=additional_information]').click()
print(d.find_element_by_css_selector('.product_dimensions').text)
d.quit()

d.find_element_by_xpath("//*[contains(@class, 'additional_information_tab')]").click()

附加信息选项卡：

您需要点击

附加信息（zusätzliche信息
)选项卡以访问该元素的值
使用CSS选择器：
from selenium import webdriver

url = 'https://designerparadies.de/produkt/schultertasche-trunk-aus-leder/'
d = webdriver.Chrome()
d.get(url)
d.find_element_by_css_selector('[href*=additional_information]').click()
print(d.find_element_by_css_selector('.product_dimensions').text)
d.quit()

d.find_element_by_xpath("//*[contains(@class, 'additional_information_tab')]").click()

使用xpath:
from selenium import webdriver

url = 'https://designerparadies.de/produkt/schultertasche-trunk-aus-leder/'
d = webdriver.Chrome()
d.get(url)
d.find_element_by_css_selector('[href*=additional_information]').click()
print(d.find_element_by_css_selector('.product_dimensions').text)
d.quit()

d.find_element_by_xpath("//*[contains(@class, 'additional_information_tab')]").click()


附加信息选项卡：
整个产品页面上只使用了一次“product\u dimensions”类，所以它是第一个。整个产品页面上只使用了一次“product\u dimensions”类，所以它是第一个。这很奇怪。您能否检查添加等待是否有效？还有，这发生在哪一行？增加了一行。也许我在实现你的代码时犯了一个错误。这就是它现在的样子。哦，它没有显示它发生的行，但它从未发生过，所以它必须在代码的“大小”部分。size=''try:driver.find_element_by_xpath（“//*[contains（@class，'additional_information_tab'）]”）。click（）时间。sleep（2）size=driver.find_element_by_xpath（//td[contains（@class，'product_dimensions'））”）。文本异常除外，如e:size='-'您不使用=进行单击，这是一个事件。如果您完全按照发生的情况使用我的代码？我没有在单击事件中使用=我在下一行中使用它来保存它。我刚刚测试了你的代码，它返回了维度。您能告诉我如何在我的代码中实现它吗？我接受了答案，并感谢您迄今为止的所有帮助：）搜索奇怪的“大小”。您能否检查添加等待是否有效？还有，这发生在哪一行？增加了一行。也许我在实现你的代码时犯了一个错误。这就是它现在的样子。哦，它没有显示它发生的行，但它从未发生过，所以它必须在代码的“大小”部分。size=''try:driver.find_element_by_xpath（“//*[contains（@class，'additional_information_tab'）]”）。click（）时间。sleep（2）size=driver.find_element_by_xpath（//td[contains（@class，'product_dimensions'））”）。文本异常除外，如e:size='-'您不使用=进行单击，这是一个事件。如果您完全按照发生的情况使用我的代码？我没有在单击事件中使用=我在下一行中使用它来保存它。我刚刚测试了你的代码，它返回了维度。您能告诉我如何在我的代码中实现它吗？我接受了答案，并感谢您迄今为止的所有帮助：）搜索“大小”