Python 由于<;br>;边刮边贴

Python 由于<;br>;边刮边贴,python,python-3.x,selenium-webdriver,web-scraping,web-scraping-language,Python,Python 3.x,Selenium Webdriver,Web Scraping,Web Scraping Language,大家好,我正在尝试刮一个页面和其中的数据,但无法在一行中获得预期的数据,因为文本中存在br标记,因此数据将移动到新行。任何帮助都将不胜感激 以下是我的代码: from selenium import webdriver from selenium.common.exceptions import NoSuchElementException import time import random import re driver = webdriver.Chrome(r"C:\Users

大家好,我正在尝试刮一个页面和其中的数据,但无法在一行中获得预期的数据,因为文本中存在br标记,因此数据将移动到新行。任何帮助都将不胜感激

以下是我的代码:

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
import time
import random
import re

driver = webdriver.Chrome(r"C:\Users\Acer\AppData\Local\Programs\Python\Python36-32\chromedriver.exe")
with open(r"E:\office\Zoro\Zoro11.txt" ,'r')as f:
    content = f.readlines()
    content = [x.strip() for x in content]
    currentIndex = 0
    with open(r"E:\office\Zoro\Zoro16.csv", 'a')as f1:
        f1.write("Product Url" + "," + "Main category"+"," + "Sub category"+"," + "Leaf category"+","+"Title" + "," + "Prodcutid"+","+"Manufacturer Name" + "," + "Manufacturer's number"+","+"reviews"+","+"Price"+","+"smalldesc"+","+"Product Dimensions"+","+"Main Image"+","+"Sub Image")
        f1.write("\n")
        for link in content[currentIndex:]:
            driver.get(link)
            time.sleep(5)
            aj = driver.find_element_by_class_name('zcl-breadcrumb__list').text.replace("Home\n","").replace("\n",">").replace(",","")
            try:
                title = driver.find_element_by_xpath("//*[@data-za='product-name']").text.replace(",","").replace("\n","")
            except:
                title = "No title"
            try:
                brand = driver.find_element_by_xpath("//*[@data-za='product-brand-name']").text.replace(",","")
            except:
                brand = "No Brand"
            try:
                productid = driver.find_element_by_xpath("//*[@data-za='PDPZoroNo']").text.replace(",", "")
            except:
                productid = "No productid"

            maincategory = aj[:aj.find(">")]
            Leafcategory = aj[aj.rindex('>')+1:]
            Subcategory = aj[(aj.find(">"))+1:aj.rindex('>')]
            mfrnu = driver.find_element_by_xpath("//*[@data-za='PDPMfrNo']").text.replace(",", "")
            try:
                mainimage = driver.find_element_by_xpath('//*[@id="app"]/div[3]/div[2]/div/div/main/div[1]/div[3]/div/div[1]/div/div/div/div/div/img').get_attribute('src')
            except:
                mainimage = "No mainimage"
            try:
                price = driver.find_element_by_class_name('product-price__price').text
            except:
                price = "No price"
            try:
                smalldesc = driver.find_element_by_class_name('product-attributes').text.replace("\n",";").replace(",","")
            except:
                smalldesc = "No desc"
            try:
                specification = driver.find_element_by_css_selector('.product-specifications__table.table.table-striped').text.replace("\n",";").replace(",","").strip()
                print(specification)

            except:
                specification = "No sepcs"
            try:
                productdesc = driver.find_element_by_css_selector('.product-description__text').text.replace(",","")
                
            except:
                productdesc = "No productfields"
            f1.write(link + "," + maincategory + "," + Subcategory + ","+ Leafcategory + ","+title + "," + productid + "," + brand + "," + mfrnu + "," + price +"," + smalldesc +"," + specification +"," + productdesc +"," + mainimage +"\n")
在上面productdesc中的代码中,我得到以下输出

Pro Series Swivel Head LED Work Light
  280 lumens
Beam Distance 54 meters
Run time 3 hrs
预期的输出都在一行中,带有空格
我用来抓取的链接是

,看起来你只需要更新这一行来阅读说明。像在其他字段中一样添加换行符替换:

更改此项:

productdesc = driver.find_element_by_css_selector('.product-description__text').text.replace(",","")
为此:

productdesc = driver.find_element_by_css_selector('.product-description__text').text.replace("\n",";").replace(",","")

请提供Zoro11.txt中的样本数据,以便我进行测试。谢谢。@Mike67来自zoro11.txt的样本数据是@Mike67来自zoro11.txt的样本数据是