Python 刮：刮入CSV文件-获取无组织的CSV文件_Python_Web Scraping_Scrapy

Python 刮：刮入CSV文件-获取无组织的CSV文件

python web-scraping scrapy

Python 刮：刮入CSV文件-获取无组织的CSV文件,python,web-scraping,scrapy,Python,Web Scraping,Scrapy,我在spider中实现了以下代码，用于从电子商务网站上刮鞋 import scrapy class HugobossSpider(scrapy.Spider): name = 'hugoboss' allowed_domains = ['hugoboss.com/de/boss-herren-neuheiten-schuhe/'] start_urls = ['http://hugoboss.com/de/boss-herren-neuheiten-schuhe//']

我在spider中实现了以下代码，用于从电子商务网站上刮鞋

 import scrapy

 class HugobossSpider(scrapy.Spider):
 name = 'hugoboss'
 allowed_domains = ['hugoboss.com/de/boss-herren-neuheiten-schuhe/']
 start_urls = ['http://hugoboss.com/de/boss-herren-neuheiten-schuhe//']

     def parse(self, response):
     #Extracting the content using css selectors
     url = response.xpath('//div/@data-mouseoverimage').extract()  
     product_title = response.xpath('//*[@class="product-tile__productInfoWrapper product-tile__productInfoWrapper--is-small font__subline"]/text()').extract()
     price = response.css('.product-tile__offer .price-sales::t Zext').getall()  
     #Give the extracted content row wise
     for item in zip(url,product_title,price):
         #create a dictionary to store the scraped info
         scraped_info = {
             'url' : item[0],
             'product_title' : item[1],
             'price' : item[2]
         }

shell会像这样正常地返回输出

但是，输出的CSV文件看起来像这样杂乱无章

我不知道问题出在哪里

从外观上看，您的刮板已拾取了一堆换行符（

\n

）以及产品名称

它似乎也选择了von这个词，我想这是不必要的

我的建议是执行一些字符串操作来消除它们：

product\u title.replace（“\n”，”）.replace（“von”，”）

最好使用

.replace（x，y）

的原因是

.strip（）/.lstrip（）/.rstrip（）

将删除字符串中的匹配字符，并可能从产品名称中删除必要的字符

希望这有帮助

非常感谢，它起作用了。。输出文件现在已组织好