Python 刮擦使文本超出范围_Python_Html_Css_Scrapy

Python 刮擦使文本超出范围

python html css scrapy

Python 刮擦使文本超出范围,python,html,css,scrapy,Python,Html,Css,Scrapy,网址：我正在尝试在URL中删除以下内容：我试过： for i in response.css('span[class = dark_text]') : i.xpath('/following-sibling::text()') 或者是现在的XPath谁不工作或者我错过了什么 aired_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[11]/text()') producer_xpa

网址：

我正在尝试在URL中删除以下内容：

我试过：

for i in response.css('span[class = dark_text]') :
    i.xpath('/following-sibling::text()')

或者是现在的XPath谁不工作或者我错过了什么

aired_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[11]/text()')

producer_xpath = response.xpath("//*[@id='content']/table/tbody/tr/td[1]/div/div[12]/span/a/@href/text()")
licensor_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[13]/a/text()')
studio_xpath response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[14]/a/@href/title/text()')
studio_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[17]/text()')
str_rating_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[18]/text()')
ranked_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[20]/span/text()')
japanese_title_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[7]/text()')
source_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[15]/text()')
genre_xpath = [response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[16]/a[{0}]'.format(i)) for i in range(1,4)]
genre_xpath_v2 = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[16]/a/@href/text()')
number_of_users_rated_anime_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[19]/span[3]/text()')
popularity_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[21]/span/text()')
members_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[22]/span/text()')
favorite_xpath =  response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[23]/span/text()')

但是我发现有些文本不属于span类，所以我想用css/XPath公式从span类中获取这些文本。

如果您只是想删除图像中提到的信息，您可以直接使用这些信息

response.xpath('//div[@class="space-it"]//text()').extract()

或者我无法正确理解您的问题。

只需在表中循环使用div就更简单了

foundH2 = False
response =  Selector(text=htmlString).xpath('//*[@id="content"]/table/tr/td[1]/div/*')

for resp in response:
  tagName = resp.xpath('name()').extract_first()
  if 'h2' == tagName:
    foundH2 = True
  if foundH2:
    # start adding 'info' after <h2>Alternative Titles</h2> found
    info = None
    if 'div' == tagName:
      for item in resp.xpath('.//text()').extract():
        if 'googletag.' in item: break
        item = item.strip()
        if item and item != ',':
          info = info + " " + item if info else item
      if info:
        print info

foundH2=False
response=Selector（text=htmlString）.xpath（'//*[@id=“content”]/table/tr/td[1]/div/*'））
对于响应的resp：
tagName=resp.xpath（'name（）'）。首先提取（）
如果“h2”==标记名：
foundH2=True
如果发现H2：
#在找到备选标题后开始添加“信息”
信息=无
如果“div”==标记名：
对于响应xpath（'.//text（）'）.extract（）中的项：
如果项目中的“谷歌标签”为“断开”
item=item.strip（）
如果项目和项目！='，'：
信息=信息+“”+项目如果信息为其他项目
如果信息：
打印信息

依我看，beautifulSoup比scrapy更快更好。

嗨。请你写一段左右的话来更好地解释你的问题，你想用什么语言？你和那个网站有协议来删除内容吗？我使用python和scrapy Framework，下面的语法返回空列表你更改了类名了吗？实际上，类名是spaceit为了获得更好的结果，您可以尝试response.xpath（'//div[@class=“js scrollfix bottom”]//div[@class=“spaceit”]只是它不会返回您的可选名称和类型谢谢，但名称和谷歌标签是什么？请您解释一下您的代码。它

div

content在

Favorites:27

之后，找到后将停止循环