Python 过滤网站上的特定评论

Python 过滤网站上的特定评论,python,beautifulsoup,web,Python,Beautifulsoup,Web,我想在网站上过滤某些投诉,这些投诉中有一个特定的单词,但我无法搜索带有非SCII字符的单词。我正在使用Python2.7和beautifulsoup。知道为什么会发生这种情况吗?YouTube应该将od语句更改为“如果您的测试在p标记内” #!/usr/bin/env python # -*- coding: utf-8 -*- import urllib2 #import re from BeautifulSoup import BeautifulSoup headers = {'User-

我想在网站上过滤某些投诉,这些投诉中有一个特定的单词,但我无法搜索带有非SCII字符的单词。我正在使用Python2.7和beautifulsoup。知道为什么会发生这种情况吗?

YouTube应该将od语句更改为“如果您的测试在p标记内”

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import urllib2
#import re
from BeautifulSoup import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0'}

req = urllib2.Request('https://www.sikayetvar.com/onedio', 
None,headers)
resp  = urllib2.urlopen(req)
html = resp.read()
soup = BeautifulSoup(html)

complaints = soup.findAll('p', attrs = {'class' : 'complaint-summary'})


for complaint in complaints:
   if complaint.text.find("genç") is not -1:
      print complaint.text

不要使用蟒蛇2。他们将在未来几年停止支持它

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import urllib2
from BeautifulSoup import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0'}

req = urllib2.Request('https://www.sikayetvar.com/onedio', 
None,headers)
resp  = urllib2.urlopen(req)
html = resp.read()
soup = BeautifulSoup(html)

complaints = soup.findAll('p', attrs = {'class' : 'complaint-summary'})

for complaint in complaints:
    if b"genç".decode("utf-8") in complaint.text:
        print(complaint.text)
输出将是

import requests
from bs4 import BeautifulSoup 

response = requests.get('https://www.sikayetvar.com/onedio',headers = {'User-Agent': 'Mozilla/5.0'})

soup = BeautifulSoup(response.content,'lxml')

complaints = soup.select('p.complaint-summary')
for complaint in complaints:
    if "genç" in complaint.text:
        print(complaint.text.strip())

你能提供一个你想要的html标签吗?

(*)1)最好是在complaint.text中找到?2) 尝试打印所有投诉项目。文本并检查它们是什么。我会得到投诉的空列表,所以我无法测试它。你能提供完整的链接吗。complaint.find将尝试查找标签。您似乎试图使用它来查找文本。我建议您查看python字符串类型的方法,并在complaint.text上使用适当的方法。我得到以下错误:UnicodeDecodeError:“ascii”编解码器无法解码位置0处的字节0xc3:序号不在范围内(128)如何搜索非ascii字符我已编辑了答案。它应该返回一个投诉。

Ne yazık ki bir sosyal sitede ahlak dışı içerikli haberler durulmuyor. Çocuk ve gençler için sakıncalı olduğunu düşünüyorum. Fotoğraflarda saçma başlıkları görebilirsiniz. Başlıklardan anlaşılacağı üzere cinsel…