特定于HTML的<；h1>；Python中的文本_Html_Python 3.x_Html Parsing

特定于HTML的<；h1>；Python中的文本

html python-3.x

特定于HTML的<；h1>；Python中的文本,html,python-3.x,html-parsing,Html,Python 3.x,Html Parsing,我只想得到的标题，这是python中的标题我尝试了一些方法，但没有得到想要的结果 import requests from bs4 import BeautifulSoup response = requests.get("https://www.strawpoll.me/20321563/r") html_content = response.content soup = BeautifulSoup(html_content, "html.parser") for i in so

我只想得到

的标题，这是python中的标题
我尝试了一些方法，但没有得到想要的结果
import requests

from bs4 import BeautifulSoup


response = requests.get("https://www.strawpoll.me/20321563/r")

html_content = response.content

soup = BeautifulSoup(html_content, "html.parser")

for i in soup.get_text("p", {"class": "result-list"}):
    print(i)

将lxml用于此类任务。你也可以使用beautifulsoup
import lxml.html
t = lxml.html.parse(url)
print t.find(".//title").text

（来自Peter Hoffmann）
使用lxml执行此类任务。你也可以使用beautifulsoup
import lxml.html
t = lxml.html.parse(url)
print t.find(".//title").text

（来自Peter Hoffmann）
我将给定代码添加到我的代码中
title = soup.title
print(title.string[:-24:])  # Last 24 character of title is always constant.

我将给定的代码添加到我的代码中
title = soup.title
print(title.string[:-24:])  # Last 24 character of title is always constant.

如果仍然无法获得所需的结果，请尝试此方法
import urllib
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://www.strawpoll.me/20321563/r'
uCLient = uReq(my_url)
page_html = uCLient.read()
uCLient.close()    
page_soup = soup(page_html,"html.parser")
_div = page_soup.find(lambda tag: tag.name=='div' and tag.has_attr('id') and 
tag['id']=="result-list") 
title = _div.findAll(lambda tag: tag.name=='h1')

print(title)

输出：[这是标题]
如果仍然无法获得所需的结果，请尝试此方法
import urllib
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://www.strawpoll.me/20321563/r'
uCLient = uReq(my_url)
page_html = uCLient.read()
uCLient.close()    
page_soup = soup(page_html,"html.parser")
_div = page_soup.find(lambda tag: tag.name=='div' and tag.has_attr('id') and 
tag['id']=="result-list") 
title = _div.findAll(lambda tag: tag.name=='h1')

print(title)

输出：[这是标题]
您可以使用BeautifulSoup，如下所示：
from bs4 import BeautifulSoup

data = "html as text(Source)"

soup = BeautifulSoup(data)

p = soup.find('h1', attrs={'class': 'titleClass'})
p.a.extract()
print p.text.strip()

您可以使用BeautifulSoup，如下所示：
from bs4 import BeautifulSoup

data = "html as text(Source)"

soup = BeautifulSoup(data)

p = soup.find('h1', attrs={'class': 'titleClass'})
p.a.extract()
print p.text.strip()

实际上我只需要“这是标题”实际上我只需要“这是标题”