Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/346.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python抓取文本数据_Python_Beautifulsoup - Fatal编程技术网

Python抓取文本数据

Python抓取文本数据,python,beautifulsoup,Python,Beautifulsoup,我是python的新手,我正在尝试使用python中的beautifulSoup从网站抓取一些文本评论。html结构的一部分如下所示: <div style="1st level"> <div style="2nd level">Here is text 1</div> <div style="2nd level">Here is text 2</div> <div style="2nd level">

我是python的新手,我正在尝试使用python中的beautifulSoup从网站抓取一些文本评论。html结构的一部分如下所示:

<div style="1st level">
    <div style="2nd level">Here is text 1</div>
    <div style="2nd level">Here is text 2</div>
    <div style="2nd level">Here is text 3</div>
    <div style="2nd level">Here is text 4</div>
    Here is text 5 and this is the part I want to get.
<div>

但后来我从课文1到课文5都得到了。有没有一种简单的方法可以让我定位到第1级,只获取文本5?

不确定这些方法是否最好,但请尝试一下:

from bs4 import BeautifulSoup as soup
from collections import deque


input = """<div style="1st level">
    <div style="2nd level">Here is text 1</div>
    <div style="2nd level">Here is text 2</div>
    <div style="2nd level">Here is text 3</div>
    <div style="2nd level">Here is text 4</div>
    Here is text 5 and this is the part I want to get.
<div>"""

web_soup = soup(input)
reviews = web_soup.find('div', style="1st level")

print reviews.contents[-2]
print deque(reviews.strings, maxlen=1).pop()
仅供参考,我已使用
deque
strings
生成器中获取最后一个元素

而且,仅供参考,通过使用
text()
,lxml+xpath可以更轻松地完成这项工作


希望这能有所帮助。

我们几乎肯定需要相关网站的html源代码来帮助您解决这一问题。我去修复了您的问题格式,发现其中有一些html源代码(尽管还不够)。请编辑您的帖子,插入实际的html源代码,并确保格式正确。您确定那些
style=
不应该是
class=
from bs4 import BeautifulSoup as soup
from collections import deque


input = """<div style="1st level">
    <div style="2nd level">Here is text 1</div>
    <div style="2nd level">Here is text 2</div>
    <div style="2nd level">Here is text 3</div>
    <div style="2nd level">Here is text 4</div>
    Here is text 5 and this is the part I want to get.
<div>"""

web_soup = soup(input)
reviews = web_soup.find('div', style="1st level")

print reviews.contents[-2]
print deque(reviews.strings, maxlen=1).pop()
Here is text 5 and this is the part I want to get.