Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/327.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从bs4.element获取特定项目_Python_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 从bs4.element获取特定项目

Python 从bs4.element获取特定项目,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我有类型为bs4.element.Tag的元素: <div class="table_v_nr"> 1003 : 11400 <span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 35id</span></div> 1003 : 11400 Y 35id 我需要从这个元素中得到“1003:11400

我有类型为bs4.element.Tag的元素:

<div class="table_v_nr">
    1003 : 11400

    <span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 35id</span></div>

1003 : 11400
Y 35id
我需要从这个元素中得到“1003:11400”。拜托,怎么做

多谢各位

编辑:

如果我有多个div,如何选择单个元素(“1003:11400”,…):

    <div class="table_v_nr">
    1003 : 11400

    <span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 35id</span></div>,
<div class="table_v_nr">
    1003 : 11400

    <span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 36id</span></div>,
<div class="table_v_nr">
    1007 : 11550

    <span class="table_v_time" title="13. min. 2. hr. 6. day.">Y 37id</span></div>,

1003 : 11400
Y 35id,
1003 : 11400
Y 36id,
1007 : 11550
Y 37id,
…这将有助于您:

div = soup.find('div', class_ = "table_v_nr")
print(div.find_next(text=True).strip())
完整代码:

from bs4 import BeautifulSoup

html = '''
<div class="table_v_nr">
    1003 : 11400

    <span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 35id</span></div>
'''
soup = BeautifulSoup(html,'html5lib')

div = soup.find('div', class_ = "table_v_nr")
print(div.find_next(text=True).strip())
编辑:

如果要从多个
div
标记中提取文本,可以尝试以下方法:

from bs4 import BeautifulSoup

html = """
    <div class="table_v_nr">
    1003 : 11400

    <span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 35id</span></div>,
<div class="table_v_nr">
    1003 : 11400

    <span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 36id</span></div>,
<div class="table_v_nr">
    1007 : 11550

    <span class="table_v_time" title="13. min. 2. hr. 6. day.">Y 37id</span></div>,
"""
soup = BeautifulSoup(html,'html5lib')

[print(div.find_next(text=True).strip()) for div in soup.find_all('div', class_ = "table_v_nr")]
使用:

编辑您可以使用CSS选择器:

from bs4 import BeautifulSoup

soup = BeautifulSoup(html,'html.parser')

for tag in soup.select (".table_v_nr:contains('1003')"):
    print(tag.next.strip())
输出:

1003 : 11400
1003 : 11400
1003 : 11400
1007 : 11550
1003 : 11400
1003 : 11400
1003 : 11400

您是否尝试了_element.text?通过
xpath
获取元素。然后只需执行
元素。text
,它就会工作。非常感谢。我可以单独选择它吗。我需要:1,将所有这些数字(1000:10000,…)添加到矩阵(熊猫数据帧)2中,或者只选择感兴趣的数字(第一,第三,…)
1003 : 11400
1003 : 11400