Python 从bs4.element获取特定项目_Python_Web Scraping_Beautifulsoup

Python 从bs4.element获取特定项目

python web-scraping

Python 从bs4.element获取特定项目,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我有类型为bs4.element.Tag的元素： <div class="table_v_nr"> 1003 : 11400 <span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 35id</span></div> 1003 : 11400 Y 35id 我需要从这个元素中得到“1003:11400

我有类型为bs4.element.Tag的元素：

<div class="table_v_nr">
    1003 : 11400

    <span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 35id</span></div>


1003 : 11400
Y 35id

我需要从这个元素中得到“1003:11400”。拜托，怎么做

多谢各位

编辑：

如果我有多个div，如何选择单个元素（“1003:11400”，…）：

    <div class="table_v_nr">
    1003 : 11400

    <span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 35id</span></div>,
<div class="table_v_nr">
    1003 : 11400

    <span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 36id</span></div>,
<div class="table_v_nr">
    1007 : 11550

    <span class="table_v_time" title="13. min. 2. hr. 6. day.">Y 37id</span></div>,


1003 : 11400
Y 35id，
1003 : 11400
Y 36id，
1007 : 11550
Y 37id，

…这将有助于您：

div = soup.find('div', class_ = "table_v_nr")
print(div.find_next(text=True).strip())

完整代码：

from bs4 import BeautifulSoup

html = '''
<div class="table_v_nr">
    1003 : 11400

    <span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 35id</span></div>
'''
soup = BeautifulSoup(html,'html5lib')

div = soup.find('div', class_ = "table_v_nr")
print(div.find_next(text=True).strip())

编辑：

如果要从多个

div

标记中提取文本，可以尝试以下方法：

from bs4 import BeautifulSoup

html = """
    <div class="table_v_nr">
    1003 : 11400

    <span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 35id</span></div>,
<div class="table_v_nr">
    1003 : 11400

    <span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 36id</span></div>,
<div class="table_v_nr">
    1007 : 11550

    <span class="table_v_time" title="13. min. 2. hr. 6. day.">Y 37id</span></div>,
"""
soup = BeautifulSoup(html,'html5lib')

[print(div.find_next(text=True).strip()) for div in soup.find_all('div', class_ = "table_v_nr")]

使用：

编辑您可以使用CSS选择器：

from bs4 import BeautifulSoup

soup = BeautifulSoup(html,'html.parser')

for tag in soup.select (".table_v_nr:contains('1003')"):
    print(tag.next.strip())

输出：

1003 : 11400

1003 : 11400
1003 : 11400
1007 : 11550

1003 : 11400

1003 : 11400
1003 : 11400

您是否尝试了_element.text？通过

xpath

获取元素。然后只需执行

元素。text

，它就会工作。非常感谢。我可以单独选择它吗。我需要：1，将所有这些数字（1000:10000，…）添加到矩阵（熊猫数据帧）2中，或者只选择感兴趣的数字（第一，第三，…）

1003 : 11400
1003 : 11400