Python 使用beautifulsoup获取多个标记和属性数据_Python_Html_Parsing_Beautifulsoup

Python 使用beautifulsoup获取多个标记和属性数据

python html parsing

Python 使用beautifulsoup获取多个标记和属性数据,python,html,parsing,beautifulsoup,Python,Html,Parsing,Beautifulsoup,我想使用beautifulsoup从以下HTML中获取多个标记和属性 1） div id=home_1039509 2） div id=“guest_1039509 3） id=“赔率”\u 3\u 1039509 4） id=“gs_1039509 5） id=“hs_1039509 6） id=“时间”\u 1039509 HTML: 您可以传递并检查idhome、guest等： from bs4 import BeautifulSoup f = lambda x: x and x.sta

我想使用beautifulsoup从以下HTML中获取多个标记和属性

1） div id=home_1039509

2） div id=“guest_1039509

3） id=“赔率”\u 3\u 1039509

4） id=“gs_1039509

5） id=“hs_1039509

6） id=“时间”\u 1039509

HTML:

您可以传递并检查

id

home

、

guest

等：

from bs4 import BeautifulSoup

f = lambda x: x and x.startswith(('home_', 'guest_', 'odds_', 'gs_', 'hs_', 'time_'))

soup = BeautifulSoup(open('test.html'))
print [element.get_text(strip=True) for element in soup.find_all(id=f)]

印刷品：

[u'U18()', u'U18', u'2', u'42', u'1', u'', u'', u'0.942.5/30.86', u'']

请注意，

startswith（）

允许传递字符串元组进行检查。

您可以传递并检查

id

home

，

guest

等：

from bs4 import BeautifulSoup

f = lambda x: x and x.startswith(('home_', 'guest_', 'odds_', 'gs_', 'hs_', 'time_'))

soup = BeautifulSoup(open('test.html'))
print [element.get_text(strip=True) for element in soup.find_all(id=f)]

印刷品：

[u'U18()', u'U18', u'2', u'42', u'1', u'', u'', u'0.942.5/30.86', u'']

请注意，

startswith（）

允许传递字符串元组以进行检查。

您可以获得类似的列

import re from bs4 import BeautifulSoup soup = BeautifulSoup(html) soup.find_all(["div", "span"], id=re.compile('[home|guest|odds_3|gs|hs|time]_\d+'))
上面的正则表达式只是一个例子
在你的情况下，它可以是

cols = tr.find_all(["div", "span"], id=re.compile('[home|guest|odds|gs|hs|time]_\d+')) for tag in cols: # find(text=True) only returns data if immediate node has text # incase <div><span>123</span></div> will return None t = td.find_all(text=True) if t: # find_all will return list so need to join text = ''.join(t).strip() + ';' print(text)

cols=tr.find|all（[“div”，“span”]，id=re.compile（“[home | guest |赔率| gs | hs |时间]）\d+”）对于cols中的标记： #find（text=True）仅在立即节点具有文本时返回数据 #如果是123，则不返回任何值 t=td.find_all（text=True）如果t： #find_all将返回列表，因此需要加入 text=''.join（t）.strip（）+'；' 打印（文本）
您可以获得类似的列

import re from bs4 import BeautifulSoup soup = BeautifulSoup(html) soup.find_all(["div", "span"], id=re.compile('[home|guest|odds_3|gs|hs|time]_\d+'))
上面的正则表达式只是一个例子
在你的情况下，它可以是

cols = tr.find_all(["div", "span"], id=re.compile('[home|guest|odds|gs|hs|time]_\d+')) for tag in cols: # find(text=True) only returns data if immediate node has text # incase <div><span>123</span></div> will return None t = td.find_all(text=True) if t: # find_all will return list so need to join text = ''.join(t).strip() + ';' print(text)

cols=tr.find|all（[“div”，“span”]，id=re.compile（“[home | guest |赔率| gs | hs |时间]）\d+”）对于cols中的标记： #find（text=True）仅在立即节点具有文本时返回数据 #如果是123，则不返回任何值 t=td.find_all（text=True）如果t： #find_all将返回列表，因此需要加入 text=''.join（t）.strip（）+'；' 打印（文本）
这个问题和您被呼叫时一样糟糕。
rfvtgb2014
。请阅读，您的代码中到底有什么不起作用？我的代码不起作用，请寻求建议，谢谢！“我的代码[原文如此]不起作用”不是一个有用的问题陈述。错误（提供完整的回溯）？意外输出（提供输入以及预期和实际输出）？到目前为止，您做了什么来尝试修复它？问题并不完全糟糕：至少它包含代码、输入和用户试图实现的目标-这有助于提供解决方案。这和您被称为
rfvtgb2014
时一样糟糕。请阅读，您的代码中到底有什么不起作用？我的代码不是工作，并寻求建议，thx！“我的代码[原文如此]不工作”不是一个有用的问题陈述。错误（提供完全回溯）？意外输出（提供输入以及预期和实际输出）？到目前为止，您做了什么来尝试修复它？问题并不完全糟糕：至少它包含代码、输入和用户试图实现的内容-这有助于提供解决方案。@hknothin2014有关BeutifulSoup如何工作的更多信息，请查看文档@hknothin2014以了解有关BeutifulSoup如何工作的更多信息Works检查文档，谢谢，但无法获取标记中的数据，请帮助，非常感谢！因为find正在从div和span元素中进行选择[“div”，“span”，“td”]将选择TDThank，实际上我添加了[“div”，“span”，“td”]，但它仍然不起作用，我不知道原因是什么，谢谢！您需要更新正则表达式，这可能有助于重新编译（r'[home | guest | bits | gs | hs | time]\ud+），即使我写cols=tr.findAll（“td”，“id”：bits|3_1039509）但是它仍然无法获取数据谢谢，但是无法获取标记内的数据，请帮助，非常感谢！因为find仅从div和span元素中选择[“div”，“span”，“td”]将选择TDThank，实际上我添加了[“div”，“span”，“td”]，但它仍然不起作用，我不知道原因是什么，谢谢！您需要更新正则表达式，可能这有助于重新编译（r'[home | guest | bits | gs | hs | time]\ud+），即使我写cols=tr.findAll（“td”，{“id”：bits|u3_1039509），但它仍然无法获得数据