“提取自定义数据”;“数据”;使用BeautifulSoup(python)的标记
抓取如下所示的HTML:“提取自定义数据”;“数据”;使用BeautifulSoup(python)的标记,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,抓取如下所示的HTML: 您可以使用为BeautifulSoup对象字典样式声明的内置\uuuu getitem\uuuu方法: from bs4 import BeautifulSoup as soup s = """ <div class="resultRow" data-unix="1528542937" id="resultRow1"> <div class="resultRow" data-unix="1528542937" id="resultRow2">
您可以使用为
BeautifulSoup
对象字典样式声明的内置\uuuu getitem\uuuu
方法:
from bs4 import BeautifulSoup as soup
s = """
<div class="resultRow" data-unix="1528542937" id="resultRow1">
<div class="resultRow" data-unix="1528542937" id="resultRow2">
<div class="resultRow" data-unix="1528542937" id="resultRow1">
"""
final_results = [i['data-unix'] for i in soup(s, 'html.parser').find_all('div', {'class':'resultRow'})]
根据您关于在循环中移动Ajax1234答案的问题:
from bs4 import BeautifulSoup
s = """
<div class="resultRow" data-unix="1528542937" id="resultRow1">
<div class="resultRow" data-unix="1528542937" id="resultRow2">
<div class="resultRow" data-unix="1528542937" id="resultRow1">
"""
soup = BeautifulSoup(s, 'lxml')
final_results = []
for tmp in soup.find_all('div', {'class':'resultRow'}):
final_results.append(tmp['data-unix'])
print final_results
['1528542937', '1528542937', '1528542937']
从bs4导入美化组
s=”“”
"""
汤=美汤(s'lxml')
最终结果=[]
对于汤中的tmp.find_all('div',{'class':'resultRow'}):
最终结果。追加(tmp['data-unix'])
打印最终结果
['1528542937', '1528542937', '1528542937']
是否可以将其移动到resultRow循环中?如果是这样,语法是什么?看一看HTML/XML阅读类(如HTMLPasser和XML.dom),看看您可以对属性及其属性值做些什么
from bs4 import BeautifulSoup
s = """
<div class="resultRow" data-unix="1528542937" id="resultRow1">
<div class="resultRow" data-unix="1528542937" id="resultRow2">
<div class="resultRow" data-unix="1528542937" id="resultRow1">
"""
soup = BeautifulSoup(s, 'lxml')
final_results = []
for tmp in soup.find_all('div', {'class':'resultRow'}):
final_results.append(tmp['data-unix'])
print final_results
['1528542937', '1528542937', '1528542937']