“提取自定义数据”;“数据”;使用BeautifulSoup(python)的标记

“提取自定义数据”;“数据”;使用BeautifulSoup(python)的标记,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,抓取如下所示的HTML: 您可以使用为BeautifulSoup对象字典样式声明的内置\uuuu getitem\uuuu方法: from bs4 import BeautifulSoup as soup s = """ <div class="resultRow" data-unix="1528542937" id="resultRow1"> <div class="resultRow" data-unix="1528542937" id="resultRow2">

抓取如下所示的HTML:


您可以使用为
BeautifulSoup
对象字典样式声明的内置
\uuuu getitem\uuuu
方法:

from bs4 import BeautifulSoup as soup
s = """
<div class="resultRow" data-unix="1528542937" id="resultRow1">
<div class="resultRow" data-unix="1528542937" id="resultRow2">
<div class="resultRow" data-unix="1528542937" id="resultRow1"> 
"""
final_results = [i['data-unix'] for i in soup(s, 'html.parser').find_all('div', {'class':'resultRow'})]

根据您关于在循环中移动Ajax1234答案的问题:

from bs4 import BeautifulSoup

s = """
<div class="resultRow" data-unix="1528542937" id="resultRow1">
<div class="resultRow" data-unix="1528542937" id="resultRow2">
<div class="resultRow" data-unix="1528542937" id="resultRow1"> 
"""

soup = BeautifulSoup(s, 'lxml')

final_results = []

for tmp in soup.find_all('div', {'class':'resultRow'}):

    final_results.append(tmp['data-unix'])

print final_results

['1528542937', '1528542937', '1528542937']
从bs4导入美化组
s=”“”
"""
汤=美汤(s'lxml')
最终结果=[]
对于汤中的tmp.find_all('div',{'class':'resultRow'}):
最终结果。追加(tmp['data-unix'])
打印最终结果
['1528542937', '1528542937', '1528542937']

是否可以将其移动到resultRow循环中?如果是这样,语法是什么?看一看HTML/XML阅读类(如HTMLPasser和XML.dom),看看您可以对属性及其属性值做些什么
from bs4 import BeautifulSoup

s = """
<div class="resultRow" data-unix="1528542937" id="resultRow1">
<div class="resultRow" data-unix="1528542937" id="resultRow2">
<div class="resultRow" data-unix="1528542937" id="resultRow1"> 
"""

soup = BeautifulSoup(s, 'lxml')

final_results = []

for tmp in soup.find_all('div', {'class':'resultRow'}):

    final_results.append(tmp['data-unix'])

print final_results

['1528542937', '1528542937', '1528542937']