Pandas 提取行和数据的漂亮汤

Pandas 提取行和数据的漂亮汤,pandas,beautifulsoup,Pandas,Beautifulsoup,我正在使用BeautifulSoup从一个内部站点提取一些数据。链接上提供的代码适用于我的4列数据。还有一个标记为th的数据。如何在所有tds的同一行中获取th。 使用库SimplifiedDoc的解决方案 from simplified_scrapy import SimplifiedDoc html = ''' <table> <tr> <td>Manager ID</td> <th>Process</th> <

我正在使用BeautifulSoup从一个内部站点提取一些数据。链接上提供的代码适用于我的4列数据。还有一个标记为th的数据。如何在所有tds的同一行中获取th。


使用库SimplifiedDoc的解决方案

from simplified_scrapy import SimplifiedDoc
html = '''
<table>
<tr>
<td>Manager ID</td>
<th>Process</th>
<td>Defect Count</td>
<td>Transaction</td>
<td>DPMO</td>
</tr>
<tr role = 'row'>
<td>bedfli</td>
<th>Receive</th>
<td>155</td>
<td>2215</td>
<td>898</td>
</tr>
</table>
'''
doc = SimplifiedDoc(html)
# First way
table = doc.getTable(body='table')
# Second way
table = doc.selects('table>tr').children.text
# Third way
table = doc.selects('table>tr').selects('td|th').text
print (table)
使用
Manager ID Process Defect Count Transaction DPMO
bedfli       Receive   155          2215       898
from simplified_scrapy import SimplifiedDoc
html = '''
<table>
<tr>
<td>Manager ID</td>
<th>Process</th>
<td>Defect Count</td>
<td>Transaction</td>
<td>DPMO</td>
</tr>
<tr role = 'row'>
<td>bedfli</td>
<th>Receive</th>
<td>155</td>
<td>2215</td>
<td>898</td>
</tr>
</table>
'''
doc = SimplifiedDoc(html)
# First way
table = doc.getTable(body='table')
# Second way
table = doc.selects('table>tr').children.text
# Third way
table = doc.selects('table>tr').selects('td|th').text
print (table)
[['Manager ID', 'Process', 'Defect Count', 'Transaction', 'DPMO'], ['bedfli', 'Receive', '155', '2215', '898']]