Python:使用Beautifulsoup从多个td和th获取值时没有标记
我有一个页面看起来像这样Python:使用Beautifulsoup从多个td和th获取值时没有标记,python,html,web-scraping,beautifulsoup,Python,Html,Web Scraping,Beautifulsoup,我有一个页面看起来像这样 <tr> <th class="fst" scope="col">time(*)</th> <th scope="col">field</th> <th scope="col">1 session</th> <th scope="col">2 session</th> <th scope="col">3 sess
<tr>
<th class="fst" scope="col">time(*)</th>
<th scope="col">field</th>
<th scope="col">1 session</th>
<th scope="col">2 session</th>
<th scope="col">3 session</th>
<th scope="col">4 session</th>
<th scope="col">5 session</th>
<th scope="col">6 session</th>
</tr>
<tr>
<th class="num_area" rowspan="11" scope="row">77</th>
<td class="txt_category">bus</td>
<td>58456</td>
<td>62891</td>
<td>63076</td>
<td>53282</td>
<td>54805</td>
<td>55097</td>
</tr>
<tr>
<td class="txt_category">taxi</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>62891</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<th class="fst" scope="col">time(*)</th>
<th scope="col">field</th>
<th scope="col">7 session</th>
<th scope="col">8 session</th>
<th scope="col">9 session</th>
<th scope="col">10 session</th>
<th scope="col">11 session</th>
<th scope="col">12 session</th>
</tr>
<tr>
<th class="num_area" rowspan="11" scope="row">100</th>
<td class="txt_category">bus</td>
<td>1342</td>
<td>138470</td>
<td>878840</td>
<td>7653</td>
<td>4422</td>
<td>87630</td>
</tr>
def scraping():
driver = webdriver.PhantomJS()
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html5lib')
result = []
for row in soup.findAll('tr'):
header = row.findAll('th')
if len(header) < 1:
continue
if len(header) == 7:
for num in range(1, 7):
date = header[num].find(text=True)
if len(header) == 8:
for num in range(1, 8):
date = header[num].find(text=True)
body = row.findAll('td')
if len(body) < 1:
continue
field_name = body[0].find(text=True)
template['field_name'] = field_name
for num in range(1, 7):
cost = body[num].find(text=True)
template['cost'] = cost
result.append(template)
到目前为止我都是这样尝试的
<tr>
<th class="fst" scope="col">time(*)</th>
<th scope="col">field</th>
<th scope="col">1 session</th>
<th scope="col">2 session</th>
<th scope="col">3 session</th>
<th scope="col">4 session</th>
<th scope="col">5 session</th>
<th scope="col">6 session</th>
</tr>
<tr>
<th class="num_area" rowspan="11" scope="row">77</th>
<td class="txt_category">bus</td>
<td>58456</td>
<td>62891</td>
<td>63076</td>
<td>53282</td>
<td>54805</td>
<td>55097</td>
</tr>
<tr>
<td class="txt_category">taxi</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>62891</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<th class="fst" scope="col">time(*)</th>
<th scope="col">field</th>
<th scope="col">7 session</th>
<th scope="col">8 session</th>
<th scope="col">9 session</th>
<th scope="col">10 session</th>
<th scope="col">11 session</th>
<th scope="col">12 session</th>
</tr>
<tr>
<th class="num_area" rowspan="11" scope="row">100</th>
<td class="txt_category">bus</td>
<td>1342</td>
<td>138470</td>
<td>878840</td>
<td>7653</td>
<td>4422</td>
<td>87630</td>
</tr>
def scraping():
driver = webdriver.PhantomJS()
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html5lib')
result = []
for row in soup.findAll('tr'):
header = row.findAll('th')
if len(header) < 1:
continue
if len(header) == 7:
for num in range(1, 7):
date = header[num].find(text=True)
if len(header) == 8:
for num in range(1, 8):
date = header[num].find(text=True)
body = row.findAll('td')
if len(body) < 1:
continue
field_name = body[0].find(text=True)
template['field_name'] = field_name
for num in range(1, 7):
cost = body[num].find(text=True)
template['cost'] = cost
result.append(template)
def刮片():
driver=webdriver.PhantomJS()
获取驱动程序(url)
soup=BeautifulSoup(driver.page_源代码'html5lib')
结果=[]
对于汤中的行。findAll('tr'):
header=row.findAll('th')
如果长度(收割台)<1:
持续
如果len(标题)==7:
对于范围(1,7)中的num:
日期=标题[num]。查找(text=True)
如果len(标题)==8:
对于范围(1,8)中的num:
日期=标题[num]。查找(text=True)
body=row.findAll('td')
如果透镜(主体)<1:
持续
字段名称=正文[0]。查找(text=True)
模板['field_name']=字段名称
对于范围(1,7)中的num:
成本=正文[num]。查找(text=True)
模板['cost']=成本
result.append(模板)
有时长度是7,有时是8,所以我决定使用范围。然而,在使用它之后,结果列表似乎只有一个字典,这不是我想要的。我想知道是否有好的方法来废除这些价值观