Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/84.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python:使用Beautifulsoup从多个td和th获取值时没有标记_Python_Html_Web Scraping_Beautifulsoup - Fatal编程技术网

Python:使用Beautifulsoup从多个td和th获取值时没有标记

Python:使用Beautifulsoup从多个td和th获取值时没有标记,python,html,web-scraping,beautifulsoup,Python,Html,Web Scraping,Beautifulsoup,我有一个页面看起来像这样 <tr> <th class="fst" scope="col">time(*)</th> <th scope="col">field</th> <th scope="col">1 session</th> <th scope="col">2 session</th> <th scope="col">3 sess

我有一个页面看起来像这样

<tr>
    <th class="fst" scope="col">time(*)</th>
    <th scope="col">field</th>
    <th scope="col">1 session</th>
    <th scope="col">2 session</th>
    <th scope="col">3 session</th>
    <th scope="col">4 session</th>
    <th scope="col">5 session</th>
    <th scope="col">6 session</th>
</tr>
<tr>
   <th class="num_area" rowspan="11" scope="row">77</th>
   <td class="txt_category">bus</td>
   <td>58456</td>                                                                   
   <td>62891</td>                                                                    
   <td>63076</td>                                                             
   <td>53282</td>                                                                 
   <td>54805</td>                                                             
   <td>55097</td>
</tr>
<tr>
   <td class="txt_category">taxi</td>
   <td>-</td>
   <td>-</td>
   <td>-</td>
   <td>62891</td>
   <td>-</td>
   <td>-</td>
</tr>
<tr>                         
    <th class="fst" scope="col">time(*)</th>
    <th scope="col">field</th>
    <th scope="col">7 session</th>
    <th scope="col">8 session</th>
    <th scope="col">9 session</th>
    <th scope="col">10 session</th>
    <th scope="col">11 session</th>
    <th scope="col">12 session</th>
</tr>
<tr>
   <th class="num_area" rowspan="11" scope="row">100</th>
   <td class="txt_category">bus</td>
   <td>1342</td>                                                                   
   <td>138470</td>                                                                    
   <td>878840</td>                                                             
   <td>7653</td>                                                                 
   <td>4422</td>                                                             
   <td>87630</td>
</tr>
def scraping():
    driver = webdriver.PhantomJS()
    driver.get(url)
    soup = BeautifulSoup(driver.page_source, 'html5lib')
    result = []
    for row in soup.findAll('tr'):
       header = row.findAll('th')
       if len(header) < 1:
           continue
       if len(header) == 7:
           for num in range(1, 7):
               date = header[num].find(text=True)

       if len(header) == 8:
           for num in range(1, 8):
               date = header[num].find(text=True)
       body = row.findAll('td')
       if len(body) < 1:
           continue
       field_name = body[0].find(text=True)
       template['field_name'] = field_name
       for num in range(1, 7):
           cost = body[num].find(text=True)
           template['cost'] = cost
       result.append(template)
到目前为止我都是这样尝试的

<tr>
    <th class="fst" scope="col">time(*)</th>
    <th scope="col">field</th>
    <th scope="col">1 session</th>
    <th scope="col">2 session</th>
    <th scope="col">3 session</th>
    <th scope="col">4 session</th>
    <th scope="col">5 session</th>
    <th scope="col">6 session</th>
</tr>
<tr>
   <th class="num_area" rowspan="11" scope="row">77</th>
   <td class="txt_category">bus</td>
   <td>58456</td>                                                                   
   <td>62891</td>                                                                    
   <td>63076</td>                                                             
   <td>53282</td>                                                                 
   <td>54805</td>                                                             
   <td>55097</td>
</tr>
<tr>
   <td class="txt_category">taxi</td>
   <td>-</td>
   <td>-</td>
   <td>-</td>
   <td>62891</td>
   <td>-</td>
   <td>-</td>
</tr>
<tr>                         
    <th class="fst" scope="col">time(*)</th>
    <th scope="col">field</th>
    <th scope="col">7 session</th>
    <th scope="col">8 session</th>
    <th scope="col">9 session</th>
    <th scope="col">10 session</th>
    <th scope="col">11 session</th>
    <th scope="col">12 session</th>
</tr>
<tr>
   <th class="num_area" rowspan="11" scope="row">100</th>
   <td class="txt_category">bus</td>
   <td>1342</td>                                                                   
   <td>138470</td>                                                                    
   <td>878840</td>                                                             
   <td>7653</td>                                                                 
   <td>4422</td>                                                             
   <td>87630</td>
</tr>
def scraping():
    driver = webdriver.PhantomJS()
    driver.get(url)
    soup = BeautifulSoup(driver.page_source, 'html5lib')
    result = []
    for row in soup.findAll('tr'):
       header = row.findAll('th')
       if len(header) < 1:
           continue
       if len(header) == 7:
           for num in range(1, 7):
               date = header[num].find(text=True)

       if len(header) == 8:
           for num in range(1, 8):
               date = header[num].find(text=True)
       body = row.findAll('td')
       if len(body) < 1:
           continue
       field_name = body[0].find(text=True)
       template['field_name'] = field_name
       for num in range(1, 7):
           cost = body[num].find(text=True)
           template['cost'] = cost
       result.append(template)
def刮片():
driver=webdriver.PhantomJS()
获取驱动程序(url)
soup=BeautifulSoup(driver.page_源代码'html5lib')
结果=[]
对于汤中的行。findAll('tr'):
header=row.findAll('th')
如果长度(收割台)<1:
持续
如果len(标题)==7:
对于范围(1,7)中的num:
日期=标题[num]。查找(text=True)
如果len(标题)==8:
对于范围(1,8)中的num:
日期=标题[num]。查找(text=True)
body=row.findAll('td')
如果透镜(主体)<1:
持续
字段名称=正文[0]。查找(text=True)
模板['field_name']=字段名称
对于范围(1,7)中的num:
成本=正文[num]。查找(text=True)
模板['cost']=成本
result.append(模板)
有时长度是7,有时是8,所以我决定使用范围。然而,在使用它之后,结果列表似乎只有一个字典,这不是我想要的。我想知道是否有好的方法来废除这些价值观