Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/288.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 靓汤:提取天气信息:表格-->;Excel文件_Python_Excel_Python 3.x_Beautifulsoup_Export To Excel - Fatal编程技术网

Python 靓汤:提取天气信息:表格-->;Excel文件

Python 靓汤:提取天气信息:表格-->;Excel文件,python,excel,python-3.x,beautifulsoup,export-to-excel,Python,Excel,Python 3.x,Beautifulsoup,Export To Excel,我想从这个表中提取数据 天气历史表 我需要什么 提取\中所有的所有内容 从python生成包含整个数据的Excel文件 我不需要的 数字旁边的所有单位(即:22°C) (我不想要中的) 33.8°C 排除\和 有人能告诉我如何在Excel文件中提取这些数据吗 HTML代码 <table id="history_table" class="responsive"> <thead> <tr class="column-heading"&

我想从这个表中提取数据

天气历史表

我需要什么

  • 提取
    \
    中所有
    的所有内容
  • 从python生成包含整个数据的Excel文件
我不需要的

  • 数字旁边的所有单位(即:22°C) (我不想要
    中的

    33.8°C
  • 排除
    \
有人能告诉我如何在Excel文件中提取这些数据吗

HTML代码

<table id="history_table" class="responsive">
    <thead>
        <tr class="column-heading">
            <th class="year-cell">2016</th>
            <th colspan="3">Temperature</th>
            <th colspan="3">Dew Point</th>
            <th colspan="3">Humidity</th>
            <th colspan="3">Speed</th>
            <th colspan="3">Pressure</th>
            <th>Precip. Accum.</th>
        </tr>
        <tr class="row-subheading"><th>Sep</th>
            <th class="alt-cell">High</th>
            <th class="alt-cell">Avg</th>
            <th class="alt-cell">Low</th>
            <th>High</th>
            <th>Avg</th>
            <th>Low</th>
            <th class="alt-cell">High</th>
            <th class="alt-cell">Avg</th>
            <th class="alt-cell">Low</th>
            <th>High</th>
            <th>Avg</th>
            <th>Gust</th>
            <th class="alt-cell">High</th>
            <th class="alt-cell">Avg</th>
            <th class="alt-cell">Low</th>
            <th>Sum</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td class="data-cell">12</td>
            <td class="data-cell alt-cell">33.8 <span class="table-unit">°C</span></td>
            <td class="data-cell alt-cell">26.1 <span class="table-unit">°C</span></td>
            <td class="data-cell alt-cell">18.4 <span class="table-unit">°C</span></td>
            <td class="data-cell">17.6 <span class="table-unit">°C</span></td>
            <td class="data-cell">16 <span class="table-unit">°C</span></td>
            <td class="data-cell">13.4 <span class="table-unit">°C</span></td>
            <td class="data-cell alt-cell">88 <span class="table-unit">%</span></td>
            <td class="data-cell alt-cell">55 <span class="table-unit">%</span></td>
            <td class="data-cell alt-cell">30 <span class="table-unit">%</span></td>
            <td class="data-cell">12 <span class="table-unit">kph</span></td>
            <td class="data-cell">1 <span class="table-unit">kph</span></td>
            <td class="data-cell">16 <span class="table-unit">kph</span></td>
            <td class="data-cell alt-cell">1016 <span class="table-unit">hPa</span></td>
            <td class="data-cell alt-cell">1014 <span class="table-unit">hPa</span></td>
            <td class="data-cell alt-cell">1012 <span class="table-unit">hPa</span></td>
            <td class="data-cell">0 <span class="table-unit">mm</span></td>
        </tr>
        <tr>
            <td class="data-cell">13</td>
            <td class="data-cell alt-cell">34.2 <span class="table-unit">°C</span></td>
            <td class="data-cell alt-cell">29 <span class="table-unit">°C</span></td>
            <td class="data-cell alt-cell">23.8 <span class="table-unit">°C</span></td>
            <td class="data-cell">17.4 <span class="table-unit">°C</span></td>
            <td class="data-cell">15.6 <span class="table-unit">°C</span></td>
            <td class="data-cell">12.7 <span class="table-unit">°C</span></td>
            <td class="data-cell alt-cell">61 <span class="table-unit">%</span></td>
            <td class="data-cell alt-cell">49 <span class="table-unit">%</span></td>
            <td class="data-cell alt-cell">29 <span class="table-unit">%</span></td>
            <td class="data-cell">12 <span class="table-unit">kph</span></td>
            <td class="data-cell">3 <span class="table-unit">kph</span></td>
            <td class="data-cell">16 <span class="table-unit">kph</span></td>
            <td class="data-cell alt-cell">1013 <span class="table-unit">hPa</span></td>
            <td class="data-cell alt-cell">1010 <span class="table-unit">hPa</span></td>
            <td class="data-cell alt-cell">1008 <span class="table-unit">hPa</span></td>
            <td class="data-cell">0 <span class="table-unit">mm</span></td>
        </tr>
        <tr class="column-heading">
            <td class="year-cell">2017</td>
            <td colspan="3">Temperature</td>
            <td colspan="3">Dew Point</td>
            <td colspan="3">Humidity</td>
            <td colspan="3">Speed</td>
            <td colspan="3">Pressure</td>
            <td>Precip. Accum.</td>
        </tr>
        <tr class="row-subheading">
            <td>Apr</td>
            <td class="alt-cell">High</td>
            <td class="alt-cell">Avg</td>
            <td class="alt-cell">Low</td>
            <td>High</td>
            <td>Avg</td>
            <td>Low</td>
            <td class="alt-cell">High</td>
            <td class="alt-cell">Avg</td>
            <td class="alt-cell">Low</td>
            <td>High</td>
            <td>Avg</td>
            <td>Gust</td>
            <td class="alt-cell">High</td>
            <td class="alt-cell">Avg</td>
            <td class="alt-cell">Low</td>
            <td>Sum</td>
       </tr> 
       <tr>
            <td class="data-cell">1</td>
            <td class="data-cell alt-cell">17.4 <span class="table-unit">°C</span></td>
            <td class="data-cell alt-cell">14.1 <span class="table-unit">°C</span></td>
            <td class="data-cell alt-cell">10.7 <span class="table-unit">°C</span></td>
            <td class="data-cell">10.2 <span class="table-unit">°C</span></td>
            <td class="data-cell">7.4 <span class="table-unit">°C</span></td>
            <td class="data-cell">4.7 <span class="table-unit">°C</span></td>
            <td class="data-cell alt-cell">82 <span class="table-unit">%</span></td>
            <td class="data-cell alt-cell">68 <span class="table-unit">%</span></td>
            <td class="data-cell alt-cell">45 <span class="table-unit">%</span></td>
            <td class="data-cell">11 <span class="table-unit">kph</span></td>
            <td class="data-cell">5 <span class="table-unit">kph</span></td>
            <td class="data-cell">18 <span class="table-unit">kph</span></td>
            <td class="data-cell alt-cell">1016 <span class="table-unit">hPa</span></td>
            <td class="data-cell alt-cell">1015 <span class="table-unit">hPa</span></td>
            <td class="data-cell alt-cell">1013 <span class="table-unit">hPa</span></td>
            <td class="data-cell">0 <span class="table-unit">mm</span></td>
      </tr>...
from xlsxwriter import Workbook
from bs4 import BeautifulSoup
def read_file():
    file = open('meteo.html', 'rt', encoding='UTF8')
    data = file.read()
    file.close()
    return data
data_path ='/Users/Xtro/Dropbox/Work/test/data/out/meteo'
def write_data_to_excel_file(datas,data_path):
    #print(datas[8])
    workbook=Workbook(data_path +'/meteo.xlsx')
    worksheet = workbook.add_worksheet()
    row=0
    worksheet.write(row,0,'Date')
    worksheet.write(row,1,'Température haute en °C')
    worksheet.write(row,2,'Température moyenne en °C')
    worksheet.write(row,3,'Température basse en °C')
    worksheet.write(row,4,'Point de rosée haut en °C')
    worksheet.write(row,5,'Point de rosée moyenne en °C')
    worksheet.write(row,6,'Point de rosée bas')
    worksheet.write(row,7,'humidité haute en %')
    worksheet.write(row,8,'humidité moyenne en %')
    worksheet.write(row,9,'humidité basse en %')
    worksheet.write(row,10,'vitesse haute en km/h')
    worksheet.write(row,11,'raffale en km/h')
    worksheet.write(row,12,'Pression haute en hPa')
    worksheet.write(row,13,'Pression moyenne en hPa')
    worksheet.write(row,14,'Pression basse en hPa')
    worksheet.write(row,15,'précipitation/jour en mm')
    row+=1


    for data in datas:
        print(data[1])
        cellule0=data[1]
        worksheet.write(row,0,cellule0)

        cellule0=data[0]
        cellule1=data[1]
        cellule2=data[2]
        cellule3=data[3]
        cellule4=data[4]
        cellule5=data[5]
        cellule6=data[6]
        cellule7=data[7]
        cellule8=data[8]
        cellule9=data[9]
        cellule10=data[10]
        cellule11=data[11]
        cellule12=data[12]
        cellule13=data[13]
        cellule14=data[14]
        cellule15=data[15]

            #cellule[i]=data[i]

        worksheet.write(row,0,cellule0)
        worksheet.write(row,1,cellule1)
        worksheet.write(row,2,cellule2)
        worksheet.write(row,3,cellule3)
        worksheet.write(row,4,cellule4)
        worksheet.write(row,5,cellule5)
        worksheet.write(row,6,cellule6)
        worksheet.write(row,7,cellule7)
        worksheet.write(row,8,cellule8)
        worksheet.write(row,9,cellule9)
        worksheet.write(row,10,cellule10)
        worksheet.write(row,11,cellule11)
        worksheet.write(row,12,cellule12)
        worksheet.write(row,13,cellule13)
        worksheet.write(row,14,cellule14)
        worksheet.write(row,14,cellule15)

        row +=1

    workbook.close()

soup = BeautifulSoup(read_file(),'lxml')

data = []
table = soup.find('table',class_='responsive')
table_body = table.find('tbody')


rows = table_body.find_all('tr')

for tr in rows:
    spans = tr.find_all('span')
    #print(spans)
    if spans:
        continue

#print (rows)
    for row in rows:
        cols = row.find_all('td')
        #print (cols)
        cols = [ele.text.strip() for ele in cols]
        data.append([ele for ele in cols if ele])

write_data_to_excel_file(data,data_path)

这里有一种使用硒的替代方法。我忘记了删除标题的要求

>>> from selenium import webdriver
>>> driver = webdriver.Chrome()
>>> driver.get('https://www.wunderground.com/personal-weather-station/dashboard?ID=ILEDEFRA210#history/tdata/s20160912/e20170912/mcustom')
>>> rows = driver.find_elements_by_xpath('.//td[@class="data-cell"]/..')
>>> len(rows)
366
>>> rows[-1].text
'12 19.9 °C 16.3 °C 12.9 °C 14.3 °C 9.6 °C 6.8 °C 82 % 69 % 47 % 18 kph 3 kph 24 kph 1012 hPa 1009 hPa 1005 hPa 2 mm'
>>> rows[0].text
'12 33.8 °C 26.1 °C 18.4 °C 17.6 °C 16 °C 13.4 °C 88 % 55 % 30 % 12 kph 1 kph 16 kph 1016 hPa 1014 hPa 1012 hPa 0 mm'
>>> for r, row in enumerate(rows):
...     [_.text.split()[0] for _ in row.find_elements_by_xpath('.//td') ]
...     
['12', '33.8', '26.1', '18.4', '17.6', '16', '13.4', '88', '55', '30', '12', '1', '16', '1016', '1014', '1012', '0']
['13', '34.2', '29', '23.8', '17.4', '15.6', '12.7', '61', '49', '29', '12', '3', '16', '1013', '1010', '1008', '0']
['14', '33.9', '26.6', '19.3', '18.4', '14.7', '12.4', '77', '52', '32', '16', '2', '20', '1013', '1010', '1007', '0.3']
['15', '22.1', '19.5', '16.9', '18.3', '16.2', '13.1', '98', '87', '74', '13', '2', '16', '1014', '1011', '1009', '16.8']
['16', '21.4', '18.3', '15.3', '16.5', '13.9', '12.4', '96', '80', '58', '13', '2', '20', '1015', '1013', '1012', '6.9']
['17', '19.6', '16.4', '13.3', '14.1', '12.9', '10.9', '94', '84', '62', '5', '0', '8', '1019', '1017', '1015', '1']
['18', '25.5', '18.9', '12.4', '17.7', '14.4', '11.8', '97', '84', '55', '4', '0', '8', '1020', '1018', '1017', '0.5']
['19', '22.4', '18.8', '15.1', '17.6', '14.8', '13.7', '98', '84', '64', '5', '0', '8', '1021', '1020', '1019', '1.8']
['20', '24.5', '20.1', '15.6', '15.1', '13.1', '10', '90', '71', '48', '2', '0', '4', '1020', '1018', '1017', '0']
['21', '30', '20.5', '11', '14.2', '11.5', '9.7', '97', '63', '32', '2', '0', '4', '1020', '1018', '1017', '0']
['22', '26.4', '19.7', '13', '13.3', '11.7', '9.6', '96', '63', '38', '11', '0', '16', '1021', '1020', '1019', '0']
['23', '31', '21', '14.7', '15.4', '12.8', '10.9', '91', '62', '34', '8', '0', '12', '1026', '1023', '1021', '0']
['24', '26', '20.1', '14.2', '13', '10.7', '8.3', '82', '57', '35', '21', '2', '24', '1024', '1020', '1016', '0']
['25', '24.7', '17.9', '14.2', '15.9', '11.3', '8.9', '88', '66', '41', '12', '1', '16', '1023', '1019', '1014', '1']
['26', '24.2', '16.4', '11.6', '11.5', '9.6', '6.4', '94', '66', '36', '12', '1', '20', '1024', '1023', '1022', '0']
['27', '26.2', '19', '12.4', '11.2', '9.4', '7.7', '86', '56', '34', '10', '1', '16', '1028', '1025', '1023', '0']
['28', '27.4', '20.6', '16.6', '17.1', '13.8', '9.3', '84', '66', '44', '14', '1', '20', '1028', '1027', '1025', '0']
['29', '23.1', '18.4', '15.1', '15.5', '14.2', '12.5', '94', '77', '56', '17', '2', '24', '1025', '1021', '1016', '0']
['30', '21.3', '16.6', '14.5', '13.4', '11.6', '7.8', '93', '73', '50', '10', '0', '12', '1017', '1014', '1012', '1.8']
['1', '19.8', '15.6', '12.9', '13.7', '11.6', '9.4', '97', '78', '57', '18', '2', '24', '1012', '1010', '1009', '1']
['2', '22.7', '14.6', '10.5', '11', '8.4', '6.5', '85', '67', '41', '14', '1', '16', '1022', '1017', '1011', '1']
['3', '27.2', '16.1', '8.3', '11.1', '7.9', '5', '99', '64', '26', '12', '0', '16', '1028', '1025', '1022', '0']
['4', '24.1', '15.9', '9.8', '12', '8.9', '6', '98', '67', '34', '14', '1', '20', '1028', '1027', '1026', '0']
['5', '21.9', '14.6', '9.1', '9.4', '6.4', '2.5', '87', '61', '33', '19', '1', '28', '1026', '1025', '1023', '0']
['6', '20.4', '12.3', '5.8', '6.6', '4.4', '2.4', '86', '61', '36', '14', '1', '20', '1024', '1020', '1017', '0']
['7', '17.6', '12.5', '9.5', '10', '7.8', '6.1', '88', '74', '56', '10', '0', '16', '1019', '1018', '1017', '0']

您好,比尔,谢谢您的回复,但我有以下错误:文件“”,第2行[.text.split()[0]代表行中的_。选择('td')]^SyntaxError:无效语法更多(我刚刚更新了html代码)我不需要,为了查看结果,我添加了以下代码数据[]对于测试中的行。选择(“#history_table>tbody>tr”):data.append=[.text.split()[0]表示行中的行。选择('td')]表示行中的行打印(数据)
[.text.split()[0]表示行中的行。选择('td')]
:如果这是代码的精确副本,则它缺少开头附近的下划线('uu')。请参考我答案中的代码。
用于测试中的行。选择(“#history#u table>tbody>tr”):[#text.split()[0]用于行中的行。选择('td')]文件“”,第2行[#text.split()[0]用于行中的行。选择('td')]^SyntaxError:无效语法
这是正确的代码。即使代码有效,我也不知道如何列出所有的
,除了
和``在
中。最后看一下html