HTML表格到适当的excel表格Python

HTML表格到适当的excel表格Python,python,html,python-3.x,web-scraping,Python,Html,Python 3.x,Web Scraping,我是Python新手,正在努力将web抓取数据打印到漂亮的excel表中。 下面是一个表,我正试图在Python中进行刮取和复制: 以下是HTML页面的外观: </div> <section id="first" style="display:none" aria-label="Power situation graph section"> <div class="gri

我是Python新手,正在努力将web抓取数据打印到漂亮的excel表中。 下面是一个表,我正试图在Python中进行刮取和复制:

以下是HTML页面的外观:

</div>
    <section id="first" style="display:none" aria-label="Power situation graph section">
        <div class="gridModule-2up">
            <div class="prognos_controls hidden" data-proggraph="1">
                Show data for:
                <button value="1" onclick="this.blur();" type="button" class="btn  btn--secondary prognosdaybutton"><span class="fa fa-clock-o" aria-hidden="true"></span> Yesterday</button>
                <button value="2" onclick="this.blur();" type="button" class="btn  btn--tertiary prognosdaybutton"><span class="fa fa-clock-o" aria-hidden="true"></span> Today</button>
                <button value="3" onclick="this.blur();" type="button" class="btn  btn--secondary prognosdaybutton"><span class="fa fa-clock-o" aria-hidden="true"></span> Tomorrow</button>
            </div>
            <table summary="Consumption" id="prognos_datatable_total" class="prognos_datatable scrollable">
                <thead>
                    <tr>
                                <th data-sheets-numberformat="[null,1]"></th>
                                <th data-sheets-value="[null,2,'17/02/2020']" data-sheets-numberformat="[null,1]" scope="col">2020-02-17</th>
                                <th data-sheets-numberformat="[null,1]"></th>
                                <th data-sheets-value="[null,2,'18/02/2020']" data-sheets-numberformat="[null,1]" scope="col">2020-02-18</th>
                                <th data-sheets-numberformat="[null,1]"></th>
                                <th data-sheets-value="[null,2,'19/02/2020']" data-sheets-numberformat="[null,1]" scope="col">2020-02-19</th>

                    </tr>
                    <tr>
                        <th caldata-sheets-value="[null,2,'Timme']" data-sheets-numberformat="[null,1]" scope="col">Hour</th>
                                <th data-sheets-value="[null,2,'F\u00f6rbrukning']" data-sheets-numberformat="[null,1]" scope="col">Consumption</th>
                                <th data-sheets-value="[null,2,'Prognos']" data-sheets-numberformat="[null,1]" scope="col">Forecast</th>
                                <th data-sheets-value="[null,2,'F\u00f6rbrukning']" data-sheets-numberformat="[null,1]" scope="col">Consumption</th>
                                <th data-sheets-value="[null,2,'Prognos']" data-sheets-numberformat="[null,1]" scope="col">Forecast</th>
                                <th data-sheets-value="[null,2,'F\u00f6rbrukning']" data-sheets-numberformat="[null,1]" scope="col">Consumption</th>
                                <th data-sheets-value="[null,2,'Prognos']" data-sheets-numberformat="[null,1]" scope="col">Forecast</th>

                    </tr>
                </thead>
                <tbody>
                    <tr>
                        <th data-sheets-value="[null,2,'00-01']" data-sheets-numberformat="[null,1]" scope="col">
                            00-01
                        </th>

                            <td data-sheets-value="[null,2,'15544']" data-sheets-numberformat="[null,1]">15&#160;544</td>
                            <td class="alert_1" data-sheets-value="[null,2,'15143']" data-sheets-numberformat="[null,1]">15&#160;143</td>
                            <td data-sheets-value="[null,2,'15669']" data-sheets-numberformat="[null,1]">15&#160;669</td>
                            <td class="alert_1" data-sheets-value="[null,2,'15869']" data-sheets-numberformat="[null,1]">15&#160;869</td>
                            <td data-sheets-value="[null,2,'-']" data-sheets-numberformat="[null,1]">-</td>
                            <td class="alert_1" data-sheets-value="[null,2,'16422']" data-sheets-numberformat="[null,1]">16&#160;422</td>
                    </tr>
                    <tr>
                        <th data-sheets-value="[null,2,'01-02']" data-sheets-numberformat="[null,1]" scope="col">
                            01-02
                        </th>

                            <td data-sheets-value="[null,2,'15238']" data-sheets-numberformat="[null,1]">15&#160;238</td>
                            <td class="alert_1" data-sheets-value="[null,2,'15052']" data-sheets-numberformat="[null,1]">15&#160;052</td>
                            <td data-sheets-value="[null,2,'15509']" data-sheets-numberformat="[null,1]">15&#160;509</td>
                            <td class="alert_1" data-sheets-value="[null,2,'15366']" data-sheets-numberformat="[null,1]">15&#160;366</td>
                            <td data-sheets-value="[null,2,'-']" data-sheets-numberformat="[null,1]">-</td>
                            <td class="alert_1" data-sheets-value="[null,2,'16176']" data-sheets-numberformat="[null,1]">16&#160;176</td>
                    </tr>
                    <tr>
                        <th data-sheets-value="[null,2,'02-03']" data-sheets-numberformat="[null,1]" scope="col">
                            02-03
                        </th>

                            <td data-sheets-value="[null,2,'15250']" data-sheets-numberformat="[null,1]">15&#160;250</td>
                            <td class="alert_1" data-sheets-value="[null,2,'15135']" data-sheets-numberformat="[null,1]">15&#160;135</td>
                            <td data-sheets-value="[null,2,'15576']" data-sheets-numberformat="[null,1]">15&#160;576</td>
                            <td class="alert_1" data-sheets-value="[null,2,'15501']" data-sheets-numberformat="[null,1]">15&#160;501</td>
                            <td data-sheets-value="[null,2,'-']" data-sheets-numberformat="[null,1]">-</td>
                            <td class="alert_1" data-sheets-value="[null,2,'16124']" data-sheets-numberformat="[null,1]">16&#160;124</td>
                    </tr>
                    <tr>
                        <th data-sheets-value="[null,2,'03-04']" data-sheets-numberformat="[null,1]" scope="col">
                            03-04
                        </th>.............
这是我的输出与此代码的外观:

如何在此基础上创建正常数据框,然后将其导出到Excel


我将非常感谢任何帮助。

试着在这里和熊猫一起去。它在引擎盖下使用beautifulsoup。我无法在你的URL上测试它,因为你还没有提供

import pandas as pd

url = 'myURLlink'
df = pd.read_html(url)[1]

df.to_csv('file.csv', index=False)
print (df.to_string())

试着在这里和熊猫一起去。它在引擎盖下使用beautifulsoup。我无法在你的URL上测试它,因为你还没有提供

import pandas as pd

url = 'myURLlink'
df = pd.read_html(url)[1]

df.to_csv('file.csv', index=False)
print (df.to_string())

问题是因为转义字符

from bs4 import BeautifulSoup

with open("sample.html", "r") as f:

    contents = f.read()
    soup = BeautifulSoup(contents, 'lxml')
    extract = soup.find("table")

    # added strip() to remove leading and trailing characters
    table = [[item.text.strip() for item in row_data.select("th,td")]
                    for row_data in extract.select("tr")]

    for item in table:
        print(' '.join(item))

检查输出

问题是因为转义字符

from bs4 import BeautifulSoup

with open("sample.html", "r") as f:

    contents = f.read()
    soup = BeautifulSoup(contents, 'lxml')
    extract = soup.find("table")

    # added strip() to remove leading and trailing characters
    table = [[item.text.strip() for item in row_data.select("th,td")]
                    for row_data in extract.select("tr")]

    for item in table:
        print(' '.join(item))

检查输出

您可以从代码片段中删除无用的导入(
csv
lxml.html
请求
pandas
)。以及没有提供价值的评论行。谢谢库纳尔!这很有帮助。有可能根据这个结果创建一个数据帧吗?Katya,有可能。只需将列表转换为dataframe:-df=pd.dataframe(表)和yes,而不是df.to_csv(“demo.csv”,index=False),就可以从代码段中删除无用的导入(
csv
lxml.html
请求
pandas
)。以及没有提供价值的评论行。谢谢库纳尔!这很有帮助。有可能根据这个结果创建一个数据帧吗?Katya,有可能。只需将列表转换为数据帧:-df=pd.dataframe(表),是而不是df.to_csv(“demo.csv”,index=False)嗨!谢谢你的回复。我得到一个CSV文件,里面有一些奇怪的符号。也许你可以看看?url为:。谢谢大家!@Katya,将行更改为:
df.to_csv('file.csv',encoding='utf-8-sig',index=False)
Hi!谢谢你的回复。我得到一个CSV文件,里面有一些奇怪的符号。也许你可以看看?url为:。谢谢大家!@Katya,将行更改为:
df.to_csv('file.csv',encoding='utf-8-sig',index=False)