使用python将本地html文件表单列数据提取到.csv文件_Python_Html_Excel_Csv

使用python将本地html文件表单列数据提取到.csv文件

python html excel csv

使用python将本地html文件表单列数据提取到.csv文件,python,html,excel,csv,Python,Html,Excel,Csv,我的任务是使用python将表的列数据从.docx提取到.xls或.csv文件而下表是这样的表4-1。比尤梅因表4-2。咬合引擎最初，我使用“mammoth”库将.docx文件转换为.html文件（因为我登录了许多网站，每个人都在将.docx文件转换为html，以便更轻松地处理数据）现在，我只需要将转换后的html文件中每个表名（如表4-1.Bite_main.c）的“CHECK”列提取到.xls或.csv表中。但是在xls表中应该是这样的 1. Bite_main.c o

我的任务是使用python将表的列数据从.docx提取到.xls或.csv文件而下表是这样的

表4-1。比尤梅因表4-2。咬合引擎最初，我使用“mammoth”库将.docx文件转换为.html文件（因为我登录了许多网站，每个人都在将.docx文件转换为html，以便更轻松地处理数据）

现在，我只需要将转换后的html文件中每个表名（如表4-1.Bite_main.c）的“CHECK”列提取到.xls或.csv表中。但是在xls表中应该是这样的

1. Bite_main.c      overflow.2,overflow.5,overflow.8,overflow.12
2. Bite_Engine.c    overflow.4,overflow.9,overflow.8,overflow.10

---

我使用了下面的代码来转换为html

with open("\input.docx", "rb") as docx_file, open("\out_file.html", "w") as myfile:
    result = mammoth.convert_to_html(docx_file,      include_default_style_map=False)
    html = result.value
    myfile.write("%s" % html.encode("utf-8", "ignore")) # here one issue is I am getting all the file data in a single line of HTML file

After conversion, i tried to extract the table buti am not getting idea properly    

raw_html = open("\out_file.html", 'r').read()
        soup = BeautifulSoup(raw_html, "html.parser")
        tables = soup.findAll("table")
        table_list = []
        for table in tables:
            table_dict = {}
            rows = table.findAll("tr")
           count = 0
            for row in rows:
                value_list = []
                entries = row.findAll("td")

当我遇到“表4-1.Bite_main.c”，然后在新的xls表中单独提取“CHECK”列时，我不知道如何提取数据。对于所有的“表4.x.xxx.x”，我需要重复同样的事情

我对Python非常陌生。请提供实现上述概念的逻辑，或提供更好的处理方法。提前感谢回答此问题的人。

您的问题并添加

打印（表格）

您的问题并添加

打印（表格）

1. Bite_main.c      overflow.2,overflow.5,overflow.8,overflow.12
2. Bite_Engine.c    overflow.4,overflow.9,overflow.8,overflow.10

with open("\input.docx", "rb") as docx_file, open("\out_file.html", "w") as myfile:
    result = mammoth.convert_to_html(docx_file,      include_default_style_map=False)
    html = result.value
    myfile.write("%s" % html.encode("utf-8", "ignore")) # here one issue is I am getting all the file data in a single line of HTML file

After conversion, i tried to extract the table buti am not getting idea properly    

raw_html = open("\out_file.html", 'r').read()
        soup = BeautifulSoup(raw_html, "html.parser")
        tables = soup.findAll("table")
        table_list = []
        for table in tables:
            table_dict = {}
            rows = table.findAll("tr")
           count = 0
            for row in rows:
                value_list = []
                entries = row.findAll("td")