从Python BeautifulSoup中的列表创建html表

从Python BeautifulSoup中的列表创建html表,python,html,beautifulsoup,html-table,python-requests,Python,Html,Beautifulsoup,Html Table,Python Requests,我在Python中使用bs4,我想从Python中的列表中获取内容,并使用bs4将其输入html代码,这样就可以使用requests.put()方法将html表发布到网站链接上。html代码是这样的:每一行都由标记组成: <tr></tr> 因此,在列表中,每个元素对应于一行,每个单元格按“```”进行拆分,因此1进入第一行的第一个单元格,Jam进入第一行的第三个单元格。 html表格字符串的前面应该有一个表格标题,结尾应该是一个表尾,如下所示: html_table_

我在Python中使用bs4,我想从Python中的列表中获取内容,并使用bs4将其输入html代码,这样就可以使用requests.put()方法将html表发布到网站链接上。html代码是这样的:每一行都由标记组成:

<tr></tr>
因此,在列表中,每个元素对应于一行,每个单元格按“```”进行拆分,因此1进入第一行的第一个单元格,Jam进入第一行的第三个单元格。 html表格字符串的前面应该有一个表格标题,结尾应该是一个表尾,如下所示:

html_table_header = "<p><br /></p><table><colgroup><col style=\"width: 115.0px;\" /><col style=\"width: 95.0px;\" /><col style=\"width: 58.0px;\" /><col style=\"width: 105.0px;\" /><col style=\"width: 110.0px;\" /><col style=\"width: 215.0px;\" /><col style=\"width: 215.0px;\" /></colgroup><tbody><tr><th><p>No.</p></th><th><p>Date and Time</p></th><th><p>Author</p></th><th><p>Jira</p></th><th><p>PR</p></th><th><p>Title</p></th><th><p>Commit ID</p></th></tr>"

html_table_footer = "</tbody></table><p class=\"auto-cursor-target\"><br /></p>"
html\u table\u header=“
编号

日期和时间

作者

吉拉

标题

提交ID

” html\u table\u footer=“


因此,构成用于创建表的数据的整个html代码应该如下所示:

<p><br /></p><table><colgroup><col style=\"width: 115.0px;\" /><col style=\"width: 95.0px;\" /><col style=\"width: 58.0px;\" /><col style=\"width: 105.0px;\" /><col style=\"width: 110.0px;\" /><col style=\"width: 215.0px;\" /><col style=\"width: 215.0px;\" /></colgroup><tbody><tr><th><p>No.</p></th><th><p>Date and Time</p></th><th><p>Author</p></th><th><p>Jira</p></th><th><p>PR</p></th><th><p>Title</p></th><th><p>Commit ID</p></th></tr><tr><td><p>1</p></td><td><p>Mon, 22 Feb 2021 13:44:27 -0800</p></td><td><p>Jam</p></td><td><p>IAP-5998</p></td><td><p>10004</p></td><td><p>Model Observing a ModelIPCException</p></td><td><p>1ba4416fdd7</p></td></tr><tr><td><p>2</p></td><td><p>Mon, 30 Feb 2021 13:44:27 -0800</p></td><td><p>Rizwan</p></td><td><p>IAP-6998</p></td><td><p>10014</p></td><td><p>Model Observing</p></td><td><p>1ba4416fdd7</p></td></tr>....................................Other elements in list according to rows go here.............</tbody></table><p class=\"auto-cursor-target\"><br /></p>

编号

日期和时间

作者

吉拉

标题

提交ID

1

周一,2021年2月22日13:44:27-0800

Jam

IAP-5998

10004

模型观察模型异常,2021年2月30日13:44:27-0800

Rizwan

IAP-6998

10014

模型观测

1ba4416fdd7

…..列表中的其他元素根据行进入此处………


以下是我使用的代码:

import re
import sys
import requests
import json
from requests.auth import HTTPBasicAuth
from bs4 import BeautifulSoup

html_table_header = "<p><br /></p><table><colgroup><col style=\"width: 115.0px;\" /><col style=\"width: 95.0px;\" /><col style=\"width: 58.0px;\" /><col style=\"width: 105.0px;\" /><col style=\"width: 110.0px;\" /><col style=\"width: 215.0px;\" /><col style=\"width: 215.0px;\" /></colgroup><tbody><tr><th><p>No.</p></th><th><p>Date and Time</p></th><th><p>Author</p></th><th><p>Jira</p></th><th><p>PR</p></th><th><p>Title</p></th><th><p>Commit ID</p></th></tr>"

html_table_footer = "</tbody></table><p class=\"auto-cursor-target\"><br /></p>"

rows = ["1" + "````" + "Mon, 22 Feb 2021 13:44:27 -0800" + "````" + "Jam" + "````" + "IAP-5998" + "````" + "10004" + "````" + "Model Observing a ModelIPCException" + "````" + "1ba4416fdd7", "2" + "````" + "Mon, 30 Feb 2021 13:44:27 -0800" + "````" + "Rizwan" + "````" + "IAP-6998" + "````" + "10014" + "````" + "Model Observing." + "````" + "3ba4416fdd7", "3" + "````" + "Fri, 20 Mar 2021 13:44:27 -0800" + "````" + "John" + "````" + "ATL-5998" + "````" + "10456" + "````" + "Exception during JumpToROM function call." + "````" + "8ca4416fdd7", "4" + "````" + "Mon, 14 Feb 2021 13:44:27 -0800" + "````" + "Brock Lesnar" + "````" + "IAP-6005" + "````" + "10009" + "````" + "RAM flushing JumpToROM function call." + "````" + "1ba4416fd10"]

row_string = ""
for idx in range(0, len(rows)):
    soup = BeautifulSoup("<tr></tr>", 'html.parser')
    for cell_id in range(0, 7):
        original_tag = soup.tr
        new_tag = soup.new_tag("td")
        original_tag.append(new_tag)
        p_tag = soup.new_tag("p")
        original_tag.td.next_sibling.append(p_tag)
        original_tag.p.string = rows[idx].split("````")[cell_id]
        row_string += str(original_tag)

pass_str = html_table_header + row_string + html_table_footer
pass_string = str(pass_str).replace('\"', '\\"')

headers = {
    'Content-Type': 'application/json',
}

data = '{"id":"534756378","type":"page", "title":"GL_Engine Output","space":{"key":"CSSAI"},"body":{"storage":{"value":"' + pass_string + '","representation":"storage"}}, "version":{"number":2}}'

response = requests.put('https://confluence.ai.com/rest/api/content/534756378', headers=headers, data=data,
                        auth=HTTPBasicAuth('svc-Automation@ai.com', 'AIengineering1@ai'))

重新导入
导入系统
导入请求
导入json
从requests.auth导入HTTPBasicAuth
从bs4导入BeautifulSoup
html_table_header=“
编号

日期和时间

作者

吉拉

标题

提交ID

” html\u table\u footer=“


” rows=[“1”+“10004”+“Mon,2021年2月22日13:44:27-0800”+“Jam”+“Jam”+“IAP-5998”+“10004”+“Mon,观察模型IPCAException”+“1ba4416fdd7”;“2”+“Mon,2021年2月30日13:44:27-0800”+“Rizwan”+“IAP-10098”+“模型观察模型”“3ba4416fdd7”、“3”+“Fri,2021年3月20日13:44:27-0800”+“John”+“ATL-5998”+“JumpToROM函数调用期间的”+“10456”+“8ca4416fdd7”、“4”+“John”+“Mon,2021年2月14日13:44:27-0800”+“Brock”+“Blocklessnar”+“10009”+“+”RAM刷新跳线功能调用。“+”``“+”1ba4416fd10”] row_string=“” 对于范围(0,len(行))中的idx: soup=BeautifulSoup(“,'html.parser') 对于范围(0,7)内的单元id: 原始标签=soup.tr 新标签=汤。新标签(“td”) 原始标签。附加(新标签) p_标签=汤。新标签(“p”) 原始标签.td.next兄弟姐妹.append(p标签) 原始\u tag.p.string=行[idx]。拆分(“``””[cell\u id] 行字符串+=str(原始标记) pass\u str=html\u表格\u页眉+行\u字符串+html\u表格\u页脚 pass\u string=str(pass\u str)。替换(“\”,“\ \”) 标题={ “内容类型”:“应用程序/json”, } 数据={“id”:“534756378”,“类型”:“页面”,“标题”:“GL_引擎输出”,“空间”:{“键”:“CSSAI”},“正文”:{“存储”:{“值”:“+pass_字符串+”,“表示”:“存储”},“版本”:{“编号”:2}” response=请求。put('https://confluence.ai.com/rest/api/content/534756378,headers=headers,data=data, auth=HTTPBasicAuth('svc-Automation@ai.com', 'AIengineering1@ai'))
但在我的代码中,只有列表中的第一个元素,即数字1、2、3等,进入了正确的单元格,但其他元素仍被插入到第一列中,因此表在发布到网站时看起来不正确,因为只有表的标题是正确的,但其他元素在第一个colu中被压缩在一起mn本身。
我查看了发布到我网站上的rest/api html代码,但它看起来与此屏幕截图所示不符:

我认为您可以使用pandas查看表格和列表,并在行上循环理解和拆分,以创建表格html

from pandas import read_html as rh

pd.set_option('display.expand_frame_repr', False)

html_table_header = "<p><br /></p><table><colgroup><col style=\"width: 115.0px;\" /><col style=\"width: 95.0px;\" /><col style=\"width: 58.0px;\" /><col style=\"width: 105.0px;\" /><col style=\"width: 110.0px;\" /><col style=\"width: 215.0px;\" /><col style=\"width: 215.0px;\" /></colgroup><tbody><tr><th><p>No.</p></th><th><p>Date and Time</p></th><th><p>Author</p></th><th><p>Jira</p></th><th><p>PR</p></th><th><p>Title</p></th><th><p>Commit ID</p></th></tr>"

html_table_footer = "</tbody></table><p class=\"auto-cursor-target\"><br /></p>"

rows = ["1" + "````" + "Mon, 22 Feb 2021 13:44:27 -0800" + "````" + "Jam" + "````" + "IAP-5998" + "````" + "10004" + "````" + "Model Observing a ModelIPCException" + "````" + "1ba4416fdd7", "2" + "````" + "Mon, 30 Feb 2021 13:44:27 -0800" + "````" + "Rizwan" + "````" + "IAP-6998" + "````" + "10014" + "````" + "Model Observing." + "````" + "3ba4416fdd7", "3" + "````" + "Fri, 20 Mar 2021 13:44:27 -0800" + "````" + "John" + "````" + "ATL-5998" + "````" + "10456" + "````" + "Exception during JumpToROM function call." + "````" + "8ca4416fdd7", "4" + "````" + "Mon, 14 Feb 2021 13:44:27 -0800" + "````" + "Brock Lesnar" + "````" + "IAP-6005" + "````" + "10009" + "````" + "RAM flushing JumpToROM function call." + "````" + "1ba4416fd10"]
body = ''

for row in rows:
    body+= '<tr>' + ''.join([f'<td><p>{i}</p></td>' for i in row.split('````')]) + '</tr>'
    
html = html_table_header + body + html_table_footer 
print(rh(html)[0])
<p><br /></p><table><colgroup><col style=\"width: 115.0px;\" /><col style=\"width: 95.0px;\" /><col style=\"width: 58.0px;\" /><col style=\"width: 105.0px;\" /><col style=\"width: 110.0px;\" /><col style=\"width: 215.0px;\" /><col style=\"width: 215.0px;\" /></colgroup><tbody><tr><th><p>No.</p></th><th><p>Date and Time</p></th><th><p>Author</p></th><th><p>Jira</p></th><th><p>PR</p></th><th><p>Title</p></th><th><p>Commit ID</p></th></tr><tr><td><p>1</p></td><td><p>Mon, 22 Feb 2021 13:44:27 -0800</p></td><td><p>Jam</p></td><td><p>IAP-5998</p></td><td><p>10004</p></td><td><p>Model Observing a ModelIPCException</p></td><td><p>1ba4416fdd7</p></td></tr><tr><td><p>2</p></td><td><p>Mon, 30 Feb 2021 13:44:27 -0800</p></td><td><p>Rizwan</p></td><td><p>IAP-6998</p></td><td><p>10014</p></td><td><p>Model Observing</p></td><td><p>1ba4416fdd7</p></td></tr>....................................Other elements in list according to rows go here.............</tbody></table><p class=\"auto-cursor-target\"><br /></p>
import re
import sys
import requests
import json
from requests.auth import HTTPBasicAuth
from bs4 import BeautifulSoup

html_table_header = "<p><br /></p><table><colgroup><col style=\"width: 115.0px;\" /><col style=\"width: 95.0px;\" /><col style=\"width: 58.0px;\" /><col style=\"width: 105.0px;\" /><col style=\"width: 110.0px;\" /><col style=\"width: 215.0px;\" /><col style=\"width: 215.0px;\" /></colgroup><tbody><tr><th><p>No.</p></th><th><p>Date and Time</p></th><th><p>Author</p></th><th><p>Jira</p></th><th><p>PR</p></th><th><p>Title</p></th><th><p>Commit ID</p></th></tr>"

html_table_footer = "</tbody></table><p class=\"auto-cursor-target\"><br /></p>"

rows = ["1" + "````" + "Mon, 22 Feb 2021 13:44:27 -0800" + "````" + "Jam" + "````" + "IAP-5998" + "````" + "10004" + "````" + "Model Observing a ModelIPCException" + "````" + "1ba4416fdd7", "2" + "````" + "Mon, 30 Feb 2021 13:44:27 -0800" + "````" + "Rizwan" + "````" + "IAP-6998" + "````" + "10014" + "````" + "Model Observing." + "````" + "3ba4416fdd7", "3" + "````" + "Fri, 20 Mar 2021 13:44:27 -0800" + "````" + "John" + "````" + "ATL-5998" + "````" + "10456" + "````" + "Exception during JumpToROM function call." + "````" + "8ca4416fdd7", "4" + "````" + "Mon, 14 Feb 2021 13:44:27 -0800" + "````" + "Brock Lesnar" + "````" + "IAP-6005" + "````" + "10009" + "````" + "RAM flushing JumpToROM function call." + "````" + "1ba4416fd10"]

row_string = ""
for idx in range(0, len(rows)):
    soup = BeautifulSoup("<tr></tr>", 'html.parser')
    for cell_id in range(0, 7):
        original_tag = soup.tr
        new_tag = soup.new_tag("td")
        original_tag.append(new_tag)
        p_tag = soup.new_tag("p")
        original_tag.td.next_sibling.append(p_tag)
        original_tag.p.string = rows[idx].split("````")[cell_id]
        row_string += str(original_tag)

pass_str = html_table_header + row_string + html_table_footer
pass_string = str(pass_str).replace('\"', '\\"')

headers = {
    'Content-Type': 'application/json',
}

data = '{"id":"534756378","type":"page", "title":"GL_Engine Output","space":{"key":"CSSAI"},"body":{"storage":{"value":"' + pass_string + '","representation":"storage"}}, "version":{"number":2}}'

response = requests.put('https://confluence.ai.com/rest/api/content/534756378', headers=headers, data=data,
                        auth=HTTPBasicAuth('svc-Automation@ai.com', 'AIengineering1@ai'))

from pandas import read_html as rh

pd.set_option('display.expand_frame_repr', False)

html_table_header = "<p><br /></p><table><colgroup><col style=\"width: 115.0px;\" /><col style=\"width: 95.0px;\" /><col style=\"width: 58.0px;\" /><col style=\"width: 105.0px;\" /><col style=\"width: 110.0px;\" /><col style=\"width: 215.0px;\" /><col style=\"width: 215.0px;\" /></colgroup><tbody><tr><th><p>No.</p></th><th><p>Date and Time</p></th><th><p>Author</p></th><th><p>Jira</p></th><th><p>PR</p></th><th><p>Title</p></th><th><p>Commit ID</p></th></tr>"

html_table_footer = "</tbody></table><p class=\"auto-cursor-target\"><br /></p>"

rows = ["1" + "````" + "Mon, 22 Feb 2021 13:44:27 -0800" + "````" + "Jam" + "````" + "IAP-5998" + "````" + "10004" + "````" + "Model Observing a ModelIPCException" + "````" + "1ba4416fdd7", "2" + "````" + "Mon, 30 Feb 2021 13:44:27 -0800" + "````" + "Rizwan" + "````" + "IAP-6998" + "````" + "10014" + "````" + "Model Observing." + "````" + "3ba4416fdd7", "3" + "````" + "Fri, 20 Mar 2021 13:44:27 -0800" + "````" + "John" + "````" + "ATL-5998" + "````" + "10456" + "````" + "Exception during JumpToROM function call." + "````" + "8ca4416fdd7", "4" + "````" + "Mon, 14 Feb 2021 13:44:27 -0800" + "````" + "Brock Lesnar" + "````" + "IAP-6005" + "````" + "10009" + "````" + "RAM flushing JumpToROM function call." + "````" + "1ba4416fd10"]
body = ''

for row in rows:
    body+= '<tr>' + ''.join([f'<td><p>{i}</p></td>' for i in row.split('````')]) + '</tr>'
    
html = html_table_header + body + html_table_footer 
print(rh(html)[0])
from bs4 import BeautifulSoup as bs

soup = bs(html, 'lxml')
print(html)
print(rh(str(soup))[0])