从Python BeautifulSoup中的列表创建html表
我在Python中使用bs4,我想从Python中的列表中获取内容,并使用bs4将其输入html代码,这样就可以使用requests.put()方法将html表发布到网站链接上。html代码是这样的:每一行都由标记组成:从Python BeautifulSoup中的列表创建html表,python,html,beautifulsoup,html-table,python-requests,Python,Html,Beautifulsoup,Html Table,Python Requests,我在Python中使用bs4,我想从Python中的列表中获取内容,并使用bs4将其输入html代码,这样就可以使用requests.put()方法将html表发布到网站链接上。html代码是这样的:每一行都由标记组成: <tr></tr> 因此,在列表中,每个元素对应于一行,每个单元格按“```”进行拆分,因此1进入第一行的第一个单元格,Jam进入第一行的第三个单元格。 html表格字符串的前面应该有一个表格标题,结尾应该是一个表尾,如下所示: html_table_
<tr></tr>
因此,在列表中,每个元素对应于一行,每个单元格按“```”进行拆分,因此1进入第一行的第一个单元格,Jam进入第一行的第三个单元格。
html表格字符串的前面应该有一个表格标题,结尾应该是一个表尾,如下所示:
html_table_header = "<p><br /></p><table><colgroup><col style=\"width: 115.0px;\" /><col style=\"width: 95.0px;\" /><col style=\"width: 58.0px;\" /><col style=\"width: 105.0px;\" /><col style=\"width: 110.0px;\" /><col style=\"width: 215.0px;\" /><col style=\"width: 215.0px;\" /></colgroup><tbody><tr><th><p>No.</p></th><th><p>Date and Time</p></th><th><p>Author</p></th><th><p>Jira</p></th><th><p>PR</p></th><th><p>Title</p></th><th><p>Commit ID</p></th></tr>"
html_table_footer = "</tbody></table><p class=\"auto-cursor-target\"><br /></p>"
html\u table\u header=“
编号日期和时间作者吉拉标题提交ID”
html\u table\u footer=“
”
因此,构成用于创建表的数据的整个html代码应该如下所示:
<p><br /></p><table><colgroup><col style=\"width: 115.0px;\" /><col style=\"width: 95.0px;\" /><col style=\"width: 58.0px;\" /><col style=\"width: 105.0px;\" /><col style=\"width: 110.0px;\" /><col style=\"width: 215.0px;\" /><col style=\"width: 215.0px;\" /></colgroup><tbody><tr><th><p>No.</p></th><th><p>Date and Time</p></th><th><p>Author</p></th><th><p>Jira</p></th><th><p>PR</p></th><th><p>Title</p></th><th><p>Commit ID</p></th></tr><tr><td><p>1</p></td><td><p>Mon, 22 Feb 2021 13:44:27 -0800</p></td><td><p>Jam</p></td><td><p>IAP-5998</p></td><td><p>10004</p></td><td><p>Model Observing a ModelIPCException</p></td><td><p>1ba4416fdd7</p></td></tr><tr><td><p>2</p></td><td><p>Mon, 30 Feb 2021 13:44:27 -0800</p></td><td><p>Rizwan</p></td><td><p>IAP-6998</p></td><td><p>10014</p></td><td><p>Model Observing</p></td><td><p>1ba4416fdd7</p></td></tr>....................................Other elements in list according to rows go here.............</tbody></table><p class=\"auto-cursor-target\"><br /></p>
编号日期和时间作者吉拉标题提交ID1周一,2021年2月22日13:44:27-0800JamIAP-599810004模型观察模型异常,2021年2月30日13:44:27-0800RizwanIAP-699810014模型观测1ba4416fdd7…..列表中的其他元素根据行进入此处………
以下是我使用的代码:
import re
import sys
import requests
import json
from requests.auth import HTTPBasicAuth
from bs4 import BeautifulSoup
html_table_header = "<p><br /></p><table><colgroup><col style=\"width: 115.0px;\" /><col style=\"width: 95.0px;\" /><col style=\"width: 58.0px;\" /><col style=\"width: 105.0px;\" /><col style=\"width: 110.0px;\" /><col style=\"width: 215.0px;\" /><col style=\"width: 215.0px;\" /></colgroup><tbody><tr><th><p>No.</p></th><th><p>Date and Time</p></th><th><p>Author</p></th><th><p>Jira</p></th><th><p>PR</p></th><th><p>Title</p></th><th><p>Commit ID</p></th></tr>"
html_table_footer = "</tbody></table><p class=\"auto-cursor-target\"><br /></p>"
rows = ["1" + "````" + "Mon, 22 Feb 2021 13:44:27 -0800" + "````" + "Jam" + "````" + "IAP-5998" + "````" + "10004" + "````" + "Model Observing a ModelIPCException" + "````" + "1ba4416fdd7", "2" + "````" + "Mon, 30 Feb 2021 13:44:27 -0800" + "````" + "Rizwan" + "````" + "IAP-6998" + "````" + "10014" + "````" + "Model Observing." + "````" + "3ba4416fdd7", "3" + "````" + "Fri, 20 Mar 2021 13:44:27 -0800" + "````" + "John" + "````" + "ATL-5998" + "````" + "10456" + "````" + "Exception during JumpToROM function call." + "````" + "8ca4416fdd7", "4" + "````" + "Mon, 14 Feb 2021 13:44:27 -0800" + "````" + "Brock Lesnar" + "````" + "IAP-6005" + "````" + "10009" + "````" + "RAM flushing JumpToROM function call." + "````" + "1ba4416fd10"]
row_string = ""
for idx in range(0, len(rows)):
soup = BeautifulSoup("<tr></tr>", 'html.parser')
for cell_id in range(0, 7):
original_tag = soup.tr
new_tag = soup.new_tag("td")
original_tag.append(new_tag)
p_tag = soup.new_tag("p")
original_tag.td.next_sibling.append(p_tag)
original_tag.p.string = rows[idx].split("````")[cell_id]
row_string += str(original_tag)
pass_str = html_table_header + row_string + html_table_footer
pass_string = str(pass_str).replace('\"', '\\"')
headers = {
'Content-Type': 'application/json',
}
data = '{"id":"534756378","type":"page", "title":"GL_Engine Output","space":{"key":"CSSAI"},"body":{"storage":{"value":"' + pass_string + '","representation":"storage"}}, "version":{"number":2}}'
response = requests.put('https://confluence.ai.com/rest/api/content/534756378', headers=headers, data=data,
auth=HTTPBasicAuth('svc-Automation@ai.com', 'AIengineering1@ai'))
重新导入
导入系统
导入请求
导入json
从requests.auth导入HTTPBasicAuth
从bs4导入BeautifulSoup
html_table_header=“
编号日期和时间作者吉拉标题提交ID”
html\u table\u footer=“
”
rows=[“1”+“10004”+“Mon,2021年2月22日13:44:27-0800”+“Jam”+“Jam”+“IAP-5998”+“10004”+“Mon,观察模型IPCAException”+“1ba4416fdd7”;“2”+“Mon,2021年2月30日13:44:27-0800”+“Rizwan”+“IAP-10098”+“模型观察模型”“3ba4416fdd7”、“3”+“Fri,2021年3月20日13:44:27-0800”+“John”+“ATL-5998”+“JumpToROM函数调用期间的”+“10456”+“8ca4416fdd7”、“4”+“John”+“Mon,2021年2月14日13:44:27-0800”+“Brock”+“Blocklessnar”+“10009”+“+”RAM刷新跳线功能调用。“+”``“+”1ba4416fd10”]
row_string=“”
对于范围(0,len(行))中的idx:
soup=BeautifulSoup(“,'html.parser')
对于范围(0,7)内的单元id:
原始标签=soup.tr
新标签=汤。新标签(“td”)
原始标签。附加(新标签)
p_标签=汤。新标签(“p”)
原始标签.td.next兄弟姐妹.append(p标签)
原始\u tag.p.string=行[idx]。拆分(“``””[cell\u id]
行字符串+=str(原始标记)
pass\u str=html\u表格\u页眉+行\u字符串+html\u表格\u页脚
pass\u string=str(pass\u str)。替换(“\”,“\ \”)
标题={
“内容类型”:“应用程序/json”,
}
数据={“id”:“534756378”,“类型”:“页面”,“标题”:“GL_引擎输出”,“空间”:{“键”:“CSSAI”},“正文”:{“存储”:{“值”:“+pass_字符串+”,“表示”:“存储”},“版本”:{“编号”:2}”
response=请求。put('https://confluence.ai.com/rest/api/content/534756378,headers=headers,data=data,
auth=HTTPBasicAuth('svc-Automation@ai.com', 'AIengineering1@ai'))
但在我的代码中,只有列表中的第一个元素,即数字1、2、3等,进入了正确的单元格,但其他元素仍被插入到第一列中,因此表在发布到网站时看起来不正确,因为只有表的标题是正确的,但其他元素在第一个colu中被压缩在一起mn本身。
我查看了发布到我网站上的rest/api html代码,但它看起来与此屏幕截图所示不符:我认为您可以使用pandas查看表格和列表,并在行上循环理解和拆分,以创建表格html
from pandas import read_html as rh
pd.set_option('display.expand_frame_repr', False)
html_table_header = "<p><br /></p><table><colgroup><col style=\"width: 115.0px;\" /><col style=\"width: 95.0px;\" /><col style=\"width: 58.0px;\" /><col style=\"width: 105.0px;\" /><col style=\"width: 110.0px;\" /><col style=\"width: 215.0px;\" /><col style=\"width: 215.0px;\" /></colgroup><tbody><tr><th><p>No.</p></th><th><p>Date and Time</p></th><th><p>Author</p></th><th><p>Jira</p></th><th><p>PR</p></th><th><p>Title</p></th><th><p>Commit ID</p></th></tr>"
html_table_footer = "</tbody></table><p class=\"auto-cursor-target\"><br /></p>"
rows = ["1" + "````" + "Mon, 22 Feb 2021 13:44:27 -0800" + "````" + "Jam" + "````" + "IAP-5998" + "````" + "10004" + "````" + "Model Observing a ModelIPCException" + "````" + "1ba4416fdd7", "2" + "````" + "Mon, 30 Feb 2021 13:44:27 -0800" + "````" + "Rizwan" + "````" + "IAP-6998" + "````" + "10014" + "````" + "Model Observing." + "````" + "3ba4416fdd7", "3" + "````" + "Fri, 20 Mar 2021 13:44:27 -0800" + "````" + "John" + "````" + "ATL-5998" + "````" + "10456" + "````" + "Exception during JumpToROM function call." + "````" + "8ca4416fdd7", "4" + "````" + "Mon, 14 Feb 2021 13:44:27 -0800" + "````" + "Brock Lesnar" + "````" + "IAP-6005" + "````" + "10009" + "````" + "RAM flushing JumpToROM function call." + "````" + "1ba4416fd10"]
body = ''
for row in rows:
body+= '<tr>' + ''.join([f'<td><p>{i}</p></td>' for i in row.split('````')]) + '</tr>'
html = html_table_header + body + html_table_footer
print(rh(html)[0])
<p><br /></p><table><colgroup><col style=\"width: 115.0px;\" /><col style=\"width: 95.0px;\" /><col style=\"width: 58.0px;\" /><col style=\"width: 105.0px;\" /><col style=\"width: 110.0px;\" /><col style=\"width: 215.0px;\" /><col style=\"width: 215.0px;\" /></colgroup><tbody><tr><th><p>No.</p></th><th><p>Date and Time</p></th><th><p>Author</p></th><th><p>Jira</p></th><th><p>PR</p></th><th><p>Title</p></th><th><p>Commit ID</p></th></tr><tr><td><p>1</p></td><td><p>Mon, 22 Feb 2021 13:44:27 -0800</p></td><td><p>Jam</p></td><td><p>IAP-5998</p></td><td><p>10004</p></td><td><p>Model Observing a ModelIPCException</p></td><td><p>1ba4416fdd7</p></td></tr><tr><td><p>2</p></td><td><p>Mon, 30 Feb 2021 13:44:27 -0800</p></td><td><p>Rizwan</p></td><td><p>IAP-6998</p></td><td><p>10014</p></td><td><p>Model Observing</p></td><td><p>1ba4416fdd7</p></td></tr>....................................Other elements in list according to rows go here.............</tbody></table><p class=\"auto-cursor-target\"><br /></p>
import re
import sys
import requests
import json
from requests.auth import HTTPBasicAuth
from bs4 import BeautifulSoup
html_table_header = "<p><br /></p><table><colgroup><col style=\"width: 115.0px;\" /><col style=\"width: 95.0px;\" /><col style=\"width: 58.0px;\" /><col style=\"width: 105.0px;\" /><col style=\"width: 110.0px;\" /><col style=\"width: 215.0px;\" /><col style=\"width: 215.0px;\" /></colgroup><tbody><tr><th><p>No.</p></th><th><p>Date and Time</p></th><th><p>Author</p></th><th><p>Jira</p></th><th><p>PR</p></th><th><p>Title</p></th><th><p>Commit ID</p></th></tr>"
html_table_footer = "</tbody></table><p class=\"auto-cursor-target\"><br /></p>"
rows = ["1" + "````" + "Mon, 22 Feb 2021 13:44:27 -0800" + "````" + "Jam" + "````" + "IAP-5998" + "````" + "10004" + "````" + "Model Observing a ModelIPCException" + "````" + "1ba4416fdd7", "2" + "````" + "Mon, 30 Feb 2021 13:44:27 -0800" + "````" + "Rizwan" + "````" + "IAP-6998" + "````" + "10014" + "````" + "Model Observing." + "````" + "3ba4416fdd7", "3" + "````" + "Fri, 20 Mar 2021 13:44:27 -0800" + "````" + "John" + "````" + "ATL-5998" + "````" + "10456" + "````" + "Exception during JumpToROM function call." + "````" + "8ca4416fdd7", "4" + "````" + "Mon, 14 Feb 2021 13:44:27 -0800" + "````" + "Brock Lesnar" + "````" + "IAP-6005" + "````" + "10009" + "````" + "RAM flushing JumpToROM function call." + "````" + "1ba4416fd10"]
row_string = ""
for idx in range(0, len(rows)):
soup = BeautifulSoup("<tr></tr>", 'html.parser')
for cell_id in range(0, 7):
original_tag = soup.tr
new_tag = soup.new_tag("td")
original_tag.append(new_tag)
p_tag = soup.new_tag("p")
original_tag.td.next_sibling.append(p_tag)
original_tag.p.string = rows[idx].split("````")[cell_id]
row_string += str(original_tag)
pass_str = html_table_header + row_string + html_table_footer
pass_string = str(pass_str).replace('\"', '\\"')
headers = {
'Content-Type': 'application/json',
}
data = '{"id":"534756378","type":"page", "title":"GL_Engine Output","space":{"key":"CSSAI"},"body":{"storage":{"value":"' + pass_string + '","representation":"storage"}}, "version":{"number":2}}'
response = requests.put('https://confluence.ai.com/rest/api/content/534756378', headers=headers, data=data,
auth=HTTPBasicAuth('svc-Automation@ai.com', 'AIengineering1@ai'))
from pandas import read_html as rh
pd.set_option('display.expand_frame_repr', False)
html_table_header = "<p><br /></p><table><colgroup><col style=\"width: 115.0px;\" /><col style=\"width: 95.0px;\" /><col style=\"width: 58.0px;\" /><col style=\"width: 105.0px;\" /><col style=\"width: 110.0px;\" /><col style=\"width: 215.0px;\" /><col style=\"width: 215.0px;\" /></colgroup><tbody><tr><th><p>No.</p></th><th><p>Date and Time</p></th><th><p>Author</p></th><th><p>Jira</p></th><th><p>PR</p></th><th><p>Title</p></th><th><p>Commit ID</p></th></tr>"
html_table_footer = "</tbody></table><p class=\"auto-cursor-target\"><br /></p>"
rows = ["1" + "````" + "Mon, 22 Feb 2021 13:44:27 -0800" + "````" + "Jam" + "````" + "IAP-5998" + "````" + "10004" + "````" + "Model Observing a ModelIPCException" + "````" + "1ba4416fdd7", "2" + "````" + "Mon, 30 Feb 2021 13:44:27 -0800" + "````" + "Rizwan" + "````" + "IAP-6998" + "````" + "10014" + "````" + "Model Observing." + "````" + "3ba4416fdd7", "3" + "````" + "Fri, 20 Mar 2021 13:44:27 -0800" + "````" + "John" + "````" + "ATL-5998" + "````" + "10456" + "````" + "Exception during JumpToROM function call." + "````" + "8ca4416fdd7", "4" + "````" + "Mon, 14 Feb 2021 13:44:27 -0800" + "````" + "Brock Lesnar" + "````" + "IAP-6005" + "````" + "10009" + "````" + "RAM flushing JumpToROM function call." + "````" + "1ba4416fd10"]
body = ''
for row in rows:
body+= '<tr>' + ''.join([f'<td><p>{i}</p></td>' for i in row.split('````')]) + '</tr>'
html = html_table_header + body + html_table_footer
print(rh(html)[0])
from bs4 import BeautifulSoup as bs
soup = bs(html, 'lxml')
print(html)
print(rh(str(soup))[0])