Python 在HTML字符串中每N个字符添加一个中断符 问题背景:
我需要对HTML字符串进行“文本包装”,以便Python 在HTML字符串中每N个字符添加一个中断符 问题背景:,python,html,string,beautifulsoup,Python,Html,String,Beautifulsoup,我需要对HTML字符串进行“文本包装”,以便元素仅应用于HTML字符串中的文本 我可以将样式应用于文本字符串(如果只需要一种样式) 但是,将样式附加到此字符串会进一步混淆实际文本与样式标记(显然) 例子: 示例测试数据: 字符串输入: <html><body><p>Author Correction: Hybrid organic-inorganic polariton laser<span style="color:red">.. A corre
元素仅应用于HTML字符串中的文本
我可以将样式应用于文本字符串(如果只需要一种样式)
但是,将样式附加到此字符串会进一步混淆实际文本与样式标记(显然)
例子:
示例测试数据:
字符串输入:
<html><body><p>Author Correction: Hybrid organic-inorganic polariton laser<span style="color:red">.. A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has </span>not<span style="color:red"> been fixed in the paper.</span> ========= Publisher Correction: Predictors of chronic kidney disease in type 1 diabetes: a longitudinal study from the AMD Annals initiative<span style="color:red">.. A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has </span><span style="color:red"> been fixed in the paper.</span></p></body></html>
作者更正:混合有机-无机极化子激光器。。这篇文章的更正已经发布,并从本文的HTML和PDF版本链接。该错误尚未在纸张中修复出版商更正:1型糖尿病慢性肾病的预测因子:AMD年鉴倡议的纵向研究。。这篇文章的更正已经发布,并从本文的HTML和PDF版本链接。这一错误已在论文中得到纠正
i、 e
作者更正:混合有机-无机极化子激光器。。这篇文章的更正已经发布,并从本文的HTML和PDF版本链接。该错误已在论文中得到纠正。==========出版商更正:1型糖尿病慢性肾病的预测因素:AMD年鉴倡议的一项纵向研究。。这篇文章的更正已经发布,并从本文的HTML和PDF版本链接。错误已在论文中修复。
(粗体为红色)
期望输出:
<html><body><p>Author Correction: Hybrid organic-inorganic <br>polariton laser<span style="color:red">.. A correction to this article has <br>been published and is linked from the HTML and PDF <br>versions of this paper. The error has </span>not<span style="color:red"> been <br>fixed in the paper.</span> ========= Publisher <br>Correction: Predictors of chronic kidney disease <br>in type 1 diabetes: a longitudinal study from the <br>AMD Annals initiative<span style="color:red">.. A correction to this <br>article has been published and is linked from the <br>HTML and PDF versions of this paper. The error has </span><span style="color:red"> <br>been fixed in the paper.</span></p></body></html>
作者更正:混合有机-无机
极化子激光器。。这篇文章的更正已经发表,并链接到本文的HTML和PDF版本。该错误尚未在论文中修复。=========出版商
校正:1型糖尿病慢性肾病的预测因素:AMD年鉴倡议的纵向研究。。这篇
文章的更正已经发布,链接自本文的
HTML和PDF版本。这一错误已在论文中得到纠正
i、 e
作者更正:混合有机-无机极化剂
激光。。对这篇文章的更正已经
已发布,并从HTML和PDF链接
本文的版本。错误已未已
固定在报纸上。==========出版商
校正:慢性肾病的预测因子
1型糖尿病:一项来自
AMD年鉴倡议。。对此的更正
文章已发表,并从
本文的HTML和PDF版本。错误已发生
<强>文中,
< P>近似蛮力的方式,如果我们认为没有标签封装(否则,请告诉),可以是:def put_tags_every_N(_input, _tag, _n):
_k = 0
_in_tag = False
_len = len(_input)
_i = 0
while _i < _len:
_c = _input[_i]
if _c == '<':
_in_tag = True
elif _c == '>':
_in_tag = False
if not _in_tag and _c != '>':
_k += 1
if _k % _n == 0:
_input = _input[:_i+1]+_tag+_input[_i+1:]
_len += len(_tag)
_i += 1
return _input
def put_tags_every_N_nowordcut(_input, _tag, _n):
_k = 0
_in_tag = False
_position_past = False
_len = len(_input)
_i = 0
while _i < _len:
_c = _input[_i]
if _c == '<':
_in_tag = True
elif _c == '>':
_in_tag = False
if not _in_tag and _c != '>':
_k += 1
if _k % _n == 0:
_position_past = True
if _position_past and (_input[_i+1] in ('<', '.') or _input[_i] == ' '):
_input = _input[:_i+1]+_tag+_input[_i+1:]
_len += len(_tag)
_position_past = False
_k = 0
_i += 1
return _input
_tmp_input = '<html><body><p>Author Correction: Hybrid organic-inorganic polariton laser<span style="color:red">.. A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has </span>not<span style="color:red"> been fixed in the paper.</span> ========= Publisher Correction: Predictors of chronic kidney disease in type 1 diabetes: a longitudinal study from the AMD Annals initiative<span style="color:red">.. A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has </span><span style="color:red"> been fixed in the paper.</span></p></body></html>'
# print(put_tags_every_N(_input=_tmp_input, _tag='<br>', _n=50))
# print(put_tags_every_N(_input=_tmp_input, _tag='<br>', _n=5))
print(put_tags_every_N_nowordcut(_input=_tmp_input, _tag='<br>', _n=50))
# print(put_tags_every_N_nowordcut(_input=_tmp_input, _tag='<br>', _n=5))
def put_tags_every_N(_input,_tag,_N):
_k=0
_in_tag=False
_len=len(_输入)
_i=0
而
_c=_输入[_i]
如果_c=='':
_in_tag=False
如果不是_in_标签和_c!='>':
_k+=1
如果_k%_n==0:
_输入=_输入[:_i+1]+_标记+_输入[_i+1:]
_len+=len(_标记)
_i+=1
返回输入
def put_tags_every_N_now或cut(_input,_tag,_N):
_k=0
_in_tag=False
_位置\过去=错误
_len=len(_输入)
_i=0
而
_c=_输入[_i]
如果_c=='':
_in_tag=False
如果不是_in_标签和_c!='>':
_k+=1
如果_k%_n==0:
_位置_过去=真
如果α位置过去和(α输入[i i+1)在(′p>近似蛮力的方式下),如果我们认为没有标签封装(否则,请告诉),则可以是:
def put_tags_every_N(_input, _tag, _n):
_k = 0
_in_tag = False
_len = len(_input)
_i = 0
while _i < _len:
_c = _input[_i]
if _c == '<':
_in_tag = True
elif _c == '>':
_in_tag = False
if not _in_tag and _c != '>':
_k += 1
if _k % _n == 0:
_input = _input[:_i+1]+_tag+_input[_i+1:]
_len += len(_tag)
_i += 1
return _input
def put_tags_every_N_nowordcut(_input, _tag, _n):
_k = 0
_in_tag = False
_position_past = False
_len = len(_input)
_i = 0
while _i < _len:
_c = _input[_i]
if _c == '<':
_in_tag = True
elif _c == '>':
_in_tag = False
if not _in_tag and _c != '>':
_k += 1
if _k % _n == 0:
_position_past = True
if _position_past and (_input[_i+1] in ('<', '.') or _input[_i] == ' '):
_input = _input[:_i+1]+_tag+_input[_i+1:]
_len += len(_tag)
_position_past = False
_k = 0
_i += 1
return _input
_tmp_input = '<html><body><p>Author Correction: Hybrid organic-inorganic polariton laser<span style="color:red">.. A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has </span>not<span style="color:red"> been fixed in the paper.</span> ========= Publisher Correction: Predictors of chronic kidney disease in type 1 diabetes: a longitudinal study from the AMD Annals initiative<span style="color:red">.. A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has </span><span style="color:red"> been fixed in the paper.</span></p></body></html>'
# print(put_tags_every_N(_input=_tmp_input, _tag='<br>', _n=50))
# print(put_tags_every_N(_input=_tmp_input, _tag='<br>', _n=5))
print(put_tags_every_N_nowordcut(_input=_tmp_input, _tag='<br>', _n=50))
# print(put_tags_every_N_nowordcut(_input=_tmp_input, _tag='<br>', _n=5))
def put_tags_every_N(_input,_tag,_N):
_k=0
_in_tag=False
_len=len(_输入)
_i=0
而
_c=_输入[_i]
如果_c=='':
_in_tag=False
如果不是_in_标签和_c!='>':
_k+=1
如果_k%_n==0:
_输入=_输入[:_i+1]+_标记+_输入[_i+1:]
_len+=len(_标记)
_i+=1
返回输入
def put_tags_every_N_now或cut(_input,_tag,_N):
_k=0
_in_tag=False
_位置\过去=错误
_len=len(_输入)
_i=0
而
_c=_输入[_i]
如果_c=='':
_in_tag=False
如果不是_in_标签和_c!='>':
_k+=1
如果_k%_n==0:
_位置_过去=真
如果_位置_过去和(_输入[_i+1]in(当文本包装时,有很多事情需要考虑)-字体大小、非比例字符、字符间空隙,这些都可以用不同的标记标签在整个文本中发生变化。知道所有这一切的是将文本呈现到屏幕上的浏览器。它由CSS控制。为什么不添加样式而不是手动添加。
?这适用于不具备*任何css功能的应用程序。特别是用于在图形包中显示小的文本片段。只有非常小的html标记子集可以使用它,但是
和
标记可以工作。*我知道这在技术上可能不正确,因为使用了内联html样式g,但是我不太了解处理输入的包的内部结构。在我意识到它们不是由我正在使用的包处理之前,我对这些字符串做了很好的样式化,所以现在我必须恢复到这些<代码> <代码>和<代码> BR> < /代码>标签。当文本包装时,有很多事情要考虑。g-字体大小、非比例字符、字符间距等
def put_tags_every_N(_input, _tag, _n):
_k = 0
_in_tag = False
_len = len(_input)
_i = 0
while _i < _len:
_c = _input[_i]
if _c == '<':
_in_tag = True
elif _c == '>':
_in_tag = False
if not _in_tag and _c != '>':
_k += 1
if _k % _n == 0:
_input = _input[:_i+1]+_tag+_input[_i+1:]
_len += len(_tag)
_i += 1
return _input
def put_tags_every_N_nowordcut(_input, _tag, _n):
_k = 0
_in_tag = False
_position_past = False
_len = len(_input)
_i = 0
while _i < _len:
_c = _input[_i]
if _c == '<':
_in_tag = True
elif _c == '>':
_in_tag = False
if not _in_tag and _c != '>':
_k += 1
if _k % _n == 0:
_position_past = True
if _position_past and (_input[_i+1] in ('<', '.') or _input[_i] == ' '):
_input = _input[:_i+1]+_tag+_input[_i+1:]
_len += len(_tag)
_position_past = False
_k = 0
_i += 1
return _input
_tmp_input = '<html><body><p>Author Correction: Hybrid organic-inorganic polariton laser<span style="color:red">.. A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has </span>not<span style="color:red"> been fixed in the paper.</span> ========= Publisher Correction: Predictors of chronic kidney disease in type 1 diabetes: a longitudinal study from the AMD Annals initiative<span style="color:red">.. A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has </span><span style="color:red"> been fixed in the paper.</span></p></body></html>'
# print(put_tags_every_N(_input=_tmp_input, _tag='<br>', _n=50))
# print(put_tags_every_N(_input=_tmp_input, _tag='<br>', _n=5))
print(put_tags_every_N_nowordcut(_input=_tmp_input, _tag='<br>', _n=50))
# print(put_tags_every_N_nowordcut(_input=_tmp_input, _tag='<br>', _n=5))