Python在没有换行符的地方为换行符添加句号
我有一个带有文本的PDF,我使用PuMuPDF(fitz)按每页提取数据。我想为开头的句子加上句号。示例和代码如下所示: 例如:Python在没有换行符的地方为换行符添加句号,python,pdf,pymupdf,Python,Pdf,Pymupdf,我有一个带有文本的PDF,我使用PuMuPDF(fitz)按每页提取数据。我想为开头的句子加上句号。示例和代码如下所示: 例如: MORE PAGE INFO Name of the company and some info More info here and here The data above is correct. We are a registered firm, ("ABC") for this company. Technology etc, more
MORE PAGE INFO
Name of the company and some info
More info here and here
The data above is correct. We are a registered firm, ("ABC") for this company.
Technology etc, more sentences and a paragraph here. These sentences are much longer etc.
Here is another Pixmap example that creates Sierpinski’s Carpet – a fractal generalizing the Cantor Set to two dimensions. Given a square carpet.
期望输出:
MORE PAGE INFO.
Name of the company and some info.
More info here and here.
The data above is correct. We are a registered firm, ("ABC") for this company.
Technology etc, more sentences and a paragraph here. These sentences are much longer etc.
Here is another Pixmap example that creates Sierpinski’s Carpet – a fractal generalizing the Cantor Set to two dimensions. Given a square carpet.
当前代码:
doc =fitz.open(myfile)
page=doc[0]
for page in doc:
text = page.getText("text")
text =text.replace ("\n",'.')
print(text)
代码输出确实为短句添加了句号,但也为格式正确的句子添加了句号。我还有别的办法吗
谢谢