如何使用python将枚举文档与原始文档合并?

如何使用python将枚举文档与原始文档合并?,python,text,Python,Text,首先,我想列举一个包含两个以上句子的文档,如下所示: doc = """I like movie. But I don't like the cast. The story is very nice""" doc1 = doc.split('.') list = [] for i in enumerate(doc1): list.append(i) 对于每个句子,我都会找到一个情绪分数,然后我希望通过平均分数将枚举文档合并到原始格式 如有任何答复,将不胜感激 doc2 = """I

首先,我想列举一个包含两个以上句子的文档,如下所示:

doc = """I like movie. But I don't like the cast. The story is very nice"""
doc1 = doc.split('.')
list = []
for i in enumerate(doc1):
     list.append(i)
对于每个句子,我都会找到一个情绪分数,然后我希望通过平均分数将枚举文档合并到原始格式

如有任何答复,将不胜感激

doc2 = """I like movie. But I don't like the cast. The story is very nice"""

我不确定我是否真的理解了你的问题

请注意,您的代码相当于:

doc = """I like movie. But I don't like the cast. The story is very nice"""
doc1 = doc.split('.')
result = list(enumerate(doc1))
(我使用了
result
,因为
list
会隐藏我用来构建列表的名称
list

如果你把

doc = """I like movie. But I don't like the cast. The story is very nice"""
作为输入,您将获得
结果

result = [(0,"I like movie"),(1," But I don't like the cast"),(2," The story is very nice")]
注意字符串开头的空格。它可能是你想要的,也可能不是

基本结果 如果您的问题是“如何在给定结果的情况下重新创建初始字符串?”,下面是一个示例代码:

recreated_doc = ".".join(value for index, value in result)
高级答案 请注意,如果您提供

doc = """I like movie. But I don't like the cast. The story is very nice."""
使用结尾逗号,您将得到:

result = [(0,"I like movie"),(1," But I don't like the cast"),(2," The story is very nice"),(3,"")]
但是如果我想换成下一行呢

result = [(0,"I like movie"),(1,"But I don't like the cast"),(2,"The story is very nice")]
(请注意,字符串的开头没有空格,也没有空字符串。)

代码如下:

doc = """I like movie. But I don't like the cast. The story is very nice."""
doc1 = doc.split('.')
doc2 = (part.strip(' ') for part in doc1)
doc3 = (part for part in doc2 if len(part) > 0)
result = list(enumerate(doc3))
# result = [(0, 'I like movie'), (1, "But I don't like the cast"), (2, 'The story is very nice')]
并重新创建原始字符串:

recreated_doc = " ".join(value+"." for index, value in result)
# recreated_doc = """I like movie. But I don't like the cast. The story is very nice."""
警告,高级解决方案并不总是重新创建相同的原始文档,因此可能无法执行

例如:

doc = """This a document with a lot of spaces.   .   Too much spaces here.       And also here     .   ."""
# [...]
# recreated_doc = """This a document with a lot of spaces. Too much spaces here. And also here."""

你能举一个你想要的输出的例子吗:“第二,我想把列举的文档合并成原始格式。”?输出应该像原始文档“doc1”一样,我不确定我是否在这方面跟你一样-如果你有权访问原始文档及其内容,你为什么要重新构建它?我正在为每个句子找到分数,然后我想通过取平均值将其合并到原始文档中。从中加入很简单:
。。。。加入(j代表uu,j在列表中)
谢谢@gissehel。这就是我所期望的“重新创建的_doc=“.”。join(索引的值,结果中的值)