Python 给定字符串的索引位置，如何获得完整的句子？_Python_String_Python 3.x_Slice

Python 给定字符串的索引位置，如何获得完整的句子？

python string python-3.x

Python 给定字符串的索引位置，如何获得完整的句子？,python,string,python-3.x,slice,Python,String,Python 3.x,Slice,我有从文本文件中提取的几个单词或术语的索引，例如： position = 156 文本块： 1 section react The following serious adverse reactions are discussed in greater detail in other sections of the prescribing information: * Peripheral Neuropathy [see Warnings and Precaution

我有从文本文件中提取的几个单词或术语的索引，例如：

position = 156

文本块：

    1 section react

  The following serious adverse reactions are discussed in greater detail in other sections of the prescribing information:



 *  Peripheral Neuropathy [see  Warnings and Precautions (      5.1      )  ]  
 *  Anaphylaxis and Infusion Reactions [see  Warnings and Precautions (      5.2      )  ]  
 *  Hematologic Toxicities [see  Warnings and Precautions (      5.3      )  ]  
 *  Serious Infections and Opportunistic Infections [see  Warnings and Precautions (      5.4      )  ]  
 *  Tumor Lysis Syndrome [see  Warnings and Precautions (      5.5      )  ]  
 *  Increased Toxicity in the Presence of Severe Renal Impairment [see  Warnings and Precautions (      5.6      )  ]  
 *  Increased Toxicity in the Presence of Moderate or Severe Hepatic Impairment [see  Warnings and Precautions (      5.7      )  ]

这个词是：

Peripheral Neuropathy

因此，我的问题是：

A）给定位置，如何提取句子，例如：

position = 156

在：

输出：

以及：

但是，它会返回全文，因为我使用的是

-1

。有没有其他方法来完成这项任务

请注意，内容是一个包含我正在处理的文本的列表。

a部分： B部分： A部分： B部分：

另一个解决方案是使用正则表达式

>>> mystr = """The effort, led by Shoukhrat Mitalipov of Oregon Health and Science University, involved changing the DNA of a large number of one-cell embryos with the gene-editing technique CRISPR, according to people familiar with the scientific results. Until now, American scientists have watched with a combination of awe, envy, and some alarm as scientists elsewhere were first to explore the controversial practice. To date, three previous reports of editing human embryos were all published by scientists in China."""
>>> import re
>>> match = re.search(r'^(?:\S+\s+){5}([^.]*\.)', mystr).group(1)
match.group(1)
'Mitalipov of Oregon Health and Science University, involved changing the DNA of a large number of one-cell embryos with the gene-editing technique CRISPR, according to people familiar with the scientific results.'

假设您拥有的是字符串中的单词列表，下面是另一种解决方案：

newstr = ""
words = mystr.split(' ')
word_iter = iter(words[5:])
while not newstr.endswith('.'):
    newstr += next(word_iter) + ' '

哈哈，好吧，还有另一个解决方案，我理解你的文章在你的帖子。我是这样用的：

mystr =  """*  Peripheral Neuropathy [see  Warnings and Precautions (      5.1      )  ]  
 *  Anaphylaxis and Infusion Reactions [see  Warnings and Precautions (      5.2      )  ]  
 *  Hematologic Toxicities [see  Warnings and Precautions (      5.3      )  ]  
 *  Serious Infections and Opportunistic Infections [see  Warnings and Precautions (      5.4      )  ]  
 *  Tumor Lysis Syndrome [see  Warnings and Precautions (      5.5      )  ]  
 *  Increased Toxicity in the Presence of Severe Renal Impairment [see  Warnings and Precautions (      5.6      )  ]  
 *  Increased Toxicity in the Presence of Moderate or Severe Hepatic Impairment [see  Warnings and Precautions (      5.7      )  ]
"""

首先我们用正则表达式得到字符串中的第五个单词

target_word = re.findall('\w+', mystr)[4]

然后我们在字符串中得到它的索引：

word_index = mystr.index(target_word)

然后我们创建迭代器：

word_iter = iter(mystr[index:])

然后循环，直到行尾：

newstr = ""
while not newstr.endswith('\n'):
    newstr += next(word_iter)

另一个解决方案是使用正则表达式

>>> mystr = """The effort, led by Shoukhrat Mitalipov of Oregon Health and Science University, involved changing the DNA of a large number of one-cell embryos with the gene-editing technique CRISPR, according to people familiar with the scientific results. Until now, American scientists have watched with a combination of awe, envy, and some alarm as scientists elsewhere were first to explore the controversial practice. To date, three previous reports of editing human embryos were all published by scientists in China."""
>>> import re
>>> match = re.search(r'^(?:\S+\s+){5}([^.]*\.)', mystr).group(1)
match.group(1)
'Mitalipov of Oregon Health and Science University, involved changing the DNA of a large number of one-cell embryos with the gene-editing technique CRISPR, according to people familiar with the scientific results.'

假设您拥有的是字符串中的单词列表，下面是另一种解决方案：

newstr = ""
words = mystr.split(' ')
word_iter = iter(words[5:])
while not newstr.endswith('.'):
    newstr += next(word_iter) + ' '

哈哈，好吧，还有另一个解决方案，我理解你的文章在你的帖子。我是这样用的：

mystr =  """*  Peripheral Neuropathy [see  Warnings and Precautions (      5.1      )  ]  
 *  Anaphylaxis and Infusion Reactions [see  Warnings and Precautions (      5.2      )  ]  
 *  Hematologic Toxicities [see  Warnings and Precautions (      5.3      )  ]  
 *  Serious Infections and Opportunistic Infections [see  Warnings and Precautions (      5.4      )  ]  
 *  Tumor Lysis Syndrome [see  Warnings and Precautions (      5.5      )  ]  
 *  Increased Toxicity in the Presence of Severe Renal Impairment [see  Warnings and Precautions (      5.6      )  ]  
 *  Increased Toxicity in the Presence of Moderate or Severe Hepatic Impairment [see  Warnings and Precautions (      5.7      )  ]
"""

首先我们用正则表达式得到字符串中的第五个单词

target_word = re.findall('\w+', mystr)[4]

然后我们在字符串中得到它的索引：

word_index = mystr.index(target_word)

然后我们创建迭代器：

word_iter = iter(mystr[index:])

然后循环，直到行尾：

newstr = ""
while not newstr.endswith('\n'):
    newstr += next(word_iter)

“…我如何用一片100个代币提取pos代币”是“代币”还是一个单词？好吧，让我来解决这个问题@AGNGazer！请注意，内容是一个包含我正在处理的文本的列表。你这是什么意思？内容是包含整个文本的字符串还是Python列表？如果是后者，那么列表的元素是什么？这是一个python列表@阿格涅拉什么名单？字符、单词、句子等？“…如何用一片100个标记提取pos标记”的“标记”与单词相同吗？好的，让我来解决这个问题@AGNGazer！请注意，内容是一个包含我正在处理的文本的列表。你这是什么意思？内容是包含整个文本的字符串还是Python列表？如果是后者，那么列表的元素是什么？这是一个python列表@阿格涅拉什么名单？字符、单词、句子等？对于B部分，我得到：

TypeError:“int”对象不可下标

@J.Do适用于我，如果内容是字符串（这是我在阅读您的笔记之前假设的）。如果内容是一个Python片段列表（我不知道“blob”是什么），那么您可能需要首先创建一个字符串：

content=''.join（content）

@J.do将代码更改为在假设

content

是一个字符串列表（=“blob”？）的情况下工作对于B部分，我得到了：

TypeError:“int”对象是不可下标的

@J.Do在内容是字符串的情况下对我有效（这是我在阅读您的笔记之前假设的）。如果内容是一个Python片段列表（我不知道“blob”是什么），那么您可能需要首先创建一个字符串：

content=''.join（content）

@J.do在假设

content

是一个字符串列表（=“blob”？）的情况下更改代码以使其工作。如果您不发布它，很难知道它为什么不工作。您可能需要在

re.search（）

函数的末尾添加

，re.MULTILINE

。像

re.search（r'^（？:\S+\S+{5}（[^.]*\）、mystr、re.MULTILINE）.group（1）

@J.Do假设我已经理解了你发布的内容，我已经用另一个解决方案更新了它。如果你不发布它，很难知道为什么它不起作用。您可能需要在

re.search（）

函数的末尾添加

，re.MULTILINE

。像

re.search（r'^（？:\S+\S+{5}（[^.]*\）、mystr、re.MULTILINE）.group（1）

@J.Do假设我已经理解了你发布的内容，我已经用另一个解决方案更新了它。