Python：在字符串中翻译/替换非'；这不是你想要的_Python_Text Processing

Python：在字符串中翻译/替换非'；这不是你想要的

python

Python：在字符串中翻译/替换非'；这不是你想要的,python,text-processing,Python,Text Processing,基本上，我有一大堆短语，我只对那些包含特定单词的短语感兴趣。我想做的是1）找出那个词是否存在，如果存在，2）删除所有其他词。我可以用一堆if和for来实现这一点，但我想知道是否会有一个简短的/python式的方法来实现它。一个建议的算法：每句话找出是否有有趣的单词如果是，请删除所有其他单词否则，请继续下一个短语是的，实现这一点需要“一堆ifs和fors”，但您会惊讶于这样的逻辑如何轻松、干净地转换为Python 实现这一点的一种更简洁的方法是使用列表理解，这在某种程度上简化了这种

基本上，我有一大堆短语，我只对那些包含特定单词的短语感兴趣。我想做的是1）找出那个词是否存在，如果存在，2）删除所有其他词。我可以用一堆if和for来实现这一点，但我想知道是否会有一个简短的/python式的方法来实现它。

一个建议的算法：

每句话
找出是否有有趣的单词
如果是，请删除所有其他单词
否则，请继续下一个短语

是的，实现这一点需要“一堆ifs和fors”，但您会惊讶于这样的逻辑如何轻松、干净地转换为Python

实现这一点的一种更简洁的方法是使用列表理解，这在某种程度上简化了这种逻辑。鉴于

短语

是短语列表：

phrases = [process(p) if isinteresting(p) else p for p in phrases]

有关

过程

和

IsInterest

函数的合适定义。

建议的算法：

每句话
找出是否有有趣的单词
如果是，请删除所有其他单词
否则，请继续下一个短语

是的，实现这一点需要“一堆ifs和fors”，但您会惊讶于这样的逻辑如何轻松、干净地转换为Python

实现这一点的一种更简洁的方法是使用列表理解，这在某种程度上简化了这种逻辑。鉴于

短语

是短语列表：

phrases = [process(p) if isinteresting(p) else p for p in phrases]

有关

过程

和

IsInterest

函数的合适定义。

基于正则表达式的解决方案：

>>> import re
>>> phrase = "A lot of interesting and boring words"
>>> regex = re.compile(r"\b(?!(?:interesting|words)\b)\w+\W*")
>>> clean = regex.sub("", phrase)
>>> clean
'interesting words'

正则表达式的工作原理如下：

\b             # start the match at a word boundary
(?!            # assert that it's not possible to match
 (?:           # one of the following:
  interesting  # "interesting"
  |            # or
  words        # "words"
 )             # add more words if desired...
 \b            # assert that there is a word boundary after our needle matches
)              # end of lookahead
\w+\W*         # match the word plus any non-word characters that follow.

基于正则表达式的解决方案：

>>> import re
>>> phrase = "A lot of interesting and boring words"
>>> regex = re.compile(r"\b(?!(?:interesting|words)\b)\w+\W*")
>>> clean = regex.sub("", phrase)
>>> clean
'interesting words'

正则表达式的工作原理如下：

\b             # start the match at a word boundary
(?!            # assert that it's not possible to match
 (?:           # one of the following:
  interesting  # "interesting"
  |            # or
  words        # "words"
 )             # add more words if desired...
 \b            # assert that there is a word boundary after our needle matches
)              # end of lookahead
\w+\W*         # match the word plus any non-word characters that follow.

我希望将translate与regex或其他方法结合使用，使其更加简洁。你的解决方案比我的更干净，所以谢谢anyway@dms：

translate

完全不是为了这个目的而设计的，虽然我努力使正则表达式在理论上可行，但我认为它不会比我提出的方法更好，也不会比pythonic更像。我认为

words

是一个有趣的单词列表，

很有趣（）

变成

any（p中的word表示words中的word）

。我希望使用translate和regex或其他方法使它更简洁。你的解决方案比我的更干净，所以谢谢anyway@dms：

translate

完全不是为了这个目的而设计的，虽然我努力使正则表达式在理论上可行，但我认为它不会比我提出的方法更好，也不会比pythonic更像。我认为

words

是一个有趣的单词列表，

很有趣（）

变成

any（p中的单词代表word中的单词）

。