Python 从文本中打印单词_Python_Regex_Text

Python 从文本中打印单词

python regex text

Python 从文本中打印单词,python,regex,text,Python,Regex,Text,我制作了这个Python程序，用于打印文本中的单词，但是我被困在Python到达下一个“tab”索引的地方，当它检查条件时，它会返回到初始索引，我不知道为什么，所以有人能向我解释为什么它不采用新的“tab”索引吗 import re initial_text = '@ Traditionally, a text is understood to be a piece of written or spoken material in its primary form (as opposed to

我制作了这个Python程序，用于打印文本中的单词，但是我被困在Python到达下一个“tab”索引的地方，当它检查条件时，它会返回到初始索引，我不知道为什么，所以有人能向我解释为什么它不采用新的“tab”索引吗

import re

initial_text = '@ Traditionally, a text is understood to be a piece of written or spoken material in its primary form (as opposed to a paraphrase or summary). A text is any stretch of language that can be understood in context. It may be as simple as 1-2 words (such as a stop sign) or as complex as a novel. Any sequence of sentences that belong together can be considered a text.'
text = re.sub('\W+', ' ', initial_text)
t = -1
for i in text:
    n = text.find(i)
    if i == ' ':
         print(text[t+1:n])
         t = n

使用这种方法

import re

initial_text = "whatever your text is"
text = re.sub(r'[^\w\s]', '', initial_text)

words_list = text.split()
for word in words:
    print(word)

举例说明：

import re

initial_text = "Hello : David welcome to Stack ! overflow"
text = re.sub(r'[^\w\s]', '', initial_text)

上面的部分删除了标点符号

words_list = text.split()

words\u list

此步骤之后将是：[“你好”、“大卫”、“欢迎”、“到”、“堆栈”、“溢出”]

for word in words_list:
    print(word)

上面的代码从列表中提取每个元素并打印出来。

这是因为您正在使用

find（）

函数，这将返回您正在搜索的单词的第一次出现的索引号，这就是它再次移动到第一个索引的原因

您可以参考

find（）

函数。

看起来您可以使用

import re

initial_text = '@ Traditionally, a text is understood to be a piece of written or spoken material in its primary form (as opposed to a paraphrase or summary). A text is any stretch of language that can be understood in context. It may be as simple as 1-2 words (such as a stop sign) or as complex as a novel. Any sequence of sentences that belong together can be considered a text.'
words = re.findall(r'[^\W_]+', initial_text)
for word in words:
    print(word)

看

re.findall

从给定文本中提取所有不重叠的匹配项

[^\W_]+

是一个正则表达式，它匹配一个或多个不同于非单词和下划线的字符，这意味着它匹配仅由数字或/和字母（全部、ASCII和其他Unicode）组成的子字符串

看

解释

[^\W\uz]+除非单词字符以外的任何字符
（除a-z、a-z、0-9、、'、'（1个或更多）外的所有
次数（匹配尽可能多的数量））

你能发布包括“文本”在内的完整代码吗？什么是文本和

选项卡

索引？请解释你的问题，一个小的比特表字符是“\t”，如果这是你正在寻找的，你想从文本中获取单词吗？@Prakar是的，没错。哦，是的，我忘记了这个方法。谢谢您的帮助。但是我可以打印它们而不将它们存储在列表中吗？这是Python中从字符串中提取单词的最简单和标准的方法。“使用列表”是“附加的”，在什么意义上？您能解释一下这种模式r'[^\w\s]'以及它如何删除另一个单词中的标点符号吗？它与“\W+”和“[^A-za-z0-9]+”有什么区别。