Python 对字符串进行标记会合并一些单词
我使用以下代码标记从stdin读取的字符串Python 对字符串进行标记会合并一些单词,python,stdin,Python,Stdin,我使用以下代码标记从stdin读取的字符串 d=[] cur = '' for i in sys.stdin.readline(): if i in ' .': if cur not in d and (cur != ''): d.append(cur) cur = '' else: cur = cur + i.lower() 这给了我一个不重复的单词数组。但是,在我的输出中,有些单词不会被拆分 我
d=[]
cur = ''
for i in sys.stdin.readline():
if i in ' .':
if cur not in d and (cur != ''):
d.append(cur)
cur = ''
else:
cur = cur + i.lower()
这给了我一个不重复的单词数组。但是,在我的输出中,有些单词不会被拆分
我的意见是
Dan went to the north pole to lead an expedition during summer.
输出数组d是
['dan'、'God'、'to'、'north'、'pole'、'tolead'、'an'、'expedition'、'during'、'summer']
为什么要一起阅读?试试这个
d=[]
cur = ''
for i in sys.stdin.readline():
if i in ' .':
if cur not in d and (cur != ''):
d.append(cur)
cur = '' # note the different indentation
else:
cur = cur + i.lower()
试试这个:
for line in sys.stdin.readline():
res = set(word.lower() for word in line[:-1].split(" "))
print res
例如:
line = "Dan went to the north pole to lead an expedition during summer."
res = set(word.lower() for word in line[:-1].split(" "))
print res
set(['north', 'lead', 'expedition', 'dan', 'an', 'to', 'pole', 'during', 'went', 'summer', 'the'])
在注释之后,我编辑:此解决方案保留输入顺序并过滤分隔符
import re
from collections import OrderedDict
line = "Dan went to the north pole to lead an expedition during summer."
list(OrderedDict.fromkeys(re.findall(r"[\w']+", line)))
# ['Dan', 'went', 'to', 'the', 'north', 'pole', 'lead', 'an', 'expedition', 'during', 'summer']
“to”
已在“d”
中。因此,您的循环跳过了“到”
和“lead”
之间的空格,但继续连接;一旦到达下一个空格,它会看到“tolead”
不在d
中,所以它会附加它
更容易解决;它还去除了所有形式的标点符号:
>>> import string
>>> set("Dan went to the north pole to lead an expedition during summer.".translate(None, string.punctuation).lower().split())
set(['summer', 'north', 'lead', 'expedition', 'dan', 'an', 'to', 'pole', 'during', 'went', 'the'])
可能也应该分开讨论。只是为了确保这个问题是正确的。使用第[:-1]:)行完成。不,不完全正确,因为您可能有多个句子。仅仅因为它在OPs示例中起作用并不意味着它在野外也起作用。空格不是唯一能界定我的单词的东西,任何标点符号都可以。请看我的版本,带有任何分隔符和保留顺序