Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/jquery-ui/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 对字符串进行标记会合并一些单词_Python_Stdin - Fatal编程技术网

Python 对字符串进行标记会合并一些单词

Python 对字符串进行标记会合并一些单词,python,stdin,Python,Stdin,我使用以下代码标记从stdin读取的字符串 d=[] cur = '' for i in sys.stdin.readline(): if i in ' .': if cur not in d and (cur != ''): d.append(cur) cur = '' else: cur = cur + i.lower() 这给了我一个不重复的单词数组。但是,在我的输出中,有些单词不会被拆分 我

我使用以下代码标记从stdin读取的字符串

d=[]
cur = ''
for i in sys.stdin.readline():
    if i in ' .':
        if cur not in d and (cur != ''):
            d.append(cur)
            cur = ''
    else:
        cur = cur + i.lower()
这给了我一个不重复的单词数组。但是,在我的输出中,有些单词不会被拆分

我的意见是

Dan went to the north pole to lead an expedition during summer.
输出数组d是

['dan'、'God'、'to'、'north'、'pole'、'tolead'、'an'、'expedition'、'during'、'summer']

为什么要一起阅读

试试这个

d=[]
cur = ''
for i in sys.stdin.readline():
    if i in ' .':
        if cur not in d and (cur != ''):
            d.append(cur)
        cur = '' # note the different indentation
    else:
        cur = cur + i.lower()
试试这个:

for line in sys.stdin.readline():
    res = set(word.lower() for word in line[:-1].split(" "))
    print res
例如:

line = "Dan went to the north pole to lead an expedition during summer."
res = set(word.lower() for word in line[:-1].split(" "))
print res

set(['north', 'lead', 'expedition', 'dan', 'an', 'to', 'pole', 'during', 'went', 'summer', 'the'])
在注释之后,我编辑:此解决方案保留输入顺序并过滤分隔符

import re
from collections import OrderedDict
line = "Dan went to the north pole to lead an expedition during summer."
list(OrderedDict.fromkeys(re.findall(r"[\w']+", line)))
# ['Dan', 'went', 'to', 'the', 'north', 'pole', 'lead', 'an', 'expedition', 'during', 'summer']
“to”
已在
“d”
中。因此,您的循环跳过了
“到”
“lead”
之间的空格,但继续连接;一旦到达下一个空格,它会看到
“tolead”
不在
d
中,所以它会附加它

更容易解决;它还去除了所有形式的标点符号:

>>> import string
>>> set("Dan went to the north pole to lead an expedition during summer.".translate(None, string.punctuation).lower().split())
set(['summer', 'north', 'lead', 'expedition', 'dan', 'an', 'to', 'pole', 'during', 'went', 'the'])

可能也应该分开讨论。只是为了确保这个问题是正确的。使用第[:-1]:)行完成。不,不完全正确,因为您可能有多个句子。仅仅因为它在OPs示例中起作用并不意味着它在野外也起作用。空格不是唯一能界定我的单词的东西,任何标点符号都可以。请看我的版本,带有任何分隔符和保留顺序