Python 从输入文件中的唯一字符串中剥离标点符号_Python_String_Punctuation

Python 从输入文件中的唯一字符串中剥离标点符号

python string

Python 从输入文件中的唯一字符串中剥离标点符号,python,string,punctuation,Python,String,Punctuation,这个问题（）涉及从单个字符串中去除标点符号。但是，我希望从输入文件中读取文本，但只打印出所有字符串的一个副本，而不使用结束标点符号。我开始这样做： f = open('#file name ...', 'a+') for x in set(f.read().split()): print x 但问题是，例如，如果输入文件有以下行： This is not is, clearly is: weird 它以不同的方式处理“是”的三种不同情况，但我想忽略任何标点符号，只打印一次“是”，而不

这个问题（）涉及从单个字符串中去除标点符号。但是，我希望从输入文件中读取文本，但只打印出所有字符串的一个副本，而不使用结束标点符号。我开始这样做：

f = open('#file name ...', 'a+')
for x in set(f.read().split()):
    print x

但问题是，例如，如果输入文件有以下行：

This is not is, clearly is: weird

它以不同的方式处理“是”的三种不同情况，但我想忽略任何标点符号，只打印一次“是”，而不是三次。如何删除任何类型的结尾标点符号，然后将结果字符串放入集合中

谢谢你的帮助。（我对Python真的很陌生。）

应该能够更准确地辨别单词

此正则表达式查找字母数字字符的紧凑组（a-z、a-z、0-9、307;）

如果只想查找字母（无数字和下划线），请将

\w

替换为

[a-zA-Z]

>>> re.findall(r'\b\w+\b', "This is not is, clearly is: weird")
['This', 'is', 'not', 'is', 'clearly', 'is', 'weird']

例如，如果你不想用空格替换标点符号，你可以使用翻译表

>>> from string import maketrans
>>> punctuation = ",;.:"
>>> replacement = "    "
>>> trans_table = maketrans(punctuation, replacement)
>>> 'This is not is, clearly is: weird'.translate(trans_table)
'This is not is  clearly is  weird'
# And for your case of creating a set of unique words.
>>> set('This is not is  clearly is  weird'.split())
set(['This', 'not', 'is', 'clearly', 'weird'])

是否确实要在

a+

模式下打开文件

应该足够了。你说r足够了，这是正确的，不过我希望以后能附加到文件中，这样我就可以在那里添加一个+以备将来使用。感谢关于常客的提示，我不知道他们。

>>> from string import maketrans
>>> punctuation = ",;.:"
>>> replacement = "    "
>>> trans_table = maketrans(punctuation, replacement)
>>> 'This is not is, clearly is: weird'.translate(trans_table)
'This is not is  clearly is  weird'
# And for your case of creating a set of unique words.
>>> set('This is not is  clearly is  weird'.split())
set(['This', 'not', 'is', 'clearly', 'weird'])