从python列表中删除标点符号

从python列表中删除标点符号,python,text,performance,punctuation,Python,Text,Performance,Punctuation,我知道有很多关于删除标点符号的例子,但我想知道最有效的方法。我有一个从txt文件中读取并拆分的单词列表 wordlist = open('Tyger.txt', 'r').read().split() 检查每个单词并删除标点符号的最快方法是什么?我可以用一堆代码来做,但我知道这不是最简单的方法 谢谢 我认为最简单的方法是首先只提取由字母组成的单词: import re with open("Tyger.txt") as f: words = re.findall("\w+", f.r

我知道有很多关于删除标点符号的例子,但我想知道最有效的方法。我有一个从txt文件中读取并拆分的单词列表

wordlist = open('Tyger.txt', 'r').read().split()
检查每个单词并删除标点符号的最快方法是什么?我可以用一堆代码来做,但我知道这不是最简单的方法


谢谢

我认为最简单的方法是首先只提取由字母组成的单词:

import re

with open("Tyger.txt") as f:
    words = re.findall("\w+", f.read())

我认为最简单的方法是首先只提取由字母组成的单词:

import re

with open("Tyger.txt") as f:
    words = re.findall("\w+", f.read())
例如:

text = """
Tyger! Tyger! burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry? 
"""
import re
words = re.findall(r'\w+', text)

第二个要快得多。

例如:

text = """
Tyger! Tyger! burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry? 
"""
import re
words = re.findall(r'\w+', text)


第二个要快得多。

我会选择这样的方式:

import re
with open("Tyger.txt") as f:
    print " ".join(re.split("[\-\,\!\?\.]", f.read())

它只会删除真正需要的内容,不会因过度匹配而产生过多的过载。

我会选择这样的方式:

import re
with open("Tyger.txt") as f:
    print " ".join(re.split("[\-\,\!\?\.]", f.read())
>>> import re

>>> the_tyger
'\n    Tyger! Tyger! burning bright \n    In the forests of the night, \n    What immortal hand or eye \n    Could frame thy fearful symmetry? \n    \n    In what distant deeps or skies \n    Burnt the fire of thine eyes? \n    On what wings dare he aspire? \n    What the hand dare sieze the fire? \n    \n    And what shoulder, & what art. \n    Could twist the sinews of thy heart? \n    And when thy heart began to beat, \n    What dread hand? & what dread feet? \n    \n    What the hammer? what the chain? \n    In what furnace was thy brain? \n    What the anvil? what dread grasp \n    Dare its deadly terrors clasp? \n    \n    When the stars threw down their spears, \n    And watered heaven with their tears, \n    Did he smile his work to see? \n    Did he who made the Lamb make thee? \n    \n    Tyger! Tyger! burning bright \n    In the forests of the night, \n    What immortal hand or eye \n    Dare frame thy fearful symmetry? \n    '

>>> print re.sub(r'["-,!?.]','',the_tyger)
它将只删除真正需要的内容,不会因过度匹配而造成过度过载

>>> import re

>>> the_tyger
'\n    Tyger! Tyger! burning bright \n    In the forests of the night, \n    What immortal hand or eye \n    Could frame thy fearful symmetry? \n    \n    In what distant deeps or skies \n    Burnt the fire of thine eyes? \n    On what wings dare he aspire? \n    What the hand dare sieze the fire? \n    \n    And what shoulder, & what art. \n    Could twist the sinews of thy heart? \n    And when thy heart began to beat, \n    What dread hand? & what dread feet? \n    \n    What the hammer? what the chain? \n    In what furnace was thy brain? \n    What the anvil? what dread grasp \n    Dare its deadly terrors clasp? \n    \n    When the stars threw down their spears, \n    And watered heaven with their tears, \n    Did he smile his work to see? \n    Did he who made the Lamb make thee? \n    \n    Tyger! Tyger! burning bright \n    In the forests of the night, \n    What immortal hand or eye \n    Dare frame thy fearful symmetry? \n    '

>>> print re.sub(r'["-,!?.]','',the_tyger)
印刷品:

Tyger Tyger burning bright 
In the forests of the night 
What immortal hand or eye 
Could frame thy fearful symmetry 

In what distant deeps or skies 
Burnt the fire of thine eyes 
On what wings dare he aspire 
What the hand dare sieze the fire 

And what shoulder  what art 
Could twist the sinews of thy heart 
And when thy heart began to beat 
What dread hand  what dread feet 

What the hammer what the chain 
In what furnace was thy brain 
What the anvil what dread grasp 
Dare its deadly terrors clasp 

When the stars threw down their spears 
And watered heaven with their tears 
Did he smile his work to see 
Did he who made the Lamb make thee 

Tyger Tyger burning bright 
In the forests of the night 
What immortal hand or eye 
Dare frame thy fearful symmetry 
或者,使用文件:

>>> with open('tyger.txt', 'r') as WmBlake:
...    print re.sub(r'["-,!?.]','',WmBlake.read())
如果要创建行列表,请执行以下操作:

>>> lines=[]
>>> with open('tyger.txt', 'r') as WmBlake:
...    lines.append(re.sub(r'["-,!?.]','',WmBlake.read()))
印刷品:

Tyger Tyger burning bright 
In the forests of the night 
What immortal hand or eye 
Could frame thy fearful symmetry 

In what distant deeps or skies 
Burnt the fire of thine eyes 
On what wings dare he aspire 
What the hand dare sieze the fire 

And what shoulder  what art 
Could twist the sinews of thy heart 
And when thy heart began to beat 
What dread hand  what dread feet 

What the hammer what the chain 
In what furnace was thy brain 
What the anvil what dread grasp 
Dare its deadly terrors clasp 

When the stars threw down their spears 
And watered heaven with their tears 
Did he smile his work to see 
Did he who made the Lamb make thee 

Tyger Tyger burning bright 
In the forests of the night 
What immortal hand or eye 
Dare frame thy fearful symmetry 
或者,使用文件:

>>> with open('tyger.txt', 'r') as WmBlake:
...    print re.sub(r'["-,!?.]','',WmBlake.read())
如果要创建行列表,请执行以下操作:

>>> lines=[]
>>> with open('tyger.txt', 'r') as WmBlake:
...    lines.append(re.sub(r'["-,!?.]','',WmBlake.read()))


你能提供一个输入和输出的示例(或者描述你的标点符号集的组成)吗?当然没问题。文本文件是一首诗。前两行是:泰格!泰格!在夜晚的森林中燃烧着明亮的火焰,我希望它们在列表中不带逗号或感叹号。我需要删除的一组标点符号是“-,!?。谢谢!看起来像是重复的@JoranBeasley:我不认为这是重复的。我的答案符合这个问题,但不符合另一个问题。你能提供一个输入和输出示例(或描述你的标点符号集的组成部分)吗当然没问题。文本文件是一首诗。前两行是:Tyger!Tyger!在夜晚的森林中燃烧得很亮,我希望它们在列表中不带逗号或感叹号。我需要删除的一组puntiation是“-,!?。谢谢看起来像是重复的@JoranBeasley:我不认为这是重复的。我的答案符合这个问题,但不是另一个。请注意,您的两个解决方案有不同的作用<代码>“可怕,对称”将在第二种方法中变成一个单词。@SvenMarnach:是的,正确。不过,translate比re快4倍。请注意,您的两种解决方案做的事情不同<代码>“可怕,对称”将在第二种方法中变成一个单词。@SvenMarnach:是的,正确。不过,translate比re快4倍。这将如何处理非标点符号的特殊字符?这非常有效,谢谢。我真的很感谢你的帮助。我从你们身上学到了很多guys@EnglishGrad:注意Sven使用
with
关键字打开输入文件。与使用
f=open()相比,最好将
块一起使用。。。close()
比使用
stuff=open().read()更可取。…
。在上一个示例中,您在读/写后无法显式地
close()
文件。@luke14free:通过提供
re.LOCALE
re.UNICODE
标志并设置区域设置,您可以按需执行此操作。对于不带任何标志的标准字符串,它将只与集合
[a-zA-Z0-9_]
匹配。有关更多详细信息,请参阅。如何处理非标点符号的特殊字符?非常感谢。我真的很感谢你的帮助。我从你们身上学到了很多guys@EnglishGrad:注意Sven使用
with
关键字打开输入文件。与使用
f=open()相比,最好将
块一起使用。。。close()
比使用
stuff=open().read()更可取。…
。在上一个示例中,您在读/写后无法显式地
close()
文件。@luke14free:通过提供
re.LOCALE
re.UNICODE
标志并设置区域设置,您可以按需执行此操作。对于不带任何标志的标准字符串,它将只与集合
[a-zA-Z0-9_]
匹配。有关更多详细信息,请参阅。+1发布完整的诗篇;)虽然它现在看起来更像布可夫斯基,而不是布莱克。+1用于发布整首诗;)虽然现在看起来更像是布可夫斯基而不是布莱克。