List Python2.7：从多维列表中删除元素_List_Python 2.7

List Python2.7：从多维列表中删除元素

list python-2.7

List Python2.7：从多维列表中删除元素,list,python-2.7,List,Python 2.7,基本上，我有一个三维列表（它是一个标记列表，其中第一个维度是文本，第二个维度是句子，第三个维度是单词）可以对列表中的某个元素进行寻址（我们称之为mat），例如： mat[2][3][4]。这将给我们第三个文本中的第五个单词或第四个句子但是，有些单词只是象“.”或“.”或“？”这样的符号。我需要把它们全部移除。我想通过一个程序： def removePunc(mat): newMat = [] newText = [] newSenten

基本上，我有一个三维列表（它是一个标记列表，其中第一个维度是文本，第二个维度是句子，第三个维度是单词）

可以对列表中的某个元素进行寻址（我们称之为mat），例如： mat[2][3][4]。这将给我们第三个文本中的第五个单词或第四个句子

但是，有些单词只是象“.”或“.”或“？”这样的符号。我需要把它们全部移除。我想通过一个程序：

    def removePunc(mat):
        newMat = []
        newText = []
        newSentence = []
        for text in mat:
           for sentence in text:
               for word in sentence:
                   if word not in " !@#$%^&*()-_+={}[]|\\:;'<>?,./\"":
                       newSentence.append(word)  
               newText.append(newSentence)
           newMat.append(newText)
        return newMat

它给了我同样的列表（mat是一个三维列表）。我的想法是迭代列表，只删除实际上是标点符号的“单词”

我不知道我做错了什么，但肯定有一个简单的逻辑错误

编辑：我需要保留数组的结构。因此，同一个句子的单词应该仍然在同一个句子中（只是没有“标点符号”单词）。例如：

    a = [[['as', '.'], ['w', '?', '?']], [['asas', '23', '!'], ['h', ',', ',']]]

变更后应：

    a = [[['as'], ['w']], [['asas', '23'], ['h']]]

感谢阅读和/或给我回复。

您编写的代码看起来很可靠，看起来“应该可以工作”，但前提是：

但是，有些单词只是象“.”或“.”或“？”这样的符号

事实上，这已经实现了。事实上，我希望符号不会与文字分开，因此：

["Are", "you", "sure", "?"] #example sentence

你宁愿：

["Are", "you", "sure?"] #example sentence

如果是这种情况，您需要遵循以下原则：

def removePunc(mat):
    newMat = []
    newText = []
    newSentence = []
    newWord = ""
    for text in mat:
       for sentence in text:
           for word in sentence:
               for char in word:
                   if char not in " !@#$%^&*()-_+={}[]|\\:;'<>?,./\"":
                       newWord += char
                   newSentence.append(newWord)  
           newText.append(newSentence)
       newMat.append(newText)
    return newMat

def removePunc（mat）：
newMat=[]
newText=[]
新闻事件=[]
newWord=“”
对于mat中的文本：
对于文本中的句子：
对于句子中的单词：
对于word中的字符：
如果字符不在“！@$%^&*（）-\+={}[]\\：；”？，./\”中：
newWord+=char
newSentence.append（newWord）
newText.append（newSentence）
newMat.append（newText）
返回纽马特

我怀疑您的数据没有按您认为的那样组织。虽然我通常不是建议正则表达式的人，但我认为在您的情况下，正则表达式可能是最好的解决方案之一。我还建议，不要从单词中删除非字母字符，而是处理句子

>>> import re
>>> non_word = re.compile(r'\W+') # If your sentences may 
>>> sentence = '''The formatting sucks, but the only change that I've made to your code was shortening the "symbols" string to one character. The only issue that I can identify is either with the "symbols" string (though it looks like all chars in it are properly escaped) that you used, or the punctuation is not actually separate words'''
>>> words = re.split(non_word, sentence)
>>> words
['The', 'formatting', 'sucks', 'but', 'the', 'only', 'change', 'that', 'I', 've', 'made', 'to', 'your', 'code', 'was', 'shortening', 'the', 'symbols', 'string', 'to', 'one', 'character', 'The', 'only', 'issue', 'that', 'I', 'can', 'identify', 'is', 'either', 'with', 'the', 'symbols', 'string', 'though', 'it', 'looks', 'like', 'all', 'chars', 'in', 'it', 'are', 'properly', 'escaped', 'that', 'you', 'used', 'or', 'the', 'punctuation', 'is', 'not', 'actually', 'separate', 'words']
>>>

最后，找到了。正如预期的那样，这是一个非常小的逻辑错误，总是存在，但看不到。下面是可行的解决方案：

def removePunc(mat):
newMat = []
for text in mat:
   newText = []
   for sentence in text:
       newSentence = []
       for word in sentence:
           if word not in " !@#$%^&*()-_+={}[]|\\:;'<>?,./\"":
               newSentence.append(word)  
       newText.append(newSentence)
   newMat.append(newText)
return newMat

def removePunc（mat）：
newMat=[]
对于mat中的文本：
newText=[]
对于文本中的句子：
新闻事件=[]
对于句子中的单词：
如果单词不在“！@$%”中^&*()-_+={}[]|\\:;'?,./\"":
newSentence.append（word）
newText.append（newSentence）
newMat.append（newText）
返回纽马特

不，实际上，我需要将它们完全移除。我还（就在一分钟前）指出了一个小例子，说明我对结果的预期。干杯使用Python2.7.9运行代码：句子=[“这些”、“是”、“单词”和“！”]text=[句子]mat=[文本]def removePunc（mat）：newMat=[]newText=[]newentence=[]对于mat中的文本：对于文本中的句子：对于句子中的单词：如果单词不在“！”：newSentence.append（word）newText.append（newSentence）newMat.append（newSentence）return newMat removePunc（mat）>[[[[“这些”，“是”，“单词”]]]sentence2=[“这些”，“是”，“单词！”]格式很糟糕，但我对代码所做的唯一更改是缩短了“symbols”字符串转换为一个字符。我能识别的唯一问题是“symbols”字符串（尽管看起来其中的所有字符都已正确转义）你使用的，或标点符号实际上不是单独的单词。这是一个极其无效的解决方案。我猜是的，但它直接建立在伊斯梅尔的建议上，让他完全理解它的作用，并在需要时修改它。不过，我得说你的解决方案更优雅、更像蟒蛇。干杯！我需要验证数据就是我认为的那样。这可能是个问题。另一方面，我真的不理解你的解决方案（以前做过正则表达式）.数据被处理。每个单词都是三维矩阵的一个成员，其中第一个维度是文本，第二个维度是句子，第三个维度是单词。我需要保持这样的结构，因为我必须进一步处理它（进行自然语言处理）。我仍然不确定是否真的需要删除“符号”字，但这是可能的选项之一。@Ismailezi，问题是-如果您的数据是您认为的那样，您的解决方案就会起作用。本质上，我的解决方案相当于正则字符串拆分-仅使用表示“一个或多个非单词符号”的正则表达式“你知道吗，我怎么能做到这一点，但不能用撇号分隔单词？我的意思是，输出是这样的，唯一的改变是像“我已经”这样的词继续是“我已经”，而不是被分成“我”和“已经”。我发现了如何做到这一点。非常感谢你，先生，不，我的代码更有效。

def removePunc(mat):
newMat = []
for text in mat:
   newText = []
   for sentence in text:
       newSentence = []
       for word in sentence:
           if word not in " !@#$%^&*()-_+={}[]|\\:;'<>?,./\"":
               newSentence.append(word)  
       newText.append(newSentence)
   newMat.append(newText)
return newMat