python删除集合中的重复值

python删除集合中的重复值,python,list,dictionary,map-function,Python,List,Dictionary,Map Function,我有一套像这样的: my_set = { [ { "sample_id": "read1", "seg_1": None, "lukM-F": "D", "23s_SA": None, "see": None, "sed": "ND" }, { "sample_id": "read2", "seg_

我有一套像这样的:

my_set  = {
  [
      {
         "sample_id": "read1", 
         "seg_1": None, 
         "lukM-F": "D", 
         "23s_SA": None, 
         "see": None, 
         "sed": "ND"
      }, 
      {
         "sample_id": "read2", 
         "seg_1": None, 
         "lukM-F": "ND", 
         "23s_SA": None, 
         "see": "D", 
         "sed": "ND"
      }, 
      {
         "sample_id": "read3", 
         "seg_1": None, 
         "lukM-F": "D", 
         "23s_SA": None, 
         "see": "ND", 
         "sed": "None"
      }
  ]
}
我想删除整个字符串中值为“None”的键。例如,例如:如果None是每个示例id read1、read2和read3中的键seg_1的值,则将该键全部删除。如果seg_1中有一个None,比如read1,而其他两个sample_id不是None,则保留seg_1及其值。因此,我想以以下内容结束:

my_set  = {
  [
      {
         "sample_id": "read1",  
         "lukM-F": "D", 
         "see": None, 
         "sed": "ND"
      }, 
      {
         "sample_id": "read2", 
         "lukM-F": "ND", 
         "see": "D", 
         "sed": "ND"
      }, 
      {
         "sample_id": "read3", 
         "lukM-F": "D", 
         "see": "ND", 
         "sed": "None"
      }
  ]
}
请注意,seg_1和23s_SA现在已被删除,因为它们在所有样本ID中的值均为“无”

我花了很长时间试图做到这一点,但没有成功。最后,我考虑将集合转换为dict,然后是list,然后循环遍历所有列表,删除所有列表中所有不包含任何内容的项

number_of_samples = len(my_set)
each_sample_list = [[] for i in range(0, number_of_samples)]

n = 0

for data_in_dict in my_set:
  for k,val in data_in_dict.items():
    each_sample_list[n].append([k,val])
  if n == number_of_samples:
    break
  else:
    print each_sample_list[n]
    n += 1
我曾想过使用itertools izip循环浏览多个列表,但不确定这是否可行。任何帮助都将不胜感激


谢谢

您可以创建计数器,然后删除所有需要的键:

import collections
import itertools

source = [ 
  {
     "sample_id": "read1", 
     "seg_1": None, 
     "lukM-F": "D", 
     "23s_SA": None, 
     "see": None, 
     "sed": "ND"
  }, 
  {
     "sample_id": "read2", 
     "seg_1": None, 
     "lukM-F": "ND", 
     "23s_SA": None, 
     "see": "D", 
     "sed": "ND"
  }, 
  {
     "sample_id": "read3", 
     "seg_1": None, 
     "lukM-F": "D", 
     "23s_SA": None, 
     "see": "ND", 
     "sed": "None"
  }
]

size = len(source)

# for python2 you should use iteritems() method
iterators_chain = itertools.chain(*[x.items() for x in source])
counter = collections.Counter(iterators_chain)

for (key, val), count in counter.items():
    if count == size and val is None:
        for x in source:
            x.pop(key)
您的my_集不是有效集,因为集合项必须是可散列的,而列表是不可散列的。但无论如何

这里有一种不需要任何导入的方法。它使用集合来确定要保留哪些关键帧

my_stuff = [
    {
        "sample_id": "read1", 
        "seg_1": None, 
        "lukM-F": "D", 
        "23s_SA": None, 
        "see": None, 
        "sed": "ND"
    }, 
    {
        "sample_id": "read2", 
        "seg_1": None, 
        "lukM-F": "ND", 
        "23s_SA": None, 
        "see": "D", 
        "sed": "ND"
    }, 
    {
        "sample_id": "read3", 
        "seg_1": None, 
        "lukM-F": "D", 
        "23s_SA": None, 
        "see": "ND", 
        "sed": None
    }
]

allkeys = set(k for d in my_stuff for k in d)
goodkeys = set(k for k in allkeys if any(d.get(k) for d in my_stuff))
badkeys = allkeys - goodkeys
for d in my_stuff:
    for k in badkeys:
        del d[k]

for d in my_stuff:
    print(d)
输出

那些。。。在Python的现代版本中,allkeys和goodkeys的构造可以用集合理解代替,但是我在这台古老的机器上使用Python 2.6.6

构建allkeys集的另一种方法是


虽然它的代码更多,但运行速度更快,因为.update以C速度处理dict的整个键集合,而另一种方法必须以Python速度循环键。当然,如果您可以保证列表中每个dict中的键集总是相同的,那么这可以进一步优化

利用列表中所有dict中的键都必须为None的优势

新my_资料的打印输出:

{'see':无,'sed':'ND','lukM-F':'D','sample_id':'read1'} {'see':'D','sed':'ND','lukM-F':'ND','sample_id':'read2'} {'see':'ND','sed':无,'lukM-F':'D','sample_id':'read3'} 如果没有听写理解,只需将最后一行更改为:

my_stuff = [dict(((k, v) for k, v in d.items() if k not in bkeys)) for d in my_stuff]

编辑为仅使用第一项的None键(若存在)。

您的StringIO实际上是一个包含一个包含三个dict的列表的集合。与StringIO对象不同的是,您的my_StringIO不是有效的python表达式。您的my_集合仍然不是有效的python集合。如何在集合中包含可变列表?您的键仅包含键从我的东西的第一个列表。那么,为什么不简单地使用keys=my_stuff[0]。keys?因为[0]是不能保证的。我已经想到了这一点,并且正在寻找一个表达式的组合,如果第一个元素存在的话,它只会实现
allkeys = set()
for d in my_stuff:
    allkeys.update(d.keys())
bkeys = [k for k, v in next(iter(my_stuff), {}).items() if v is None]

bkeys = [k for k in bkeys if all(d[k] is None for d in my_stuff)]

my_stuff = [{k: v for k, v in d.items() if k not in bkeys} for d in my_stuff]
my_stuff = [dict(((k, v) for k, v in d.items() if k not in bkeys)) for d in my_stuff]