将CSV文件排序并重新组织为python字典_Python_Csv_Dictionary_Data Manipulation

将CSV文件排序并重新组织为python字典

python csv dictionary

将CSV文件排序并重新组织为python字典,python,csv,dictionary,data-manipulation,Python,Csv,Dictionary,Data Manipulation,我有以下格式的csv文件： ComponentID subComponent Measurement X030 A1111111 784.26 X030 A2222222 784.26 X015 A1111111 997.35 X015 A2222222 997.35 X015 A3333333 997.35 X075 A1111111

我有以下格式的csv文件：

ComponentID subComponent    Measurement
X030        A1111111        784.26
X030        A2222222        784.26
X015        A1111111        997.35
X015        A2222222        997.35
X015        A3333333        997.35
X075        A1111111        673.2
X075        A2222222        673.2
X075        A3333333        673.2
X090        A1111111        1003.2
X090        A2222222        1003.2
X090        A3333333        1003.2
X105        A1111111        34.37
X105        A2222222        34.37
X105        A3333333        34.37
X105        A4444444        34.37

my_dict = {'X030': ['A1111111', 'A2222222', 784.26],
           'X015': ['A1111111', 'A2222222', 'A3333333', 997.35 ],
           'X075': ['A1111111', 'A2222222', 'A3333333', 673.2],
           'X090': ['A1111111', 'A2222222', 'A3333333', 1003.2],
           'X105': ['A1111111', 'A2222222', 'A3333333', 'A4444444', 34.37]
          }

我希望以以下格式的python字典形式返回该文件：

ComponentID subComponent    Measurement
X030        A1111111        784.26
X030        A2222222        784.26
X015        A1111111        997.35
X015        A2222222        997.35
X015        A3333333        997.35
X075        A1111111        673.2
X075        A2222222        673.2
X075        A3333333        673.2
X090        A1111111        1003.2
X090        A2222222        1003.2
X090        A3333333        1003.2
X105        A1111111        34.37
X105        A2222222        34.37
X105        A3333333        34.37
X105        A4444444        34.37

my_dict = {'X030': ['A1111111', 'A2222222', 784.26],
           'X015': ['A1111111', 'A2222222', 'A3333333', 997.35 ],
           'X075': ['A1111111', 'A2222222', 'A3333333', 673.2],
           'X090': ['A1111111', 'A2222222', 'A3333333', 1003.2],
           'X105': ['A1111111', 'A2222222', 'A3333333', 'A4444444', 34.37]
          }

一开始，我是在使用它，但这并没有让我有任何进展。我的困惑在于如何设计它，因为我不确定如何返回以下项：

ComponentID:[组件，只有一个度量]

我不确定如何执行此任务，欢迎提供任何指导

您可以循环查看

csv

行，并使用

dict.setdefault

方法将行存储在字典中：

>>> import csv
>>> d={}
>>> with open('your_file.csv', newline='') as csvfile:
...     spamreader = csv.reader(csvfile, delimiter='\t')
...     for row in spamreader:
...         d.setdefault(row[0],[]).extend(row[1:])
...     print d

我的做法是：

myData = {}
with open('p.csv') as inputfile:
    for line in inputfile:
        if ('ComponentID' not in line):
            row = [x.strip() for x in line.split('        ')]
            cid = row[0]
            sub = row[1]
            msmt = row[2]

            if cid in myData.keys():
                msmt = myData[cid][-1]
                myData[cid] = myData[cid][:-1]
                myData[cid].append(sub)
                myData[cid].append(msmt)
            else:
                myData[cid] = row[1:]
print myData

首先，我在理解数据结构时遇到了一些困难：是否可以保证任何给定组件的所有子组件都具有相同的度量？如果是这样，那么无论是给定的TSV格式还是您想要的dict都不是存储此信息的非常合理的数据结构

尽管如此，这里还是有一些简单的代码，完全符合您的要求：

d = {}
with open('yourfile.tsv') as tsvfile:
  next(tsvfile)
  for line in tsvfile:
    row = line.split()
    componentid, subcomponent, measurement = row[0], row[1], float(row[2])
    if not componentid in d:
      d[componentid] = [subcomponent, measurement]
    else:
      assert measurement == d[componentid][-1]
      d[componentid] = d[componentid][:-1] + [subcomponent, measurement]

下面是一些代码，将其置于一个更具逻辑性的结构中：

d = {}
with open('yourfile.tsv') as tsvfile:
  next(tsvfile)
  for line in tsvfile:
    row = line.split()
    componentid, subcomponent, measurement = row[0], row[1], float(row[2])
    if not componentid in d:
      d[componentid] = {'subcomponents': [subcomponent], 'measurement': measurement}
    else:
      assert measurement == d[componentid]['measurement']
      d[componentid]['subcomponents'] += [subcomponent]

这给了你

{
  'X105': {'measurement': 34.37, 'subcomponents': ['A1111111', 'A2222222', 'A3333333', 'A4444444']},
  'X015': {'measurement': 997.35, 'subcomponents': ['A1111111', 'A2222222', 'A3333333']},
  'X075': {'measurement': 673.2, 'subcomponents': ['A1111111', 'A2222222', 'A3333333']},
  'X030': {'measurement': 784.26, 'subcomponents': ['A1111111', 'A2222222']},
  'X090': {'measurement': 1003.2, 'subcomponents': ['A1111111', 'A2222222', 'A3333333']}
}

我想你至少可以肯定这里的基本逻辑是什么，对吧？由于您没有共享任何代码，至少可以共享用于此目的的预期算法。@fedorqui字典将提供一个外部类，该类使用它进行一些计算和报告。我不是问“您将如何使用它”，而是问“您将如何设计它”。像这样呈现，它看起来像一个工作分配，而这应该是一个你展示你迄今为止所做的尝试和你所陷入的困境的地方。给@fedorqui读一读很抱歉，起初，我是用它看的，但这并没有让我有任何进展。我的困惑在于如何设计它。我不确定如何返回ComponentID:[组件，只有一个度量值]我修改了您的代码，并使用以下函数解决了我的问题：/def data_from_csv（csv_文件）：<代码>d={}，打开（csv_文件）为csvfile:reader=csv。读卡器中的行的读卡器（csvfile，分隔符='，'）：组件，子组件，度量=行[0]，行[1]，浮点（行[2]），如果在d:d[Component]中没有组件，则为[subcomponent，度量]否则：断言度量==d[componentid][1]d[componentid]=d[componentid][：-1]+[subcomponent，measurement]返回d