需要更有效的方法在Python中解析csv文件吗_Python_Csv

需要更有效的方法在Python中解析csv文件吗

python csv

需要更有效的方法在Python中解析csv文件吗,python,csv,Python,Csv,这是一个示例csv文件 id, serial_no 2, 500 2, 501 2, 502 3, 600 3, 601 这是我正在寻找的输出（带有ID列表的序列号列表）：我已经实现了我的解决方案，但是代码太多了，我相信还有更好的解决方案。还在学习Python，我还不知道所有的技巧 file = 'test.csv' data = csv.reader(open(file)) fields = data.next() for row in data: each_row = []

这是一个示例csv文件

id, serial_no
2, 500
2, 501
2, 502
3, 600
3, 601

这是我正在寻找的输出（带有ID列表的序列号列表）：

我已经实现了我的解决方案，但是代码太多了，我相信还有更好的解决方案。还在学习Python，我还不知道所有的技巧

file = 'test.csv'

data = csv.reader(open(file))
fields = data.next()

for row in data:
  each_row = []     
    each_row.append(row[0])
    each_row.append(row[1])
    zipped_data.append(each_row)
for rec in zipped_data:
  if rec[0] not in ids:
    ids.append(rec[0])
for id in ids:
    for rec in zipped_data:
      if rec[0] == id:
        ser_no.append(rec[1])
  tmp.append(id)
  tmp.append(ser_no)
  print tmp
  tmp = []
  ser_no = []

**为了简化代码，我省略了var初始化

print tmp

给我上面提到的输出。我知道有一个更好的方法来做这件事，或者说是蟒蛇式的方法。太乱了！任何建议都很好

from collections import defaultdict

records = defaultdict(list)

file = 'test.csv'

data = csv.reader(open(file))
fields = data.next()

for row in data:
    records[row[0]].append(row[1])

#sorting by ids since keys don't maintain order
results = sorted(records.items(), key=lambda x: x[0])
print results

如果序列号列表需要唯一，只需将

defaultdict（list）

替换为

defaultdict（set）

和

records[row[0]]。将records[row[0]]追加为records[row[0]>。添加（row[1]）
而不是列表，我会将其设为a，然后只需对值调用append（）
方法
result = collections.defaultdict(list)
for row in data:
  result[row[0]].append(row[1])

一些意见：
0）文件
是内置的（与打开
同义），因此变量的名称选择不当。此外，该变量实际上包含一个文件名，因此
1） 一旦我们读取完文件，就可以关闭它。实现这一点的最简单方法是使用with
块
2） 第一个循环遍历所有行，从每行中获取前两个元素，并用这些结果列出一个列表。但是，您的所有行都只包含两个元素，因此这不会产生净效果。CSV阅读器已经是行的迭代器，从迭代器创建列表的简单方法是将其传递给列表构造函数
3） 通过手动检查，可以继续创建唯一ID值的列表。独特事物的列表最好称为set
，Pythonset
自动确保唯一性
4） 您拥有数据的名称zipped_data
。这很说明问题：将zip
应用于行列表将生成列列表，而ID只是第一列，转换为一个集合
5） 我们可以使用列表理解来构建给定ID的序列号列表；告诉它你想要什么
6） 当我们得到结果时打印出来是一种混乱和僵化；最好是创建整个数据块（这样我们就有了创建该数据的代码，这样我们就可以用它做其他事情，而不仅仅是打印和忘记它）
应用这些想法，我们得到：
filename = 'test.csv'

with open(filename) as in_file:
    data = csv.reader(in_file)
    data.next() # ignore the field labels
    rows = list(data) # read the rest of the rows from the iterator

print [
    # We want a list of all serial numbers from rows with a matching ID...
    [serial_no for row_id, serial_no in rows if row_id == id]
    # for each of the IDs that there is to match, which come from making
    # a set from the first column of the data.
    for id in set(zip(*rows)[0])
]

通过使用itertools
模块中的groupby
函数，我们可能可以做得更好。
这是我写的一个版本，看起来已经有很多答案了
您可能喜欢使用csv.DictReader，通过字段名（从标题/第一行）轻松访问每一列
使用itertools.groupby的示例仅当行已按id分组时，此选项才有效
from csv import DictReader
from itertools import groupby
from operator import itemgetter

filename = 'test.csv'

# the context manager ensures that infile is closed when it goes out of scope
with open(filename) as infile:

    # group by id - this requires that the rows are already grouped by id
    groups = groupby(DictReader(infile), key=itemgetter('id'))

    # loop through the groups printing a list for each one
    for i,j in groups:
        print [i, map(itemgetter(' serial_no'), list(j))]

注意“序列号”
前面的空格。这是因为在输入文件中逗号后面有空格
“Pythonic”，而不是“Pythonian”。）你永远不会是一个有着这样评论的“蟒蛇主义者”-请澄清，短代码并不意味着效率。您有600600
，但可能意味着600601
是的，我有，这是一个打字错误。谢谢这真漂亮！我有很多关于Python的学习要做。。我想这正是我需要的。但有一个问题是，当它打印出不符合顺序的记录时（ID的顺序是：3、2、4）。为什么会发生这种情况？我能做些什么来解决这个问题？字典不会把它的键按顺序排列。您可以使用.items（）
将字典转换为键值对列表，并对该列表进行排序。@tasha我为您添加了排序功能。您可以使用运算符.itemgetter（0）
而不是lambda函数对键进行排序，但在对字典项进行排序的情况下，它是冗余的-排序将返回正确的结果，而不使用键argument@gnibbler+1感谢你给我的建议，现在看来很明显，但是我从来没有想过。谢谢你的建议。。我不知道这些特性中的很多。实际上，文件
是对打开
的一个不推荐的引用。它被广泛用作变量名。。。在Python3groupby上删除的，仅当具有相同id的行已分组时才起作用。它们可能是，但没有指定。我使用groupby添加了一个答案，以获得完整性感谢您提供另一个解决方案。我想知道，按列名或列位置解析csv的更合适的方法是什么？按列名更容易阅读（如果您或其他人试图弄清楚代码在做什么）。按列索引（位置）可能更有效一些（在读取小型csv文件的情况下可以忽略不计）。
#!/usr/bin/python
import csv

myFile = open('sample.csv','rb')
csvFile = csv.DictReader(myFile)
# first row will be used for field names (by default)

myData = {}

for myRow in csvFile:
    myId = myRow['id']
    if not myData.has_key(myId): myData[myId] = []
    myData[myId].append(myRow['serial_no'])

for myId in sorted(myData):
    print '%s %s' % (myId, myData[myId])

myFile.close()

from csv import DictReader
from itertools import groupby
from operator import itemgetter

filename = 'test.csv'

# the context manager ensures that infile is closed when it goes out of scope
with open(filename) as infile:

    # group by id - this requires that the rows are already grouped by id
    groups = groupby(DictReader(infile), key=itemgetter('id'))

    # loop through the groups printing a list for each one
    for i,j in groups:
        print [i, map(itemgetter(' serial_no'), list(j))]