在Python 3中，将两个几乎相同的行合并到一个csv.DictReader中_Python_Python 3.x_Export To Csv

在Python 3中，将两个几乎相同的行合并到一个csv.DictReader中

python python-3.x

在Python 3中，将两个几乎相同的行合并到一个csv.DictReader中,python,python-3.x,export-to-csv,Python,Python 3.x,Export To Csv,我有以下数据，只是想不出在Python中合并这些数据的解决方案：数据如下所示： ID OFFSET TEXT 1 1 This text is short 2 1 This text is super long and got cut by the database s 2 2000 o it will come out like this 3 1 I'm short too 我一直在尝试使

我有以下数据，只是想不出在Python中合并这些数据的解决方案：

数据如下所示：

ID    OFFSET    TEXT
1     1         This text is short
2     1         This text is super long and got cut by the database s
2     2000      o it will come out like this
3     1         I'm short too

我一直在尝试使用csv.DictReader和csv.DictWriter。

使用

itertools.groupby

按id分组，然后加入文本：

import itertools
import operator

#dr is the DictRreader
for dbid, rows in itertools.groupby(dr, key=operator.itemgetter('ID')):
    print(dbid, ''.join(row['TEXT'] for row in rows))

类

csv.DictReader

和

csv.DictWriter

用于csv文件，尽管您可能会让它们读取像您显示的那样的固定列描述的文件，但这并不是真正必要的，并且会使事情复杂化

假设记录井然有序，您只需执行以下操作：

读每一行（扔掉第一行）
读取ID、偏移量和文本（丢弃偏移量）
如果ID是新的，则存储从ID到文本的映射
如果ID不是新的，请附加文本

Python可以在没有模块的情况下完成这一切

这里有一个初步的方法：

text="""
ID    OFFSET    TEXT
1     1         This text is short
2     1         This text is super long and got cut by the database s
2     2000      o it will come out like this
3     1         I'm short too
""".strip()

lines = text.splitlines()
columns = lines.pop(0)  # don't need the columns
result = dict()

for line in lines:
    # the maxsplit arg is important to keep all the text
    id, offset, text = line.split(maxsplit=2)
    if id in result:
        result[id] += text
    else:
        result[id] = text

print("Result:")
for id, text in result.items():
    print(f"ID {id} -> '{text}'")

这将使用Python 3.6 f-strings，但如果您愿意，也可以不使用它获得相同的结果，例如：

...
    print("ID %s -> '%s'" % (id, text)

无论哪种方式，结果都是：

Result:
ID 1 -> 'This text is short'
ID 2 -> 'This text is super long and got cut by the database so it will come out like this'
ID 3 -> 'I'm short too'

如果结果中的id为“ok”，则条件检查

，但可以使用defaultdict
避免：
from collections import defaultdict

result = defaultdict(str)
for line in lines:
    id, offset, text = line.split(maxsplit=2)
    result[id] += text  # <-- much better

print("Result:")
for id, text in result.items():
    print(f"ID {id} -> '{text}'")

从集合导入defaultdict
结果=defaultdict（str）
对于行中的行：
id，偏移量，text=line.split（maxsplit=2）
结果[id]+=text#您尝试了什么？向我们展示代码，这样我们才能提供帮助。