在Python 3中,将两个几乎相同的行合并到一个csv.DictReader中
我有以下数据,只是想不出在Python中合并这些数据的解决方案: 数据如下所示:在Python 3中,将两个几乎相同的行合并到一个csv.DictReader中,python,python-3.x,export-to-csv,Python,Python 3.x,Export To Csv,我有以下数据,只是想不出在Python中合并这些数据的解决方案: 数据如下所示: ID OFFSET TEXT 1 1 This text is short 2 1 This text is super long and got cut by the database s 2 2000 o it will come out like this 3 1 I'm short too 我一直在尝试使
ID OFFSET TEXT
1 1 This text is short
2 1 This text is super long and got cut by the database s
2 2000 o it will come out like this
3 1 I'm short too
我一直在尝试使用csv.DictReader和csv.DictWriter。使用
itertools.groupby
按id分组,然后加入文本:
import itertools
import operator
#dr is the DictRreader
for dbid, rows in itertools.groupby(dr, key=operator.itemgetter('ID')):
print(dbid, ''.join(row['TEXT'] for row in rows))
类
csv.DictReader
和csv.DictWriter
用于csv文件,尽管您可能会让它们读取像您显示的那样的固定列描述的文件,但这并不是真正必要的,并且会使事情复杂化
假设记录井然有序,您只需执行以下操作:
- 读每一行(扔掉第一行)
- 读取ID、偏移量和文本(丢弃偏移量)
- 如果ID是新的,则存储从ID到文本的映射
- 如果ID不是新的,请附加文本
text="""
ID OFFSET TEXT
1 1 This text is short
2 1 This text is super long and got cut by the database s
2 2000 o it will come out like this
3 1 I'm short too
""".strip()
lines = text.splitlines()
columns = lines.pop(0) # don't need the columns
result = dict()
for line in lines:
# the maxsplit arg is important to keep all the text
id, offset, text = line.split(maxsplit=2)
if id in result:
result[id] += text
else:
result[id] = text
print("Result:")
for id, text in result.items():
print(f"ID {id} -> '{text}'")
这将使用Python 3.6 f-strings,但如果您愿意,也可以不使用它获得相同的结果,例如:
...
print("ID %s -> '%s'" % (id, text)
无论哪种方式,结果都是:
Result:
ID 1 -> 'This text is short'
ID 2 -> 'This text is super long and got cut by the database so it will come out like this'
ID 3 -> 'I'm short too'
如果结果中的id为“ok”,则条件检查,但可以使用defaultdict
避免:
from collections import defaultdict
result = defaultdict(str)
for line in lines:
id, offset, text = line.split(maxsplit=2)
result[id] += text # <-- much better
print("Result:")
for id, text in result.items():
print(f"ID {id} -> '{text}'")
从集合导入defaultdict
结果=defaultdict(str)
对于行中的行:
id,偏移量,text=line.split(maxsplit=2)
结果[id]+=text#您尝试了什么?向我们展示代码,这样我们才能提供帮助。