使两列CSV文件用户id显示一次,并以空格分隔会议列表-python
在上面的链接中有很好的解释。但在我的情况是有点不同使两列CSV文件用户id显示一次,并以空格分隔会议列表-python,python,csv,Python,Csv,在上面的链接中有很好的解释。但在我的情况是有点不同 user meetings 178787 287750 178787 151515 178787 158478 576585 896352 576585 985639 576585 456988 预期结果是 user meetings 178787 "[287750,151515,158478]" 576585 "[896352,985639,456988]" 如何使用pyth
user meetings
178787 287750
178787 151515
178787 158478
576585 896352
576585 985639
576585 456988
预期结果是
user meetings
178787 "[287750,151515,158478]"
576585 "[896352,985639,456988]"
如何使用python和上面的代码实现这一点。提前感谢。您可以逐行阅读文件,拆分行并将会议添加到字典中,其中关键是用户。使用该方法可以非常巧妙地完成此操作
from collections import defaultdict
import csv
inpath = '' # Path to input CSV file
outpath = '' # Path to output CSV file
output = defaultdict(list) # Dictionary like {user_id: [meetings]}
for row in csv.DictReader(open(inpath)):
output[row['user']].append(row['meetings'])
with open(outpath, 'w') as f:
for user, meetings in output.items():
row = user + ',' + str(meetings) + '\n'
f.write(row)
然后,我们可以使用制表符将这本词典写回同一个文件,使所有内容对齐
因此,假设您的文件名为f.csv,则代码如下所示:
d = {}
for l in open('f.csv').read().split('\n')[1:-1]:
u, m = l.split()
d.setdefault(u, []).append(m)
with open('f.csv', 'w') as f:
f.write('user\tmeetings\n')
for u, m in d.items():
f.write(u + '\t' + str(m) + '\n')
产生以下所需输出:
user meetings
178787 ['287750', '151515', '158478']
576585 ['896352', '985639', '456988']
既然用户将是关键,我们就编一本字典吧。注意:这最终会将整个文件加载到内存中一次,但不需要先按用户对文件进行排序。还要注意,输出也没有排序,因为dict.items不会以任何确定的顺序检索字典项
output = {}
with f as open('input.csv'):
for line in f:
user, meeting = line.strip('\r\n').split()
# we strip newlines before splitting on whitespace
if user not in output and user != 'user':
# the user was not found in the dict (and we want to skip the header)
output[user] = [meeting] # add the user, with the first meeting
else: # user already exists in dict
output[user].append(meeting) # add meeting to user entry
# print output header
print("user meetings") # I used a single space, feel free to use '\t' etc.
# lets retrieve all meetings per user
for user, meetings in output.items() # in python2, use .iteritems() instead
meetings = ','.join(_ for _ in meetings) # format ["1","2","3"] to "1,2,3"
print('{} "[{}]"'.format(user, meetings))
发烧友:排序输出。我首先对键进行排序。注意,这将使用更多的内存,因为我也在创建一个键列表
# same as before
output = {}
with f as open('input.csv'):
for line in f:
user, meeting = line.strip('\r\n').split()
# we strip newlines before splitting on whitespace
if user not in output and user != 'user':
# the user was not found in the dict (and we want to skip the header)
output[user] = [meeting] # add the user, with the first meeting
else: # user already exists in dict
output[user].append(meeting) # add meeting to user entry
# print output header
print("user meetings") # I used a single space, feel free to use '\t' etc.
# sort my dict keys before printing them:
for user in sorted(output.keys()):
meetings = ','.join(_ for _ in output[user])
print('{} "[{}]"'.format(user, meetings))
熊猫提供了一个很好的解决方案:
import pandas as pd
df = pd.read_csv('myfile.csv', columns=['user', 'meetings'])
df_grouped = df.groupby('user')['meetings'].apply(list).astype(str).reset_index()
发布当前代码我喜欢使用dict.setdefaulthere@cowbert是的,这是一个很酷的用法,非常感谢jp_数据分析,它工作得很好。