Python 3.x 从一列的唯一值进行分析,从另一列命名文件
我使用下面的代码在数据集中循环使用唯一值,效果很好,但我想将导出名称更改为更合适的唯一值,以便在仪表板中使用。下面是我的代码,路径中的x取自团队的唯一名称。但是,仅对于这一部分,我希望从原始数据帧之外的列表中指定名称Python 3.x 从一列的唯一值进行分析,从另一列命名文件,python-3.x,pandas,Python 3.x,Pandas,我使用下面的代码在数据集中循环使用唯一值,效果很好,但我想将导出名称更改为更合适的唯一值,以便在仪表板中使用。下面是我的代码,路径中的x取自团队的唯一名称。但是,仅对于这一部分,我希望从原始数据帧之外的列表中指定名称 team = df['RSA'].unique() for x in team: path2 = r'C:\Users\davidlopez\Desktop\regions\%s.csv' %x r = HROs['RSA'] == x Complete
team = df['RSA'].unique()
for x in team:
path2 = r'C:\Users\davidlopez\Desktop\regions\%s.csv' %x
r = HROs['RSA'] == x
Completed = HROs['Current Team Simple'].isin(['Completed'])
table = HROs[Completed & r]
top20 = table.groupby(['To Position Title']).RequestNumber.count().sort_values().nlargest(20)
top20.to_csv(path2, index=True, header=True)
我尝试了几种方法来解决这个问题:
1) 创建一个列表,并在列表路径中指定x而不是x
mylist = ['HR_DASH_0034','HR_DASH_0035','HR_DASH_0036','HR_DASH_0037','HR_DASH_0038','HR_DASH_0039','HR_DASH_0040',
'HR_DASH_0041','HR_DASH_0042','HR_DASH_0043','HR_DASH_0044','HR_DASH_0045','empty']
for x in team:
path2 = r'C:\Users\davidlopez\Desktop\regions\%s.csv' %mylist
r = HROs['RSA'] == x
Completed = HROs['Current Team Simple'].isin(['Completed'])
table = HROs[Completed & r]
top20 = table.groupby(['To Position Title']).RequestNumber.count().sort_values().nlargest(20)
top20.to_csv(path2, index=True, header=True)
这不起作用,因为它不会循环,也不会将新值与原始数据帧值对齐。把它从名单上划掉
2) 我想也许循环中的一个循环就可以做到:
team = df['RSA'].unique()
mylist = ['HR_DASH_0034','HR_DASH_0035','HR_DASH_0036','HR_DASH_0037','HR_DASH_0038','HR_DASH_0039','HR_DASH_0040',
'HR_DASH_0041','HR_DASH_0042','HR_DASH_0043','HR_DASH_0044','HR_DASH_0045','empty']
for x in team:
for name in mylist:
path2 = r'C:\Users\davidlopez\Desktop\regions\%s.csv' %name
r = HROs['RSA'] == x
Completed = HROs['Current Team Simple'].isin(['Completed'])
table = HROs[Completed & r]
top20 = table.groupby(['To Position Title']).RequestNumber.count().sort_values().nlargest(20)
top20.to_csv(path2, index=True, header=True)
那也没用。它只是给了我mylist中的最后一个值,但也没有适当地对齐团队列表中的唯一值
3) 接下来,我创建了一个数据框架,其中包含来自团队和新列表的唯一值
team = df['RSA'].unique()
mylist = ['HR_DASH_0034','HR_DASH_0035','HR_DASH_0036','HR_DASH_0037','HR_DASH_0038','HR_DASH_0039','HR_DASH_0040',
'HR_DASH_0041','HR_DASH_0042','HR_DASH_0043','HR_DASH_0044','HR_DASH_0045','empty']
dict = {'RSA': team, 'DASH_ID': mylist}
newdf = pd.DataFrame(dict)
print (newdf)
RSA DASH_ID
0 Intermountain Region, R4 HR_DASH_0034
1 Pacific Southwest Region, R5 HR_DASH_0035
2 Alaska Region, R10 HR_DASH_0036
3 Pacific Northwest Region, R6 HR_DASH_0037
4 Northern Region, R1 HR_DASH_0038
5 Eastern Region, R9 HR_DASH_0039
6 Albuquerque Service Center(ASC) HR_DASH_0040
7 Rocky Mountain Region, R2 HR_DASH_0041
8 Research & Development(RES) HR_DASH_0042
9 Washington Office(WO) HR_DASH_0043
10 Southwestern Region, R3 HR_DASH_0044
11 Southern Region, R8 HR_DASH_0045
12 L2 Desc Not Available empty
但是,我仍然不知道如何在上面提到的路径中导出DASH_ID列元素名
所以最后,当文件发送出去时,名称HR_DASH_0034应该与山间区域R4对齐
感谢您的帮助 在第一种方法中,只需使用:
mylist = [...] # your list definition
ml_iter = iter(mylist)
在循环内部,将mylist替换为:
path2 = r'C:\Users\davidlopez\Desktop\regions\%s.csv' %str(next(ml_iter))
更多信息:
让我知道这是否有用
更新:第二个解决方案
让我知道这是否有效 拉姆沙,谢谢你的尝试。我收到一个错误:OSError:[Errno 22]无效参数:“C:\\Users\\davidlopez\\Desktop\\regions\\.csv”当我调用变量ml_iter时,我得到以下结果:可能是类型转换错误-我更新了我的答案。很抱歉,最终这不起作用。ml_iter的列表将分配给任何团队列表。
for x, m in zip(team, mylist):
path2 = r'C:\Users\davidlopez\Desktop\regions\%s.csv' %m
r = HROs['RSA'] == x
Completed = HROs['Current Team Simple'].isin(['Completed'])
table = HROs[Completed & r]
top20 = table.groupby(['To Position Title']).RequestNumber.count().sort_values().nlargest(20)
top20.to_csv(path2, index=True, header=True)