Python 3.x 从一列的唯一值进行分析,从另一列命名文件

Python 3.x 从一列的唯一值进行分析,从另一列命名文件,python-3.x,pandas,Python 3.x,Pandas,我使用下面的代码在数据集中循环使用唯一值,效果很好,但我想将导出名称更改为更合适的唯一值,以便在仪表板中使用。下面是我的代码,路径中的x取自团队的唯一名称。但是,仅对于这一部分,我希望从原始数据帧之外的列表中指定名称 team = df['RSA'].unique() for x in team: path2 = r'C:\Users\davidlopez\Desktop\regions\%s.csv' %x r = HROs['RSA'] == x Complete

我使用下面的代码在数据集中循环使用唯一值,效果很好,但我想将导出名称更改为更合适的唯一值,以便在仪表板中使用。下面是我的代码,路径中的x取自团队的唯一名称。但是,仅对于这一部分,我希望从原始数据帧之外的列表中指定名称

team = df['RSA'].unique()

for x in team:

    path2 = r'C:\Users\davidlopez\Desktop\regions\%s.csv' %x
    r = HROs['RSA'] == x
    Completed = HROs['Current Team Simple'].isin(['Completed'])
    table = HROs[Completed & r]
    top20 = table.groupby(['To Position Title']).RequestNumber.count().sort_values().nlargest(20)
    top20.to_csv(path2, index=True, header=True)
我尝试了几种方法来解决这个问题:

1) 创建一个列表,并在列表路径中指定x而不是x

mylist = ['HR_DASH_0034','HR_DASH_0035','HR_DASH_0036','HR_DASH_0037','HR_DASH_0038','HR_DASH_0039','HR_DASH_0040',
         'HR_DASH_0041','HR_DASH_0042','HR_DASH_0043','HR_DASH_0044','HR_DASH_0045','empty']

for x in team:

    path2 = r'C:\Users\davidlopez\Desktop\regions\%s.csv' %mylist
    r = HROs['RSA'] == x
    Completed = HROs['Current Team Simple'].isin(['Completed'])
    table = HROs[Completed & r]
    top20 = table.groupby(['To Position Title']).RequestNumber.count().sort_values().nlargest(20)
    top20.to_csv(path2, index=True, header=True)
这不起作用,因为它不会循环,也不会将新值与原始数据帧值对齐。把它从名单上划掉

2) 我想也许循环中的一个循环就可以做到:

team = df['RSA'].unique()

mylist = ['HR_DASH_0034','HR_DASH_0035','HR_DASH_0036','HR_DASH_0037','HR_DASH_0038','HR_DASH_0039','HR_DASH_0040',
         'HR_DASH_0041','HR_DASH_0042','HR_DASH_0043','HR_DASH_0044','HR_DASH_0045','empty']

for x in team:

    for name in mylist:
        path2 = r'C:\Users\davidlopez\Desktop\regions\%s.csv' %name
    r = HROs['RSA'] == x
    Completed = HROs['Current Team Simple'].isin(['Completed'])
    table = HROs[Completed & r]
    top20 = table.groupby(['To Position Title']).RequestNumber.count().sort_values().nlargest(20)
    top20.to_csv(path2, index=True, header=True)
那也没用。它只是给了我mylist中的最后一个值,但也没有适当地对齐团队列表中的唯一值

3) 接下来,我创建了一个数据框架,其中包含来自团队和新列表的唯一值

team = df['RSA'].unique()

mylist = ['HR_DASH_0034','HR_DASH_0035','HR_DASH_0036','HR_DASH_0037','HR_DASH_0038','HR_DASH_0039','HR_DASH_0040',
         'HR_DASH_0041','HR_DASH_0042','HR_DASH_0043','HR_DASH_0044','HR_DASH_0045','empty']

dict = {'RSA': team, 'DASH_ID': mylist}  

newdf = pd.DataFrame(dict) 

print (newdf)
                                RSA       DASH_ID
0          Intermountain Region, R4  HR_DASH_0034
1      Pacific Southwest Region, R5  HR_DASH_0035
2                Alaska Region, R10  HR_DASH_0036
3      Pacific Northwest Region, R6  HR_DASH_0037
4               Northern Region, R1  HR_DASH_0038
5                Eastern Region, R9  HR_DASH_0039
6   Albuquerque Service Center(ASC)  HR_DASH_0040
7         Rocky Mountain Region, R2  HR_DASH_0041
8       Research & Development(RES)  HR_DASH_0042
9             Washington Office(WO)  HR_DASH_0043
10          Southwestern Region, R3  HR_DASH_0044
11              Southern Region, R8  HR_DASH_0045
12            L2 Desc Not Available         empty
但是,我仍然不知道如何在上面提到的路径中导出DASH_ID列元素名

所以最后,当文件发送出去时,名称HR_DASH_0034应该与山间区域R4对齐


感谢您的帮助

在第一种方法中,只需使用:

mylist = [...]  # your list definition
ml_iter = iter(mylist)
在循环内部,将mylist替换为:

path2 = r'C:\Users\davidlopez\Desktop\regions\%s.csv' %str(next(ml_iter))
更多信息:

让我知道这是否有用

更新:第二个解决方案


让我知道这是否有效

拉姆沙,谢谢你的尝试。我收到一个错误:OSError:[Errno 22]无效参数:“C:\\Users\\davidlopez\\Desktop\\regions\\.csv”当我调用变量ml_iter时,我得到以下结果:可能是类型转换错误-我更新了我的答案。很抱歉,最终这不起作用。ml_iter的列表将分配给任何团队列表。
for x, m in zip(team, mylist):

    path2 = r'C:\Users\davidlopez\Desktop\regions\%s.csv' %m
    r = HROs['RSA'] == x
    Completed = HROs['Current Team Simple'].isin(['Completed'])
    table = HROs[Completed & r]
    top20 = table.groupby(['To Position Title']).RequestNumber.count().sort_values().nlargest(20)
    top20.to_csv(path2, index=True, header=True)