循环中数据帧列的Concat字符串（Python 3.8）_Python_Pandas_String_Loops_Concatenation

循环中数据帧列的Concat字符串（Python 3.8）

python pandas string loops

循环中数据帧列的Concat字符串（Python 3.8）,python,pandas,string,loops,concatenation,Python,Pandas,String,Loops,Concatenation,假设我有一个包含字符串和数字的数据帧“DS_df”。“LAultimateparentcountry”、“借入LTimateParentCountry”和“tot”三列构成一种关系如何从这三列中创建字典（对于整个数据集，顺序很重要）？我需要访问这两个国家作为一个变量，访问tot作为另一个变量。到目前为止，我已经尝试了下面的代码，但这只会产生一个包含单独项目的列表。由于某种原因，我也无法获得。加入工作，因为df相当大（+900k行）最好的结果是一本字典，在那里我可以访问“德国和瑞士”：例如56

假设我有一个包含字符串和数字的数据帧“DS_df”。“LAultimateparentcountry”、“借入LTimateParentCountry”和“tot”三列构成一种关系

如何从这三列中创建字典（对于整个数据集，顺序很重要）？我需要访问这两个国家作为一个变量，访问tot作为另一个变量。到目前为止，我已经尝试了下面的代码，但这只会产生一个包含单独项目的列表。由于某种原因，我也无法获得。加入工作，因为df相当大（+900k行）

最好的结果是一本字典，在那里我可以访问“德国和瑞士”：例如56708。非常感谢您的任何帮助或建议

干杯你可以这样使用口述：

countries_map = {}

for index, row in DS_df.iterrows():
    curr_rel = f'{row["LAultimateparentcountry"]}_{row["borrowerultimateparentcountry"]}'
    countries_map[curr_rel] = row["tot"]

如果您不希望运行现有的键值

（并使用其第一次出现）：

在数据帧上执行操作时，最好是按列而不是按行考虑解决方案

如果您的dataframe有900k+行，那么对dataframe应用矢量化操作可能是一个不错的选择

以下是两种解决方案：

pd.Series(DS_df.tot.values, index=DS_df.LAultimateparentcountry.str.cat(DS_df.borrowerultimateparentcountry, sep="_")).to_dict()

dict(zip(DS_df.LAultimateparentcountry.str.cat(DS_df.borrowerultimateparentcountry, sep="_"), DS_df.tot))

使用pd.Series+来指定（）：

pd.Series(DS_df.tot.values, index=DS_df.LAultimateparentcountry.str.cat(DS_df.borrowerultimateparentcountry, sep="_")).to_dict()

dict(zip(DS_df.LAultimateparentcountry.str.cat(DS_df.borrowerultimateparentcountry, sep="_"), DS_df.tot))

使用zip（）+dict（）：

pd.Series(DS_df.tot.values, index=DS_df.LAultimateparentcountry.str.cat(DS_df.borrowerultimateparentcountry, sep="_")).to_dict()

dict(zip(DS_df.LAultimateparentcountry.str.cat(DS_df.borrowerultimateparentcountry, sep="_"), DS_df.tot))

测试数据帧：

    DS_df = DataFrame({
        'LAultimateparentcountry':['India', 'Germany', 'India'],
        'borrowerultimateparentcountry':['France', 'Ireland', 'France'],
        'tot':[56708, 87902, 91211]
    })
DS_df


LAultimateparentcountry borrowerultimateparentcountry   tot
0   India   France  56708
1   Germany Ireland 87902
2   India   France  91211

两种解决方案的输出：

{'India_France': 91211, 'Germany_Ireland': 87902}

如果形成的键具有重复项，则将更新该值

哪种解决方案更有效？ 简短回答-
zip（）+dict（）#如果行数大约低于1000000
pd.Series+to_dict（）#如果行大约超过1000000

长答案-以下是测试：

pd.Series(DS_df.tot.values, index=DS_df.LAultimateparentcountry.str.cat(DS_df.borrowerultimateparentcountry, sep="_")).to_dict()

dict(zip(DS_df.LAultimateparentcountry.str.cat(DS_df.borrowerultimateparentcountry, sep="_"), DS_df.tot))

使用30行3列进行测试

zip（）+dict（）

pd.系列+至_dict（）：

使用6291456行和3列进行测试

pd.系列+至_dict（）

zip+dict（）

最后一个帮我搞定了，非常感谢。有没有可能我现在可以从复制品中去掉这个？@MaximilianBach在《钥匙》中没有复制品。如果df中有更多匹配项，它将使用匹配值在dict中的curr键上运行。如果您指的是值，如果某个值已经存在于dict中，是否希望忽略该值？是。对不起，我不太准确，我对这个领域相当陌生。这确实是我所指的价值观！有没有可能一开始就把它删掉/忽略掉？谢谢您的帮助。@MaximilianBach如果dict值中还没有cond，您可以添加一个cond。重新编辑主ans.Thx。如果我只忽略这些，国家关系在哪里重复？在我看来，我也忽略了“tot”列中的重复项。因此，即阿尔巴尼亚_阿联酋=1和阿尔巴尼亚_西班牙=1均被忽略且未添加，因为阿尔巴尼亚_阿根廷=1已经存在值1。我将如何在国家/地区保持独特的关系，而价值观中仍然允许重复？再次感谢你的耐心！