Python 使用字符串作为键的其他dataframe替换dataframe中的值_Python_Csv_Pandas_Dictionary_Replace

Python 使用字符串作为键的其他dataframe替换dataframe中的值

python csv pandas dictionary replace

Python 使用字符串作为键的其他dataframe替换dataframe中的值,python,csv,pandas,dictionary,replace,Python,Csv,Pandas,Dictionary,Replace,我已经尝试了一段时间，我被卡住了。这就是问题所在：我正在处理一些关于CSV文件中文本的元数据。看起来是这样的：真正的表更长更复杂，但它遵循相同的逻辑：每一行都是文本，每一列都是文本的不同方面。我在一些专栏中加入了很多变化，我希望它能用一个更简单的模式来重塑。例如，从叙事角度将同质和自同质的价值观转变为非异质的价值观。我在另一个名为keywords的CSV文件中定义了这个新模型，如下所示：正如您所看到的，元数据的每一列都成为新模型关键字中的一行，其中旧值在术语\值列中，新值在新\模型列中

我已经尝试了一段时间，我被卡住了。这就是问题所在：

我正在处理一些关于CSV文件中文本的元数据。看起来是这样的：

真正的表更长更复杂，但它遵循相同的逻辑：每一行都是文本，每一列都是文本的不同方面。我在一些专栏中加入了很多变化，我希望它能用一个更简单的模式来重塑。例如，从叙事角度将同质和自同质的价值观转变为非异质的价值观。我在另一个名为keywords的CSV文件中定义了这个新模型，如下所示：

正如您所看到的，元数据的每一列都成为新模型关键字中的一行，其中旧值在术语\值列中，新值在新\模型列中

所以我需要使用Pandas映射或替换这些值。这就是我现在得到的：

import re
import pandas as pd

df_metadata = pd.read_csv("/metadata.csv", encoding="utf-8", sep=",")
df_keywords = pd.read_csv("/keywords.csv", encoding="utf-8", sep="\t")

for column_metadata,value_metadata in df_metadata.iteritems():

    if str(column_metadata) in list(df_keywords.loc[:,"term_type"]):
        
        df_metadata.loc[df_metadata[column_metadata] == value_metadata, column_metadata] = df_keywords.loc[df_keywords["term_value"] == value_metadata, ["new_model"]]

Python总是返回这个错误：

“ValueError:序列长度必须匹配才能进行比较”

我认为问题在于替换为loc的第二部分的值_元数据，我的意思是：

df_keywords.loc[df_keywords["term_value"] == value_metadata, ["new_model"]]

我不明白的是，为什么value_元数据在这个命令的第一部分有效，但在第二部分无效

求求你，我会感激你的帮助。也许有一种比遍历数据帧更简单的方法。。。我对任何建议都持开放态度。顺致敬意，

José

您可以首先在

df_关键字

中创建

多索引

，以便更快地按旧关键字选择新值和按旧关键字在循环中选择新值：

df_keywords.set_index(['term_type','term_value'], inplace=True)

idx = pd.IndexSlice
#first maping in column narrative-perspective
print (df_keywords.loc[idx['narrative-perspective',:]].to_dict()['new_model'])
{'heterodiegetic': 'heterodiegetic', 'other/mixed': 'n-heterodiegetic', 
 'homodiegetic': 'n-heterodiegetic', 'autodiegetic': 'n-heterodiegetic'}

#column names for replacing    
L = ['narrative-perspective','narrator','protagonist-gender']
for col in L:
    df_metadata[col] = 
    df_metadata[col].map(df_keywords.loc[idx[col,:]].to_dict()['new_model'])

print (df_metadata)
     idno author-name narrative-perspective        narrator protagonist-gender
0  ne0001      Baroja      n-heterodiegetic    third-person               male
1  ne0002      Galdos        heterodiegetic    third-person             n-male
2  ne0003      Galdos      n-heterodiegetic    third-person               male
3  ne0004      Galdos      n-heterodiegetic    third-person             n-male
4  ne0005      Galdos        heterodiegetic    third-person             n-male
5  ne0006      Galdos        heterodiegetic    third-person               male
6  ne0007        Sawa        heterodiegetic    third-person             n-male
7  ne0008    Zamacois        heterodiegetic    third-person             n-male
8  ne0009      Galdos        heterodiegetic    third-person             n-male
9  ne0011      Galdos      n-heterodiegetic  n-third-person               male

也可以省略，然后通过

系列

映射：

df_keywords.set_index(['term_type','term_value'], inplace=True)
idx = pd.IndexSlice

#first maping in column narrative-perspective
print (df_keywords.loc[idx['narrative-perspective',:]]['new_model'])
term_value
autodiegetic      n-heterodiegetic
heterodiegetic      heterodiegetic
homodiegetic      n-heterodiegetic
other/mixed       n-heterodiegetic
Name: new_model, dtype: object

L = ['narrative-perspective','narrator','protagonist-gender']
for col in L:
    df_metadata[col] = df_metadata[col].map(df_keywords.loc[idx[col,:]]['new_model'])

print (df_metadata)
     idno author-name narrative-perspective        narrator protagonist-gender
0  ne0001      Baroja      n-heterodiegetic    third-person               male
1  ne0002      Galdos        heterodiegetic    third-person             n-male
2  ne0003      Galdos      n-heterodiegetic    third-person               male
3  ne0004      Galdos      n-heterodiegetic    third-person             n-male
4  ne0005      Galdos        heterodiegetic    third-person             n-male
5  ne0006      Galdos        heterodiegetic    third-person               male
6  ne0007        Sawa        heterodiegetic    third-person             n-male
7  ne0008    Zamacois        heterodiegetic    third-person             n-male
8  ne0009      Galdos        heterodiegetic    third-person             n-male
9  ne0011      Galdos      n-heterodiegetic  n-third-person               male

哇，非常感谢！：）它起作用了！您认为创建元数据列表的最佳方法是什么，以根据这两个文件的输入来重新建模？因为现在我有三个专栏，但明天我可能有20个。。。我这样做了，它是有效的，但您肯定有更好的方法：`list_u=[]；对于df_metadata.columns.values中的列名称：如果列表中的str（列名称）（df_关键字.loc[：，“term_type”]）：list_u.append（列名称）；打印（列表）；`我不需要把评论的代码做得很好：（对不起！我认为最简单的方法是打印（df_关键字.term_type.drop_duplicates（）.tolist（））谢谢！顺便说一句，我在stackOverflow获得了第一个好日子；）