Python 熊猫：基于与另一列的匹配替换列值_Python_Python 2.7_Pandas_Dataframe

Python 熊猫：基于与另一列的匹配替换列值

python python-2.7 pandas dataframe

Python 熊猫：基于与另一列的匹配替换列值,python,python-2.7,pandas,dataframe,Python,Python 2.7,Pandas,Dataframe,我在第一个数据帧df1[“ItemType”]中有一列，如下所示 Dataframe1 ItemType1 redTomato whitePotato yellowPotato greenCauliflower yellowCauliflower yelloSquash redOnions YellowOnions WhiteOnions yellowCabbage GreenCabbage 我需要根据从另一个数据帧创建的字典来替换它 Dataframe2 ItemType2

我在第一个数据帧

df1[“ItemType”]

中有一列，如下所示

Dataframe1

ItemType1
redTomato
whitePotato
yellowPotato
greenCauliflower
yellowCauliflower
yelloSquash
redOnions
YellowOnions
WhiteOnions
yellowCabbage
GreenCabbage

我需要根据从另一个数据帧创建的字典来替换它

Dataframe2

ItemType2          newType
whitePotato        Potato
yellowPotato       Potato
redTomato          Tomato
yellowCabbage   
GreenCabbage    
yellowCauliflower   yellowCauliflower
greenCauliflower    greenCauliflower
YellowOnions        Onions
WhiteOnions         Onions
yelloSquash         Squash
redOnions           Onions

注意,

在
```
dataframe2
```
中，一些
```
ItemType
```
与中的
```
ItemType
```
相同
```
dataframe1
```
dataframe2中的一些
```
ItemType
```
具有
```
null
```
值，如yellowcolla

dataframe2中的

ItemType

与

dataframe中的ItemType
不符合顺序


如果对应的Dataframe2
ItemType
中的值与ItemType
中的值匹配，我需要将Dataframe1
列中的值替换为newType
，并记住在项目符号中列出的上述例外情况。

如果没有匹配项，则值需要保持原样[无更改]
到目前为止我得到的是
import pandas as pd

#read second `csv-file`
df2 = pd.read_csv('mappings.csv',names = ["ItemType", "newType"])
#conver to dict
df2=df2.set_index('ItemType').T.to_dict('list')

下面给出的“匹配时替换”不起作用。它们插入的是NaN
值，而不是实际值。这些都是基于这样的讨论
或
提前谢谢
编辑

两个数据帧中的两个列标题具有不同的名称。所以，上的dataframe1列是ItemType1，第二个数据帧中的第一列是ItemType2。在第一次编辑时错过了
 此方法要求您将列名设置为“type”，然后可以使用merge和np.where进行设置
df3 = df1.merge(df2,how='inner',on='type')['type','newType']

df3['newType'] = np.where(df['newType'].isnull(),df['type'],df['newType'])

您可以将df2
转换为一个由'ItemType2'
索引的系列，然后在df1
上使用：
# Make df2 a Series indexed by 'ItemType'.
df2 = df2.set_index('ItemType2')['newType'].dropna()

# Replace values in df1.
df1['ItemType1'] = df1['ItemType1'].replace(df2)

或者在一行中，如果您不想更改df2
：
df1['ItemType1'] = df1['ItemType1'].replace(df2.set_index('ItemType2')['newType'].dropna())

使用map

您需要的所有逻辑：
def update_type(t1, t2, dropna=False):
    return t1.map(t2).dropna() if dropna else t1.map(t2).fillna(t1)

让我们将'ItemType2'
作为Dataframe2

update_type(Dataframe1.ItemType1,
            Dataframe2.set_index('ItemType2').newType)

0                Tomato
1                Potato
2                Potato
3      greenCauliflower
4     yellowCauliflower
5                Squash
6                Onions
7                Onions
8                Onions
9         yellowCabbage
10         GreenCabbage
Name: ItemType1, dtype: object


验证


时机
您好-感谢您的快速重播。我对这个问题做了一些细微的修改。两个数据帧中的两个列标题具有不同的名称。因此，上的dataframe1列是ItemType1
，第二个dataframe中的第一列是ItemType2
。此外，上面的解决方案给出的错误是keyrorm:'type'
关于'type'的错误和ItemType1和ItemType2的问题是相同的。具体地说，我试图加入'type'，而实际上df没有'type'列，而是有'ItemType1'和'ItemType2'。就我个人而言，我会在df和procedure中将列重命名为ItemType。但提供的其他解决方案可能更适合您的具体需求。时间测量万岁+1Hi-thx响应，仍在检查。解决方案还需要在第二个数据框newType
列中省略/删除相应值为empty
，null
的ItemType1
项。所以在这里，它应该放下黄色卷心菜和绿色卷心菜，这样它们就不会出现在最后的表格中。@Anil\M这很简单，我强行把它们放了回去。更新了带有可选dropna参数的帖子。@piRSquared-我得到了NameError:没有为返回t1.map（t2）.dropna（）定义全局名称“dropna”
，如果dropna else t1.map（t2）.fillna（t1）
@Anil_M您是否在函数定义的签名中添加了dropna=False。首先，当我现在尝试在1列上运行这个时，我得到了一个MemoryError，对此可以做些什么。第二个问题，我试图在我现在做的工作中使用它，但我需要更复杂的东西。我想将这一列应用于一个包含大量列（大约100）和行的大型数据帧。我将如何修改代码以实现这一点？
def update_type(t1, t2, dropna=False):
    return t1.map(t2).dropna() if dropna else t1.map(t2).fillna(t1)

update_type(Dataframe1.ItemType1,
            Dataframe2.set_index('ItemType2').newType)

0                Tomato
1                Potato
2                Potato
3      greenCauliflower
4     yellowCauliflower
5                Squash
6                Onions
7                Onions
8                Onions
9         yellowCabbage
10         GreenCabbage
Name: ItemType1, dtype: object

update_type(Dataframe1.ItemType1,
            Dataframe2.set_index('ItemType2').newType,
            dropna=True)

0                Tomato
1                Potato
2                Potato
3      greenCauliflower
4     yellowCauliflower
5                Squash
6                Onions
7                Onions
8                Onions
Name: ItemType1, dtype: object

updated = update_type(Dataframe1.ItemType1, Dataframe2.set_index('ItemType2').newType)

pd.concat([Dataframe1, updated], axis=1, keys=['old', 'new'])

def root(Dataframe1, Dataframe2):
    return Dataframe1['ItemType1'].replace(Dataframe2.set_index('ItemType2')['newType'].dropna())

def piRSquared(Dataframe1, Dataframe2):
    t1 = Dataframe1.ItemType1
    t2 = Dataframe2.set_index('ItemType2').newType
    return update_type(t1, t2)