Python 在具有更改值的新dataframe中返回的dataframe列
该函数返回我正在查找的信息片段的一个很好的列表。如何返回“状态”列但替换为:Python 在具有更改值的新dataframe中返回的dataframe列,python,pandas,dataframe,Python,Pandas,Dataframe,该函数返回我正在查找的信息片段的一个很好的列表。如何返回“状态”列但替换为: def get_list_of_university_towns(): states = {'CA' : 'California', 'SC' : 'South Carolina'} df = pd.read_csv(filename) # filename.csv has many columns 'State' and 'RegionName' are within df_res = df[['S
def get_list_of_university_towns():
states = {'CA' : 'California', 'SC' : 'South Carolina'}
df = pd.read_csv(filename) # filename.csv has many columns 'State' and 'RegionName' are within
df_res = df[['State', 'RegionName']]
return df_res
我尝试了返回[df_res.loc[:,'State'].replace(states),df['RegionName']]
,但它返回2个数据帧。我知道可以在原始的df中进行替换,但我可以让df保持原样吗?第一个解决方案列分别:
df_res.loc[:, 'State'].replace(states)
另一种解决方案是在dict
中定义replace
列:
def get_list_of_university_towns():
states = {'CA' : 'California', 'SC' : 'South Carolina'}
df = pd.read_csv(filename)
df_res = df[['State', 'RegionName']]
df_res['State'] = df_res['State'].replace(states)
return df_res
样本:
def get_list_of_university_towns():
states = {'CA' : 'California', 'SC' : 'South Carolina'}
df = pd.read_csv(filename)
df_res = df[['State', 'RegionName']].replace({'State':states})
return df_res
我认为这里的关键是复制原始df,然后使用重新分配或
inplace
参数修改列。下面是我用来测试示例的df定义
df = pd.DataFrame({'State':['SC','CA'], 'RegionName':['CA','SC'], 'col':[5,8]})
states = {'CA' : 'California', 'SC' : 'South Carolina'}
df_res = df[['State', 'RegionName']].replace({'State':states})
print (df_res)
State RegionName
0 South Carolina CA
1 California SC
print (df)
RegionName State col
0 CA SC 5
1 SC CA 8
结果:
import pandas as pd
df = pd.DataFrame({'State': ['CA', 'SC', 'CA', 'SC', 'CA', 'SC', 'CA', 'SC'],
'RegionName': ['SW', 'NE', 'SW', 'NE', 'SW', 'NE', 'SW', 'NE'],
'College': ['College1', 'College2', 'College1', 'College2', 'College1', 'College2', 'College1', 'College2']})
从那里,我复制了df,并使用了您的字典,states={'CA':'California',SC':'South Carolina'}
,来替换新df中的列
College RegionName State
0 College1 SW CA
1 College2 NE SC
2 College1 SW CA
3 College2 NE SC
4 College1 SW CA
5 College2 NE SC
6 College1 SW CA
7 College2 NE SC
但可能看起来像:
df_res = df.loc[:, ['State', 'RegionName']]
df_res.State.replace(states, inplace=True)
这导致:
df=
df_res=
df_res = df.loc[:, ['State', 'RegionName']]
df_res['State'] = df_res.State.replace(states)
College RegionName State
0 College1 SW CA
1 College2 NE SC
2 College1 SW CA
3 College2 NE SC
4 College1 SW CA
5 College2 NE SC
6 College1 SW CA
7 College2 NE SC
State RegionName
0 California SW
1 South Carolina NE
2 California SW
3 South Carolina NE
4 California SW
5 South Carolina NE
6 California SW
7 South Carolina NE