Python 熊猫:基于json对象替换DataFrame中的列值
我有一个名为“countries”的json对象,如下所示,包含所有国家的ISO代码列表:Python 熊猫:基于json对象替换DataFrame中的列值,python,pandas,Python,Pandas,我有一个名为“countries”的json对象,如下所示,包含所有国家的ISO代码列表: countries = [{"name":"Afghanistan","alpha-2":"AF","country-code":"004"},{"name":"Åland Islands","alpha-2":"AX","country-code":"248"},{"name":"Albania","alpha-2":"AL","country-code":"008"},{"name":"Algeria
countries = [{"name":"Afghanistan","alpha-2":"AF","country-code":"004"},{"name":"Åland Islands","alpha-2":"AX","country-code":"248"},{"name":"Albania","alpha-2":"AL","country-code":"008"},{"name":"Algeria","alpha-2":"DZ","country-code":"012"}]
我有一个带有“国家”列的熊猫数据框:
Country
--------
Albania
Algeria
Algeria
我想用json对象中的“alpha-2”值替换Country列“name”。结果应该是:
Country
---------
AL
DZ
DZ
我正在尝试这样做,它不会给出任何错误,也不会改变值
df['Country'] = df['Country'].replace(lambda y: (x['alpha-2'] for x in countries) if y in (x['name'] for x in countries) else y)
您正在访问df['Country']中的列Country,因此如果您有其他字段以及有问题的alpha-2,那么为什么不简单地使用df['Country']=df['alpha-2'],它无论如何都比lambda快熊猫不建议使用行lambda,原因与pd.Series.apply相同。更好的方法是构造一个映射字典,然后使用矢量化:
如果您已经将Json转换为pandas dataFrame,并且您拥有如列Country所示的dataFrame,那么您可以简单地使用map函数或使用replace方法,这两种方法都可以在这里使用
df['Country'] = df['Country'].map({'Albania': 'AL', 'Algeria': 'DZ'})
或:
或者,您也可以创建一个字典,一次进行多个替换,如下所示
new_vals = {
'Albania': 'AL',
'Algeria': 'DZ',
}
df['Country'].replace(new_vals)
# df['Country'].replace(new_vals, inplace=True)
您可以用这种方法创建一个新的{country:country_code}字典模式,使用country_to_country_code={v['name']:v['alpha-2']表示v in countries},然后只需将您的国家列与这个country_to_country_代码字典映射即可
import pandas as pd
df = pd.DataFrame({"Country":["Albania", "Algeria", "Algeria"]})
countries = [{"name":"Afghanistan","alpha-2":"AF","country-code":"004"},{"name":"Åland Islands","alpha-2":"AX","country-code":"248"},{"name":"Albania","alpha-2":"AL","country-code":"008"},{"name":"Algeria","alpha-2":"DZ","country-code":"012"}]
country_to_country_code= {v['name']:v['alpha-2'] for v in countries}
df.loc[:, 'Country'] = df['Country'].map(country_to_country_code)
print(df)
输出
您可以将列表转换为数据帧,例如df2,然后进行替换
import pandas as pd
countries = [{"name":"Afghanistan","alpha-2":"AF","country-code":"004"},{"name":"Åland Islands","alpha-2":"AX","country-code":"248"},{"name":"Albania","alpha-2":"AL","country-code":"008"},{"name":"Algeria","alpha-2":"DZ","country-code":"012"}]
df2 = pd.DataFrame(countries)
co = [('Country', ['Afghanistan', 'Algeria', 'Albania'])] # your original dataframe with country
df1 = pd.DataFrame.from_items(co)
df1['Country'] = df1['Country'].replace(df2.set_index('name')['alpha-2'])
df1 should look like:
[enter image description here][1]
我认为问题是OP是从json对象开始的,所以他们正在寻找一种方法,如果使用pd.Series.map,则将json转换为单个映射。@jpp,这可能是真的:-,否则映射将是合适的。这在某种意义上是好的,OP希望将json转换为单个映射+1
import pandas as pd
df = pd.DataFrame({"Country":["Albania", "Algeria", "Algeria"]})
countries = [{"name":"Afghanistan","alpha-2":"AF","country-code":"004"},{"name":"Åland Islands","alpha-2":"AX","country-code":"248"},{"name":"Albania","alpha-2":"AL","country-code":"008"},{"name":"Algeria","alpha-2":"DZ","country-code":"012"}]
country_to_country_code= {v['name']:v['alpha-2'] for v in countries}
df.loc[:, 'Country'] = df['Country'].map(country_to_country_code)
print(df)
Country
0 AL
1 DZ
2 DZ
import pandas as pd
countries = [{"name":"Afghanistan","alpha-2":"AF","country-code":"004"},{"name":"Åland Islands","alpha-2":"AX","country-code":"248"},{"name":"Albania","alpha-2":"AL","country-code":"008"},{"name":"Algeria","alpha-2":"DZ","country-code":"012"}]
df2 = pd.DataFrame(countries)
co = [('Country', ['Afghanistan', 'Algeria', 'Albania'])] # your original dataframe with country
df1 = pd.DataFrame.from_items(co)
df1['Country'] = df1['Country'].replace(df2.set_index('name')['alpha-2'])
df1 should look like:
[enter image description here][1]