Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/300.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/amazon-s3/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将这些值重新映射到其他值,并同时提供默认值_Python_Pandas - Fatal编程技术网

Python 将这些值重新映射到其他值,并同时提供默认值

Python 将这些值重新映射到其他值,并同时提供默认值,python,pandas,Python,Pandas,我有一个表,我必须在纽约映射两个值,CAits国内值,WT外部值,除此之外,它必须在海外映射 di = {"NY": "Domestic","CA": "Domestic","WT":"OUTSIDE"} df.replace({'Territory': di}) 如何在上述代码中给出海外。因此,在默认情况下,它有(字典中没有任何内容)可供海外使用,它会为不匹配的值返回缺少的值,因此添加了用于将其替换为默认值的值: di = {"NY": "Domestic","CA": "Domestic

我有一个表,我必须在纽约映射两个值,CAits国内值,WT外部值,除此之外,它必须在海外映射

di = {"NY": "Domestic","CA": "Domestic","WT":"OUTSIDE"}

df.replace({'Territory': di})
如何在上述代码中给出
海外
。因此,在默认情况下,它有(字典中没有任何内容)可供海外使用,它会为不匹配的值返回缺少的值,因此添加了用于将其替换为默认值的值:

di = {"NY": "Domestic","CA": "Domestic","WT":"OUTSIDE"}

df.replace({'Territory': di})
df = pd.DataFrame({'Territory':['NY','CA','WT','SK','DE']})
di = {"NY": "Domestic","CA": "Domestic","WT":"OUTSIDE"}
print (df)
  Territory
0        NY
1        CA
2        WT
3        SK
4        DE

df['Territory'] = df['Territory'].map(di).fillna('OVERSEAS')
print (df)
  Territory
0  Domestic
1  Domestic
2   OUTSIDE
3  OVERSEAS
4  OVERSEAS
虽然工作起来比需要的慢,因为它必须首先进行映射,然后返回并填充缺少的元素。如果我们利用Python的内置字典,我们可以显著提高性能

有两种方法可以利用python字典对象的灵活性来创建默认值。一个是使用映射字典上的,另一个是使用。如上所述,
get
defaultdict
方法的优点是,它们避免了在映射后回顾整个系列以替换NAs,而是在映射步骤本身内进行

因此,简而言之,我建议:

df = pd.DataFrame({'Territory':['NY','CA','WT','SK','DE']})
di = {"NY": "Domestic","CA": "Domestic","WT":"OUTSIDE"}
df['Territory'] = df['Territory'].map(lambda x: di.get(x, 'OVERSEAS'))
支持此方法性能的一些时间安排包括:

df = pd.DataFrame({'Territory':['NY','CA','WT','SK','DE']})
di = {"NY": "Domestic","CA": "Domestic","WT":"OUTSIDE"}

%timeit df['Territory'].map(lambda x: di.get(x, 'OVERSEAS'))
>>> 138 µs ± 1.35 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

from collections import defaultdict
dd = defaultdict(lambda:'OVERSEAS')
dd.update(di)   
%timeit df['Territory'].map(di)
>>> 143 µs ± 2.17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit df['Territory'] = df['Territory'].map(di).fillna('OVERSEAS')
>>> 657 µs ± 33.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
对于较大的词典,性能上的差异变得更加明显:

另外值得注意的是,如果没有默认值,那么在Pandas中只映射一个缺少术语的dict似乎很慢

%timeit df['Territory'].map(di)
>>> 372 µs ± 11.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

它没有改变,列值保持不变f['Territory']=df['Territory'].map(di).fillna('OVERSEAS'),它没有改变值扫描您提供的
df
简单到我们可以运行,并详细说明输入/输出预期如果jezrael的解决方案不起作用,请检查
df
是否是另一个数据帧的一部分。