Python 熊猫:选择列,如果不存在,则为默认值
假设我有以下数据帧:Python 熊猫:选择列,如果不存在,则为默认值,python,pandas,dataframe,Python,Pandas,Dataframe,假设我有以下数据帧: >>> df val1 val2 val3 key 1 1 1 1 2 2 2 2 3 3 3 3 现在我要选择列val1、val2,这里是kicker:val4 我想要的是: >>> df.something(something) val1 val2 val4 key 1 1 1 NaN 2 2 2 Na
>>> df
val1 val2 val3
key
1 1 1 1
2 2 2 2
3 3 3 3
现在我要选择列val1、val2,这里是kicker:val4
我想要的是:
>>> df.something(something)
val1 val2 val4
key
1 1 1 NaN
2 2 2 NaN
3 3 3 NaN
IIUC重新索引
另外.loc也可以这样做,但会引发一个警告:传递list likes to.loc或[]以及任何缺少的标签将在将来引发keyrerror,您可以使用.reindex作为替代方法
df.loc[:,["val1", "val2", "val4"]]
像这样的事情应该让你开始:
import pandas as pd
import numpy as np
df = pd.DataFrame([[1, 1, 1], [2, 2, 2], [3, 3, 3]], columns=['val1', 'val2', 'val3'])
def check_columns(df, values):
temp = pd.DataFrame()
for i in values:
try:
temp[i] = df[i]
except:
temp[i] = np.nan
return temp
print(check_columns(df, ['val1', 'val2', 'val3']))
print(check_columns(df, ['val1', 'val2', 'val4']))
给出:
val1 val2 val3
0 1 1 1
1 2 2 2
2 3 3 3
val1 val2 val4
0 1 1 NaN
1 2 2 NaN
2 3 3 NaN
import pandas as pd
import numpy as np
df = pd.DataFrame([[1, 1, 1], [2, 2, 2], [3, 3, 3]], columns=['val1', 'val2', 'val3'])
def check_columns(df, values):
temp = pd.DataFrame()
for i in values:
try:
temp[i] = df[i]
except:
temp[i] = np.nan
return temp
print(check_columns(df, ['val1', 'val2', 'val3']))
print(check_columns(df, ['val1', 'val2', 'val4']))
val1 val2 val3
0 1 1 1
1 2 2 2
2 3 3 3
val1 val2 val4
0 1 1 NaN
1 2 2 NaN
2 3 3 NaN