Python 数据帧中单元格中的条件更改值
我想替换“打开”、“高”和“低”的NaN值,使其与“关闭”相同。但是,仅当更改为Python 数据帧中单元格中的条件更改值,python,pandas,dataframe,Python,Pandas,Dataframe,我想替换“打开”、“高”和“低”的NaN值,使其与“关闭”相同。但是,仅当更改为0.00 下面是我的代码 try: url = 'https://api.iextrading.com/1.0/stock/AAME/chart/1y' q_data = pd.read_json(url) if q_data.change == 0.00: q_data.open = q_data.close q_data.high = q_data.clos
0.00
下面是我的代码
try:
url = 'https://api.iextrading.com/1.0/stock/AAME/chart/1y'
q_data = pd.read_json(url)
if q_data.change == 0.00:
q_data.open = q_data.close
q_data.high = q_data.close
q_data.low = q_data.close
except Exception:
print "No data"
continue
问题是try
循环被绕过并转到除外的循环。
如何正确更改数据?这是构建逻辑的一种方法。没有依赖于其他列的内置函数fillna
,但您可以通过组合布尔数组获得相同的结果:
df = pd.read_json('file.json')
mask = (df['change'] == 0).values
for col in ['open', 'high', 'low']:
col_mask = mask & df[col].isnull().values
df.loc[col_mask, col] = df.loc[col_mask, 'close']
绩效基准测试
循环可以比明显的矢量化方法更快。一些pandas
专家可能能够解释性能差异。数据来自@jezrael
df = pd.DataFrame({'close':[100] * 6,
'open':[4,5,4,5,np.nan,4],
'high':[np.nan,8,9,4,2,3],
'low':[1,3,5,7,np.nan,np.nan],
'change':[0,3,6,9,0,4]})
df = pd.concat([df]*10000)
def jp(df):
mask = (df['change'] == 0).values
for col in ['open', 'high', 'low']:
col_mask = mask & df[col].isnull().values
df.loc[col_mask, col] = df.loc[col_mask, 'close']
return df
def jez(df):
cols = ['open', 'high', 'low']
m = df[cols].isnull().values & (df['change'] == 0).values[:, None]
df[cols] = df[cols].mask(m, df['close'], axis=0)
return df
%timeit jp(df) # 9.09 ms
%timeit jez(df) # 13.4 ms
这是构建逻辑的一种方法。没有依赖于其他列的内置函数fillna
,但您可以通过组合布尔数组获得相同的结果:
df = pd.read_json('file.json')
mask = (df['change'] == 0).values
for col in ['open', 'high', 'low']:
col_mask = mask & df[col].isnull().values
df.loc[col_mask, col] = df.loc[col_mask, 'close']
绩效基准测试
循环可以比明显的矢量化方法更快。一些pandas
专家可能能够解释性能差异。数据来自@jezrael
df = pd.DataFrame({'close':[100] * 6,
'open':[4,5,4,5,np.nan,4],
'high':[np.nan,8,9,4,2,3],
'low':[1,3,5,7,np.nan,np.nan],
'change':[0,3,6,9,0,4]})
df = pd.concat([df]*10000)
def jp(df):
mask = (df['change'] == 0).values
for col in ['open', 'high', 'low']:
col_mask = mask & df[col].isnull().values
df.loc[col_mask, col] = df.loc[col_mask, 'close']
return df
def jez(df):
cols = ['open', 'high', 'low']
m = df[cols].isnull().values & (df['change'] == 0).values[:, None]
df[cols] = df[cols].mask(m, df['close'], axis=0)
return df
%timeit jp(df) # 9.09 ms
%timeit jez(df) # 13.4 ms
我建议通过广播在numpy
中使用带和链布尔掩码的非循环解决方案:
df = pd.DataFrame({'close':[100] * 6,
'open':[4,5,4,5,np.nan,4],
'high':[np.nan,8,9,4,2,3],
'low':[1,3,5,7,np.nan,np.nan],
'change':[0,3,6,9,0,4],
'col':[np.nan]*6})
print (df)
change close col high low open
0 0 100 NaN NaN 1.0 4.0
1 3 100 NaN 8.0 3.0 5.0
2 6 100 NaN 9.0 5.0 4.0
3 9 100 NaN 4.0 7.0 5.0
4 0 100 NaN 2.0 NaN NaN
5 4 100 NaN 3.0 NaN 4.0
cols = ['open', 'high', 'low']
m = df[cols].isnull().values & (df['change'] == 0).values[:, None]
df[cols] = df[cols].mask(m, df['close'], axis=0)
#numpy alternative
#df[cols] = np.where(m, df['close'].values[:, None], df[cols])
print (df)
change close col high low open
0 0 100 NaN 100.0 1.0 4.0
1 3 100 NaN 8.0 3.0 5.0
2 6 100 NaN 9.0 5.0 4.0
3 9 100 NaN 4.0 7.0 5.0
4 0 100 NaN 2.0 100.0 100.0
5 4 100 NaN 3.0 NaN 4.0
说明:
boolen系列
存在问题链boolen数据帧
,获取错误:
m = df[cols].isnull() & (df['change'] == 0)
ValueError: operands could not be broadcast together with shapes (18,) (3,)
解决方案是:
因此有必要创建N x 1阵列:
print ((df['change'] == 0).values[:, None])
[[ True]
[False]
[False]
[False]
[ True]
[False]]
m = df[cols].isnull().values & (df['change'] == 0).values[:, None]
print (m)
[[False True False]
[False False False]
[False False False]
[False False False]
[ True False True]
[False False False]]
我建议通过广播在numpy
中使用带和链布尔掩码的非循环解决方案:
df = pd.DataFrame({'close':[100] * 6,
'open':[4,5,4,5,np.nan,4],
'high':[np.nan,8,9,4,2,3],
'low':[1,3,5,7,np.nan,np.nan],
'change':[0,3,6,9,0,4],
'col':[np.nan]*6})
print (df)
change close col high low open
0 0 100 NaN NaN 1.0 4.0
1 3 100 NaN 8.0 3.0 5.0
2 6 100 NaN 9.0 5.0 4.0
3 9 100 NaN 4.0 7.0 5.0
4 0 100 NaN 2.0 NaN NaN
5 4 100 NaN 3.0 NaN 4.0
cols = ['open', 'high', 'low']
m = df[cols].isnull().values & (df['change'] == 0).values[:, None]
df[cols] = df[cols].mask(m, df['close'], axis=0)
#numpy alternative
#df[cols] = np.where(m, df['close'].values[:, None], df[cols])
print (df)
change close col high low open
0 0 100 NaN 100.0 1.0 4.0
1 3 100 NaN 8.0 3.0 5.0
2 6 100 NaN 9.0 5.0 4.0
3 9 100 NaN 4.0 7.0 5.0
4 0 100 NaN 2.0 100.0 100.0
5 4 100 NaN 3.0 NaN 4.0
说明:
boolen系列
存在问题链boolen数据帧
,获取错误:
m = df[cols].isnull() & (df['change'] == 0)
ValueError: operands could not be broadcast together with shapes (18,) (3,)
解决方案是:
因此有必要创建N x 1阵列:
print ((df['change'] == 0).values[:, None])
[[ True]
[False]
[False]
[False]
[ True]
[False]]
m = df[cols].isnull().values & (df['change'] == 0).values[:, None]
print (m)
[[False True False]
[False False False]
[False False False]
[False False False]
[ True False True]
[False False False]]
您能详细介绍一下m=df[cols].isnull().values&(df['change']==0).values[:,None]
?特别是(df['change']==0)。值[:,无]
是的,给我一些时间。你能详细介绍一下m=df[cols].isnull().values&(df['change']==0)。值[:,无]
?特别是(df['change']==0)。值[:,无]
是的,给我一些时间。@jpp-总是有np的解决方案。在哪里
,你可以增加计时还是我可以做?嗯,这里循环应该更快,因为只有3列,如果我认为没有的话。@jezrael,请添加numpy。哪里
基准测试(我的回答还是你的)。我很想知道为什么循环方法更快,这是令人惊讶的。@jpp-总是有np的解决方案。在哪里
,你可以添加计时,还是我可以?嗯,这里循环应该更快,因为只有3列,如果我认为没有的话。@jezrael,请添加numpy。在哪里
基准测试(对于我的答案或你的答案)。我很想知道为什么循环方法更快,这是令人惊讶的。下面的解决方案之一有帮助吗?请随意接受一个(勾选左侧)或要求澄清。下面的解决方案之一有帮助吗?请随意接受一个(勾选左侧)或要求澄清。