Python 使用索引为数据帧中的特定单元格设置值
我已经创建了一个数据帧Python 使用索引为数据帧中的特定单元格设置值,python,pandas,dataframe,cell,nan,Python,Pandas,Dataframe,Cell,Nan,我已经创建了一个数据帧 df = DataFrame(index=['A','B','C'], columns=['x','y']) 还有这个 x y A NaN NaN B NaN NaN C NaN NaN 但是df的内容没有改变。在数据帧中,它再次仅为NaNs 有什么建议吗?,df.set_值('C','x',10),远远快于我在下面建议的选项。然而,它一直是 展望未来,我们将继续努力 为什么df.xs('C')['x']=10不起作用: 默认情况下,df.xs('C
df = DataFrame(index=['A','B','C'], columns=['x','y'])
还有这个
x y
A NaN NaN
B NaN NaN
C NaN NaN
但是df
的内容没有改变。在数据帧中,它再次仅为NaN
s
有什么建议吗?,df.set_值('C','x',10)
,远远快于我在下面建议的选项。然而,它一直是
展望未来,我们将继续努力
为什么
df.xs('C')['x']=10
不起作用:
默认情况下,df.xs('C')返回一个新的数据帧,所以
df.xs('C')['x']=10
仅修改此新数据帧
df['x']
返回数据帧的视图,所以
df['x']['C'] = 10
修改df
自身
警告:有时很难预测操作是否返回副本或视图。由于这个原因
因此,推荐的替代方案是
df.at['C', 'x'] = 10
它会修改df
建议(根据维护人员)设置值的方法是:
df.ix['x','C']=10
使用“链接索引”(df['x']['C']
)可能会导致问题
见:
.set\u value
方法将被禁用.iat/.at
是很好的替代品,不幸的是,pandas提供的文档很少
最快的方法是使用。此方法比
.ix
方法快约100倍。例如:
df.set\u value('C','x',10)
尝试使用df.loc[row\u index,col\u indexer]=value
这是唯一对我有效的方法
df.loc['C', 'x'] = 10
了解有关
.loc
的更多信息我也在搜索这个主题,我找到了一种方法来迭代一个数据帧,并使用第二个数据帧中的查找值更新它。这是我的密码
src_df = pd.read_sql_query(src_sql,src_connection)
for index1, row1 in src_df.iterrows():
for index, row in vertical_df.iterrows():
src_df.set_value(index=index1,col=u'etl_load_key',value=etl_load_key)
if (row1[u'src_id'] == row['SRC_ID']) is True:
src_df.set_value(index=index1,col=u'vertical',value=row['VERTICAL'])
您还可以使用
.loc
使用条件查找,如下所示:
df.loc[df[<some_column_name>] == <condition>, [<another_column_name>]] = <value_to_add>
df.loc[df[]=,[]=
其中
可以使用.iloc
df.iloc[[2],[0]]=10
如果您不想更改整行的值,而只想更改某些列的值:
x = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
x.iloc[1] = dict(A=10, B=-10)
从0.21.1版开始,您还可以使用.at
方法。与这里提到的.loc
相比有一些不同,但是单值替换的速度更快在我的示例中,我只是在选定的单元格中更改它
for index, row in result.iterrows():
if np.isnan(row['weight']):
result.at[index, 'weight'] = 0.0
“结果”是一个数据字段,列“权重”df.loc['c','x']=10
这将更改cth行和的值
第XT列。除了上面的答案之外,下面是一个基准测试,比较将数据行添加到现有数据帧的不同方法。它表明,对于大型数据帧(至少对于这些测试条件),使用at或set值是最有效的方法
- 为每行创建新的数据帧并。。。
- 。。。附加它(13.0秒)
- 。。。连接它(13.1秒)
- 首先将所有新行存储在另一个容器中,转换为新数据帧一次,然后追加。。。
- 容器=列表列表(2.0秒)
- 容器=列表字典(1.9 s)
- 预先分配整个数据帧,迭代新行和所有列,并使用
- 。。。在(0.6秒)
- 。。。设置_值(0.4秒)
对于测试,使用了一个现有的数据帧,该数据帧包含100000行1000列和随机numpy值。在此数据帧中,添加了100个新行
代码见下文:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Wed Nov 21 16:38:46 2018
@author: gebbissimo
"""
import pandas as pd
import numpy as np
import time
NUM_ROWS = 100000
NUM_COLS = 1000
data = np.random.rand(NUM_ROWS,NUM_COLS)
df = pd.DataFrame(data)
NUM_ROWS_NEW = 100
data_tot = np.random.rand(NUM_ROWS + NUM_ROWS_NEW,NUM_COLS)
df_tot = pd.DataFrame(data_tot)
DATA_NEW = np.random.rand(1,NUM_COLS)
#%% FUNCTIONS
# create and append
def create_and_append(df):
for i in range(NUM_ROWS_NEW):
df_new = pd.DataFrame(DATA_NEW)
df = df.append(df_new)
return df
# create and concatenate
def create_and_concat(df):
for i in range(NUM_ROWS_NEW):
df_new = pd.DataFrame(DATA_NEW)
df = pd.concat((df, df_new))
return df
# store as dict and
def store_as_list(df):
lst = [[] for i in range(NUM_ROWS_NEW)]
for i in range(NUM_ROWS_NEW):
for j in range(NUM_COLS):
lst[i].append(DATA_NEW[0,j])
df_new = pd.DataFrame(lst)
df_tot = df.append(df_new)
return df_tot
# store as dict and
def store_as_dict(df):
dct = {}
for j in range(NUM_COLS):
dct[j] = []
for i in range(NUM_ROWS_NEW):
dct[j].append(DATA_NEW[0,j])
df_new = pd.DataFrame(dct)
df_tot = df.append(df_new)
return df_tot
# preallocate and fill using .at
def fill_using_at(df):
for i in range(NUM_ROWS_NEW):
for j in range(NUM_COLS):
#print("i,j={},{}".format(i,j))
df.at[NUM_ROWS+i,j] = DATA_NEW[0,j]
return df
# preallocate and fill using .at
def fill_using_set(df):
for i in range(NUM_ROWS_NEW):
for j in range(NUM_COLS):
#print("i,j={},{}".format(i,j))
df.set_value(NUM_ROWS+i,j,DATA_NEW[0,j])
return df
#%% TESTS
t0 = time.time()
create_and_append(df)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))
t0 = time.time()
create_and_concat(df)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))
t0 = time.time()
store_as_list(df)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))
t0 = time.time()
store_as_dict(df)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))
t0 = time.time()
fill_using_at(df_tot)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))
t0 = time.time()
fill_using_set(df_tot)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))
set\u value()
已弃用
从0.23.4版开始,熊猫“宣布未来”
考虑到这一建议,下面演示如何使用它们:
- 按行/列整数位置
- 按行/列标签
参考资料:
以下是所有用户为按整数和字符串索引的数据帧提供的有效解决方案的摘要
df.iloc、df.loc和df.at对于这两种类型的数据帧,df.iloc仅适用于行/列整数索引,df.loc和df.at支持使用列名和/或整数索引设置值
当指定的索引不存在时,df.loc和df.at都会将新插入的行/列附加到现有的数据帧中,但df.iloc会引发“Indexer:位置索引器超出范围”。在Python 2.7和3.7中测试的工作示例如下:
import numpy as np, pandas as pd
df1 = pd.DataFrame(index=np.arange(3), columns=['x','y','z'])
df1['x'] = ['A','B','C']
df1.at[2,'y'] = 400
# rows/columns specified does not exist, appends new rows/columns to existing data frame
df1.at['D','w'] = 9000
df1.loc['E','q'] = 499
# using df[<some_column_name>] == <condition> to retrieve target rows
df1.at[df1['x']=='B', 'y'] = 10000
df1.loc[df1['x']=='B', ['z','w']] = 10000
# using a list of index to setup values
df1.iloc[[1,2,4], 2] = 9999
df1.loc[[0,'D','E'],'w'] = 7500
df1.at[[0,2,"D"],'x'] = 10
df1.at[:, ['y', 'w']] = 8000
df1
>>> df1
x y z w q
0 10 8000 NaN 8000 NaN
1 B 8000 9999 8000 NaN
2 10 8000 9999 8000 NaN
D 10 8000 NaN 8000 NaN
E NaN 8000 9999 8000 499.0
将numpy作为np导入,将pandas作为pd导入
df1=pd.DataFrame(索引=np.arange(3),列=['x','y','z'])
df1['x']=['A','B','C']
df1.at[2,'y']=400
#指定的行/列不存在,将新行/列追加到现有数据框
df1.在['D','w']=9000处
df1.loc['E','q']=499
#使用df[]==检索目标行
df1.at[df1['x']=='B','y']=10000
df1.loc[df1['x']=='B',['z','w']]=10000
#使用索引列表设置值
df1.iloc[[1,2,4],2]=9999
df1.loc[[0,'D','E','w']=7500
df1.at[[0,2,“D”],'x']=10
df1.at[:,['y','w']]=8000
df1
>>>df1
x y z w q
0 10 8000南8000南
1 B 8000 9999 8000 NaN
2108009998000NAN
D 10 8000 NaN 8000 NaN
鄂南8000 9999 8000 499.0
.iat/.at
是一个很好的解决方案。
假设您有这个简单的数据框:
A B C
0 1 8 4
1 3 9 6
2 22 33 52
如果我们想修改单元格[0]的值,“A”]
u可以使用以下解决方案之一:
df.iat[0,0]=2
df.at[0,'A']=2
下面是如何使用iat
获取和设置单元格值的完整示例:
def prepossessing(df):
for index in range(0,len(df)):
df.iat[index,0] = df.iat[index,0] * 2
return df
您的列车在:
0
0 54
1 15
2 15
3 8
4 31
5 63
6 11
y_在调用预加载函数后进行训练,该函数将iat
更改为将每个单元格的值乘以2:
0
0 108
1 30
2 30
3 16
4 62
5 126
6 22
要设置值,请使用:
df.at[0, 'clm1'] = 0
- 设置变量的最快推荐方法
set\u value
,ix
已被弃用
- 无警告,与iloc和loc不同
import numpy as np, pandas as pd
df1 = pd.DataFrame(index=np.arange(3), columns=['x','y','z'])
df1['x'] = ['A','B','C']
df1.at[2,'y'] = 400
# rows/columns specified does not exist, appends new rows/columns to existing data frame
df1.at['D','w'] = 9000
df1.loc['E','q'] = 499
# using df[<some_column_name>] == <condition> to retrieve target rows
df1.at[df1['x']=='B', 'y'] = 10000
df1.loc[df1['x']=='B', ['z','w']] = 10000
# using a list of index to setup values
df1.iloc[[1,2,4], 2] = 9999
df1.loc[[0,'D','E'],'w'] = 7500
df1.at[[0,2,"D"],'x'] = 10
df1.at[:, ['y', 'w']] = 8000
df1
>>> df1
x y z w q
0 10 8000 NaN 8000 NaN
1 B 8000 9999 8000 NaN
2 10 8000 9999 8000 NaN
D 10 8000 NaN 8000 NaN
E NaN 8000 9999 8000 499.0
A B C
0 1 8 4
1 3 9 6
2 22 33 52
def prepossessing(df):
for index in range(0,len(df)):
df.iat[index,0] = df.iat[index,0] * 2
return df
0
0 54
1 15
2 15
3 8
4 31
5 63
6 11
0
0 108
1 30
2 30
3 16
4 62
5 126
6 22
df.at[0, 'clm1'] = 0
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(100, 100))
%timeit df.iat[50,50]=50 # ✓
%timeit df.at[50,50]=50 # ✔
%timeit df.set_value(50,50,50) # will deprecate
%timeit df.iloc[50,50]=50
%timeit df.loc[50,50]=50
7.06 µs ± 118 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
5.52 µs ± 64.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
3.68 µs ± 80.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
98.7 µs ± 1.07 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
109 µs ± 1.42 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
conditional_index = df.loc[ df['col name'] <condition> ].index
==5, >10 , =="Any string", >= DateTime
df.loc[conditional_index , [col name]]= <new value>
df.loc[conditional_index, [col1,col2]]= <new value>
df.loc[conditional_index, [col1,col2]]= df.loc[conditional_index,'col name']
df['x'].loc['C':]=10
df
df.loc['C', 'x']=10
df
df.loc[index_position, "column_name"] = some_value
df[0][0] = '"236"76"'
# %timeit df[0][0] = '"236"76"'
# 938 µs ± 83.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
df.at[0, 0] = '"236"76"'
# %timeit df.at[0, 0] = '"236"76"'
#15 µs ± 2.09 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
df.iat[0, 0] = '"236"76"'
# %timeit df.iat[0, 0] = '"236"76"'
# 41.1 µs ± 3.09 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
df.loc[0, 0] = '"236"76"'
# %timeit df.loc[0, 0] = '"236"76"'
# 5.21 ms ± 401 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
df.iloc[0, 0] = '"236"76"'
# %timeit df.iloc[0, 0] = '"236"76"'
# 5.12 ms ± 300 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)