Python 熊猫"；diff（）；用绳子_Python_Pandas

Python 熊猫"；diff（）；用绳子

python pandas

Python 熊猫"；diff（）；用绳子,python,pandas,Python,Pandas,如何在每次列更改其字符串值时标记数据帧中的行例：输入使用并比较： dataframe['changed'] = dataframe['ColumnB'] == dataframe['ColumnB'].shift(1).fillna(dataframe['ColumnB']) 对于我的作品，与比较，然后将NaN替换为0，因为之前没有值： df['diff'] = (df.ColumnB != df.ColumnB.shift()).astype(int) df.ix[0,'diff']

如何在每次列更改其字符串值时标记数据帧中的行

例：

输入

使用并比较：

dataframe['changed'] = dataframe['ColumnB'] == dataframe['ColumnB'].shift(1).fillna(dataframe['ColumnB'])

对于我的作品，与比较，然后将

NaN

替换为

，因为之前没有值：

df['diff'] = (df.ColumnB != df.ColumnB.shift()).astype(int)
df.ix[0,'diff'] = 0
print (df)
   ColumnA ColumnB  diff
0        1    Blue     0
1        2    Blue     0
2        3     Red     1
3        4     Red     0
4        5  Yellow     1

编辑另一个答案-最快的是使用

ne

：

df['diff'] = (df.ColumnB.ne(df.ColumnB.shift())).astype(int)
df.ix[0,'diff'] = 0

我使用

ne

而不是使用实际的

来获得更好的性能=比较：
df['changed'] = df['ColumnB'].ne(df['ColumnB'].shift().bfill()).astype(int)

计时
使用以下设置生成更大的数据帧：
df = pd.concat([df]*10**5, ignore_index=True) 

我得到以下时间安排：
%timeit df['ColumnB'].ne(df['ColumnB'].shift().bfill()).astype(int)
10 loops, best of 3: 38.1 ms per loop

%timeit (df.ColumnB != df.ColumnB.shift()).astype(int)
10 loops, best of 3: 77.7 ms per loop

%timeit df['ColumnB'] == df['ColumnB'].shift(1).fillna(df['ColumnB'])
10 loops, best of 3: 99.6 ms per loop

%timeit (df.ColumnB.ne(df.ColumnB.shift())).astype(int)
10 loops, best of 3: 19.3 ms per loop

非常清晰的答案我想知道，这种方法和简单地使用之间是否存在性能差异=
？@jezrael如何基于两列做同样的事情？@Navroop-你认为df[['ColumnA'，'ColumnB']].ne（df[['ColumnA'，'ColumnB']].shift（））.any（axis=1）.astype（int）
？性能说明：使用np.bool
类型而不是整数可能更好np.bool
占用一个字节。我想您可以使用np.int8
，但默认情况下使用np.int64
或np.int64
（无论您的系统上有多长个C），我相信…请您为（df.ColumnB.ne（df.ColumnB.shift（））添加计时。astype（int）
？@jezrael:添加计时。使用ix
使第一行0的计时增加了~1ms，所以这样看起来速度最快。嗨，我在脚本中使用了这个答案，但它返回了“SettingWithCopyWarning”，你们看到了吗？dff['changed']=dff.col1.ne（dff.col1.shift（1））@root如何获取状态计数的移位？也就是说，Blue->Red
，Red->Yellow，与WARE的顺序相同detected@root在中间有“<代码>红色< /代码>”，我能直接知道从“代码>蓝色>代码>到代码>黄色<代码>的状态变化吗？
%timeit df['ColumnB'].ne(df['ColumnB'].shift().bfill()).astype(int)
10 loops, best of 3: 38.1 ms per loop

%timeit (df.ColumnB != df.ColumnB.shift()).astype(int)
10 loops, best of 3: 77.7 ms per loop

%timeit df['ColumnB'] == df['ColumnB'].shift(1).fillna(df['ColumnB'])
10 loops, best of 3: 99.6 ms per loop

%timeit (df.ColumnB.ne(df.ColumnB.shift())).astype(int)
10 loops, best of 3: 19.3 ms per loop