Python熊猫:如何在符号更改时选择行,并具有最小值?
我试图找到函数穿过x=0线的位置。我利用了一个事实,即当函数穿过x轴时,其符号会发生变化 现在,我有一个这样的数据框,我想找到最接近零的两行,假设函数在两个点穿过x轴Python熊猫:如何在符号更改时选择行,并具有最小值?,python,pandas,numpy,Python,Pandas,Numpy,我试图找到函数穿过x=0线的位置。我利用了一个事实,即当函数穿过x轴时,其符号会发生变化 现在,我有一个这样的数据框,我想找到最接近零的两行,假设函数在两个点穿过x轴 A value 0 105 0.662932 1 105 0.662932 2 107 0.052653 # sign changes here when A is 107 3 108 -0.228060 # among these two A 107 is closer to zero 4 11
A value
0 105 0.662932
1 105 0.662932
2 107 0.052653 # sign changes here when A is 107
3 108 -0.228060 # among these two A 107 is closer to zero
4 110 -0.740819
5 112 -1.188906
6 142 -0.228060 # sign changes here when A is 142
7 143 0.052654 # among these two, A 143 is closer to zero
8 144 0.349638
所需输出:
A value
2 107 0.052653
7 143 0.052654
作为pd进口熊猫
数据=[
[105, 0.662932],
[105, 0.662932],
[107,0.052653],符号在此之间发生变化
[108,-0.228060],这里;第一行的'value'更接近于0
[110, -0.740819],
[112, -1.188906],
[142,-0.228060],符号在此之间变化
[143,0.052654],这里;第二行的'value'接近于0
[144, 0.349638],
]
df=pd.DataFramedata,列=[A,值]
如果两个元素之间的符号相同,则差值为0
否则,对于这个用例,2或-2并不重要
使用句点=1和=-1向前和向后进行差异
sign=df.value.mapnp.sign
diff1=符号。diffperiods=1.0
diff2=符号。diffperiods=-1.0
现在我们有了发生符号变化的位置。我们只需要提取
“值”在这些位置取值,以确定两种可能性中的哪一种
为每个符号更改选择“值”接近0的值
df1=df.loc[diff1[diff1!=0]。索引]
df2=df.loc[diff2[diff2!=0]。索引]
idx=np.wherebsdf1.value.values您可以使用numpy概括该方法: 这个解决方案也将更加有效,尤其是随着规模的扩大
In [122]: df = pd.concat([df]*100)
In [123]: %timeit chris(df)
870 µs ± 10 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [124]: %timeit nathan(df)
2.03 s ± 10.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [125]: %timeit df.loc[find_closest_to_zero_idx(df.value.values)]
1.81 ms ± 12.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
您可以使用numpy概括该方法: 这个解决方案也将更加有效,尤其是随着规模的扩大
In [122]: df = pd.concat([df]*100)
In [123]: %timeit chris(df)
870 µs ± 10 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [124]: %timeit nathan(df)
2.03 s ± 10.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [125]: %timeit df.loc[find_closest_to_zero_idx(df.value.values)]
1.81 ms ± 12.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
我设法找到了一个简单的解决方案:
import numpy as np
import pandas as pd
data = [
[105, 0.662932],
[105, 0.662932],
[107, 0.052653], # sign changes between here
[108, -0.228060], # and here; first row has `value` closer to 0
[110, -0.740819],
[112, -1.188906],
[142, -0.228060], # sign changes between here
[143, 0.052654], # and here; second row has `value` closer to 0
[144, 0.349638],
]
df = pd.DataFrame(data, columns=["A", "value"]
解决方案
慢而纯的方法
我设法找到了一个简单的解决方案:
import numpy as np
import pandas as pd
data = [
[105, 0.662932],
[105, 0.662932],
[107, 0.052653], # sign changes between here
[108, -0.228060], # and here; first row has `value` closer to 0
[110, -0.740819],
[112, -1.188906],
[142, -0.228060], # sign changes between here
[143, 0.052654], # and here; second row has `value` closer to 0
[144, 0.349638],
]
df = pd.DataFrame(data, columns=["A", "value"]
解决方案
慢而纯的方法
看看np.sign np.diff和np.where。使用这些,您可以隔离到数字的符号,使用diff检查符号何时更改,并使用np.where获取符号实际更改位置的索引。将其组合在一起应该不会太困难。它可以按顺序值穿过x轴,即,-1,1,-1?请查看np.sign np.diff和np.where。使用这些,您可以隔离到数字的符号,使用diff检查符号何时更改,并使用np.where获取符号实际更改位置的索引。把它们组合在一起应该不会太困难。它可以按顺序值穿过x轴,即,-1,1,-1?它给出了错误的答案。第二行应该是7 143 0.052654索引7,A是143而不是142。刚刚更新-我以为你想要接近0,但你的意思是接近0的值要找到符号变化的第一个索引,我找到了另一种方法idx=np.argwherenp.diffnp.signdf.A.values*0-df.value.values!=0给出2,6给出错误答案。第二行应该是7 143 0.052654索引7,A是143而不是142。刚刚更新-我以为你想要接近0,但你的意思是接近0的值要找到符号变化的第一个索引,我找到了另一种方法idx=np.argwherenp.diffnp.signdf.A.values*0-df.value.values!=0给出2和6这将只找到前两个occurences@user3483203修正了这个错误,现在它可以处理任意数量的事件。谢谢你找到这个错误。我把你的时间加进去了。您的解决方案总是比Nathan的快,但是显式迭代意味着它比我的纯numpy解决方案慢。因此,我的代码有一个缺点,所有出现的x轴前后必须有两个值。这只会找到前两个occurences@user3483203修正了错误,现在,它适用于任意数量的事件。谢谢你找到这个错误。我把你的时间加进去了。您的解决方案总是比Nathan的快,但是显式迭代意味着它将比我的纯numpy解决方案慢。因此,我的代码有一个缺点,对于所有出现的情况,x轴前后必须有两个值。
import numpy as np
import pandas as pd
data = [
[105, 0.662932],
[105, 0.662932],
[107, 0.052653], # sign changes between here
[108, -0.228060], # and here; first row has `value` closer to 0
[110, -0.740819],
[112, -1.188906],
[142, -0.228060], # sign changes between here
[143, 0.052654], # and here; second row has `value` closer to 0
[144, 0.349638],
]
df = pd.DataFrame(data, columns=["A", "value"]
def find_closest_to_zero_idx(arr):
fx = np.zeros(len(arr))
fy = np.array(arr)
# lower index when sign changes in array
idx = np.argwhere((np.diff(np.sign(fx - fy)) != 0) )
nearest_to_zero = []
# test two values before and after zero which is nearer to zero
for i in range(len(idx)):
if abs(arr[idx[i][0]]) < abs(arr[idx[i][0]+1]):
nearer = idx[i][0]
nearest_to_zero.append(nearer)
else:
nearer = idx[i][0]+1
nearest_to_zero.append(nearer)
return nearest_to_zero
idx = find_closest_to_zero_idx(df.value.values)
idx = find_closest_to_zero_idx(df.value.values)
df.loc[idx]
A value
2 107 0.052653
7 143 0.052654
df['value_shifted'] = df.value.shift(-1)
df['sign_changed'] = np.sign(df.value.values) * np.sign(df.value_shifted.values)
# lower index where sign changes
idx = df[df.sign_changed == -1.0].index.values
# make both lower and upper index from the a-axis negative so that
# we can groupby later.
for i in range(len(idx)):
df.loc[ [idx[i], idx[i]+1], 'sign_changed'] = -1.0 * (i+1)
df1 = df[ np.sign(df.sign_changed) == -1.0]
df2 = df1.groupby('sign_changed')['value'].apply(lambda x: min(abs(x)))
df3 = df2.reset_index()
answer = df.merge(df3,on=['sign_changed','value'])
answer
A value value_shifted sign_changed
0 107 0.052653 -0.228060 -1.0
1 143 0.052654 0.349638 -2.0