如何在python中获得最近除以100的数字_Python_Pandas

如何在python中获得最近除以100的数字

python pandas

如何在python中获得最近除以100的数字,python,pandas,Python,Pandas,我想在基于输入列的dataframe中添加一个新列。新添加的列必须按如下方式填充第一行必须填入最近的除以100的数字从下一行开始，将重复输出，直到其与输入值的差值大于或等于100 input output 11700.15 11700 11695.20 11700 11661.00 11700 11630.40 11700 11666.10 11700 11600.30 11700 11600.00 11600 11555.40 1

我想在基于输入列的dataframe中添加一个新列。新添加的列必须按如下方式填充

第一行必须填入最近的除以100的数字

从下一行开始，将重复输出，直到其与输入值的差值大于或等于100

input       output
11700.15    11700
11695.20    11700
11661.00    11700
11630.40    11700
11666.10    11700
11600.30    11700
11600.00    11600
11555.40    11600
11655.20    11600
11699.00    11600
11701.55    11700
11799.44    11700
11604.65    11700
11600.33    11700
11599.65    11600

在pandas中，最优雅的方法是什么？

据我所知，这里没有一种不涉及显式迭代的直观方法，这对于

numpy

和

pandas

来说并不理想。然而，这个问题的时间复杂性是O（n），这使得它成为

numba

库的一个很好的目标。这使我们能够提出一个非常有效的解决方案

关于我的解决方案，我使用

（a+threshold//2）//threshold*threshold

，这与使用

np.round（a，小数=-2）

相比显得冗长。这是由于使用

numba

的

nopython=True

标志的性质，该标志与

np.round

函数不兼容

让我们测试一下：

a = df['input'].values
pd.Series(cumsum_with_threshold(a, 100))

如果要将舍入值与输入值进行比较，而不是实际值，只需对循环中的上述函数进行以下更改，即可给出问题的输出

for i in range(1, s):
   if np.abs(a[i] - d) > t:
       o[i] = r[i]
       # OLD d = a[i]
       d = r[i]
   else:
       o[i] = o[i - 1]

为了测试效率，让我们在一个更大的数据集上运行：

l = np.random.choice(df['input'].values, 10_000_000)

%timeit cumsum_with_threshold(l, 100)
1.54 µs ± 7.93 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

无论如何都不优雅，但我想这是不可能的（可能是错误的！）：

输出：

       input  output  out_check
0   11700.15   11700    11700.0
1   11695.20   11700    11700.0
2   11661.00   11700    11700.0
3   11630.40   11700    11700.0
4   11666.10   11700    11700.0
5   11600.30   11700    11700.0
6   11600.00   11600    11600.0
7   11555.40   11600    11600.0
8   11655.20   11600    11600.0
9   11699.00   11600    11600.0
10  11701.55   11700    11700.0
11  11799.44   11700    11700.0
12  11604.65   11700    11700.0
13  11600.33   11700    11600.0
14  11599.65   11600    11600.0

我确信

输出中的最后两个值必须是1600。
我提出的解决方案：
last = df.loc[0, 'input'].round(-2)
for ix in range(len(df)):
    inp = df.loc[ix, 'input']
    last = inp.round(-2) if abs(inp - last) >= 100 else last
    df.loc[ix, 'output'] = last

这正好产生了OP给出的输出。
我认为熊猫不能做到这一点，可能因为你的输出不正确。最后两行应该是1600。11600.33距离11701.55超过100，其中运行1700starts@user3483203让我们等待OP，但正如我理解的问题，重要的区别是之前的输出和当前输入之间的区别。所以最后两行是（11700-11600.33）<100
和（11700-11599.65>100）
@filippo啊，我相信你是对的。我的答案只有一个字符，所以我将两者都加上alternatives@user3483203是的，菲利波是对的。差异是先前输出和当前输入之间的差异。谢谢。最后一行使用df['out\u check']=（df['input']*ch）.round（-2）.ffill（）。很cleaner@VishnuKunchur谢谢你，伙计。最后2个输出也是正确的。我感兴趣的是前一个输出和当前输入之间的差异。您的解决方案帮助了我。谢谢用户3483203。它像预期的那样工作。我感兴趣的是前一个输出和当前输入之间的差异。非常感谢，非常感谢您的时间和努力。
vals = df1['input'].values
anchor = vals[0]
ch = np.zeros(len(vals))
ch.fill(np.nan)
for i in range(len(vals)):
    if abs(vals[i] - anchor) >= 100:
        anchor = vals[i]
        ch[i] = 1
    else:
        continue
ch[0] = 1

df['out_check'] = pd.Series(100* np.round((df['input'] * ch)/100)).ffill()

       input  output  out_check
0   11700.15   11700    11700.0
1   11695.20   11700    11700.0
2   11661.00   11700    11700.0
3   11630.40   11700    11700.0
4   11666.10   11700    11700.0
5   11600.30   11700    11700.0
6   11600.00   11600    11600.0
7   11555.40   11600    11600.0
8   11655.20   11600    11600.0
9   11699.00   11600    11600.0
10  11701.55   11700    11700.0
11  11799.44   11700    11700.0
12  11604.65   11700    11700.0
13  11600.33   11700    11600.0
14  11599.65   11600    11600.0

last = df.loc[0, 'input'].round(-2)
for ix in range(len(df)):
    inp = df.loc[ix, 'input']
    last = inp.round(-2) if abs(inp - last) >= 100 else last
    df.loc[ix, 'output'] = last