Python 如何创建包含最后一个元素的子列表，但对相同大小的所有其他子列表使用通用公式？_Python_Loops_For Loop_Sublist

Python 如何创建包含最后一个元素的子列表，但对相同大小的所有其他子列表使用通用公式？

python loops for-loop

Python 如何创建包含最后一个元素的子列表，但对相同大小的所有其他子列表使用通用公式？,python,loops,for-loop,sublist,Python,Loops,For Loop,Sublist,我有一个很长的列表，我们称之为y长度（y）=500。我不是故意在代码中包含y的对于y中的每个项目，我想找到该项目的平均值及其5个值。当我进入列表中的最后一项时，我遇到了一个问题，因为我需要对下面的一行使用“a+1” a = 0 SMAlist = [] for each_item in y: if a > 4 and a < ((len(y))-1): # finding my averages begin at 6th item b = (y[a-5:a+

我有一个很长的列表，我们称之为

<代码>长度（y）=500。我不是故意在代码中包含y的

对于y中的每个项目，我想找到该项目的平均值及其5个值。当我进入列表中的最后一项时，我遇到了一个问题，因为我需要对下面的一行使用“a+1”

a = 0
SMAlist = []
for each_item in y:
    if a > 4 and a < ((len(y))-1): # finding my averages begin at 6th item
        b = (y[a-5:a+1]) # this line doesn't work for the last item in y
        SMAsix = round((sum(b)/6),2)
        SMAlist.append(SMAsix)
    if a > ((len(y))-2): # this line seems unnecessary. How can I avoid it?
        b = (y[-6:-1]+[y[a]]) # Should I just use negative values in general?
        SMAsix = round((sum(b)/6),2)
        SMAlist.append(SMAsix)
    a = a+1

a=0
SMAlist=[]
对于y中的每个_项：
如果a>4且a<（（len（y））-1）：#从第6项开始计算平均值
b=（y[a-5:a+1]）#该行不适用于y中的最后一项
SMAsix=四舍五入（（和（b）/6），2）
SMAlist.append（SMAsix）
如果a>（（len（y））-2）：#这行似乎没有必要。我怎样才能避免呢？
b=（y[-6:-1]+[y[a]]#我一般应该使用负值吗？
SMAsix=四舍五入（（和（b）/6），2）
SMAlist.append（SMAsix）
a=a+1

你可以选择你的列表，并在区块上建立平均值。链接的答案使用了完整的块，我对其进行了调整，以构建增量块：

通过列表理解滑动平均值：

# Inspiration for a "full" chunk I adapted: https://stackoverflow.com/a/312464/7505395
def overlappingChunks(l, n):
    """Yield overlapping n-sized chunks from l."""
    for i in range(0, len(l)):
        yield l[i:i + n]

somenums = [10406.19,10995.72,11162.55,11256.7,11634.98,12174.25,13876.47,
            18491.18,16908,15266.43]

# avg over sublist-lengths
slideAvg5 = [ round(sum(part)/(len(part)*1.0),2) for part in overlappingChunks(somenums,6)]

print (slideAvg5)

输出：

[11271.73, 11850.11, 13099.36, 14056.93, 14725.22, 15343.27, 16135.52, 
 16888.54, 16087.22, 15266.43]

在平均分区之前，我打算按增量

范围（len（yourlist））

对列表进行分区，但这是因为完整分区已经在这里解决了：我调整了它，以生成增量块，将其应用于您的问题

平均使用哪些分区？

explained = {(idx,tuple(part)): round(sum(part)/(len(part)*1.0),2) for idx,part in
             enumerate(overlappingChunks(somenums,6))}
import pprint
pprint.pprint(explained)

输出（重新格式化）：

选项1：熊猫

import pandas as pd

y = [10406.19,10995.72,11162.55,11256.7,11634.98,12174.25,13876.47,18491.18,16908,15266.43]
series = pd.Series(y)
print(series.rolling(window=6, center=True).mean().dropna().tolist())

选项2:Numpy

import numpy as np
window=6
s=np.insert(np.cumsum(np.array(y)), 0, [0])
output = (s[window :] - s[:-window]) * (1. / window)
print(list(output))

输出

[11271.731666666667, 11850.111666666666, 13099.355, 14056.930000000002, 14725.218333333332]

计时（取决于数据大小）

更新

检查计时代码（适用于Jupyter笔记本电脑）

对@Vivek Kalyanarangan的“拉链”解决方案有一点警告。对于较长的序列，这很容易失去意义。为了清晰起见，让我们使用

float32

：

>>> y = (1000 + np.sin(np.arange(1000000))).astype(np.float32)
>>> window=6
>>> 
# naive zipper solution
>>> s=np.insert(np.cumsum(np.array(y)), 0, [0])
>>> output = (s[window :] - s[:-window]) * (1. / window)
# towards the end the result is clearly wrong
>>> print(output[-10:])
[1024. 1024. 1024. 1024. 1024. 1024. 1024. 1024. 1024. 1024.]
>>> 
# this can be alleviated by first taking the difference and then summing
>>> np.cumsum(np.r_[y[:window].sum(), y[window:]-y[:-window]])/window
array([1000.02936,  999.98285,  999.9521 , ..., 1000.0247 , 1000.05304,
       1000.0367 ], dtype=float32)
>>> 
# compare to last value calculated directly for reference
>>> np.mean(y[-6:])
1000.03217

为了进一步减少错误，您可以将

分块，并在不损失太多速度的情况下每隔一段时间锚定一次累积值。

您能给我们展示一下

的一些元素，让我们知道它是什么样子吗？当然。y=[10406.1910995.7211162.5511256.711634.9812174.2513876.4718491.181690815266.43…]窗口应为6？这并不重要——这个想法很重要：）+1Re你的NumPy解决方案：因为

对我来说并不意味着零（即使是零），你应该先取差，然后求和，以避免失去意义。如果只有500个数字没什么大不了的，但如果不花一分钱，为什么不好好做呢？@PatrickArtner应该这样做

y=[10406.1910995.7211162.5511256.711634.9812174.2513876.4718491.181690815266.4315266.43]

收益率

[11271.731666666666666711850.11166666666666666666130099.355140556.930000000002425.2183333315330.4599999999]

？如果是的话，那么我将全部设置将你的数字复制到我的中，我的看起来不同，因为我还平均了你似乎放弃的“部分”分区-这就是为什么我得到的值比你多，但第一个匹配（四舍五入）。@PatrickArtner我使用jupyter笔记本（编辑答案）！也包括你的统计数据

# Pandas
59.5 µs ± 8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

# Numpy
19 µs ± 4.38 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

# @PatrickArtner's solution
16.1 µs ± 2.98 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%%timeit
import pandas as pd

y = [10406.19,10995.72,11162.55,11256.7,11634.98,12174.25,13876.47,18491.18,16908,15266.43]
series = pd.Series(y)

>>> y = (1000 + np.sin(np.arange(1000000))).astype(np.float32)
>>> window=6
>>> 
# naive zipper solution
>>> s=np.insert(np.cumsum(np.array(y)), 0, [0])
>>> output = (s[window :] - s[:-window]) * (1. / window)
# towards the end the result is clearly wrong
>>> print(output[-10:])
[1024. 1024. 1024. 1024. 1024. 1024. 1024. 1024. 1024. 1024.]
>>> 
# this can be alleviated by first taking the difference and then summing
>>> np.cumsum(np.r_[y[:window].sum(), y[window:]-y[:-window]])/window
array([1000.02936,  999.98285,  999.9521 , ..., 1000.0247 , 1000.05304,
       1000.0367 ], dtype=float32)
>>> 
# compare to last value calculated directly for reference
>>> np.mean(y[-6:])
1000.03217