Python 3.x (Python/numpy.where循环)代码需要很长时间(需要帮助来加速代码)

Python 3.x (Python/numpy.where循环)代码需要很长时间(需要帮助来加速代码),python-3.x,performance,numpy,for-loop,Python 3.x,Performance,Numpy,For Loop,下面的代码大约需要5分钟才能在运行此操作的硬件上对大约110'000个项目进行分组。for循环似乎是原因。我希望有人能提出建议,加快这一进程 def group_data(x_epochs, y_data, grouping): x_texts = np.array([dt.datetime.fromtimestamp(i).strftime(grouping) for i in x_epochs], dtype='str') unique_x_texts = np.array(sort

下面的代码大约需要5分钟才能在运行此操作的硬件上对大约110'000个项目进行分组。
for
循环似乎是原因。我希望有人能提出建议,加快这一进程

def group_data(x_epochs, y_data, grouping):
  x_texts = np.array([dt.datetime.fromtimestamp(i).strftime(grouping) for i in x_epochs], dtype='str')
  unique_x_texts = np.array(sorted(set(x_texts)), dtype='str')
  returned_y_data = np.zeros(np.shape(unique_x_texts))

  for ut in unique_x_texts:
    indices = np.where(x_texts == ut)[0]
    y = y_data[indices[-1]] - y_data[indices[0]]
    if y > 0:
      returned_y_data[np.where(unique_x_texts == ut)[0]] = y

  return unique_x_texts, returned_y_data
数据
x_epochs
是unix历元时间值的线性(单调)上升列表。
y\u数据
是累加器仪表读数,通常在时间上上升,但速率不断变化。
grouping
参数指定该数据应如何分组,以便可以计算每组的增量。例如,如果我想返回每小时增量的列表,我会指定
%Y%j%h

是否有任何改进和加速此代码的选项

编辑:
更新代码,感谢@goodvibration的评论。使用
枚举
并取消对
np的一次调用。其中

def group_data(x_epochs, y_data, grouping):
  x_texts = np.array([dt.datetime.fromtimestamp(i).strftime(grouping) for i in x_epochs], dtype='str')
  unique_x_texts = np.array(sorted(set(x_texts)), dtype='str')
  returned_y_data = np.zeros(np.shape(unique_x_texts))

  for idx, ut in enumerate(unique_x_texts):
    indices = np.where(x_texts == ut)[0]
    y = y_data[indices[-1]] - y_data[indices[0]]
    if y > 0:
      returned_y_data[idx] = y

  return unique_x_texts, returned_y_data
不幸的是,throughputtime的收益并不十分显著

编辑:
x_时代的示例

[1584199800 1584200400 1584201000 1584201600 1584202200 1584202800
 1584203400 1584204000 1584204600 1584205200 1584205800 1584206400
 1584207000 1584207600 1584208200 1584208800 1584209400 1584210000
 1584210600 1584211200 1584211800 1584212400 1584213000 1584213600
 1584214200 1584214800 1584215400 1584216000 1584216600 1584217200
 1584217800 1584218400 1584219000 1584219600 1584220200 1584220800
 1584221400 1584222000 1584222600 1584223200 1584223800 1584224400
 1584225000 1584225600 1584226200 1584226800 1584227400 1584228000
 1584228600 1584229200 1584229800 1584230400 1584231000 1584231600
 1584232200 1584232800 1584233400 1584234000 1584234600 1584235200
 1584235800 1584236400 1584237000 1584237600 1584238200 1584238800
 1584239400 1584240000 1584240600 1584241200 1584241800 1584242400
 1584243000 1584243600 1584244200 1584244800 1584245400 1584246000
 1584246600 1584247200 1584247800 1584248400 1584249000 1584249600
 1584250200 1584250800 1584251400 1584252000 1584252600 1584253200
 1584253800 1584254400 1584255000 1584255600 1584256200 1584256800
 1584257400 1584258000 1584258600 1584259200 1584259800 1584260400
 1584261000 1584261600 1584262200 1584262800 1584263400 1584264000
 1584264600 1584265200 1584265800 1584266400 1584267000 1584267600
 1584268200 1584268800 1584269400 1584270000 1584270600 1584271200
 1584271800 1584272400 1584273000 1584273600 1584274200 1584274800
 1584275400 1584276000 1584276600 1584277200 1584277800 1584278400
 1584279000 1584279600 1584280200 1584280800 1584281400 1584282000
 1584282600 1584283200 1584283800 1584284400 1584285000 1584285600
 1584286200 1584286800 1584287400 1584288000 1584288600 1584289200
 1584289800 1584290400 1584291000 1584291600 1584292200 1584292800
 1584293400 1584294000 1584294600 1584295200 1584295800 1584296400
 1584297000 1584297600 1584298200 1584298800 1584299400 1584300000
 1584300600 1584301200 1584301800 1584302400 1584303000 1584303600
 1584304200 1584304800 1584305400 1584306000 1584306600 1584307200
 1584307800 1584308400 1584309000 1584309600 1584310200 1584310800
 1584311400 1584312000 1584312600 1584313200 1584313800 1584314400
 1584315000 1584315600 1584316200 1584316800 1584317400 1584318000
 1584318600 1584319200 1584319800 1584320400 1584321000 1584321600
 1584322200 1584322800 1584323400 1584324000 1584324600 1584325200
 1584325800 1584326400 1584327000 1584327600 1584328200 1584328800
 1584329400 1584330000 1584330600 1584331200 1584331800 1584332400
 1584333000 1584333600 1584334200 1584334800 1584335400 1584336000
 1584336600 1584337200 1584337800 1584338400 1584339000 1584339600
 1584340200 1584340800 1584341400 1584342000 1584342600 1584343200
 1584343800 1584344400 1584345000 1584345600 1584346200 1584346800
 1584347400 1584348000 1584348600 1584349200 1584349800 1584350400
 1584351000 1584351600 1584352200 1584352800 1584353400 1584354000
 1584354600 1584355200 1584355800 1584356400 1584357000 1584357600
 1584358200 1584358800 1584359400 1584360000 1584360600 1584361200
 1584361800 1584362400 1584363000 1584363600 1584364200 1584364800
 1584365400 1584366000 1584366600 1584367200 1584367800 1584368400
 1584369000 1584369600 1584370200 1584370800 1584371400 1584372000
 1584372600 1584373200 1584373800 1584374400 1584375000 1584375600
 1584376200 1584376800 1584377400 1584378000 1584378600 1584379200
 1584379800]
y\u数据的示例

[54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.9718543  55013.61214165
 55046.23509934 55092.59933775 55120.49915683 55232.16887417
 55396.74668874 55587.17537943 55794.46357616 56039.78807947
 56228.94435076 56341.84768212 56392.19055649 56484.82119205
 56505.43377483 56509.92580101 56544.1307947  56624.4205298
 56788.66104553 56901.03311258 56986.18543046 57053.55311973
 57106.50827815 57228.92580101 57307.19205298 57373.13930348
 57426.03703704 57505.9884106  57587.55223881 57766.36363636
 57932.57615894 58124.88723051 58207.29292929 58294.66500829
 58392.36700337 58498.51986755 58501.         58653.12962963
 58951.96517413 59136.54635762 59255.65656566 59324.30845771
 59326.         59346.57504216 59400.54304636 59470.82154882
 59530.70646766 59575.73344371 59609.38888889 59645.1641791
 59678.98675497 59705.51770658 59733.30463576 59747.55960265
 59775.43338954 59783.78807947 59784.         59784.
 59784.         59784.         59784.         59784.97019868
 59785.         59785.         59785.         59785.
 59785.        ]
[  0.           0.           0.           0.           0.
   0.           0.           0.           0.           0.
   0.           0.           0.           0.           0.
   0.           0.           0.           0.           0.
   0.           0.           0.           0.           0.
   0.           0.           0.           0.           0.
   0.           0.           0.           0.           0.
   0.           0.           0.           0.          56.23509934
 701.86423841 465.64569536 476.25962945 372.48391731 701.3045187
 657.30016584 263.99668874 208.16520615  78.48229342   1.
   0.        ]
使用分组=
'%d%Hh'
结果是
唯一的\u x\u文本

['14 16h' '14 17h' '14 18h' '14 19h' '14 20h' '14 21h' '14 22h' '14 23h'
 '15 00h' '15 01h' '15 02h' '15 03h' '15 04h' '15 05h' '15 06h' '15 07h'
 '15 08h' '15 09h' '15 10h' '15 11h' '15 12h' '15 13h' '15 14h' '15 15h'
 '15 16h' '15 17h' '15 18h' '15 19h' '15 20h' '15 21h' '15 22h' '15 23h'
 '16 00h' '16 01h' '16 02h' '16 03h' '16 04h' '16 05h' '16 06h' '16 07h'
 '16 08h' '16 09h' '16 10h' '16 11h' '16 12h' '16 13h' '16 14h' '16 15h'
 '16 16h' '16 17h' '16 18h']
返回的数据

[54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.         54990.
 54990.         54990.         54990.9718543  55013.61214165
 55046.23509934 55092.59933775 55120.49915683 55232.16887417
 55396.74668874 55587.17537943 55794.46357616 56039.78807947
 56228.94435076 56341.84768212 56392.19055649 56484.82119205
 56505.43377483 56509.92580101 56544.1307947  56624.4205298
 56788.66104553 56901.03311258 56986.18543046 57053.55311973
 57106.50827815 57228.92580101 57307.19205298 57373.13930348
 57426.03703704 57505.9884106  57587.55223881 57766.36363636
 57932.57615894 58124.88723051 58207.29292929 58294.66500829
 58392.36700337 58498.51986755 58501.         58653.12962963
 58951.96517413 59136.54635762 59255.65656566 59324.30845771
 59326.         59346.57504216 59400.54304636 59470.82154882
 59530.70646766 59575.73344371 59609.38888889 59645.1641791
 59678.98675497 59705.51770658 59733.30463576 59747.55960265
 59775.43338954 59783.78807947 59784.         59784.
 59784.         59784.         59784.         59784.97019868
 59785.         59785.         59785.         59785.
 59785.        ]
[  0.           0.           0.           0.           0.
   0.           0.           0.           0.           0.
   0.           0.           0.           0.           0.
   0.           0.           0.           0.           0.
   0.           0.           0.           0.           0.
   0.           0.           0.           0.           0.
   0.           0.           0.           0.           0.
   0.           0.           0.           0.          56.23509934
 701.86423841 465.64569536 476.25962945 372.48391731 701.3045187
 657.30016584 263.99668874 208.16520615  78.48229342   1.
   0.        ]

根据您的数据进行测试。试试这个:

def fast_groupie(x_epochs, y_data, grouping):
  y_data = np.array(y_data)
  x_texts = np.array([dt.datetime.fromtimestamp(i).strftime(grouping) for i in x_epochs], dtype='str')

  unique_x_texts, loc1 = np.unique(x_texts, return_index = True)
  loc2 =  len(x_texts)-1-np.unique(np.flip(x_texts), return_index=True)[1]

  y = y_data[loc2]-y_data[loc1]
  returned_y_data = np.where(y>0, y, 0)

  return unique_x_texts, returned_y_data
您提供的函数首先找到唯一的_x_文本,这是对x_文本的完整遍历。然后,如果在唯一的x_文本中有M个项目,它调用np.where M times,这是x_文本的另一个M次遍历。项目越独特,所需时间就越长


上述功能仅通过x_文本两次;它独立于M,因此应该更快一些。

最重要的是,尽量避免for循环,并用或其他类似于map的函数替换它们。按照单词顺序,使用向量和矩阵(数组)执行操作@水星提供了一个很好的例子。Numpy、Scipy和其他Python ML库都有针对大型数据集进行优化和编写的函数。使用它们。

如何在
unique\u x\u text==ut
内部
为ut在unique\u x\u text中使用
??我准备了一个形状为
unique\u x\u text
的数组。然后最后一个
np。where
决定了增量值应该放在数组中的什么位置。@goodvibration等等,我想我知道你在哪里了。我可以在
for
命令中确定索引。我的意思是说在for循环中,
ut
unique\u x\u text
数组中的一个元素,那么它们怎么可能相等呢?不,我使用
np
unique\u text
中查找
ut
的索引。但是多亏了你的评论,我发现我可以使用
enumerate
for
命令中确定索引。这将消除对
np的额外呼叫,其中
授予你奖金,因为你是唯一一个费心帮忙的人。不幸的是,您的代码在野外的速度稍快;-)啊,嗯。但是,我已经学会了一些新的技巧,所以也谢谢你竖起大拇指: