Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/loops/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何加速百万元素的Python嵌套循环_Python_Loops_Numpy_Vector_Astropy - Fatal编程技术网

如何加速百万元素的Python嵌套循环

如何加速百万元素的Python嵌套循环,python,loops,numpy,vector,astropy,Python,Loops,Numpy,Vector,Astropy,我尝试将满足特定条件的两个对象(一个数据集包含约50万个元素,另一个包含约200万个元素)配对,然后将两个对象的信息保存到一个文件中。配对计算中不涉及许多变量,但它们对我的后续分析很重要,因此我需要跟踪这些变量并保存它们。如果有办法将整个分析矢量化,速度会快得多。下面我以随机数为例: import numpy as np from astropy import units as u from astropy.coordinates import SkyCoord from PyAstronomy

我尝试将满足特定条件的两个对象(一个数据集包含约50万个元素,另一个包含约200万个元素)配对,然后将两个对象的信息保存到一个文件中。配对计算中不涉及许多变量,但它们对我的后续分析很重要,因此我需要跟踪这些变量并保存它们。如果有办法将整个分析矢量化,速度会快得多。下面我以随机数为例:

import numpy as np
from astropy import units as u
from astropy.coordinates import SkyCoord
from PyAstronomy import pyasl


RA1 = np.random.uniform(0,360,500000)
DEC1 = np.random.uniform(-90,90,500000)
d = np.random.uniform(55,2000,500000)
z = np.random.uniform(0.05,0.2,500000)
e = np.random.uniform(0.05,1.0,500000)
s = np.random.uniform(0.05,5.0,500000)
RA2 = np.random.uniform(0,360,2000000)
DEC2 = np.random.uniform(-90,90,2000000)
n = np.random.randint(10,10000,2000000)
m = np.random.randint(10,10000,2000000)

f = open('results.txt','a')
for i in range(len(RA1)):
    if i % 50000 == 0:
        print i
    ra1 = RA1[i]
    dec1 = DEC1[i]
    c1 = SkyCoord(ra=ra1*u.degree, dec=dec1*u.degree)
    for j in range(len(RA2)):
        ra2 = RA2[j]
        dec2 = DEC2[j]
        c2 = SkyCoord(ra=ra2*u.degree, dec=dec2*u.degree)

        ang = c1.separation(c2)
        sep = d[i] * ang.radian
        pa = pyasl.positionAngle(ra1, dec1, ra2, dec2)

        if sep < 1.5:
            np.savetxt(f,np.c_[ra1,dec1,sep,z[i],e[i],s[i],n[j],m[j]], fmt = '%1.4f   %1.4f   %1.4f   %1.4f   %1.4f   %1.4f   %i   %i')
将numpy导入为np
从astropy导入单位为u
从astropy.coordinates导入SkyCoord
从pyasl导入pyasl
RA1=np.随机均匀(0360500000)
DEC1=np.随机均匀(-90,90500000)
d=np.随机均匀(552000000)
z=np.随机均匀(0.05,0.2500000)
e=np.随机均匀(0.05,1.0500000)
s=np.随机均匀(0.05,5.0500000)
RA2=np.随机均匀(03602000000)
DEC2=np.随机均匀(-90,902000000)
n=np.random.randint(1010002000000)
m=np.random.randint(1010002000000)
f=打开('results.txt','a')
对于范围内的i(len(RA1)):
如果i%50000==0:
打印i
ra1=ra1[i]
dec1=dec1[i]
c1=SkyCoord(ra=ra1*u.degree,dec=dec1*u.degree)
对于范围内的j(len(RA2)):
ra2=ra2[j]
dec2=dec2[j]
c2=SkyCoord(ra=ra2*u.degree,dec=dec2*u.degree)
ang=c1.分离(c2)
sep=d[i]*ang.弧度
pa=pyasl位置角(ra1、dec1、ra2、dec2)
如果sep<1.5:
np.savetxt(f,np.c_[ra1,dec1,sep,z[i],e[i],s[i],n[j],m[j]],fmt='%1.4f%1.4f%1.4f%1.4f%1.4f%1.4f%1.4f%i%i')

这里是一个在内存中使用缓冲区来减少I/O的实现。注意:为了与Python 3更兼容,我更喜欢使用
io
模块进行文件输入/输出。我认为这是最好的做法。它不会降低您的性能

import io

with io.open('results.txt', 'a') as f:
    buf = io.BytesIO()
    for i in xrange(len(RA1)):
        if i % 50000 == 0:
            print(i)
            f.write(buf.getvalue())
            buf.truncate(0)
        ra1 = RA1[i]
        dec1 = DEC1[i]
        c1 = SkyCoord(ra=ra1 * u.degree, dec=dec1 * u.degree)
        for j in xrange(len(RA2)):
            ra2 = RA2[j]
            dec2 = DEC2[j]
            c2 = SkyCoord(ra=ra2 * u.degree, dec=dec2 * u.degree)

            ang = c1.separation(c2)
            sep = d[i] * ang.radian
            pa = pyasl.positionAngle(ra1, dec1, ra2, dec2)

            if sep < 1.5:
                np.savetxt(buf, np.c_[ra1, dec1, sep, z[i], e[i], s[i], n[j], m[j]],
                           fmt='%1.4f   %1.4f   %1.4f   %1.4f   %1.4f   %1.4f   %i   %i')
    f.write(buf.getvalue())
导入io
将io.open('results.txt','a')作为f:
buf=io.BytesIO()
对于x范围内的i(len(RA1)):
如果i%50000==0:
印刷品(一)
f、 写入(buf.getvalue())
buf.截断(0)
ra1=ra1[i]
dec1=dec1[i]
c1=SkyCoord(ra=ra1*u.degree,dec=dec1*u.degree)
对于x范围内的j(len(RA2)):
ra2=ra2[j]
dec2=dec2[j]
c2=SkyCoord(ra=ra2*u.degree,dec=dec2*u.degree)
ang=c1.分离(c2)
sep=d[i]*ang.弧度
pa=pyasl位置角(ra1、dec1、ra2、dec2)
如果sep<1.5:
np.savetxt(buf,np.c_[ra1,dec1,sep,z[i],e[i],s[i],n[j],m[j]],
fmt='%1.4f%1.4f%1.4f%1.4f%1.4f%1.4f%1.4f%i%i')
f、 写入(buf.getvalue())
注意:在Python 2中,我使用
xrange
而不是
range
来减少内存使用


buf.truncate(0)
可以被这样的新实例替换:
buf=io.BytesIO()
。它可能会更有效…

savetxt
这种方式基本上是

astr = fmt % (ra1,dec1,sep,z[i],e[i],s[i],n[j],m[j])
astr += '\n'  # or include in fmt
f.write(astr)

也就是说,只需将格式化的行写入文件

加速的第一种方法:c2=SkyCoord,按ra2,dec2 len(RA1)次计算每对。您可以通过创建SkyCoord的缓冲区阵列来加速:

f = open('results.txt','a')
C1 = [SkyCoord(ra=ra1*u.degree, dec=DEC1[i]*u.degree) 
      for i, ra1 in enumerate(RA1)] )
C2 = [SkyCoord(ra=ra2*u.degree, dec=DEC2[i]*u.degree) 
      for i, ra2 in enumerate(RA2)] )  # buffer coords

for i, c1 in enumerate(C1):  # we only need enumerate() to get i
    for j, c2 in enumerate(C2):
        ang = c1.separation(c2)  # note we don't have to calculate c2
        if d[i] < 1.5 / ang.radian:
            # now we don't have to multiply every iteration. 
            # The right part is a constant

            # the next line is only executed if objects are close enough
            pa = pyasl.positionAngle(RA1[i], DEC1[i], RA2[j], DEC2[j])
            np.savetxt('...whatever')

你需要问自己的基本问题是:你能减少数据集吗

如果没有,我有一些坏消息:500000*2000000是
1e12
。这意味着你要做一万亿次手术

角度分离涉及到一些三角函数(我认为这里涉及到了
cos
sin
sqrt
),因此每次操作大约需要数百纳秒到微秒。假设每项操作需要1美元,您仍然需要12天来完成此操作。这假设您没有任何Python循环或IO开销,我认为1us对于此类操作是合理的

但肯定有办法对其进行优化:
SkyCoord
允许矢量化,但只允许1D:

# Create the SkyCoord for the longer array once
c2 = SkyCoord(ra=RA2*u.degree, dec=DEC2*u.degree)
# and calculate the seperation from each coordinate of the shorter list
for idx, (ra, dec) in enumerate(zip(RA1, DEC1)):
    c1 = SkyCoord(ra=ra*u.degree, dec=dec*u.degree)
    # x will be the angular seperation with a length of your RA2 and DEC2 arrays
    x = c1.separation(c2)
这将产生几个数量级的加速:

# note that I made these MUCH shorter
RA1 = np.random.uniform(0,360,5)
DEC1 = np.random.uniform(-90,90,5)
RA2 = np.random.uniform(0,360,10)
DEC2 = np.random.uniform(-90,90,10)

def test(RA1, DEC1, RA2, DEC2):
    """Version with vectorized inner loop."""
    c2 = SkyCoord(ra=RA2*u.degree, dec=DEC2*u.degree)
    for idx, (ra, dec) in enumerate(zip(RA1, DEC1)):
        c1 = SkyCoord(ra=ra*u.degree, dec=dec*u.degree)
        x = c1.separation(c2)

def test2(RA1, DEC1, RA2, DEC2):
    """Double loop."""
    for ra, dec in zip(RA1, DEC1):
        c1 = SkyCoord(ra=ra*u.degree, dec=dec*u.degree)
        for ra, dec in zip(RA2, DEC2):
            c2 = SkyCoord(ra=ra*u.degree, dec=dec*u.degree)
            x = c1.separation(c2)

%timeit test(RA1, DEC1, RA2, DEC2)  # 1 loop, best of 3: 225 ms per loop
%timeit test2(RA1, DEC1, RA2, DEC2) # 1 loop, best of 3: 2.71 s per loop
这已经是10倍的速度,而且它的扩展性更好:

RA1 = np.random.uniform(0,360,5)
DEC1 = np.random.uniform(-90,90,5)
RA2 = np.random.uniform(0,360,2000000)
DEC2 = np.random.uniform(-90,90,2000000)

%timeit test(RA1, DEC1, RA2, DEC2)  # 1 loop, best of 3: 2.8 s per loop

# test2 scales so bad I only use 50 elements here
RA2 = np.random.uniform(0,360,50)
DEC2 = np.random.uniform(-90,90,50)
%timeit test2(RA1, DEC1, RA2, DEC2)  # 1 loop, best of 3: 11.4 s per loop
注意,通过对内部循环进行矢量化,我能够在1/4的时间内计算出40000多倍的元素。因此,通过对内部循环进行矢量化,速度应该快约20万倍

在这里,我们计算了3秒钟内5次200万次分离,因此每次操作大约为300纳秒。以这种速度完成这项任务需要3天

即使你也可以把剩下的循环矢量化,我不认为这会产生很大的加速,因为在这个水平上,循环的开销比每个循环的计算时间要小得多。使用
line profiler
支持以下语句:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    11                                           def test(RA1, DEC1, RA2, DEC2):
    12         1       216723 216723.0      2.6      c2 = SkyCoord(ra=RA2*u.degree, dec=DEC2*u.degree)
    13         6          222     37.0      0.0      for idx, (ra, dec) in enumerate(zip(RA1, DEC1)):
    14         5       206796  41359.2      2.5          c1 = SkyCoord(ra=ra*u.degree, dec=dec*u.degree)
    15         5      7847321 1569464.2     94.9          x = c1.separation(c2)
如果从
点击中看不明显,那么这是从5 x 2000000运行中得到的,为了进行比较,这里是在
测试2中运行5x20时得到的:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    17                                           def test2(RA1, DEC1, RA2, DEC2):
    18         6           80     13.3      0.0      for ra, dec in zip(RA1, DEC1):
    19         5       195030  39006.0      0.6          c1 = SkyCoord(ra=ra*u.degree, dec=dec*u.degree)
    20       105         1737     16.5      0.0          for ra, dec in zip(RA2, DEC2):
    21       100      3871427  38714.3     11.8              c2 = SkyCoord(ra=ra*u.degree, dec=dec*u.degree)
    22       100     28870724 288707.2     87.6              x = c1.separation(c2)
test2
的伸缩性更差的原因是
c2=SkyCoord
部分占用了总时间的12%,而不是2.5%,而且每次调用
separation
都有一些显著的开销。因此,真正让它变慢的不是Python循环开销,而是
SkyCoord
构造函数和
分离的静态部分

显然,您需要对
pa
计算和保存到文件进行矢量化(我还没有使用
PyAstronomy
numpy.savetext
,因此我不能在那里提出建议)

但是仍然存在一个问题,那就是在普通计算机上进行一万亿次三角运算是不可行的

关于如何缩短时间的其他一些想法:

  • 使用多处理器使计算机的每个核心并行工作,从理论上讲,这可以通过增加核心数量来加快速度。实际上,这是无法实现的,我建议
    Line #      Hits         Time  Per Hit   % Time  Line Contents
    ==============================================================
        17                                           def test2(RA1, DEC1, RA2, DEC2):
        18         6           80     13.3      0.0      for ra, dec in zip(RA1, DEC1):
        19         5       195030  39006.0      0.6          c1 = SkyCoord(ra=ra*u.degree, dec=dec*u.degree)
        20       105         1737     16.5      0.0          for ra, dec in zip(RA2, DEC2):
        21       100      3871427  38714.3     11.8              c2 = SkyCoord(ra=ra*u.degree, dec=dec*u.degree)
        22       100     28870724 288707.2     87.6              x = c1.separation(c2)
    
    aMask=(abs(RA1[:,None]-RA2[None,:])<2)&(abs(DEC1[:,None]-DEC2[None,:])<2)
    
    locs=np.where(aMask)
    
    (array([   0,    2,    4, ..., 4998, 4999, 4999], dtype=int32),
     array([3575, 1523, 1698, ..., 4869, 1801, 2792], dtype=int32))