Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/292.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从numpy数组创建字符串列表(非循环解决方案)_Python_Arrays_String_Numpy - Fatal编程技术网

Python 从numpy数组创建字符串列表(非循环解决方案)

Python 从numpy数组创建字符串列表(非循环解决方案),python,arrays,string,numpy,Python,Arrays,String,Numpy,我想拼凑一个新的列表,它是一个字符串,使用numpy数组的两列。然而,我似乎无法在不循环每个元素的情况下实现这一点: import numpy as np test_list = np.tile(np.array([[1,2],[3,4],[5,6]]),(100000,1)) print(test_list[:,0]) print(test_list[:,1]) def dumbstring(points): # Loop through and append a list

我想拼凑一个新的列表,它是一个字符串,使用numpy数组的两列。然而,我似乎无法在不循环每个元素的情况下实现这一点:

import numpy as np
test_list = np.tile(np.array([[1,2],[3,4],[5,6]]),(100000,1))
print(test_list[:,0])
print(test_list[:,1])

def dumbstring(points):
    # Loop through and append a list
    string_pnts = []
    for x in points:
        string_pnts.append("X co-ordinate is %g and y is %g" % (x[0], x[1]))
    return string_pnts

def dumbstring2(points):
    # Prefill a list
    string_pnts = [""] * len(points)
    i = 0
    for x in points:
        string_pnts[i] = ("X co-ordinate is %g and y is %g" % (x[0], x[1]))
        i += 1
    return string_pnts

def numpystring(points):
    return ("X co-ordinate is %g and y is %g" % (points[:,0], points[:,1]))    

def numpystring2(point_x, point_y):
    return ("X co-ordinate is %g and y is %g" % (point_x, point_y))
前两项工作(我原以为预填充会比追加快,但似乎是一样的):

然而,最后一个问题是,我想知道是否没有办法将这个函数矢量化

tnumpystring = numpystring(test_list) # Error
tnumpystring2 = numpystring2(test_list[:,0],test_list[:,1]) # Error
编辑:

我试过熊猫,因为我实际上不需要Numpy,但是速度慢了一点:

import pandas as pd
df = pd.DataFrame(test_list)
df.columns = ['x','y']
% time pdtest = ("X co-ordinate is " + df.x.map(str) + " and y is " + df.y.map(str)).tolist()
print(test[:5])
我也尝试了映射,但这也比通过np循环慢:

def mappy(pt_x,pt_y):
    return("X co-ordinate is %g and y is %g" % (pt_x, pt_y))
%time mtest1 = list(map(lambda x: mappy(x[0],x[1]),test_list))
print(mtest1[:5])
计时:

这里有一个解决方案,使用,首先将您的类型设置为
str

from numpy.core.defchararray import add    
test_list = np.tile(np.array([[1,2],[3,4],[5,6]]),(100000,1)).astype(str)

def stringy_arr(points):
    return add(add('X coordinate is ', points[:,0]),add(' and y coordinate is ', points[:,1]))
稍快的计时:

%timeit stringy_arr(test_list)
1 loops, best of 3: 216 ms per loop

array(['X coordinate is 1 and y coordinate is 2',
       'X coordinate is 3 and y coordinate is 4',
       'X coordinate is 5 and y coordinate is 6', ...,
       'X coordinate is 1 and y coordinate is 2',
       'X coordinate is 3 and y coordinate is 4',
       'X coordinate is 5 and y coordinate is 6'], 
      dtype='|S85')

# Previously tried functions
%time dumbstring(test_list)
1 loops, best of 3: 340 ms per loop

%timeit tdumbstring2 = dumbstring2(test_list)
1 loops, best of 3: 320 ms per loop

%time mtest1 = list(map(lambda x: mappy(x[0],x[1]),test_list))
1 loops, best of 3: 340 ms per loop
编辑

您还可以理解地使用纯python,比我第一次提出的解决方案快得多:

test_list = np.tile(np.array([[1,2],[3,4],[5,6]]),(10000000,1)).astype(str)  #10M
test_list = test_list.tolist()

def comp(points):
    return ['X coordinate is %s Y coordinate is %s' % (x,y) for x,y in points]

%timeit comp(test_list)
1 loops, best of 3: 6.53 s per loop

['X coordinate is 1 Y coordinate is 2',
 'X coordinate is 3 Y coordinate is 4',
 'X coordinate is 5 Y coordinate is 6',
 'X coordinate is 1 Y coordinate is 2',
 'X coordinate is 3 Y coordinate is 4',
 'X coordinate is 5 Y coordinate is 6',...

%timeit dumbstring(test_list)
1 loops, best of 3: 30.7 s per loop

我尝试使用调用
map
,而不是使用
for
循环,但效果不大。据我所见,两点的字符串格式花费的时间最多。我还玩弄了
numpy.savetxt
和一个虚拟的
StringIO
“文件”,但这只会减慢一切。看看这里的相关讨论:谢谢Greg,我也尝试了map,发现它慢了一点。奇怪的是:我试过熊猫,那是较慢的牙床!奇怪,但我检查了10000000,出于某种原因:循环附加列表:25.1s,预填充列表:24.7s,映射lambda:28s,pandas_df:72s,stringy包括str的时间:71s,stringy-ready_数组_string:77s必须在10000000运行它
%timeit dumbstring(test_list)
是1个循环,最好是3:31.3秒/循环,
%timeit stringy_arr(test_list)
1个循环,最好是3:21.5秒/循环
。我不知道是否有真正理想的,这并不奇怪,因为我给出的解决方案仍然是“元素明智的”…凯文,抱歉,但我在我的原始帖子中添加了一个截图,因为我觉得我快疯了。基本for循环对我来说似乎是最快的…在您的图像中,您错过了列表理解函数的一个步骤,将数组转换为列表,然后测试它<代码>测试列表=测试列表.tolist()。看看这是否有帮助。
test_list = np.tile(np.array([[1,2],[3,4],[5,6]]),(10000000,1)).astype(str)  #10M
test_list = test_list.tolist()

def comp(points):
    return ['X coordinate is %s Y coordinate is %s' % (x,y) for x,y in points]

%timeit comp(test_list)
1 loops, best of 3: 6.53 s per loop

['X coordinate is 1 Y coordinate is 2',
 'X coordinate is 3 Y coordinate is 4',
 'X coordinate is 5 Y coordinate is 6',
 'X coordinate is 1 Y coordinate is 2',
 'X coordinate is 3 Y coordinate is 4',
 'X coordinate is 5 Y coordinate is 6',...

%timeit dumbstring(test_list)
1 loops, best of 3: 30.7 s per loop