Python 在numpy中连接数据阵列内的所有阵列

Python 在numpy中连接数据阵列内的所有阵列,python,numpy,Python,Numpy,我用3个数组的所有可能组合生成了一个ndarray,如下所示: countries = ["AF"... "Zw"] names = ["name1",... "nameN"] var_type = ['var1', 'var2', 'var3'] combinations = np.array(np.meshgrid(names, var_type,countries)).T.reshape(-1, 3) arr

我用3个数组的所有可能组合生成了一个ndarray,如下所示:

countries = ["AF"... "Zw"]
names = ["name1",... "nameN"]
var_type = ['var1', 'var2', 'var3']
combinations = np.array(np.meshgrid(names, var_type,countries)).T.reshape(-1, 3)
array([
   "name1-var1-AF",
   "name1-var2-AF",
   "name1-var3-AF",
   ...,
   "nameN-var1-ZW",
   "nameN-var2-ZW",
   "nameN-var3-ZW"
])
columns = []

for column in combinations:
   columns.append(str('-'.join(column))) 
它给出了一个具有以下结果的数据集:

array([
   ['name1', 'var1', 'AF'],
   ['name1', 'var2', 'AF'],
   ['name1', 'var3', 'AF'],
   ...,
   ['nameN', 'var1', 'ZW'],
   ['nameN', 'var2', 'ZW'],
   ['nameN', 'var3', 'ZW']
])
我想加入每个单独的子数组,得到一个新的数组,合并后的值如下:

countries = ["AF"... "Zw"]
names = ["name1",... "nameN"]
var_type = ['var1', 'var2', 'var3']
combinations = np.array(np.meshgrid(names, var_type,countries)).T.reshape(-1, 3)
array([
   "name1-var1-AF",
   "name1-var2-AF",
   "name1-var3-AF",
   ...,
   "nameN-var1-ZW",
   "nameN-var2-ZW",
   "nameN-var3-ZW"
])
columns = []

for column in combinations:
   columns.append(str('-'.join(column))) 
但到目前为止,我在谷歌唯一喜欢的方式是这样的for循环:

countries = ["AF"... "Zw"]
names = ["name1",... "nameN"]
var_type = ['var1', 'var2', 'var3']
combinations = np.array(np.meshgrid(names, var_type,countries)).T.reshape(-1, 3)
array([
   "name1-var1-AF",
   "name1-var2-AF",
   "name1-var3-AF",
   ...,
   "nameN-var1-ZW",
   "nameN-var2-ZW",
   "nameN-var3-ZW"
])
columns = []

for column in combinations:
   columns.append(str('-'.join(column))) 

有没有一种更矢量化的方法来实现这一点???

numpy
不快速编译用于处理字符串的代码-除了适用于任何
dtype
的基本数组操作之外。甚至
np.char
函数也使用基本的python字符串方法

In [12]: countries = ["AF","Zw"] 
    ...: names = ["name1","name2", "nameN"] 
    ...: var_type = ['var1', 'var2', 'var3'] 
    ...: combinations = np.array(np.meshgrid(names, var_type,countries)).T.reshape(-1, 3)   

In [14]: ['-'.join(row) for row in _]                                                                
Out[14]: 
['name1-var1-AF',
 'name1-var2-AF',
 'name1-var3-AF',
 'name2-var1-AF',
 ...
 'nameN-var3-Zw']
这基本上是一个列表操作。在列表上迭代更快

In [18]: timeit ['-'.join(row) for row in combinations]                                              
62.3 µs ± 113 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [19]: timeit ['-'.join(row) for row in combinations.tolist()]                                     
6.55 µs ± 31.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [20]: %%timeit alist = combinations.tolist() 
    ...: ['-'.join(row) for row in alist] 
    ...:  
    ...:                                                                                             
2.88 µs ± 3.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
如果我包括创建组合所花费的时间:

In [29]: %%timeit 
    ...: combinations = np.array(np.meshgrid(names, var_type,countries)).T.reshape(-1, 3)    
    ...: ['-'.join(row) for row in combinations] 
    ...:  
    ...:                                                                                             
164 µs ± 925 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
另一方面,使用
itertools.product

In [30]: timeit ['-'.join(tup) for tup in product(names, var_type, countries)]                       
4.17 µs ± 136 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

这种情况下,
numpy
没有帮助。

确切的数据类型是什么?你能确切地说明原始数组是如何构造的吗?例如,MCVE?编辑了关于如何构造数组的问题,
组合中的列是否始终具有一致的字符数?还是武断?在后一种情况下,使用循环。它可能会更改,因为名称具有不同的长度,因此您无法执行太多操作。您可以通过这种方式使用线性索引和映射位置,但这比只运行python循环要慢得多。Numpy最适合于大小一致的元素,而这些元素不是。