Python 带字符数组的Numpy ufunc.at

Python 带字符数组的Numpy ufunc.at,python,string,numpy,Python,String,Numpy,有没有办法使用numpy ufunc.at(特别是add.at)来连接字符串数组?add.at或char.add.at均不适用于字符串/字符数组 该方法需要处理n维数组,所以基于索引进行拆分然后进行连接并不理想 a = np.array(['a', 'b']) ixs = np.array([0, 1, 1]) vals = np.array(['e', 'f', 'g]) # Neither of these options work np.add.at(a, ixs, vals) -

有没有办法使用numpy ufunc.at(特别是add.at)来连接字符串数组?add.at或char.add.at均不适用于字符串/字符数组

该方法需要处理n维数组,所以基于索引进行拆分然后进行连接并不理想

a = np.array(['a', 'b'])
ixs =  np.array([0, 1, 1])
vals = np.array(['e', 'f', 'g])

# Neither of these options work

np.add.at(a, ixs, vals)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-29-fb8e3bd48930> in <module>()
      2 ixs =  np.array([0, 1])
      3 vals = np.array(['e', 'e'])
----> 4 np.add.at(a, ixs, vals)

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')



np.char.add.at(a, ixs, vals)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-30-e1bb1f7868dd> in <module>()
      2 ixs =  np.array([0, 1])
      3 vals = np.array(['e', 'e'])
----> 4 np.char.add.at(a, ixs, vals)

AttributeError: 'function' object has no attribute 'at'


a=np.array(['a','b'])
ixs=np.array([0,1,1])
VAL=np.数组(['e','f','g])
#这两种选择都不起作用
np.add.at(a、ixs、VAL)
---------------------------------------------------------------------------
TypeError回溯(最近一次调用上次)
在()
2 ixs=np.数组([0,1])
3 VAL=np.数组(['e','e'])
---->4 np.附加值(a、ixs、VAL)
TypeError:ufunc“add”不包含签名类型与dtype匹配的循环('
您的错误:

In [280]: np.char.add.at(a, ixs, vals)                                       
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-280-f06ad4d86cfb> in <module>
----> 1 np.char.add.at(a, ixs, vals)

AttributeError: 'function' object has no attribute 'at'

In [281]: np.add.at(a, ixs, vals)                                            
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-281-683423808141> in <module>
----> 1 np.add.at(a, ixs, vals)

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')

numpy
ufuncs倾向于(总是?)通过将操作委托给对象的相应方法来对对象数据类型数组进行操作。
add
是为Python字符串定义的,因此
add.at
可以根据需要工作。

好吧,您可以一直依靠数字!因此,这里有一个使用数组操作的方法,适用于大型数据集-

def char_add_at(a, ixs, vals):
    an = a.view('i1').reshape(len(a),-1)
    vn = vals.view('i1').reshape(len(vals),-1)

    s = (vn!=0).sum(1)
    vnc = np.bincount(ixs,s).astype(int)
    anc = (an!=0).sum(1)
    tnc = anc + vnc

    r = len(anc)
    c = tnc.max()+1
    out_ar = np.zeros((r,c), dtype=np.uint8)

    out_ar[:,:anc.max()] = an

    fill_mask = tnc[:,None] > np.arange(c)
    fill_mask &= out_ar==0
    out_ar[fill_mask] = vn[vn!=0]

    out = out_ar.view('S'+str(c)).ravel()
    return out
样本运行-

In [671]: a = np.array(['a', 'bz', 'cer'])
     ...: ixs =  np.array([0, 1, 1, 2])
     ...: vals = np.array(['ez', 'fieabcdef', 'gwop', 'H'])

In [672]: char_add_at(a, ixs, vals)
Out[672]: array(['aez', 'bzfieabcdefgwop', 'cerH'], dtype='|S16')
时间安排-

案例1:将样本数据集放大100倍

In [675]: # Sample setup
     ...: a = np.array(['a', 'bz', 'cer'])
     ...: ixs =  np.array([0, 1, 1, 2])
     ...: vals = np.array(['ez', 'fieabcdef', 'gwop', 'H'])
     ...: 
     ...: # Scale up sample dataset
     ...: N = 100 # scale up factor
     ...: a = np.hstack(([a]*N))
     ...: ixs = (ixs + (ixs.max()+1)*np.arange(N)[:,None]).ravel()
     ...: vals = np.hstack(([vals]*N))

# @hpaulj's soln
In [676]: %%timeit
     ...: ao=a.astype(object)                                                
     ...: vo=vals.astype(object)                                             
     ...: np.add.at(ao, ixs, vo)
10000 loops, best of 3: 56.3 µs per loop

In [677]: %timeit char_add_at(a, ixs, vals)
10000 loops, best of 3: 72.6 µs per loop
案例2:将样本数据集放大1000倍

# @hpaulj's soln
In [679]: %%timeit
     ...: ao=a.astype(object)                                                
     ...: vo=vals.astype(object)                                             
     ...: np.add.at(ao, ixs, vo)
1000 loops, best of 3: 483 µs per loop

In [680]: %timeit char_add_at(a, ixs, vals)
1000 loops, best of 3: 364 µs per loop
案例3:将样本数据集放大10000x

# @hpaulj's soln
In [682]: %%timeit
     ...: ao=a.astype(object)                                                
     ...: vo=vals.astype(object)                                             
     ...: np.add.at(ao, ixs, vo)
100 loops, best of 3: 5.28 ms per loop

In [683]: %timeit char_add_at(a, ixs, vals)
100 loops, best of 3: 3.34 ms per loop

a
vals
中是否只有单个字符串?否-不幸的是,它们是混合长度的单词。感谢您的帮助!字符串连接是为Python字符串定义的,但不是为
numpy
字符串数据类型定义的。有时您可以通过对象数据类型数组获得字符串连接行为,但您必须在个案的基础上进行测试。非常感谢-在我的案例中使用对象数据类型很有效!在使用对象数据类型数组时会不会失去性能?@Divakar,是的,但有些性能比没有要好:)
numpy
没有任何快速的字符串特定操作
np.char
都使用Python字符串方法。简单性还不错!
# @hpaulj's soln
In [679]: %%timeit
     ...: ao=a.astype(object)                                                
     ...: vo=vals.astype(object)                                             
     ...: np.add.at(ao, ixs, vo)
1000 loops, best of 3: 483 µs per loop

In [680]: %timeit char_add_at(a, ixs, vals)
1000 loops, best of 3: 364 µs per loop
# @hpaulj's soln
In [682]: %%timeit
     ...: ao=a.astype(object)                                                
     ...: vo=vals.astype(object)                                             
     ...: np.add.at(ao, ixs, vo)
100 loops, best of 3: 5.28 ms per loop

In [683]: %timeit char_add_at(a, ixs, vals)
100 loops, best of 3: 3.34 ms per loop