相当于Python中R的sub和paste（字符串和数字的串联）_Python_Pandas

相当于Python中R的sub和paste（字符串和数字的串联）

python pandas

相当于Python中R的sub和paste（字符串和数字的串联）,python,pandas,Python,Pandas,以前，对于R，我使用sub和paste将字符串和数字连接在一起。我发现在Python中有点难。下面是Python中的示例代码 import pandas as pd from numpy.random import rand random.seed(1) testtt = round(pd.DataFrame(rand(5,4)),3) testtt.iloc[1,1] print(testtt) # 0 1 2 3 # 0 0.417

以前，对于R，我使用

sub

和

paste

将字符串和数字连接在一起。我发现在Python中有点难。下面是Python中的示例代码

import pandas as pd    
from numpy.random import rand
random.seed(1)
testtt = round(pd.DataFrame(rand(5,4)),3)
testtt.iloc[1,1]
print(testtt)

#        0      1      2      3
# 0  0.417  0.720  0.000  0.302
# 1  0.147  0.092  0.186  0.346
# 2  0.397  0.539  0.419  0.685
# 3  0.204  0.878  0.027  0.670
# 4  0.417  0.559  0.140  0.198

for i in range(testtt.shape[1]):
    for j in range(testtt.shape[0]):
        testtt.iloc[j,i] = str(i) + '_' + str(testtt.iloc[j,i],)


print(testtt)
#          0        1        2        3
# 0  0_0.417   1_0.72    2_0.0  3_0.302
# 1  0_0.147  1_0.092  2_0.186  3_0.346
# 2  0_0.397  1_0.539  2_0.419  3_0.685
# 3  0_0.204  1_0.878  2_0.027   3_0.67
# 4  0_0.417  1_0.559   2_0.14  3_0.198

实际上，我期待着在它下面的数字中添加列索引。正如您看到的，对于第一列，“0”被添加到该列下的所有元素中，对于第二列，“1”被添加，依此类推

我认为

for loops

并不是最好的方法，因为我的实际数据是一个90000*20个元素的矩阵，需要花费太多的时间来运行

这是我以前在R中编写的代码，速度要快得多，因为列的数量是20，并且它只在列中使用了一个短循环：

for (i in 1:(ncol(testtt))){
  testtt[,i] <- sub("^", paste(i,"_",sep = ""), testtt[,i] )
}

for（1中的i:（ncol（testtt）））{
testtt[，i]您的R代码片段翻译成熊猫，如下所示：
for i in range(len(testtt.columns)):
  testtt.iloc[: i] = str(i) + '_' + testtt.iloc[:, i].round(3).astype(str)

但是，更有效的解决方案是在数据帧中使用每个系列
的名称
属性，该属性基于数字列名为我们提供所需的前缀，并通过应用lambda（即匿名）函数来执行串联：
testtt = testtt.apply(lambda x: str(x.name) + '_' + x.round(3).astype(str))

pd.DataFrame.apply
方法一次作用于数据帧的一列（基于默认参数axis=0
；如果提供了axis=1
，它将按行工作），因此在这种情况下不需要“for”循环。
在Python中，字符串连接是通过添加来完成的。使用广播，您可以执行类似的操作
df.astype(str).radd(df.add_suffix('_').columns)

Out: 
         0        1        2        3
0  0_0.972  1_0.661  2_0.872  3_0.876
1  0_0.751  1_0.097  2_0.673  3_0.978
2  0_0.662  1_0.645  2_0.498  3_0.769
3  0_0.587  1_0.538  2_0.032  3_0.279
4  0_0.739  1_0.663  2_0.769  3_0.475

以下是它的工作原理：
add_suffix
方法在每个列名的末尾添加

df.add_suffix('_').columns
Out: Index(['0_', '1_', '2_', '3_'], dtype='object') 

现在，只需添加即可获得所需的输出。但是，如果将df添加到df.columns，则会得到以下结果：
df.add_suffix('_').columns + df.astype('str')
Out: 
Index([('0_0.972', '1_0.661', '2_0.872', '3_0.876'),
       ('0_0.751', '1_0.097', '2_0.673', '3_0.978'),
       ('0_0.662', '1_0.645', '2_0.498', '3_0.769'),
       ('0_0.587', '1_0.538', '2_0.032', '3_0.279'),
       ('0_0.739', '1_0.663', '2_0.769', '3_0.475')],
      dtype='object')

由于df.add_suffix（“uu”）.columns
是一个Index
对象，返回的对象也是Index。我们希望返回的对象是一个数据帧，所以我们对数据帧执行操作。radd
方法将df
添加到df.columns
的右侧
使用for循环可以实现相同的效果：
df = df.astype('str')
for col in df:
    df[col] = '{}_'.format(col) + df[col]