Python 熊猫：将行值设置为和索引号对应的字母表的字母？_Python_Pandas

Python 熊猫：将行值设置为和索引号对应的字母表的字母？

python pandas

Python 熊猫：将行值设置为和索引号对应的字母表的字母？,python,pandas,Python,Pandas,我有一个数据帧： a b c country 0 5 7 11 Morocco 1 5 9 9 Nigeria 2 6 2 13 Spain 我想添加一列e，它是与索引号对应的字母表中的字母，例如： a b c country e 0 5 7 11 Morocco A 1 5 9 9 Nigeria B 2 6 2 13 Spain

我有一个数据帧：

   a    b   c    country
0  5    7   11   Morocco
1  5    9   9    Nigeria
2  6    2   13   Spain

我想添加一列

，它是与索引号对应的字母表中的字母，例如：

   a    b   c    country    e
0  5    7   11   Morocco    A
1  5    9   9    Nigeria    B
2  6    2   13   Spain      C

我该怎么做？我试过：

 df['e'] = chr(ord('a') + df.index.astype(int))

但我得到：

TypeError: int() argument must be a string or a number, not 'Int64Index'

一种方法是将索引转换为

系列

，然后调用

apply

并传递

lambda

：

In[271]:
df['e'] = df.index.to_series().apply(lambda x: chr(ord('a') + x)).str.upper()
df

Out[271]: 
   a  b   c  country  e
0  5  7  11  Morocco  A
1  5  9   9  Nigeria  B
2  6  2  13    Spain  C

基本上，这里的错误是

df.index

属于

Int64Index

类型，

chr

函数不知道如何使用它，因此通过对

系列

调用

apply

，我们逐行迭代转换

我认为性能方面的列表理解会更快：

In[273]:
df['e'] = [chr(ord('a') + x).upper() for x in df.index]
df

Out[273]: 
   a  b   c  country  e
0  5  7  11  Morocco  A
1  5  9   9  Nigeria  B
2  6  2  13    Spain  C

计时

%timeit df.index.to_series().apply(lambda x: chr(ord('a') + x)).str.upper()
%timeit [chr(ord('a') + x).upper() for x in df.index]
1000 loops, best of 3: 491 µs per loop
100000 loops, best of 3: 19.2 µs per loop

这里的列表理解方法要快得多

这里有一个替代功能解决方案。假设您的国家/地区少于字母

from string import ascii_uppercase
from operator import itemgetter

df['e'] = itemgetter(*df.index)(ascii_uppercase)

print(df)

   a  b   c  country  e
0  5  7  11  Morocco  A
1  5  9   9  Nigeria  B
2  6  2  13    Spain  C

您可以使用

map

并从

df.index

获取值：

df['e']=map（chr，ord（'A'）+df.index.values）

如果进行速度比较：

# Edchum
%timeit df.index.to_series().apply(lambda x: chr(ord('A') + x))
10000 loops, best of 3: 135 µs per loop
%timeit [chr(ord('A') + x) for x in df.index]
100000 loops, best of 3: 7.38 µs per loop
# jpp
%timeit itemgetter(*df.index)(ascii_uppercase)
100000 loops, best of 3: 7.23 µs per loop
# Me
%timeit map(chr,ord('A') + df.index.values)
100000 loops, best of 3: 3.12 µs per loop

因此，

map

似乎速度更快，但可能是因为数据样本的长度

在到达字母Z后，我们如何让字母表重新启动？我尝试过这种方法，但在Z之后，它会出现在一些外国字符（“|”、”）甚至一些中文字符（我想）。到达Z后，我们如何让字母以AB、AC、AD重新启动？@Rich see