Python 2.7 KeyError数据帧（编码索引）_Python 2.7_Indexing_Pandas_Encoding_Keyerror

Python 2.7 KeyError数据帧（编码索引）

python-2.7 indexing pandas encoding

Python 2.7 KeyError数据帧（编码索引）,python-2.7,indexing,pandas,encoding,keyerror,Python 2.7,Indexing,Pandas,Encoding,Keyerror,我正在运行下面的代码。它创建一对数据帧，将另一个具有会议名称列表的数据帧中的一列作为其索引 df_conf = pd.read_sql("select distinct Conference from publications where year>=1991 and length(conference)>1 order by conference", db) for index, row in df_conf.iterrows(): row

我正在运行下面的代码。它创建一对数据帧，将另一个具有会议名称列表的数据帧中的一列作为其索引

    df_conf = pd.read_sql("select distinct Conference from publications where year>=1991 and length(conference)>1 order by conference", db)

    for index, row in df_conf.iterrows():
            row[0]=row[0].encode("utf-8")

    df2= pd.DataFrame(index=df_conf['Conference'], columns=['Citation1991','Citation1992'])

    df2 = df2.fillna(0)
    df_if= pd.DataFrame(index=df_conf['Conference'], columns=['IF1994','IF1995'])

    df_if = df_if.fillna(0)

    df_pubs=pd.read_sql("select Conference, Year, count(*) as totalPubs from publications where year>=1991 group by conference, year", db)

    for index, row in df_pubs.iterrows():
        row[0]=row[0].encode("utf-8")

    df_pubs= df_pubs.pivot(index='Conference', columns='Year', values='totalPubs')
    df_pubs.fillna(0)

    for index, row in df2.iterrows():
        df_if.ix[index,'IF1994'] = df2.ix[index,'Citation1992'] / (df_pubs.ix[index,1992]+df_pubs.ix[index,1993])

最后一行不断给我以下错误：

KeyError: 'Analyse dynamischer Systeme in Medizin, Biologie und \xc3\x96kologie'

不太清楚我做错了什么。我试着对索引进行编码。这行不通。我甚至试过

。at

仍然不起作用

我知道这与编码有关，因为它总是在包含非ascii字符的索引处停止

我使用的是python 2.7，我认为问题在于：

for index, row in df_conf.iterrows():
    row[0]=row[0].encode("utf-8")

我很惊讶它没有发出警告

除此之外，将矢量化方法应用到序列中要快得多：

df_conf['col_name'] = df_conf['col_name'].str.encode('utf-8')

如果需要，还可以用类似的方式对索引进行编码：

df.index = df.index.str.encode('utf-8')

它发生在代码最后一部分的行中

df_if.ix[index,'IF1994'] = df2.ix[index,'Citation1992'] / (df_pubs.ix[index,1992]+df_pubs.ix[index,1993])

如果是这样，试试看

df_if.ix[index,u'IF1994'] = df2.ix[index,u'Citation1992'] / (df_pubs.ix[index,1992]+df_pubs.ix[index,1993])

这会奏效的。UTF8中的数据帧索引以奇怪的方式工作，即使脚本声明为“#--coding:UTF8--”。当您使用数据帧列并使用utf8字符串索引时，只需在utf8字符串中加上“u”

您确定它在编码吗？因为您的for循环看起来有问题，您可以尝试

df_conf[0]=df_conf[0]。应用（encode，'utf-8）

实际上，如果您运行的是pandas 0.15.0或更高版本，那么您应该更快地执行此操作

df_conf[0]=df_-conf[0].str.encode（'utf-8'）

这是矢量化的，你能试试

df_-conf[col_-name]=df_-conf[col_-name].str.encode（'utf-8'）

用你想编码的列替换

col_-name

，说你能编码你的索引吗？你的索引是str吗？你能试试

df_-conf.index=df.conf.index.str.encode吗（'utf-8'）

成功了！谢谢！为什么循环没有成功？是索引位还是

df_conf[col_name]

bit？循环不起作用的原因是，您正在对行和值进行迭代，但修改了数据的一个副本，因此原始df不会发生变化。要修改df，您应该直接进行列赋值或使用

.loc

、

.iloc

或

.ix

，请告诉我具体起作用的是什么，我会给出答案用样本数据解释我尝试了最后一个：

df.index=df.index.str.encode（'utf-8'）

并且得到了

AttributeError:“Index”对象没有属性“str”

Hmm。你的索引真的是str数据类型吗？如果不是，那就不起作用了，但我原来认为你的索引是str，如果不是，那么忽略这个代码段它是str，它是会议名称。如果你注意到问题中的关键错误，它会突出显示其中一个索引。您使用的版本是什么，因为它在0.16.1中运行良好