Python 3.x 数据帧连接错误

Python 3.x 数据帧连接错误,python-3.x,pandas,dataframe,concatenation,Python 3.x,Pandas,Dataframe,Concatenation,我有以下数据框: columnName | columnText | columnTextContents ---------------------------------------------------------------------------- Linda | [{age:45, category:technical}, | [{city:Mexico,type:member}]

我有以下数据框:

columnName |            columnText              |       columnTextContents
----------------------------------------------------------------------------
Linda      | [{age:45, category:technical},     | [{city:Mexico,type:member}]
           |   {age:55, category:nontechnical}] |  
----------------------------------------------------------------------------
Richeal    | [{age:65, category:technical}]     | [{city:Mexico,type:member}]
----------------------------------------------------------------------------
columnName | age  |  category    |  city    | type
--------------------------------------------------------
Linda      | 45   | technical    | Mexico   | member
--------------------------------------------------------
Linda      | 55   | nontechnical | NaN      | NaN
--------------------------------------------------------
Richeal    | 65   | technical    | Mexico   | member
--------------------------------------------------------
在上面的数据帧中,第二个和第三个数据帧是listOfDict,我想将其重新创建到下面的数据帧中:

columnName |            columnText              |       columnTextContents
----------------------------------------------------------------------------
Linda      | [{age:45, category:technical},     | [{city:Mexico,type:member}]
           |   {age:55, category:nontechnical}] |  
----------------------------------------------------------------------------
Richeal    | [{age:65, category:technical}]     | [{city:Mexico,type:member}]
----------------------------------------------------------------------------
columnName | age  |  category    |  city    | type
--------------------------------------------------------
Linda      | 45   | technical    | Mexico   | member
--------------------------------------------------------
Linda      | 55   | nontechnical | NaN      | NaN
--------------------------------------------------------
Richeal    | 65   | technical    | Mexico   | member
--------------------------------------------------------
我已经写了下面的一段代码,它没有生成预期的输出:

for k, v in zip(columnDataDF["columnText"].iteritems(), columnDataDF["columnTextContents"].iteritems()):
    tempDF = tempDF.append(pd.concat([pd.DataFrame.from_dict(k[1]), pd.DataFrame.from_dict(v[1])], axis=1))

columnDataDF = columnDataDF.drop('columnText', 1)
columnDataDF = columnDataDF.drop('columnTextContents', 1).join(tempDF)
下面是为上述代码生成的输出:

columnName | age  |  category    |  city    | type
--------------------------------------------------------
Linda      | 45   | technical    | Mexico   | member
--------------------------------------------------------
Linda      | 65   | technical    | Mexico   | member
--------------------------------------------------------
Richeal    |  55  | nontechnical | NaN      | NaN
--------------------------------------------------------

创建和修改
tempDF
时,会丢失原始索引。然后连接将失败,因为它与索引不匹配

解决这个问题的一种方法是手动跟踪索引,并将其分配给最终的
tempDF
。索引可以找到为
k[0]
v[0]
。 以下措施可能有效:

index = []
for k, v in zip(columnDataDF["columnText"].iteritems(),
                columnDataDF["columnTextContents"].iteritems()):
    index.extend([k[0]] * len(k[1]))
    tempDF = tempDF.append(pd.concat([pd.DataFrame.from_dict(k[1]),
                                      pd.DataFrame.from_dict(v[1])], 
                                     axis=1))
tempDF.index = index

欢迎来到堆栈溢出!请阅读和上的章节