Python 3.x 数据帧连接错误
我有以下数据框:Python 3.x 数据帧连接错误,python-3.x,pandas,dataframe,concatenation,Python 3.x,Pandas,Dataframe,Concatenation,我有以下数据框: columnName | columnText | columnTextContents ---------------------------------------------------------------------------- Linda | [{age:45, category:technical}, | [{city:Mexico,type:member}]
columnName | columnText | columnTextContents
----------------------------------------------------------------------------
Linda | [{age:45, category:technical}, | [{city:Mexico,type:member}]
| {age:55, category:nontechnical}] |
----------------------------------------------------------------------------
Richeal | [{age:65, category:technical}] | [{city:Mexico,type:member}]
----------------------------------------------------------------------------
columnName | age | category | city | type
--------------------------------------------------------
Linda | 45 | technical | Mexico | member
--------------------------------------------------------
Linda | 55 | nontechnical | NaN | NaN
--------------------------------------------------------
Richeal | 65 | technical | Mexico | member
--------------------------------------------------------
在上面的数据帧中,第二个和第三个数据帧是listOfDict,我想将其重新创建到下面的数据帧中:
columnName | columnText | columnTextContents
----------------------------------------------------------------------------
Linda | [{age:45, category:technical}, | [{city:Mexico,type:member}]
| {age:55, category:nontechnical}] |
----------------------------------------------------------------------------
Richeal | [{age:65, category:technical}] | [{city:Mexico,type:member}]
----------------------------------------------------------------------------
columnName | age | category | city | type
--------------------------------------------------------
Linda | 45 | technical | Mexico | member
--------------------------------------------------------
Linda | 55 | nontechnical | NaN | NaN
--------------------------------------------------------
Richeal | 65 | technical | Mexico | member
--------------------------------------------------------
我已经写了下面的一段代码,它没有生成预期的输出:
for k, v in zip(columnDataDF["columnText"].iteritems(), columnDataDF["columnTextContents"].iteritems()):
tempDF = tempDF.append(pd.concat([pd.DataFrame.from_dict(k[1]), pd.DataFrame.from_dict(v[1])], axis=1))
columnDataDF = columnDataDF.drop('columnText', 1)
columnDataDF = columnDataDF.drop('columnTextContents', 1).join(tempDF)
下面是为上述代码生成的输出:
columnName | age | category | city | type
--------------------------------------------------------
Linda | 45 | technical | Mexico | member
--------------------------------------------------------
Linda | 65 | technical | Mexico | member
--------------------------------------------------------
Richeal | 55 | nontechnical | NaN | NaN
--------------------------------------------------------
创建和修改
tempDF
时,会丢失原始索引。然后连接将失败,因为它与索引不匹配
解决这个问题的一种方法是手动跟踪索引,并将其分配给最终的tempDF
。索引可以找到为k[0]
或v[0]
。
以下措施可能有效:
index = []
for k, v in zip(columnDataDF["columnText"].iteritems(),
columnDataDF["columnTextContents"].iteritems()):
index.extend([k[0]] * len(k[1]))
tempDF = tempDF.append(pd.concat([pd.DataFrame.from_dict(k[1]),
pd.DataFrame.from_dict(v[1])],
axis=1))
tempDF.index = index
欢迎来到堆栈溢出!请阅读和上的章节