Python 如何在Pandas中组合/合并同一数据帧中的列？_Python_Pandas_Dataframe

Python 如何在Pandas中组合/合并同一数据帧中的列？

python pandas dataframe

Python 如何在Pandas中组合/合并同一数据帧中的列？,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个与此类似的数据帧： 0 1 2 3 4 5 0 1001 1 176 REMAINING US SOUTH 1 1002 1 176 REMAINING US SOUTH 我想做的是合并第3、4和5列，以创建包含第3、4和5列中所有数据的列期望输出： 0 1 2 3 0 1001 1 176 REMAINING US SOUTH

我有一个与此类似的数据帧：

       0    1   2   3           4   5
0   1001    1   176 REMAINING   US  SOUTH
1   1002    1   176 REMAINING   US  SOUTH

我想做的是合并第3、4和5列，以创建包含第3、4和5列中所有数据的列

期望输出：

       0    1   2   3           
0   1001    1   176 REMAINING US SOUTH
1   1002    1   176 REMAINING US SOUTH

我已经试过了

hbadef['6'] = hbadef[['3', '4', '5']].apply(lambda x: ''.join(x), axis=1)

但那没有成功

这是我实现时的堆栈跟踪

 hbadef['3'] = hbadef['3'] + ' ' +  hbadef['4'] + ' ' + hbadef['5']

堆栈跟踪：

TypeError                                 Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2524             try:
-> 2525                 return self._engine.get_loc(key)
   2526             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

KeyError: '3'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-62-2da6c35d6e89> in <module>()
----> 1 hbadef['3'] = hbadef['3'] + ' ' +  hbadef['4'] + ' ' + hbadef['5']
      2 # hbadef.drop(['4', '5'], axis=1)
      3 # hbadef.columns = ['MKTcode', 'Region']
      4 
      5 # pd.concat(

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2137             return self._getitem_multilevel(key)
   2138         else:
-> 2139             return self._getitem_column(key)
   2140 
   2141     def _getitem_column(self, key):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
   2144         # get column
   2145         if self.columns.is_unique:
-> 2146             return self._get_item_cache(key)
   2147 
   2148         # duplicate columns & possible reduce dimensionality

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
   1840         res = cache.get(item)
   1841         if res is None:
-> 1842             values = self._data.get(item)
   1843             res = self._box_item_values(item, values)
   1844             cache[item] = res

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
   3841 
   3842             if not isna(item):
-> 3843                 loc = self.items.get_loc(item)
   3844             else:
   3845                 indexer = np.arange(len(self.items))[isna(self.items)]

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2525                 return self._engine.get_loc(key)
   2526             except KeyError:
-> 2527                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2528 
   2529         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

KeyError: '3'

TypeError回溯（最近一次调用）
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc（）
pandas/_libs/hashtable_class_helper.pxi在pandas._libs.hashtable.Int64HashTable.get_item（）中
TypeError：需要一个整数
在处理上述异常期间，发生了另一个异常：
KeyError回溯（最近一次呼叫最后一次）
get\u loc中的~\AppData\Local\Continuum\anaconda3\lib\site packages\pandas\core\index\base.py（self、key、method、tolerance）
2524请尝试：
->2525返回发动机。获取位置（钥匙）
2526除键错误外：
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc（）
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc（）
KeyError:'3'
在处理上述异常期间，发生了另一个异常：
TypeError回溯（最近一次调用上次）
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc（）
pandas/_libs/hashtable_class_helper.pxi在pandas._libs.hashtable.Int64HashTable.get_item（）中
TypeError：需要一个整数
在处理上述异常期间，发生了另一个异常：
KeyError回溯（最近一次呼叫最后一次）
在（）
---->1 hbadef['3']=hbadef['3']+''+hbadef['4']+''+hbadef['5']
2#hbadef.下降（[4'，5'，轴=1）
3#hbadef.columns=['MKTcode'，'Region']
4.
5#pd.concat(
~\AppData\Local\Continuum\anaconda3\lib\site packages\pandas\core\frame.py in\uuuuuu getitem\uuuuuu（self，key）
2137返回自我。\u获取项目\u多级（键）
2138其他：
->2139返回self.\u getitem\u列（键）
2140
2141 def_getitem_列（自身，键）：
_getitem_列中的~\AppData\Local\Continuum\anaconda3\lib\site packages\pandas\core\frame.py（self，key）
2144#获取列
2145如果self.columns.u是唯一的：
->2146返回自我。获取项目缓存（密钥）
2147
2148#重复列和可能的降维
~\AppData\Local\Continuum\anaconda3\lib\site packages\pandas\core\generic.py在\u get\u item\u缓存中（self，item）
1840 res=cache.get（项）
1841如果res为无：
->1842 values=self.\u data.get（项目）
1843 res=自身。\框\项\值（项，值）
1844缓存[项目]=res
get中的~\AppData\Local\Continuum\anaconda3\lib\site packages\pandas\core\internals.py（self、item、fastpath）
3841
3842如果不是isna（项目）：
->3843 loc=自身项目。获取loc（项目）
3844其他：
3845索引器=np.arange（len（self.items））[isna（self.items）]
get\u loc中的~\AppData\Local\Continuum\anaconda3\lib\site packages\pandas\core\index\base.py（self、key、method、tolerance）
2525返回发动机。获取位置（钥匙）
2526除键错误外：
->2527返回self.\u引擎。获取self.\u loc（self.\u可能\u cast\u索引器（键））
2528
2529 indexer=self.get_indexer（[key]，method=method，tolerance=tolerance）
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc（）
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc（）
KeyError:'3'

我尝试过删除NaN值，但得到了类似的结果。我很困惑为什么这么简单的函数不能正常工作

我将接受一个答案，以便我们可以排序“关闭”此问题。这两个答案都可以接受并解决问题，我遇到的问题可能是应用程序错误，我必须独立于此问题解决。

您只需添加

hbadef['3'] += ' ' +  hbadef['4'] + ' ' + hbadef['5']

然后删除不需要的列

hbadef.drop(['4', '5'], axis=1, inplace=True)
>>> hbadef
    0   1   2   3
0   1001    1   176 REMAINING US SOUTH
1   1002    1   176 REMAINING US SOUTH

注意：如果列是整数，则使用

hbadef.loc[:, 3] += ' ' + hbadef.loc[:, 4] + ' ' + hbadef.loc[:, 5]
hbadef.drop([4, 5], axis=1, inplace=True)

使用

concat

agg

pd.concat(
    [df.iloc[:, :3], df.iloc[:, 3:].agg(' '.join, axis=1)], 
    axis=1, 
    ignore_index=True
)

      0  1    2                   3
0  1001  1  176  REMAINING US SOUTH
1  1002  1  176  REMAINING US SOUTH

我真的很喜欢使用

''。作为一种功能加入。@AmiTavory我最看重良好的体育精神，因为你的回答肯定也解决了这个问题。祝你好运：）谢谢你的快速回复！当我尝试实现此解决方案时，会出现多个类型和键错误。它说需要一个整数，所以我假设标题是一个字符？你能粘贴'headed.columns'吗？这是我使用list（df）[0,1,2,3,4,5]@CharlesD更新答案时的输出。LMK，如果这没有帮助。也许这也是因为第4列和第5列中有一些NaN值。谢谢！我尝试了两种答案，但不幸的是，我仍然会出错。我目前正在对其进行故障排除，但进展缓慢，因为我是熊猫队的新手，我不太明白错误消息试图告诉我什么。伙计，我希望我能像你和阿美一样有经验。所以我想你的两个答案可能都有效，错误就在我这边，如果是这样的话，我应该把两个都标记为正确吗？你只能接受一个，所以接受一个性能最好/最干净/闻起来最好的。。。如果你还不能决定，掷硬币；）@CharlesD可能会尝试df.columns=[str（c）代表df.columns中的c]
然后继续？同时，考虑接受CaldF速度的回答。