Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/307.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用opby agg函数将集合转换为列表会导致';ValueError:函数不减少';_Python_Python 3.x_Pandas_Dataframe - Fatal编程技术网

Python 使用opby agg函数将集合转换为列表会导致';ValueError:函数不减少';

Python 使用opby agg函数将集合转换为列表会导致';ValueError:函数不减少';,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,有时,我似乎越是使用Python(和Pandas),我就越不了解它。所以,如果我只是看不到这里的树木,我道歉,但我一直在兜圈子,只是看不到我做错了什么 基本上,我有一个示例脚本(我想在更大的数据帧上实现),但我无法让它工作到我满意的程度 dataframe由各种数据类型的列组成。我想将dataframe分为两列,然后生成一个新的dataframe,其中包含每个组中每个变量的所有唯一值的列表。(最终,我想将列表项连接到一个字符串中,但这是一个不同的问题。) 我最初使用的脚本是: import nu

有时,我似乎越是使用Python(和Pandas),我就越不了解它。所以,如果我只是看不到这里的树木,我道歉,但我一直在兜圈子,只是看不到我做错了什么

基本上,我有一个示例脚本(我想在更大的数据帧上实现),但我无法让它工作到我满意的程度

dataframe由各种数据类型的列组成。我想将dataframe分为两列,然后生成一个新的dataframe,其中包含每个组中每个变量的所有唯一值的列表。(最终,我想将列表项连接到一个字符串中,但这是一个不同的问题。)

我最初使用的脚本是:

import numpy as np
import pandas as pd

def tempFuncAgg(tempVar):
    tempList = set(tempVar.dropna()) # Drop NaNs and create set of unique values
    print(tempList)
    return tempList

# Define dataframe
tempDF = pd.DataFrame({ 'id': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],
                        'date': ["02/04/2015 02:34","06/04/2015 12:34","09/04/2015 23:03","12/04/2015 01:00","15/04/2015 07:12","21/04/2015 12:59","29/04/2015 17:33","04/05/2015 10:44","06/05/2015 11:12","10/05/2015 08:52","12/05/2015 14:19","19/05/2015 19:22","27/05/2015 22:31","01/06/2015 11:09","04/06/2015 12:57","10/06/2015 04:00","15/06/2015 03:23","19/06/2015 05:37","23/06/2015 13:41","27/06/2015 15:43"],
                        'gender': ["male","female","female","male","male","female","female",np.nan,"male","male","female","male","female","female","male","female","male","female",np.nan,"male"],
                        'age': ["young","old","old","old","old","old",np.nan,"old","old","young","young","old","young","young","old",np.nan,"old","young",np.nan,np.nan]})

# Groupby based on 2 categorical variables
tempGroupby = tempDF.groupby(['gender','age'])

# Aggregate for each variable in each group using function defined above
dfAgg = tempGroupby.agg(lambda x: tempFuncAgg(x))
print(dfAgg)
此脚本的输出与预期一样:一系列包含值集的行和一个包含返回集的数据帧:

{'09/04/2015 23:03', '21/04/2015 12:59', '06/04/2015 12:34'}
{'01/06/2015 11:09', '12/05/2015 14:19', '27/05/2015 22:31', '19/06/2015 05:37'}
{'15/04/2015 07:12', '19/05/2015 19:22', '06/05/2015 11:12', '04/06/2015 12:57', '15/06/2015 03:23', '12/04/2015 01:00'}
{'02/04/2015 02:34', '10/05/2015 08:52'}
{2, 3, 6}
{18, 11, 13, 14}
{4, 5, 9, 12, 15, 17}
{1, 10}
                                                           date  \
gender age                                                        
female old    set([09/04/2015 23:03, 21/04/2015 12:59, 06/04...   
       young  set([01/06/2015 11:09, 12/05/2015 14:19, 27/05...   
male   old    set([15/04/2015 07:12, 19/05/2015 19:22, 06/05...   
       young          set([02/04/2015 02:34, 10/05/2015 08:52])   

                                      id  
gender age                                
female old                set([2, 3, 6])  
       young       set([18, 11, 13, 14])  
male   old    set([4, 5, 9, 12, 15, 17])  
       young                set([1, 10])  
当我尝试将集合转换为列表时,问题就出现了。奇怪的是,它生成了两个重复的行,其中包含相同的列表,但随后由于“ValueError:Function not reduce”错误而失败

def tempFuncAgg(tempVar):
    tempList = list(set(tempVar.dropna()))   # This is the only difference
    print(tempList)
    return tempList


tempDF = pd.DataFrame({ 'id': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],
                        'date': ["02/04/2015 02:34","06/04/2015 12:34","09/04/2015 23:03","12/04/2015 01:00","15/04/2015 07:12","21/04/2015 12:59","29/04/2015 17:33","04/05/2015 10:44","06/05/2015 11:12","10/05/2015 08:52","12/05/2015 14:19","19/05/2015 19:22","27/05/2015 22:31","01/06/2015 11:09","04/06/2015 12:57","10/06/2015 04:00","15/06/2015 03:23","19/06/2015 05:37","23/06/2015 13:41","27/06/2015 15:43"],
                        'gender': ["male","female","female","male","male","female","female",np.nan,"male","male","female","male","female","female","male","female","male","female",np.nan,"male"],
                        'age': ["young","old","old","old","old","old",np.nan,"old","old","young","young","old","young","young","old",np.nan,"old","young",np.nan,np.nan]})

tempGroupby = tempDF.groupby(['gender','age'])

dfAgg = tempGroupby.agg(lambda x: tempFuncAgg(x))
print(dfAgg)
但现在的结果是:

['09/04/2015 23:03', '21/04/2015 12:59', '06/04/2015 12:34']
['09/04/2015 23:03', '21/04/2015 12:59', '06/04/2015 12:34']
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
...
ValueError: Function does not reduce
如果您能帮我解决这个问题,我将不胜感激。如果您有什么明显的问题,我只是没有看到,我将提前道歉

编辑
顺便说一句,将集合转换为元组而不是列表是没有问题的。

列表有时在熊猫中会有奇怪的问题。您可以:

  • 使用元组(正如您已经注意到的)

  • 如果您确实需要列表,只需在第二个操作中执行以下操作:

    dfAgg.applymap(lambda x:list(x))

  • 完整示例:

    import numpy as np
    import pandas as pd
    
    def tempFuncAgg(tempVar):
        tempList = set(tempVar.dropna()) # Drop NaNs and create set of unique values
        print(tempList)
        return tempList
    
        # Define dataframe
        tempDF = pd.DataFrame({ 'id': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],
                                'date': ["02/04/2015 02:34","06/04/2015 12:34","09/04/2015 23:03","12/04/2015 01:00","15/04/2015 07:12","21/04/2015 12:59","29/04/2015 17:33","04/05/2015 10:44","06/05/2015 11:12","10/05/2015 08:52","12/05/2015 14:19","19/05/2015 19:22","27/05/2015 22:31","01/06/2015 11:09","04/06/2015 12:57","10/06/2015 04:00","15/06/2015 03:23","19/06/2015 05:37","23/06/2015 13:41","27/06/2015 15:43"],
                                'gender': ["male","female","female","male","male","female","female",np.nan,"male","male","female","male","female","female","male","female","male","female",np.nan,"male"],
                                'age': ["young","old","old","old","old","old",np.nan,"old","old","young","young","old","young","young","old",np.nan,"old","young",np.nan,np.nan]})
    
    # Groupby based on 2 categorical variables
    tempGroupby = tempDF.groupby(['gender','age'])
    
    # Aggregate for each variable in each group using function defined above
    dfAgg = tempGroupby.agg(lambda x: tempFuncAgg(x))
    
    # Transform in list
    dfAgg.applymap(lambda x: list(x))
    
    print(dfAgg)
    

    熊猫身上有很多这种奇怪的行为,一般来说,继续解决(像这样)总比找到一个完美的解决方案要好

    非常感谢你的回答,并确认我没有完全发疯。我想这是一个通过经验学到的陷阱。