Python 向具有可变长度的数据帧添加新列

Python 向具有可变长度的数据帧添加新列,python,python-3.x,pandas,dataframe,series,Python,Python 3.x,Pandas,Dataframe,Series,关于将结果添加到现有数据帧,我有一个问题 if relevant_item != 'None' and relevant_item != 'Not in dict': items = relevant_item len_item = len(items) if len_item == 1: item_result = items if len_item == 2: two = items item_result

关于将结果添加到现有数据帧,我有一个问题

if relevant_item != 'None' and relevant_item != 'Not in dict':
    items = relevant_item
    len_item = len(items)

    if len_item == 1:
        item_result = items

    if len_item == 2:
        two = items
        item_result = some_method(two)

    if len_item == 3:
        threes = items
        item_result = some_method(three)

hash_in_dict_shopping.append(item_result)#new list of list

shops = pd.Series(hash_in_dict_shopping)
df_final['hash_in_shop'] = shops.values
在将新列表添加到现有数据帧时,我收到一条错误消息“ValueError:值的长度与索引的长度不匹配”,因此我想知道如何将新列表添加到新列,并在保持原始顺序的同时用“无”填充所有缺少的行值

过滤前的原始数据(约700行):

过滤相关项目的数据后(大约40行):

应用某些_方法后(从dict返回值):

数据框中包含所有700行的新列:

'None'
'None'
['fruit','green groceries']
'None'
'None'
'None'
['dry food', 'staples', 'legumes']
'None'
'None'
['dairy']

有两点需要注意:

  • 迭代序列时,不应忽略/跳过“无”/“不在目录中”行。新系列的长度必须与原始系列的长度相同
  • 您应该使用内置功能来按行应用函数。由于您不能使用矢量化功能(因为您的数据框包含
    列表
    对象),因此您可以使用带有自定义功能的
    pd.Series.apply
  • 下面是一个简单的例子:

    df = pd.DataFrame({'col': ['None', 'Not in dict', ['apple', 'banana', 'grapes'],
                               'None', ['mile'], 'Not in dict']})
    
    def calculated(x):
        try:
            if x in {'Not in dict', 'None'}:
                return None
        except TypeError:
            if len(x) == 1:
                return 2
            elif len(x) == 2:
                return 4
            else:
                return 6
    
    df['calc'] = df['col'].apply(calculated)
    
    print(df)
    
                           col  calc
    0                     None   NaN
    1              Not in dict   NaN
    2  [apple, banana, grapes]   6.0
    3                     None   NaN
    4                   [mile]   2.0
    5              Not in dict   NaN
    

    您是否尝试设置空数组,然后更改值(如果有的话)

    import numpy as np
    items = numpy.empty((len(DataFrame))
    items[:] = numpy.nan
    
    if relevant_item != 'None' and relevant_item != 'Not in dict':
    items[i] = relevant_item # supposing you have some so
    len_item = count_nonzero(np.isnan(items))
    
    if len_item == 1:
        item_result = items
    
    if len_item == 2:
        two = items
        item_result = some_method(two)
    
    if len_item == 3:
        threes = items
        item_result = some_method(three)
    
    这样,您的items数组的长度与数据帧的长度相同,您将不会得到该错误。如果NaN数组不合适,为什么不尝试numpy.zeros

    希望这有帮助

    'None'
    'None'
    ['fruit','green groceries']
    'None'
    'None'
    'None'
    ['dry food', 'staples', 'legumes']
    'None'
    'None'
    ['dairy']
    
    df = pd.DataFrame({'col': ['None', 'Not in dict', ['apple', 'banana', 'grapes'],
                               'None', ['mile'], 'Not in dict']})
    
    def calculated(x):
        try:
            if x in {'Not in dict', 'None'}:
                return None
        except TypeError:
            if len(x) == 1:
                return 2
            elif len(x) == 2:
                return 4
            else:
                return 6
    
    df['calc'] = df['col'].apply(calculated)
    
    print(df)
    
                           col  calc
    0                     None   NaN
    1              Not in dict   NaN
    2  [apple, banana, grapes]   6.0
    3                     None   NaN
    4                   [mile]   2.0
    5              Not in dict   NaN
    
    import numpy as np
    items = numpy.empty((len(DataFrame))
    items[:] = numpy.nan
    
    if relevant_item != 'None' and relevant_item != 'Not in dict':
    items[i] = relevant_item # supposing you have some so
    len_item = count_nonzero(np.isnan(items))
    
    if len_item == 1:
        item_result = items
    
    if len_item == 2:
        two = items
        item_result = some_method(two)
    
    if len_item == 3:
        threes = items
        item_result = some_method(three)