Python 向具有可变长度的数据帧添加新列
关于将结果添加到现有数据帧,我有一个问题Python 向具有可变长度的数据帧添加新列,python,python-3.x,pandas,dataframe,series,Python,Python 3.x,Pandas,Dataframe,Series,关于将结果添加到现有数据帧,我有一个问题 if relevant_item != 'None' and relevant_item != 'Not in dict': items = relevant_item len_item = len(items) if len_item == 1: item_result = items if len_item == 2: two = items item_result
if relevant_item != 'None' and relevant_item != 'Not in dict':
items = relevant_item
len_item = len(items)
if len_item == 1:
item_result = items
if len_item == 2:
two = items
item_result = some_method(two)
if len_item == 3:
threes = items
item_result = some_method(three)
hash_in_dict_shopping.append(item_result)#new list of list
shops = pd.Series(hash_in_dict_shopping)
df_final['hash_in_shop'] = shops.values
在将新列表添加到现有数据帧时,我收到一条错误消息“ValueError:值的长度与索引的长度不匹配”,因此我想知道如何将新列表添加到新列,并在保持原始顺序的同时用“无”填充所有缺少的行值
过滤前的原始数据(约700行):
过滤相关项目的数据后(大约40行):
应用某些_方法后(从dict返回值):
数据框中包含所有700行的新列:
'None'
'None'
['fruit','green groceries']
'None'
'None'
'None'
['dry food', 'staples', 'legumes']
'None'
'None'
['dairy']
有两点需要注意:
列表
对象),因此您可以使用带有自定义功能的pd.Series.apply
df = pd.DataFrame({'col': ['None', 'Not in dict', ['apple', 'banana', 'grapes'],
'None', ['mile'], 'Not in dict']})
def calculated(x):
try:
if x in {'Not in dict', 'None'}:
return None
except TypeError:
if len(x) == 1:
return 2
elif len(x) == 2:
return 4
else:
return 6
df['calc'] = df['col'].apply(calculated)
print(df)
col calc
0 None NaN
1 Not in dict NaN
2 [apple, banana, grapes] 6.0
3 None NaN
4 [mile] 2.0
5 Not in dict NaN
您是否尝试设置空数组,然后更改值(如果有的话)
import numpy as np
items = numpy.empty((len(DataFrame))
items[:] = numpy.nan
if relevant_item != 'None' and relevant_item != 'Not in dict':
items[i] = relevant_item # supposing you have some so
len_item = count_nonzero(np.isnan(items))
if len_item == 1:
item_result = items
if len_item == 2:
two = items
item_result = some_method(two)
if len_item == 3:
threes = items
item_result = some_method(three)
这样,您的items数组的长度与数据帧的长度相同,您将不会得到该错误。如果NaN数组不合适,为什么不尝试numpy.zeros
希望这有帮助
'None'
'None'
['fruit','green groceries']
'None'
'None'
'None'
['dry food', 'staples', 'legumes']
'None'
'None'
['dairy']
df = pd.DataFrame({'col': ['None', 'Not in dict', ['apple', 'banana', 'grapes'],
'None', ['mile'], 'Not in dict']})
def calculated(x):
try:
if x in {'Not in dict', 'None'}:
return None
except TypeError:
if len(x) == 1:
return 2
elif len(x) == 2:
return 4
else:
return 6
df['calc'] = df['col'].apply(calculated)
print(df)
col calc
0 None NaN
1 Not in dict NaN
2 [apple, banana, grapes] 6.0
3 None NaN
4 [mile] 2.0
5 Not in dict NaN
import numpy as np
items = numpy.empty((len(DataFrame))
items[:] = numpy.nan
if relevant_item != 'None' and relevant_item != 'Not in dict':
items[i] = relevant_item # supposing you have some so
len_item = count_nonzero(np.isnan(items))
if len_item == 1:
item_result = items
if len_item == 2:
two = items
item_result = some_method(two)
if len_item == 3:
threes = items
item_result = some_method(three)