Python 在数据帧上运行函数 问题:
我有以下函数,可以在数据上正常运行。我想在data Chunck上运行它以提高内存效率: 脚本: 我将数据帧划分为10个块,将每个块与原始数据帧交叉连接,然后应用上面提供的函数:Python 在数据帧上运行函数 问题:,python,dataframe,Python,Dataframe,我有以下函数,可以在数据上正常运行。我想在data Chunck上运行它以提高内存效率: 脚本: 我将数据帧划分为10个块,将每个块与原始数据帧交叉连接,然后应用上面提供的函数: appended_data = [] chunk_size = int(df.shape[0] / 10) for start in list(range(0, df.shape[0], chunk_size)): df_subset = df.iloc[start:start + chunk_size]
appended_data = []
chunk_size = int(df.shape[0] / 10)
for start in list(range(0, df.shape[0], chunk_size)):
df_subset = df.iloc[start:start + chunk_size]
dfCart=cartesian_product(df_subset, df)
dfCartResult=feat(dfCart)
appended_data.append(dfCartResult)
dff = pd.concat(appended_data, axis=1)
错误:
---------------------------------------------------------------------------
KeyError回溯(最近一次呼叫最后一次)
get_loc中的~/.conda/envs/test_py3/lib/python3.6/site-packages/pandas/core/index/base.py(self、key、method、tolerance)
3077尝试:
->3078返回发动机。获取位置(钥匙)
3079键错误除外:
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi在pandas._libs.hashtable.PyObjectHashTable.get_item()中
pandas/_libs/hashtable_class_helper.pxi在pandas._libs.hashtable.PyObjectHashTable.get_item()中
KeyError:“MiddleName\u hamming\u距离”
在处理上述异常期间,发生了另一个异常:
KeyError回溯(最近一次呼叫最后一次)
集合中的~/.conda/envs/test_py3/lib/python3.6/site-packages/pandas/core/internals.py(自身、项目、值、检查)
4242请尝试:
->4243 loc=自身物品。获取物品位置(物品)
4244除键错误外:
get_loc中的~/.conda/envs/test_py3/lib/python3.6/site-packages/pandas/core/index/base.py(self、key、method、tolerance)
3079键错误除外:
->3080返回自我。引擎。获取位置(自我。可能施法索引器(键))
3081
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi在pandas._libs.hashtable.PyObjectHashTable.get_item()中
pandas/_libs/hashtable_class_helper.pxi在pandas._libs.hashtable.PyObjectHashTable.get_item()中
KeyError:“MiddleName\u hamming\u距离”
在处理上述异常期间,发生了另一个异常:
ValueError回溯(最近一次调用上次)
在里面
4 df_subset=df.iloc[start:start+chunk_size]
5 dfCart=笛卡尔乘积(df_子集,df)
---->6 dfCart结果=专长(dfCart)
7追加数据。追加(dfCartResult)
8 dff=pd.concat(附加数据,轴=1)
壮举(df)
8对于功能中的i\u名称:
9对于col中的col:
--->10 df[col+'''+i]=df[[col+''ux',col+'uy']].dropna().apply(lambda行:j(行[col+'ux'],行[col+'uy']),轴=1)
11 df[‘平均值’]=df.平均值(轴=1)
12 dft=df[df['平均值]>=38]
~/.conda/envs/test\u py3/lib/python3.6/site-packages/pandas/core/frame.py in\uuuuuuuuu setitem\uuuuuuuuu(self、key、value)
3117其他:
3118#设置列
->3119自我设置项目(键、值)
3120
3121定义设置项切片(自身、键、值):
~/.conda/envs/test_py3/lib/python3.6/site-packages/pandas/core/frame.py in_set_项(self、key、value)
3193自我确保有效索引(值)
3194 value=self.\u sanitize\u列(键,值)
->3195 NDFrame.\u设置\u项(自身、键、值)
3196
3197#检查我们是否正在修改副本
~/.conda/envs/test_py3/lib/python3.6/site-packages/pandas/core/generic.py in_set_项(self、key、value)
2598
2599定义设置项(自身、键、值):
->2600自身数据集(键、值)
2601自身。\u清除\u项目\u缓存()
2602
集合中的~/.conda/envs/test_py3/lib/python3.6/site-packages/pandas/core/internals.py(自身、项目、值、检查)
4244除键错误外:
4245#此项目不存在,只需在末尾插入即可
->4246自我插入(len(自我项目)、item、value)
4247返回
4248
插入中的~/.conda/envs/test_py3/lib/python3.6/site-packages/pandas/core/internals.py(self、loc、item、value、允许重复)
4345
4346块=生成块(值=值,ndim=self.ndim,
->4347位置=切片(loc,loc+1))
4348
4349对于blkno,以快速计数小整数(self.\u blknos[loc:]):
make_块中的~/.conda/envs/test_py3/lib/python3.6/site-packages/pandas/core/internals.py(值、位置、klass、ndim、数据类型、快速路径)
3203位置=位置,数据类型=数据类型)
3204
->3205返回klass(值,ndim=ndim,placement=placement)
3206
3207#待办事项:灵活,索引=无和/或项目=无
~/.conda/envs/test_py3/lib/python3.6/site-packages/pandas/core/internals.py in uuuuu init_uuuuuuu(self,values,placement,ndim)
2301
2302 super(ObjectBlock,self)。\uuuu init\uuuu(值,ndim=ndim,
->2303位置=位置)
2304
2305@property
~/.conda/envs/test_py3/lib/python3.6/site-packages/pandas/core/internals.py in uuuuu init_uuuuuuu(self,values,placement,ndim)
123升值错误(
124'传递的项目数错误{val},位置暗示'
-->125'{mgr}'。格式(val=len(self.values),mgr=len(self.mgr_locs)))
126
127定义检查ndim(自身、值、ndim):
ValueError:传递的项目数错误2,放置意味着1
我如何解决这个问题
appended_data = []
chunk_size = int(df.shape[0] / 10)
for start in list(range(0, df.shape[0], chunk_size)):
df_subset = df.iloc[start:start + chunk_size]
dfCart=cartesian_product(df_subset, df)
dfCartResult=feat(dfCart)
appended_data.append(dfCartResult)
dff = pd.concat(appended_data, axis=1)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/.conda/envs/test_py3/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3077 try:
-> 3078 return self._engine.get_loc(key)
3079 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'MiddleName_hamming_distance'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
~/.conda/envs/test_py3/lib/python3.6/site-packages/pandas/core/internals.py in set(self, item, value, check)
4242 try:
-> 4243 loc = self.items.get_loc(item)
4244 except KeyError:
~/.conda/envs/test_py3/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3079 except KeyError:
-> 3080 return self._engine.get_loc(self._maybe_cast_indexer(key))
3081
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'MiddleName_hamming_distance'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-110-b43ec93e6e45> in <module>
4 df_subset = df.iloc[start:start + chunk_size]
5 dfCart=cartesian_product(df_subset, df)
----> 6 dfCartResult=feat(dfCart)
7 appended_data.append(dfCartResult)
8 dff = pd.concat(appended_data, axis=1)
<ipython-input-72-cf2eb45ab3c6> in feat(df)
8 for i in features_names:
9 for col in cols:
---> 10 df[col+'_'+i]=df[[col+'_x',col+'_y']].dropna().apply(lambda row: j(row[col+'_x'],row[col+'_y']),axis=1)
11 df['Mean']=df.mean(axis=1)
12 dft=df[df['Mean']>=38]
~/.conda/envs/test_py3/lib/python3.6/site-packages/pandas/core/frame.py in __setitem__(self, key, value)
3117 else:
3118 # set column
-> 3119 self._set_item(key, value)
3120
3121 def _setitem_slice(self, key, value):
~/.conda/envs/test_py3/lib/python3.6/site-packages/pandas/core/frame.py in _set_item(self, key, value)
3193 self._ensure_valid_index(value)
3194 value = self._sanitize_column(key, value)
-> 3195 NDFrame._set_item(self, key, value)
3196
3197 # check if we are modifying a copy
~/.conda/envs/test_py3/lib/python3.6/site-packages/pandas/core/generic.py in _set_item(self, key, value)
2598
2599 def _set_item(self, key, value):
-> 2600 self._data.set(key, value)
2601 self._clear_item_cache()
2602
~/.conda/envs/test_py3/lib/python3.6/site-packages/pandas/core/internals.py in set(self, item, value, check)
4244 except KeyError:
4245 # This item wasn't present, just insert at end
-> 4246 self.insert(len(self.items), item, value)
4247 return
4248
~/.conda/envs/test_py3/lib/python3.6/site-packages/pandas/core/internals.py in insert(self, loc, item, value, allow_duplicates)
4345
4346 block = make_block(values=value, ndim=self.ndim,
-> 4347 placement=slice(loc, loc + 1))
4348
4349 for blkno, count in _fast_count_smallints(self._blknos[loc:]):
~/.conda/envs/test_py3/lib/python3.6/site-packages/pandas/core/internals.py in make_block(values, placement, klass, ndim, dtype, fastpath)
3203 placement=placement, dtype=dtype)
3204
-> 3205 return klass(values, ndim=ndim, placement=placement)
3206
3207 # TODO: flexible with index=None and/or items=None
~/.conda/envs/test_py3/lib/python3.6/site-packages/pandas/core/internals.py in __init__(self, values, placement, ndim)
2301
2302 super(ObjectBlock, self).__init__(values, ndim=ndim,
-> 2303 placement=placement)
2304
2305 @property
~/.conda/envs/test_py3/lib/python3.6/site-packages/pandas/core/internals.py in __init__(self, values, placement, ndim)
123 raise ValueError(
124 'Wrong number of items passed {val}, placement implies '
--> 125 '{mgr}'.format(val=len(self.values), mgr=len(self.mgr_locs)))
126
127 def _check_ndim(self, values, ndim):
ValueError: Wrong number of items passed 2, placement implies 1