Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/elixir/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在没有内存错误的情况下对3k类别的变量进行热编码_Python_Pandas_Deep Learning_One Hot Encoding - Fatal编程技术网

Python 如何在没有内存错误的情况下对3k类别的变量进行热编码

Python 如何在没有内存错误的情况下对3k类别的变量进行热编码,python,pandas,deep-learning,one-hot-encoding,Python,Pandas,Deep Learning,One Hot Encoding,我是一个热编码的变量,有超过3k的类别,并运行到MemoryError。我还有其他变量,我也是一个热门编码,但它们的类别较少。对于一个可以成功进行热编码的变量,我拥有的最大类别是935 我正在使用以下代码: from sklearn import preprocessing from sklearn.preprocessing import OneHotEncoder def onehot(featurename): onehot_encoder = OneHotEncoder(spa

我是一个热编码的变量,有超过3k的类别,并运行到MemoryError。我还有其他变量,我也是一个热门编码,但它们的类别较少。对于一个可以成功进行热编码的变量,我拥有的最大类别是935

我正在使用以下代码:

from sklearn import preprocessing
from sklearn.preprocessing import OneHotEncoder

def onehot(featurename):
    onehot_encoder = OneHotEncoder(sparse=False)
    onehot_encoded = onehot_encoder.fit_transform(df[featurename].reshape(-1, 1))
    trn_onehot_encoded = onehot_encoded[msk]
    val_onehot_encoded = onehot_encoded[~msk]
    return trn_onehot_encoded, val_onehot_encoded

trn_onehot_encoded_mt, val_onehot_encoded_mt = onehot('modality_type')
trn_onehot_encoded_mr, val_onehot_encoded_mr = onehot('roleid')
trn_onehot_encoded_sub, val_onehot_encoded_sub = onehot('subject')
trn_onehot_encoded_quartile, val_onehot_encoded_quartile = onehot('quartile')
trn_onehot_encoded_country, val_onehot_encoded_country = onehot('country_short')
trn_onehot_encoded_region, val_onehot_encoded_region = onehot('region')
trn_onehot_encoded_groupmemberornot, val_onehot_encoded_groupmemberornot = onehot('groupmemberornot')
trn_onehot_encoded_highlight, val_onehot_encoded_highlight = onehot('highlight_bin_new')
trn_onehot_encoded_note, val_onehot_encoded_note = onehot('note_bin_new')
trn_onehot_encoded_eid, val_onehot_encoded_eid = onehot('new_eid')
我对变量
new_eid
进行编码的最后一行代码是我获取
MemoryError
或一个死内核的代码

为了尝试解决此错误,我在函数
onehot()
中的
onehotcoder
中将字段
sparse
设置为
true

适合
Sparse=True
的代码如下:

<All the code above with Sparse=True>
mt = Input(shape=(trn_onehot_encoded_mt.shape[1],))
mr = Input(shape=(trn_onehot_encoded_mr.shape[1],))
sub = Input(shape=(trn_onehot_encoded_sub.shape[1],))
gmon = Input(shape=(trn_onehot_encoded_groupmemberornot.shape[1],))
region = Input(shape=(trn_onehot_encoded_region.shape[1],))
country = Input(shape=(trn_onehot_encoded_country.shape[1],))
highlight = Input(shape=(trn_onehot_encoded_highlight.shape[1],))
note = Input(shape=(trn_onehot_encoded_note.shape[1],))

#Model definition
x = merge([u, a], mode='concat')
x = Flatten()(x)
x = merge([x, mt], mode='concat')
x = merge([x, mr], mode='concat')
x = merge([x, sub], mode='concat')
x = merge([x, gmon], mode='concat')
x = merge([x, region], mode='concat')
x = merge([x, country], mode='concat')
x = merge([x, highlight], mode='concat')
x = merge([x, note], mode='concat')
x = Dense(1000, activation='relu')(x)
BatchNormalization()
Dropout(0.5)
x = Dense(200, activation='relu')(x)
BatchNormalization()
Dropout(0.5)
x = Dense(50, activation='relu')(x)
BatchNormalization()
x = Dense(2, activation='softmax')(x)
nn = Model([user_in, artifact_in, mt, mr, sub, gmon, region, country, highlight, note], x)
nn.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

def fit_nn(lr, bs):
    nn.optimizer.lr = lr        
    nn.fit([trn.member_id, 
        trn.artifact_id, 
        trn_onehot_encoded_mt, 
        trn_onehot_encoded_mr, 
        trn_onehot_encoded_sub, 
        trn_onehot_encoded_groupmemberornot, 
        trn_onehot_encoded_region, 
        trn_onehot_encoded_country,
        trn_onehot_encoded_highlight,
        trn_onehot_encoded_note], trn_onehot_encoded_quartile, 
       batch_size=bs, 
       epochs=1, 
       validation_data=([val.member_id, 
                         val.artifact_id, 
                         val_onehot_encoded_mt, 
                         val_onehot_encoded_mr, 
                         val_onehot_encoded_sub, 
                         val_onehot_encoded_groupmemberornot, 
                         val_onehot_encoded_region, 
                         val_onehot_encoded_country,
                         val_onehot_encoded_highlight,
                         val_onehot_encoded_note], val_onehot_encoded_quartile)
           )


bs = 10000
fit_nn(0.001, bs)

mt=输入(形状=(trn\u onehot\u编码的\u mt.shape[1],)
mr=输入(shape=(trn\u onehot\u encoded\u mr.shape[1],)
sub=输入(shape=(trn\u onehot\u encoded\u sub.shape[1],)
gmon=Input(shape=(trn\u onehot\u encoded\u groupmemberornot.shape[1],)
region=输入(shape=(trn\u onehot\u encoded\u region.shape[1],)
country=输入(shape=(trn\u onehot\u encoded\u country.shape[1],)
highlight=输入(shape=(trn\u onehot\u encoded\u highlight.shape[1],)
note=输入(shape=(trn\u onehot\u encoded\u note.shape[1],)
#模型定义
x=合并([u,a],mode='concat')
x=展平()(x)
x=合并([x,mt],mode='concat')
x=合并([x,mr],mode='concat')
x=合并([x,sub],mode='concat')
x=合并([x,gmon],mode='concat')
x=合并([x,区域],模式='concat')
x=合并([x,国家],模式='concat')
x=合并([x,突出显示],模式='concat')
x=合并([x,注意],mode='concat')
x=密度(1000,激活='relu')(x)
BatchNormalization()
辍学(0.5)
x=密度(200,活化='relu')(x)
BatchNormalization()
辍学(0.5)
x=密度(50,活化='relu')(x)
BatchNormalization()
x=密集(2,激活='softmax')(x)
nn=模型([用户输入、工件输入、mt、mr、sub、gmon、地区、国家、突出显示、注释],x)
编译(loss='classifical_crossentropy',optimizer='adam',metrics=['accurity'])
def配件(左后、右后):
nn.optimizer.lr=lr
nn.fit([trn.member\u id,
trn.artifact\u id,
trn\u onehot\u encoded\u mt,
trn_onehot_encoded_先生,
trn\u onehot\u encoded\u sub,
trn\u onehot\u encoded\u groupmemberornot,
trn_onehot_编码区,
trn_onehot_编码_国家,
trn_onehot_encoded_highlight,
trn\u onehot\u编码的\u注]、trn\u onehot\u编码的\u四分位,
批次大小=bs,
纪元=1,
验证_数据=([val.member_id,
val.artifact_id,
val_onehot_mt,
val_onehot先生,
val_onehot_encoded_sub,
val_onehot_encoded_groupmemberornot,
val_onehot_编码区,
瓦卢·奥涅霍特国家,
val_onehot_encoded_highlight,
val_onehot_编码(注),val_onehot_编码(四分位)
)
bs=10000
安装(0.001,bs)
但是,当我尝试拟合模型时,我得到以下错误:

Train on 2116850 samples, validate on 234276 samples
Epoch 1/1
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-32-8ce1b684763f> in <module>()
----> 1 fit_nn(0.001, bs)

<ipython-input-30-3e1be8cadb04> in fit_nn(lr, bs)
     23                          val_onehot_encoded_country,
     24                          val_onehot_encoded_highlight,
---> 25                          val_onehot_encoded_note], val_onehot_encoded_quartile)
     26            )

/home/prateek_dl/anaconda3/lib/python3.5/site-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
   1596                               initial_epoch=initial_epoch,
   1597                               steps_per_epoch=steps_per_epoch,
-> 1598                               validation_steps=validation_steps)
   1599 
   1600     def evaluate(self, x, y,

/home/prateek_dl/anaconda3/lib/python3.5/site-packages/keras/engine/training.py in _fit_loop(self, f, ins, out_labels, batch_size, epochs, verbose, callbacks, val_f, val_ins, shuffle, callback_metrics, initial_epoch, steps_per_epoch, validation_steps)
   1181                     batch_logs['size'] = len(batch_ids)
   1182                     callbacks.on_batch_begin(batch_index, batch_logs)
-> 1183                     outs = f(ins_batch)
   1184                     if not isinstance(outs, list):
   1185                         outs = [outs]

/home/prateek_dl/anaconda3/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py in __call__(self, inputs)
   2271         updated = session.run(self.outputs + [self.updates_op],
   2272                               feed_dict=feed_dict,
-> 2273                               **self.session_kwargs)
   2274         return updated[:len(self.outputs)]
   2275 

/home/prateek_dl/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    893     try:
    894       result = self._run(None, fetches, feed_dict, options_ptr,
--> 895                          run_metadata_ptr)
    896       if run_metadata:
    897         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/home/prateek_dl/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1091             feed_handles[subfeed_t] = subfeed_val
   1092           else:
-> 1093             np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
   1094 
   1095           if (not is_tensor_handle_feed and

/home/prateek_dl/anaconda3/lib/python3.5/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
    480 
    481     """
--> 482     return array(a, dtype, copy=False, order=order)
    483 
    484 def asanyarray(a, dtype=None, order=None):

ValueError: setting an array element with a sequence.
对2116850个样本进行训练,对234276个样本进行验证
纪元1/1
---------------------------------------------------------------------------
ValueError回溯(最近一次调用上次)
在()
---->1个配件(0.001,bs)
安装(左后、右后)
23瓦卢·奥涅霍特国家,
24 val_onehot_encoded_高光,
--->25 val_onehot_编码(注),val_onehot_编码(四分位)
26            )
/home/prateek_dl/anaconda3/lib/python3.5/site-packages/keras/engine/training.py in fit(self、x、y、批量大小、历元、详细信息、回调、验证分割、验证数据、混洗、类权重、样本权重、初始历元、每历元的步数、验证步数、**kwargs)
1596初始纪元=初始纪元,
1597步/u历元=步/u历元,
->1598验证步骤=验证步骤)
1599
1600 def评估(自、x、y、,
/home/prateek\u dl/anaconda3/lib/python3.5/site-packages/keras/engine/training.py in\u-fit\u循环(self、f、ins、out\u标签、批量大小、历元、冗余、回调、val\u f、val\u-ins、无序、回调度量、初始历元、每个历元的步骤、验证步骤)
1181批次日志['size']=len(批次ID)
1182回调。在批处理开始时(批处理索引、批处理日志)
->1183 outs=f(批量输入)
1184如果不存在(输出,列表):
1185输出=[输出]
/home/prateek\u dl/anaconda3/lib/python3.5/site-packages/keras/backend/tensorflow\u backend.py in\uuuuuu调用(self,输入)
2271 updated=session.run(self.outputs+[self.updates\u op],
2272进刀盘=进刀盘,
->2273**self.session_-kwargs)
2274返回更新[:len(自输出)]
2275
/home/prateek_dl/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py in run(self、fetches、feed_dict、options、run_元数据)
893尝试:
894结果=self.\u运行(无、取数、输入、选项、,
-->895运行(元数据)
896如果运行\u元数据:
897 proto_data=tf_session.tf_GetBuffer(run_metadata_ptr)
/home/prateek_dl/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py in_run(self、handle、fetches、feed_dict、options、run_metadata)
1091进纸手柄[副进纸]=副进纸值
1092其他:
->1093 np_val=np.asarray(子进纸值,数据类型=子进纸类型)
1094
1095如果(非张量)为
/asarray中的home/prateek_dl/anaconda3/lib/python3.5/site-packages/numpy/core/numeric.py(a,数据类型,订单)
480
481     """
--> 482