Python OneHotEncoder类型错误:不支持类型的转换:(dtype(';float64';),dtype(';O';)

Python OneHotEncoder类型错误:不支持类型的转换:(dtype(';float64';),dtype(';O';),python,numpy,pandas,Python,Numpy,Pandas,我对scikit学习和熊猫是完全陌生的。我有以下代码: cols = ['delay', 'month', 'day', 'dow', 'hour', 'distance', 'carrier', 'dest','year','origin','arr'] tp = read_csv('D:/CCP DS Nov 2014/smartfly/smartfly_historic_train.csv', iterator=True, chunksize=1000) data_2007 = co

我对scikit学习和熊猫是完全陌生的。我有以下代码:

cols = ['delay', 'month', 'day', 'dow', 'hour', 'distance', 'carrier', 'dest','year','origin','arr']

tp = read_csv('D:/CCP DS Nov 2014/smartfly/smartfly_historic_train.csv', iterator=True,  chunksize=1000) 
data_2007 = concat(tp, ignore_index=True) # df is DataFrame. If error do list(tp)
data_2007.columns = ['delay', 'month', 'day', 'dow', 'hour', 'distance', 'carrier',  'dest','year','origin','arr']
data_2007 = data_2007.dropna(subset=['delay'])
tp = read_csv('D:/CCP DS Nov 2014/smartfly/smartfly_historic_test.csv', iterator=True, chunksize=1000) 
categ = [cols.index(x) for x in ['month','day','dow','hour','distance','carrier','dest']]
enc = OneHotEncoder(categorical_features = categ,sparse=True)
df = data_2007.drop('delay', axis=1)
df['carrier'] = pd.factorize(df['carrier'])[0]
df['dest'] = pd.factorize(df['dest'])[0]
train_x = enc.fit_transform(df)
smartfly\u historical\u train.csv
中的示例记录如下所示

-5,8,11,7,10,361,US,CLT,2013,BWI,1132
我试图将分类变量如
US
CLT
转换为整数,以便输入
RandomForest
,但我得到以下错误:

TypeError: no supported conversion for types: (dtype('float64'), dtype('O')
堆栈错误跟踪:

File "D:\Meerkat\meerkat\ml_new1.py", line 58, in run_from_command_line
 train_x = enc.fit_transform(df)
  File "C:\Python33\lib\site-packages\sklearn\preprocessing\data.py", line 1054, in fit_transform
 self.categorical_features, copy=True)
  File "C:\Python33\lib\site-packages\sklearn\preprocessing\data.py", line 897, in _transform_selected
  return sparse.hstack((X_sel, X_not_sel))
  File "C:\Python33\lib\site-packages\scipy\sparse\construct.py", line 453, in hstack
  return bmat([blocks], format=format, dtype=dtype)
  File "C:\Python33\lib\site-packages\scipy\sparse\construct.py", line 583, in bmat
    dtype = upcast(*tuple([A.dtype for A in blocks[block_mask]]))
  File "C:\Python33\lib\site-packages\scipy\sparse\sputils.py", line 62, in upcast
    raise TypeError('no supported conversion for types: %r' % (args,))
TypeError: no supported conversion for types: (dtype('float64'), dtype('O'))

请考虑将错误的堆栈跟踪编辑到您的问题中。我正在尝试从这里运行代码。一段代码标记为[13]这里有点猜测,因为我没有使用OneHotEncoder的经验,但我认为它不喜欢您尝试将距离(很可能是连续变量)转换为分类变量。