Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/perl/10.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 我使用sklearn的FeatureHasher作为DecisionTreeGressor。现在我如何解码回归器的预测结果?_Python_Scikit Learn_Random Forest_Feature Extraction - Fatal编程技术网

Python 我使用sklearn的FeatureHasher作为DecisionTreeGressor。现在我如何解码回归器的预测结果?

Python 我使用sklearn的FeatureHasher作为DecisionTreeGressor。现在我如何解码回归器的预测结果?,python,scikit-learn,random-forest,feature-extraction,Python,Scikit Learn,Random Forest,Feature Extraction,我想你需要稍微修改一下 一个可能的建议是手动散列文档中的所有字符串,并查看它们对应的列(功能) 一个简单的例子: dataset = pd.read_csv('ll.csv',index_col=0) dataset = dataset.dropna(axis=0) # features or independent variables x = pd.DataFrame() x['Skills'] = dataset['Skills'] x['Location'] = dataset['Loc

我想你需要稍微修改一下

一个可能的建议是手动散列文档中的所有字符串,并查看它们对应的列(功能)

一个简单的例子:

dataset = pd.read_csv('ll.csv',index_col=0)
dataset = dataset.dropna(axis=0)

# features or independent variables
x = pd.DataFrame()
x['Skills'] = dataset['Skills']
x['Location'] = dataset['Location']
x['Industry'] = dataset['Industry']
x['Experience'] = dataset['Experience']

# applying hashing
x_hash = copy.copy(x)

for i in range(x_hash.shape[1]):
    x_hash.iloc[:,i] = x_hash.iloc[:,i].astype('str')

x_hash = h.transform(x_hash.values)

#Dependent Variable
y=pd.DataFrame()

y['Functional Area'] = dataset['Functional Area']
y_hash = copy.copy(y)

for i in range(y_hash.shape[1]):
    y_hash.iloc[:,i] = y_hash.iloc[:,i].astype('str')

y_hash = h.transform(y_hash.values)

# Regressor
regressor = DecisionTreeRegressor(random_state=0)

ll = regressor.fit(x_hash.toarray(),y_hash.toarray())

# For predicting input features
input_df = pd.DataFrame()

input_df['Skills'] = ['Illustrator']

input_df['Experience'] = ['1-6']

input_df['Industry'] = ['IT - Software Services']

input_df['Location'] = ['Cairo-Egypt']

input_df_hash = copy.copy(input_df)

for i in range(input_df_hash.shape[1]):
    input_df_hash.iloc[:,i] = input_df_hash.iloc[:,i].astype('str')


input_df_hash = h.transform(input_df_hash.values)

sss=regressor.predict(input_df_hash.toarray())
输出:

from sklearn.feature_extraction import FeatureHasher, _hashing
import numpy as np

def hasher(string, n_features):
    """Hash a single string"""

    res = _hashing.transform([[(string, 1)]], dtype=int, n_features=n_features)
    return res[0][0]


n_features = 10

h = FeatureHasher(n_features=n_features)
D = [{'dog': 1, 'cat': 2, 'elephant': 4}, {'dog': 2, 'run': 5}, {'human': 4, 'desk': 10}]

word_to_ix = {word: hasher(word, n_features) for word in set().union(*(d.keys() for d in D))}
columns = ['-'.join([i for i in word_to_ix.keys() if word_to_ix[i] == ix]) for ix in range(n_features)]

f = h.transform(D)

print('Transformed features:\n', f.toarray())
print('Word to ix dictionary:\n', word_to_ix)
print('Columns:\n', columns)
请注意,此方法可以处理可能的碰撞

Transformed features:
 [[ 0.  0. -4. -1.  0.  0.  0.  0.  0.  2.]
 [ 0.  0.  0. -2. -5.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  4.  0.  0. 10.]]
Word to ix dictionary:
 {'dog': 3, 'desk': 9, 'elephant': 2, 'human': 6, 'run': 4, 'cat': 9}
Columns:
 ['', '', 'elephant', 'dog', 'run', '', 'human', '', '', 'desk-cat']