Python 以流水线方式输出决策树_Python_Pipeline_Decision Tree

Python 以流水线方式输出决策树

python

Python 以流水线方式输出决策树,python,pipeline,decision-tree,Python,Pipeline,Decision Tree,您好，由于我不熟悉使用sklearn库的机器学习方法，我尝试将决策树合并到管道中，然后对模型进行预测和输出，但当我运行以下代码时，我得到了警告： “管道”对象没有属性“树” 因此，我想知道管道是否不支持树输出，以及如何解决这个问题？我也尝试过直接使用decision_tree类，但我得到了另一个警告：使用序列设置数组元素。我知道这似乎是因为我有不同维度的向量，但仍然不知道如何处理这种情况 from sklearn.feature_extraction.text import CountVec

您好，由于我不熟悉使用sklearn库的机器学习方法，我尝试将决策树合并到管道中，然后对模型进行预测和输出，但当我运行以下代码时，我得到了警告：

“管道”对象没有属性“树”

因此，我想知道管道是否不支持树输出，以及如何解决这个问题？我也尝试过直接使用decision_tree类，但我得到了另一个警告：使用序列设置数组元素。我知道这似乎是因为我有不同维度的向量，但仍然不知道如何处理这种情况

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.pipeline import Pipeline

from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree.export import export_text
from sklearn import tree


# a function that reads the corpus, tokenizes it and returns the documents
# and their labels
def read_corpus(corpus_file, use_sentiment):
    documents = []
    labels = []
    with open(corpus_file, encoding='utf-8') as f:
        for line in f:
            tokens = line.strip().split()

            documents.append(tokens[3:])

            if use_sentiment:
                # 2-class problem: positive vs negative
                labels.append( tokens[1] )
            else:
                # 6-class problem: books, camera, dvd, health, music, software
                labels.append( tokens[0] )

    return documents, labels

# a dummy function that just returns its input
def identity(x):
    return x

# read the data and split i into train and test
X, Y = read_corpus('/Users/dengchenglong/Downloads/trainset', use_sentiment=False)
split_point = int(0.75*len(X))
Xtrain = X[:split_point]
Ytrain = Y[:split_point]
Xtest = X[split_point:]
Ytest = Y[split_point:]

# let's use the TF-IDF vectorizer
tfidf = False

# we use a dummy function as tokenizer and preprocessor,
# since the texts are already preprocessed and tokenized.
if tfidf:
    vec = TfidfVectorizer(preprocessor = identity,
                          tokenizer = identity)
else:
    vec = CountVectorizer(preprocessor = identity,
                          tokenizer = identity)


# combine the vectorizer with a Naive Bayes classifier
classifier = Pipeline( [('vec', vec),
                        ('cls', tree.DecisionTreeClassifier())])


# train the classifier on the train dataset
decision_tree = classifier.fit(Xtrain, Ytrain)


# predict the labels of the test data 
Yguess = classifier.predict(Xtest)
tree.plot_tree(classifier.fit(Xtest, Ytest)) 
# report performance of the classifier
print(accuracy_score(Ytest, Yguess))
print(classification_report(Ytest, Yguess))

如果您尝试以下方法：

from sklearn.pipeline import make_pipeline

# combine the vectorizer with a Naive Bayes classifier
clf = DecisionTreeClassifier()
classifier = make_pipeline(vec,clf)

看起来，在使用管道之前，您必须启动您试图应用的模型。让我知道这是否有效，如果无效，它返回的错误。发件人：

以下示例：

不起作用。。。同样的警告仍然存在，似乎管道不支持树中的函数？编辑的答案。。。检查吗？无论如何，我相信如果你遵循我共享的第二个链接，你应该能够使树在你的管道中工作。如果这个新asnwer有任何问题，请告诉我