Python NaiveBayesClassifier的输入列表结构是什么?
我制作了一个tweets的CSV文件,二元图和二进制标签如下。我想在上面运行Python NaiveBayesClassifier的输入列表结构是什么?,python,csv,nltk,naivebayes,Python,Csv,Nltk,Naivebayes,我制作了一个tweets的CSV文件,二元图和二进制标签如下。我想在上面运行NaiveBayesClassifier bigram,label I love,0 love you,0 I hate,1 hate you,1 ... 我读了很多页,没有找到正确的答案。下面的代码是我根据一些似乎有效的示例编写的,但我不知道应该如何修改它来分类我的CSV行(作为列表或字典输入) 请帮我一把。您需要将数据集加载到数据框中 import pandas as pd import numpy as np
NaiveBayesClassifier
bigram,label
I love,0
love you,0
I hate,1
hate you,1
...
我读了很多页,没有找到正确的答案。下面的代码是我根据一些似乎有效的示例编写的,但我不知道应该如何修改它来分类我的CSV行(作为列表或字典输入)
请帮我一把。您需要将数据集加载到数据框中
import pandas as pd
import numpy as np
df= pd.read_csv('yourcsvfilename.csv')
from sklearn.model_selection import train_test_split
feature_col = ['bigram']
predicted_class=['label']
X= df[feature_col].values
y= df[predicted_class].values
split_test_size = 0.30 #split your dataset into training set and test set 70%(training set) 30%(test set)
X_train, X_test, y_train, y_test = train_test_split (X,y, test_size=split_test_size, random_state= 42)
#Training on Naive Bayes Classifier
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb.fit(X_train, y_train.ravel())
from sklearn import metrics #for accuracy
gnb_train_predict= gnb.predict(X_train)
print("Accuracy= ", metrics.accuracy_score(y_train, gnb_train_predict))
#for test set
gnb_test_predict= gnb.predict(X_test)
print("Accuracy= ", metrics.accuracy_score(y_test, gnb_test_predict))
感谢您的提示性回答,但它有一个错误:KeyError:“没有[Index(['label'],dtype='object')]在[columns]中”通过运行df.info()检查列名“label”中是否有空格我忘了提到代码的准确性是针对您的训练集的。为了让您的测试集``gnb_test_predict=gnb.predict(X_test)打印(“accurity=”,metrics.accurity_score(y_test,gnb_test_predict))``欢迎任何其他编辑CSV文件结构的建议:)没有空格,我删除了逗号后面的所有空格:“bigram,label”-->“bigram,label”。但仍然:ValueError:无法将字符串转换为浮点:“我爱”
import pandas as pd
import numpy as np
df= pd.read_csv('yourcsvfilename.csv')
from sklearn.model_selection import train_test_split
feature_col = ['bigram']
predicted_class=['label']
X= df[feature_col].values
y= df[predicted_class].values
split_test_size = 0.30 #split your dataset into training set and test set 70%(training set) 30%(test set)
X_train, X_test, y_train, y_test = train_test_split (X,y, test_size=split_test_size, random_state= 42)
#Training on Naive Bayes Classifier
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb.fit(X_train, y_train.ravel())
from sklearn import metrics #for accuracy
gnb_train_predict= gnb.predict(X_train)
print("Accuracy= ", metrics.accuracy_score(y_train, gnb_train_predict))
#for test set
gnb_test_predict= gnb.predict(X_test)
print("Accuracy= ", metrics.accuracy_score(y_test, gnb_test_predict))