Python NaiveBayesClassifier的输入列表结构是什么?

Python NaiveBayesClassifier的输入列表结构是什么?,python,csv,nltk,naivebayes,Python,Csv,Nltk,Naivebayes,我制作了一个tweets的CSV文件,二元图和二进制标签如下。我想在上面运行NaiveBayesClassifier bigram,label I love,0 love you,0 I hate,1 hate you,1 ... 我读了很多页,没有找到正确的答案。下面的代码是我根据一些似乎有效的示例编写的,但我不知道应该如何修改它来分类我的CSV行(作为列表或字典输入) 请帮我一把。您需要将数据集加载到数据框中 import pandas as pd import numpy as np

我制作了一个tweets的CSV文件,二元图和二进制标签如下。我想在上面运行
NaiveBayesClassifier

bigram,label
I love,0
love you,0
I hate,1
hate you,1
...
我读了很多页,没有找到正确的答案。下面的代码是我根据一些似乎有效的示例编写的,但我不知道应该如何修改它来分类我的CSV行(作为列表或字典输入)


请帮我一把。

您需要将数据集加载到数据框中

import pandas as pd
import numpy as np

df= pd.read_csv('yourcsvfilename.csv')

from sklearn.model_selection  import train_test_split
feature_col = ['bigram']
predicted_class=['label']
X= df[feature_col].values
y= df[predicted_class].values
split_test_size = 0.30  #split your dataset into training set and test set 70%(training set) 30%(test set)

X_train, X_test, y_train, y_test = train_test_split (X,y, test_size=split_test_size, random_state=  42)

#Training on Naive Bayes Classifier
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb.fit(X_train, y_train.ravel())

from sklearn import metrics #for accuracy
gnb_train_predict= gnb.predict(X_train)
print("Accuracy= ", metrics.accuracy_score(y_train, gnb_train_predict))

#for test set
gnb_test_predict= gnb.predict(X_test)
print("Accuracy= ", metrics.accuracy_score(y_test, gnb_test_predict))

感谢您的提示性回答,但它有一个错误:KeyError:“没有[Index(['label'],dtype='object')]在[columns]中”通过运行df.info()检查列名“label”中是否有空格我忘了提到代码的准确性是针对您的训练集的。为了让您的测试集``gnb_test_predict=gnb.predict(X_test)打印(“accurity=”,metrics.accurity_score(y_test,gnb_test_predict))``欢迎任何其他编辑CSV文件结构的建议:)没有空格,我删除了逗号后面的所有空格:“bigram,label”-->“bigram,label”。但仍然:ValueError:无法将字符串转换为浮点:“我爱”
import pandas as pd
import numpy as np

df= pd.read_csv('yourcsvfilename.csv')

from sklearn.model_selection  import train_test_split
feature_col = ['bigram']
predicted_class=['label']
X= df[feature_col].values
y= df[predicted_class].values
split_test_size = 0.30  #split your dataset into training set and test set 70%(training set) 30%(test set)

X_train, X_test, y_train, y_test = train_test_split (X,y, test_size=split_test_size, random_state=  42)

#Training on Naive Bayes Classifier
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb.fit(X_train, y_train.ravel())

from sklearn import metrics #for accuracy
gnb_train_predict= gnb.predict(X_train)
print("Accuracy= ", metrics.accuracy_score(y_train, gnb_train_predict))

#for test set
gnb_test_predict= gnb.predict(X_test)
print("Accuracy= ", metrics.accuracy_score(y_test, gnb_test_predict))