Python中逻辑回归的解包字典_Python_Dictionary_Logistic Regression_Sentiment Analysis

Python中逻辑回归的解包字典

python dictionary

Python中逻辑回归的解包字典,python,dictionary,logistic-regression,sentiment-analysis,Python,Dictionary,Logistic Regression,Sentiment Analysis,我正试图对产品评论进行一些情绪分析，但我在让我的模型阅读单词计数词典时被绊倒了 import pandas as pd import numpy as np from sklearn import linear_model, model_selection, metrics products = pd.read_csv('data.csv') def count_words(s): d = {} wl = str(s).split() for w in wl:

我正试图对产品评论进行一些情绪分析，但我在让我的模型阅读单词计数词典时被绊倒了

import pandas as pd  
import numpy as np   
from sklearn import linear_model, model_selection, metrics

products = pd.read_csv('data.csv')

def count_words(s):
   d = {}
   wl = str(s).split()
   for w in wl:
       d[w] = wl.count(w)
   return d

products['word_count'] = products['review'].apply(count_words)

products = products[products['rating'] != 3]
products['sentiment'] = (products['rating'] >= 4) * 1 

train_data, test_data = model_selection.train_test_split(products, test_size = 0.2, random_state=0)

sentiment_model = linear_model.LogisticRegression()
sentiment_model.fit(X = train_data['word_count'], y =train_data['sentiment'])

当我运行最后一行时，出现以下错误：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-51-0c3f47af3a6e> in <module>()
----> 1 sentiment_model.fit(X = train_data['word_count'], y = 
train_data['sentiment'])

C:\ProgramData\anaconda_3\lib\site-packages\sklearn\linear_model\logistic.py 
in fit(self, X, y, sample_weight)
   1171 
   1172         X, y = check_X_y(X, y, accept_sparse='csr', dtype=np.float64,
-> 1173                          order="C")
   1174         check_classification_targets(y)
   1175         self.classes_ = np.unique(y)

C:\ProgramData\anaconda_3\lib\site-packages\sklearn\utils\validation.py in 
check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
    519     X = check_array(X, accept_sparse, dtype, order, copy, force_all_finite,
    520                     ensure_2d, allow_nd, ensure_min_samples,
--> 521                     ensure_min_features, warn_on_dtype, estimator)
    522     if multi_output:
    523         y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,

C:\ProgramData\anaconda_3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    380                                       force_all_finite)
    381     else:
--> 382         array = np.array(array, dtype=dtype, order=order, copy=copy)
    383 
    384         if ensure_2d:

TypeError: float() argument must be a string or a number, not 'dict'

---------------------------------------------------------------------------
TypeError回溯（最近一次调用上次）
在（）
---->1情绪模型拟合（X=训练数据['word\u count'，，y=
列车_数据[“情绪”]）
C:\ProgramData\anaconda\u 3\lib\site packages\sklearn\linear\u model\logistic.py
合适（自身、X、y、样品重量）
1171
1172 X，y=check_X_y（X，y，accept_sparse='csr'，dtype=np.float64，
->1173 order=“C”）
1174检查分类目标（y）
1175 self.classes=np.unique（y）
C:\ProgramData\anaconda\u 3\lib\site packages\sklearn\utils\validation.py in
检查\u X\u y（X、y、接受\u稀疏、数据类型、顺序、复制、强制\u所有\u有限、确保\u 2d、允许\u nd、多\u输出、确保\u最小\u样本、确保\u最小\u特征、y\u数字、警告\u数据类型、估计器）
519 X=检查数组（X，接受稀疏，数据类型，顺序，复制，强制所有有限，
520确保2d，允许nd，确保最小样本，
-->521确保\u最小\u功能、警告\u数据类型、估计器）
522如果多输出：
523 y=检查数组（y，'csr'，强制所有有限=真，确保2d=假，
检查数组中的C:\ProgramData\anaconda\u 3\lib\site packages\sklearn\utils\validation.py（数组、接受稀疏、数据类型、顺序、复制、强制所有有限、确保2d、允许nd、确保最小样本、确保最小特征、警告数据类型、估计器）
380力（全部有限）
381其他：
-->382 array=np.array（array，dtype=dtype，order=order，copy=copy）
383
384如果确保\u 2d：
TypeError:float（）参数必须是字符串或数字，而不是“dict”

似乎模型将字典作为x变量而不是字典中的条目来提取。我想我需要将字典解压成数组（？），但我没有这么做的运气

更新：下面是运行word_count并定义情绪后的产品外观试试看

X = train_data['word_count'].values()

这应该返回

train\u data['word\u count']

中每个项目的字数（数字）列表，如果这是您要查找的

如果您只想纠正错误，请首先在

序列数据['word\u count']

上使用，将其转换为可接受的格式，即形状数组

[n\u样本，n\u特征]

将以下内容添加到您的代码中，然后再添加到

model.fit（）

：

然后按如下方式调用mootation\u model.fit（）：

sentiment_model.fit(X = train_data_dict, y =train_data['sentiment'])

注意：- 我建议您使用计数单词的方法，而不是实现自己的计数单词的方法

你能提供一个

data.csv

的最小样本吗？我得到了：TypeError:'numpy.ndarray'对象不是可调用的使用

。值

没有调用尝试过。值返回到：TypeError:float（）参数必须是字符串或数字，而不是'dict'，确定，然后在训练模型之前尝试

打印训练数据['word u count']

，和

类型（训练数据['word\u count']）

。它给了你什么，你能在这里复制一个样本吗？编辑Py 3时，需要在

print

stmt周围加括号。

sentiment_model.fit(X = train_data_dict, y =train_data['sentiment'])

from sklearn.feature_extraction.text import CountVectorizer

countVec = CountVectorizer()

train_data_vectorizer = countVec.fit_transform(train_data['review'])
sentiment_model.fit(X = train_data_vectorizer, y =train_data['sentiment'])