Scikit learn tf-idf变换线性回归

Scikit learn tf-idf变换线性回归,scikit-learn,linear-regression,tf-idf,Scikit Learn,Linear Regression,Tf Idf,我有两个数据帧,前者在列中包含>700个预测器,后者包含一列。前者用作预测因子(所有值均为0和1,但由于稀疏性,大部分值为0),第二个用作模型训练和测试的响应。第一个是nameser,第二个是star 我使用以下方法进行tf-idf转换 from sklearn.feature_extraction.text import TfidfTransformer transformer = TfidfTransformer() A = transformer.fit_transform(ser)

我有两个数据帧,前者在列中包含>700个预测器,后者包含一列。前者用作预测因子(所有值均为0和1,但由于稀疏性,大部分值为0),第二个用作模型训练和测试的响应。第一个是name
ser
,第二个是
star

我使用以下方法进行tf-idf转换

from sklearn.feature_extraction.text import TfidfTransformer
transformer = TfidfTransformer()

A = transformer.fit_transform(ser)
以下显示打印(A)

我使用tf idf转换了吗?由于我有以下内容,我收到了我将在文章末尾发布的错误

star = pd.DataFrame({"star": star})
data = pd.concat([ser, star], axis = 1)

from sklearn.linear_model import LinearRegression

D = LinearRegression()

Dfit = D.fit(ser, star, sample_weight = A)
Dpred = D.predict(ser)
Dscore = D.score(ser,star)
print(Dscore)
错误

Traceback (most recent call last):
File "categories_model.py", line 67, in <module>
Dfit = D.fit(ser, star, sample_weight = A)
File "/opt/conda/lib/python2.7/site-packages/sklearn/linear_model/base.py", line 434, in fit
sample_weight=sample_weight)
File "/opt/conda/lib/python2.7/site-packages/sklearn/linear_model/base.py", line 127, in center_data
X_mean = np.average(X, axis=0, weights=sample_weight)
File "/opt/conda/lib/python2.7/site-packages/numpy/lib/function_base.py", line 937, in average
"1D weights expected when shapes of a and weights differ.")
TypeError: 1D weights expected when shapes of a and weights differ.
回溯(最近一次呼叫最后一次):
文件“categories_model.py”,第67行,在
Dfit=D.配合(ser、star、样品重量=A)
文件“/opt/conda/lib/python2.7/site packages/sklearn/linear_model/base.py”,第434行,以适合的形式
样品重量=样品重量)
文件“/opt/conda/lib/python2.7/site packages/sklearn/linear\u model/base.py”,第127行,在中心数据中
X_平均值=np平均值(X轴=0,权重=样本权重)
文件“/opt/conda/lib/python2.7/site packages/numpy/lib/function_base.py”,平均第937行
“当a的形状和权重不同时,预期1D权重。”)
TypeError:当a的形状和权重不同时,需要1D权重。

谁能帮我理解所有这些,以及如何改进代码?谢谢

错误源于变换矩阵的错位。这就解决了问题

Dfit = D.fit(A, star)

错误来自于变换矩阵的错位。这就解决了问题

Dfit = D.fit(A, star)