Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/315.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/bash/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从包含文本(描述性)数据的pandas列中提取特征,并将其与其他特征相结合?_Python_Machine Learning_Scikit Learn_Feature Extraction_Tf Idf - Fatal编程技术网

Python 从包含文本(描述性)数据的pandas列中提取特征,并将其与其他特征相结合?

Python 从包含文本(描述性)数据的pandas列中提取特征,并将其与其他特征相结合?,python,machine-learning,scikit-learn,feature-extraction,tf-idf,Python,Machine Learning,Scikit Learn,Feature Extraction,Tf Idf,我有如下数据集(数据集中只有一行和一些列) 我想从文本列中提取特征。下面我使用tf idf方法 这就是我要绑的东西 from sklearn.feature_extraction.text import TfidfVectorizer tf = TfidfVectorizer(analyzer='word', ngram_range=(1,3), min_df = 0, stop_words = 'english') # Calculating tf-idf for summary colum

我有如下数据集(数据集中只有一行和一些列)

我想从文本列中提取特征。下面我使用tf idf方法

这就是我要绑的东西

from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer(analyzer='word', ngram_range=(1,3), min_df = 0, stop_words = 'english')

# Calculating tf-idf for summary column(only for single text) 
tfidf_matrix =  tf.fit_transform(raw_data['summary'][:1])
feature_names = tf.get_feature_names() 

print len(feature_names)

feature_names[50:70]

dense = tfidf_matrix.todense()
现在,我的第一个文本列summary得到了密集矩阵表示(仅适用于第一个文本数据)

我的问题是如何将其与数据集中的其他特性结合起来,以便将其用于模型

我需要将所有文本列合并到单个列中,然后计算tf idf值,还是需要分别计算每个文本列的tf idf值

参考以下链接:

from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer(analyzer='word', ngram_range=(1,3), min_df = 0, stop_words = 'english')

# Calculating tf-idf for summary column(only for single text) 
tfidf_matrix =  tf.fit_transform(raw_data['summary'][:1])
feature_names = tf.get_feature_names() 

print len(feature_names)

feature_names[50:70]

dense = tfidf_matrix.todense()