Python 3.x 与'相关的错误;列表';没有属性的对象';下';
我正在做多个集群项目。在我当前的集群项目中,我得到了一个与我的tfidf_矢量器代码相关的错误 以下是我导入的文档:Python 3.x 与'相关的错误;列表';没有属性的对象';下';,python-3.x,cluster-analysis,attributeerror,tfidfvectorizer,Python 3.x,Cluster Analysis,Attributeerror,Tfidfvectorizer,我正在做多个集群项目。在我当前的集群项目中,我得到了一个与我的tfidf_矢量器代码相关的错误 以下是我导入的文档: description_1 = open('description1.txt', encoding="utf8").read().lower().split('\n') description_2 = open('description2.txt', encoding="utf8").read().lower().split('\n') description_3 = ope
description_1 = open('description1.txt',
encoding="utf8").read().lower().split('\n')
description_2 = open('description2.txt',
encoding="utf8").read().lower().split('\n')
description_3 = open('description3.txt',
encoding="utf8").read().lower().split('\n')
description_4 = open('description4.txt',
encoding="utf8").read().lower().split('\n')
description_5 = open('description5.txt',
encoding="utf8").read().lower().split('\n')
description_6 = open('description6.txt',
encoding="utf8").read().lower().split('\n')
description_7 = open('description7.txt',
encoding="utf8").read().lower().split('\n')
descriptions_on = (description_1, description_2, description_3,
description_4, description_5, description_6, description_7)
descriptions = []
for i in range(len(descriptions_on)):
item = descriptions_on[i]
descriptions.append(item)
AttributeError Traceback (most recent call
last)
<timed exec> in <module>
D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in
fit_transform(self, raw_documents, y)
1611 """
1612 self._check_params()
-> 1613 X = super(TfidfVectorizer,
self).fit_transform(raw_documents)
1614 self._tfidf.fit(X)
1615 # X is already a transformed view of raw_documents so
D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in
fit_transform(self, raw_documents, y)
1029
1030 vocabulary, X = self._count_vocab(raw_documents,
->1031 self.fixed_vocabulary_)
1032
1033 if self.binary:
D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in
_count_vocab(self, raw_documents, fixed_vocab)
941 for doc in raw_documents:
942 feature_counter = {}
--> 943 for feature in analyze(doc):
944 try:
945 feature_idx = vocabulary[feature]
D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in
<lambda>(doc)
327 tokenize)
328 return lambda doc: self._word_ngrams(
--> 329 tokenize(preprocess(self.decode(doc))),
stop_words)
330
331 else:
D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in
<lambda>(x)
255
256 if self.lowercase:
--> 257 return lambda x: strip_accents(x.lower())
258 else:
259 return strip_accents
AttributeError: 'list' object has no attribute 'lower'
print(descriptions)
然后我将这些文档组合起来:
description_1 = open('description1.txt',
encoding="utf8").read().lower().split('\n')
description_2 = open('description2.txt',
encoding="utf8").read().lower().split('\n')
description_3 = open('description3.txt',
encoding="utf8").read().lower().split('\n')
description_4 = open('description4.txt',
encoding="utf8").read().lower().split('\n')
description_5 = open('description5.txt',
encoding="utf8").read().lower().split('\n')
description_6 = open('description6.txt',
encoding="utf8").read().lower().split('\n')
description_7 = open('description7.txt',
encoding="utf8").read().lower().split('\n')
descriptions_on = (description_1, description_2, description_3,
description_4, description_5, description_6, description_7)
descriptions = []
for i in range(len(descriptions_on)):
item = descriptions_on[i]
descriptions.append(item)
AttributeError Traceback (most recent call
last)
<timed exec> in <module>
D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in
fit_transform(self, raw_documents, y)
1611 """
1612 self._check_params()
-> 1613 X = super(TfidfVectorizer,
self).fit_transform(raw_documents)
1614 self._tfidf.fit(X)
1615 # X is already a transformed view of raw_documents so
D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in
fit_transform(self, raw_documents, y)
1029
1030 vocabulary, X = self._count_vocab(raw_documents,
->1031 self.fixed_vocabulary_)
1032
1033 if self.binary:
D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in
_count_vocab(self, raw_documents, fixed_vocab)
941 for doc in raw_documents:
942 feature_counter = {}
--> 943 for feature in analyze(doc):
944 try:
945 feature_idx = vocabulary[feature]
D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in
<lambda>(doc)
327 tokenize)
328 return lambda doc: self._word_ngrams(
--> 329 tokenize(preprocess(self.decode(doc))),
stop_words)
330
331 else:
D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in
<lambda>(x)
255
256 if self.lowercase:
--> 257 return lambda x: strip_accents(x.lower())
258 else:
259 return strip_accents
AttributeError: 'list' object has no attribute 'lower'
print(descriptions)
问题出现在这些代码行的某个地方
from sklearn.feature_extraction.text import TfidfVectorizer
from spacy.lang.fr.stop_words import STOP_WORDS as fr_stop
from spacy.lang.en.stop_words import STOP_WORDS as en_stop
#from warnings import filterwarnings
#filterwarnings('ignore')
final_stopwords_list = list(fr_stop) + list(en_stop)
tfidf_vectorizer = TfidfVectorizer(max_df=0.90, max_features=200000,
min_df=0.10, stop_words=final_stopwords_list,
use_idf=True, tokenizer=tokenize_and_stem,
ngram_range=(1,3))
%time tfidf_matrix = tfidf_vectorizer.fit_transform(descriptions)
tokenizer=tokenize_and_-stem与一个函数“tokenize_and_-stem”相关,该函数已创建,但未包含在该问题的代码列表中,因为它不恰当
这是我从上面的代码中得到的错误消息:
description_1 = open('description1.txt',
encoding="utf8").read().lower().split('\n')
description_2 = open('description2.txt',
encoding="utf8").read().lower().split('\n')
description_3 = open('description3.txt',
encoding="utf8").read().lower().split('\n')
description_4 = open('description4.txt',
encoding="utf8").read().lower().split('\n')
description_5 = open('description5.txt',
encoding="utf8").read().lower().split('\n')
description_6 = open('description6.txt',
encoding="utf8").read().lower().split('\n')
description_7 = open('description7.txt',
encoding="utf8").read().lower().split('\n')
descriptions_on = (description_1, description_2, description_3,
description_4, description_5, description_6, description_7)
descriptions = []
for i in range(len(descriptions_on)):
item = descriptions_on[i]
descriptions.append(item)
AttributeError Traceback (most recent call
last)
<timed exec> in <module>
D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in
fit_transform(self, raw_documents, y)
1611 """
1612 self._check_params()
-> 1613 X = super(TfidfVectorizer,
self).fit_transform(raw_documents)
1614 self._tfidf.fit(X)
1615 # X is already a transformed view of raw_documents so
D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in
fit_transform(self, raw_documents, y)
1029
1030 vocabulary, X = self._count_vocab(raw_documents,
->1031 self.fixed_vocabulary_)
1032
1033 if self.binary:
D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in
_count_vocab(self, raw_documents, fixed_vocab)
941 for doc in raw_documents:
942 feature_counter = {}
--> 943 for feature in analyze(doc):
944 try:
945 feature_idx = vocabulary[feature]
D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in
<lambda>(doc)
327 tokenize)
328 return lambda doc: self._word_ngrams(
--> 329 tokenize(preprocess(self.decode(doc))),
stop_words)
330
331 else:
D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in
<lambda>(x)
255
256 if self.lowercase:
--> 257 return lambda x: strip_accents(x.lower())
258 else:
259 return strip_accents
AttributeError: 'list' object has no attribute 'lower'
print(descriptions)
结果:
description_1 = open('description1.txt',
encoding="utf8").read().lower().split('\n')
description_2 = open('description2.txt',
encoding="utf8").read().lower().split('\n')
description_3 = open('description3.txt',
encoding="utf8").read().lower().split('\n')
description_4 = open('description4.txt',
encoding="utf8").read().lower().split('\n')
description_5 = open('description5.txt',
encoding="utf8").read().lower().split('\n')
description_6 = open('description6.txt',
encoding="utf8").read().lower().split('\n')
description_7 = open('description7.txt',
encoding="utf8").read().lower().split('\n')
descriptions_on = (description_1, description_2, description_3,
description_4, description_5, description_6, description_7)
descriptions = []
for i in range(len(descriptions_on)):
item = descriptions_on[i]
descriptions.append(item)
AttributeError Traceback (most recent call
last)
<timed exec> in <module>
D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in
fit_transform(self, raw_documents, y)
1611 """
1612 self._check_params()
-> 1613 X = super(TfidfVectorizer,
self).fit_transform(raw_documents)
1614 self._tfidf.fit(X)
1615 # X is already a transformed view of raw_documents so
D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in
fit_transform(self, raw_documents, y)
1029
1030 vocabulary, X = self._count_vocab(raw_documents,
->1031 self.fixed_vocabulary_)
1032
1033 if self.binary:
D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in
_count_vocab(self, raw_documents, fixed_vocab)
941 for doc in raw_documents:
942 feature_counter = {}
--> 943 for feature in analyze(doc):
944 try:
945 feature_idx = vocabulary[feature]
D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in
<lambda>(doc)
327 tokenize)
328 return lambda doc: self._word_ngrams(
--> 329 tokenize(preprocess(self.decode(doc))),
stop_words)
330
331 else:
D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in
<lambda>(x)
255
256 if self.lowercase:
--> 257 return lambda x: strip_accents(x.lower())
258 else:
259 return strip_accents
AttributeError: 'list' object has no attribute 'lower'
print(descriptions)
[['\UFEFF成立于1991年蒙特利尔。dex是平价奢侈品领域的真正先驱,”,加勒比媒体与传播学院(carimac)新闻专业二年级学生萨萨娜·桑德森的客座帖子坐落在西印度群岛大学……“DaMeNETS提供了一种简单而可靠的个性化医学解决方案,更精确和DaMeNETS正在开发一种心理健康护理的第一诊断工具,这将使临床医生获得更多的……”DIDACTE授权培训供应商。在几分钟内,“,”di-o-matic:发现你最喜欢的cg角色背后的技术“,”proship facilite le Recrument des talents and la prét du preparation du come dués qui blouira vos invités,tout en assurant la gestion la total de la problitémes administratifis and contracuels.”,“,”dk spec spec participe au congress de montés de montéal sur le bois,du 20 au 22 mars,au fairmont le reineélisabeth.venez Neus rencontrer!”,“do networks limited致力于为具有丰富光通信线路专业知识的企业提供一站式服务,do networks limited整个团队致力于……”,“道格拉斯顾问公司(douglas consultants inc.)于1999年在《建筑设计》杂志上发表了一篇关于道格拉斯顾问公司的文章。“cooper目前担任dream、dream office reit、dream global reit、dream industrial reit和e-l financial corporation limited的董事会成员。p.jane gavan女士是dream的资产管理总裁,在房地产行业拥有30多年的经验。”,“dromadaire géo-innovations是一家公司,环境服务、企业利用géo-Localization pour faire l…”“电子产品专家委员会,魁北克省自动化和控制委员会,电子产品专家委员会”,邓迪可持续技术(dst)从事环保技术的开发和商业化,用于处理中的材料。您的文档是
列表,而不是字符串
因此,他们没有标记器的正确格式。你能试着打印说明
并给我们看一下样本吗?@UpasanaMittal绝对,我已经编辑了帖子…结果是打印的一部分(说明)。完整输出太长,无法发布。看起来这是列表而不是文本列表,tfidf需要文本列表。我基本上有7个文档,在文章开头发布,每个文档有100行文本。如果您将该文档的行转换为段落,可以吗?