Python、sklearn、it idf如何按“拆分”####&引用；，默认空间_Python_Split_Scikit Learn_Tf Idf

Python、sklearn、it idf如何按“拆分”####&引用；，默认空间

python scikit-learn

Python、sklearn、it idf如何按“拆分”####&引用；，默认空间,python,split,scikit-learn,tf-idf,Python,Split,Scikit Learn,Tf Idf,使用sklean tf idf，默认使用空间分割 corpus = [ 'This is the first document.', 'This is the second second document.', 'And the third one.', 'Is this the first document?' ] vectorizer = CountVectorizer() X = vectorizer.fit_transform(corpus) 但是，我想

使用sklean tf idf，默认使用空间分割

corpus = [  
'This is the first document.',  
'This is the second second document.',  
'And the third one.',  
'Is this the first document?'
]    

vectorizer = CountVectorizer()   
X = vectorizer.fit_transform(corpus)

但是，我想用这个表格：

enter code herecorpus = [  
'This####is####the####first####document.',  
'This####is####the####second####second####document.'
]
vectorizer = CountVectorizer()   
X = vectorizer.fit_transform(corpus)
tfidf=transformer.fit_transform(vectorizer.fit_transform(documents))
word=vectorizer.get_feature_names()
weight=tfidf.toarray()

如何操作？

使用自定义标记器：

def four_pounds_tokenizer(s):
   return s.split('####')

vectorizer = CountVectorizer(tokenizer=four_pounds_tokenizer)
X = vectorizer.fit_transform(corpus)

传递您自己的标记器