如何创建所有单词的字典，而不是Python中的字符串？_Python_Dictionary

如何创建所有单词的字典，而不是Python中的字符串？

python dictionary

如何创建所有单词的字典，而不是Python中的字符串？,python,dictionary,Python,Dictionary,我需要用Python创建字典，由文本中的所有单词组成，除了字母外，没有空格和符号。稍后我将需要创建矩阵M x N，其中M是原始文本中字符串的数量，N是字典中的单词数量。我的代码如下：也许你需要这个（如果我正确理解你需要什么）：在这种情况下，您将获得下一个数据帧： import re from collections import Counter text = 'Hello this is text - yes it is' text_list = re.split('[^a-z]+', tex

我需要用Python创建字典，由文本中的所有单词组成，除了字母外，没有空格和符号。稍后我将需要创建矩阵M x N，其中M是原始文本中字符串的数量，N是字典中的单词数量。我的代码如下：

也许你需要这个（如果我正确理解你需要什么）：

在这种情况下，您将获得下一个数据帧：

import re
from collections import Counter
text = 'Hello this is text - yes it is'
text_list = re.split('[^a-z]+', text.lower())
count = Counter(text_list)
df = pd.DataFrame(count, index=[0])

或者您可能需要下一个矢量化（但您需要什么值？）

在这种情况下，您将获得数据框，其中行将是文本中的行，列名称将是世界，值-频率（您可以在此处阅读）

可能的重复

import re
from collections import Counter
text = 'Hello this is text - yes it is'
text_list = re.split('[^a-z]+', text.lower())
count = Counter(text_list)
df = pd.DataFrame(count, index=[0])

hello   is  it  text    this    yes
   1     2   1   1        1      1

from sklearn.feature_extraction.text import TfidfVectorizer
text_list = []
with open('cat_sentences.txt') as f:
    for line in f:
        text_list.append(line.lower().replace('[^\w\s]',' '))
        print(text_list)
tfidf_v = TfidfVectorizer(min_df=1,stop_words= None)
X = tfidf_v.fit_transform(text_list)
data = pd.DataFrame(data=X.toarray(), columns=tfidf_v.get_feature_names(), index = text_list)