Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/tensorflow/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
tensorflow.keras.preprocessing.text.Tokenizer.text\u to\u矩阵做什么?_Tensorflow_Keras - Fatal编程技术网

tensorflow.keras.preprocessing.text.Tokenizer.text\u to\u矩阵做什么?

tensorflow.keras.preprocessing.text.Tokenizer.text\u to\u矩阵做什么?,tensorflow,keras,Tensorflow,Keras,请解释是什么,结果是什么 from tensorflow.keras.preprocessing.text import Tokenizer tokenizer = Tokenizer(oov_token="<OOV>") sentences = [text] print(sentences) tokenizer.fit_on_texts(sentences) word_index = tokenizer.word_index sequences = toke

请解释是什么,结果是什么

from tensorflow.keras.preprocessing.text import Tokenizer
tokenizer = Tokenizer(oov_token="<OOV>")

sentences = [text]
print(sentences)
tokenizer.fit_on_texts(sentences)

word_index = tokenizer.word_index
sequences = tokenizer.texts_to_sequences(sentences)
matrix = tokenizer.texts_to_matrix(sentences)
print(word_index)
print(sequences)
print(matrix)
---
['The fool doth think he is wise, but the wise man knows himself to be a fool.']

# word_index
{'<OOV>': 1, 'the': 2, 'fool': 3, 'wise': 4, 'doth': 5, 'think': 6, 'he': 7, 'is': 8, 'but': 9, 'man': 10, 'knows': 11, 'himself': 12, 'to': 13, 'be': 14, 'a': 15}

# sequences
[[2, 3, 5, 6, 7, 8, 4, 9, 2, 4, 10, 11, 12, 13, 14, 15, 3]]

# matrix
[[0. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]
来自tensorflow.keras.preprocessing.text导入标记器的

标记器=标记器(oov_token=“”)
句子=[正文]
打印(句子)
标记器。在文本(句子)上匹配
word\u index=tokenizer.word\u index
序列=标记器。文本到序列(句子)
矩阵=标记器。文本到矩阵(句子)
打印(word_索引)
打印(序列)
打印(矩阵)
---
愚人自以为聪明,智者却知道自己是愚人
#单词索引
{':1,':2,'愚人':3,'智慧':4,'多思':5,'思考':6,'他':7,'是':8,'但是':9,'人':10,'知道':11,'他自己':12,'到':13,'是':14,'a':15}
#序列
[[2, 3, 5, 6, 7, 8, 4, 9, 2, 4, 10, 11, 12, 13, 14, 15, 3]]
#母体
[[0. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]
在二进制模式(默认模式)下,它指示从所学词汇中提取的单词在输入文本中。您已将您的标记器训练为

['The fool doth think he is wise, but the wise man knows himself to be a fool.']
因此,当您将同一文本转换为矩阵时,它将包含除
OOV
之外的所有单词(由
1
表示)-因为所有单词都是已知的-因此结果向量1的位置为0(请参见
word\u index
),并且由于从1开始枚举单词,因此0始终为0

一些示例

tokenizer.text_to_矩阵(['foo'])
#这篇文章中只有OOV
数组([[0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,。,
0.]])
tokenizer.text\u to\u矩阵(['he'])
#已知单词,两次(不管多久一次)
数组([[0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,。,
0.]])
tokenizer.text\u to\u矩阵(['thedool'])
数组([[0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,。,
0.]])
其他MOD

其他MOD更为清晰

  • 计数-词汇表中的一个单词在文本中出现了多少次
tokenizer.text_to_矩阵(['He,He the fool'],mode=“count”)
数组([[0,0,1,1,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,。,
0.]])
  • freq-总和标准化为1.0的计数
tokenizer.text_to_matrix(['he the fool'],mode=“freq”)
数组([[0,0,0.25,0.25,0,0,0,0,0,0.5,0,0,0,0,0,0,,
0.  , 0.  , 0.  , 0.  , 0.  , 0.  ]])
tokenizer.text_to_matrix(['he the fool'],mode=“tfidf”)
数组([[0,0,0.84729786,0.84729786,0,
0.        , 0.        , 1.43459998, 0.        , 0.        ,
0.        , 0.        , 0.        , 0.        , 0.        ,
0.        , 0.        ]])
在二进制模式(默认模式)下,它指示从所学词汇中提取的单词在输入文本中。您已经对标记器进行了相关培训

['The fool doth think he is wise, but the wise man knows himself to be a fool.']
因此,当您将同一文本转换为矩阵时,它将包含除
OOV
之外的所有单词(由
1
表示)-因为所有单词都是已知的-因此结果向量1的位置为0(请参见
word\u index
),并且由于从1开始枚举单词,因此0始终为0

一些示例

tokenizer.text_to_矩阵(['foo'])
#这篇文章中只有OOV
数组([[0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,。,
0.]])
tokenizer.text\u to\u矩阵(['he'])
#已知单词,两次(不管多久一次)
数组([[0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,。,
0.]])
tokenizer.text\u to\u矩阵(['thedool'])
数组([[0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,。,
0.]])
其他MOD

其他MOD更为清晰

  • 计数-词汇表中的一个单词在文本中出现了多少次
tokenizer.text_to_矩阵(['He,He the fool'],mode=“count”)
数组([[0,0,1,1,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,。,
0.]])
  • freq-总和标准化为1.0的计数
tokenizer.text_to_matrix(['he the fool'],mode=“freq”)
数组([[0,0,0.25,0.25,0,0,0,0,0,0.5,0,0,0,0,0,0,,
0.  , 0.  , 0.  , 0.  , 0.  , 0.  ]])
tokenizer.text_to_matrix(['he the fool'],mode=“tfidf”)
数组([[0,0,0.84729786,0.84729786,0,
0.        , 0.        , 1.43459998, 0.        , 0.        ,
0.        , 0.        , 0.        , 0.        , 0.        ,
0.        , 0.        ]])