Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/75.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 字典不受输入的影响?_R_Text Classification_Fasttext - Fatal编程技术网

R 字典不受输入的影响?

R 字典不受输入的影响?,r,text-classification,fasttext,R,Text Classification,Fasttext,fastrtextpackage中有一个get_dictionary函数,我认为它会返回字典中的所有单词。然而,当我将wordNgrams设置为2或3时,它返回的单词列表与我将wordNgrams设置为1时得到的单词列表完全相同。有人能告诉我这是怎么回事吗?谢谢 当你增加n个n克时,你的fasttext分类算法正在为所有的情况下在同一字典上工作。然而,不是训练单独的单词I,love,NY,而是训练单词I love,love NY的串联——这是一个二元结构。为了演示,我训练了5个五角星;当然,索引

fastrtextpackage中有一个get_dictionary函数,我认为它会返回字典中的所有单词。然而,当我将wordNgrams设置为2或3时,它返回的单词列表与我将wordNgrams设置为1时得到的单词列表完全相同。有人能告诉我这是怎么回事吗?谢谢

当你增加n个n克时,你的fasttext分类算法正在为所有的情况下在同一字典上工作。然而,不是训练单独的单词I,love,NY,而是训练单词I love,love NY的串联——这是一个二元结构。为了演示,我训练了5个五角星;当然,索引n-gram越大,计算时间越长,但句法结构捕获得越好

library(fastrtext)

data("train_sentences")
data("test_sentences")

# prepare data
tmp_file_model <- tempfile()

train_labels <- paste0("__label__", train_sentences[,"class.text"])
train_texts <- tolower(train_sentences[,"text"])
train_to_write <- paste(train_labels, train_texts)
train_tmp_file_txt <- tempfile()
writeLines(text = train_to_write, con = train_tmp_file_txt)

test_labels <- paste0("__label__", test_sentences[,"class.text"])
test_texts <- tolower(test_sentences[,"text"])
test_to_write <- paste(test_labels, test_texts)

# learn model 1 1-grams
library(microbenchmark)
microbenchmark(execute(commands = c("supervised", "-input", train_tmp_file_txt,
                     "-output", tmp_file_model, "-dim", 20, "-lr", 1,
                     "-epoch", 20, "-wordNgrams", 1, "-verbose", 1)), times = 5)

# mean time: 1.229228 seconds

model1 <- load_model(tmp_file_model)

# learn model 2 5-grams)
microbenchmark(execute(commands = c("supervised", "-input", train_tmp_file_txt,
                     "-output", tmp_file_model, "-dim", 20, "-lr", 1,
                     "-epoch", 20, "-wordNgrams", 5, "-verbose", 1)), times = 5)

# mean time: 2.659191

model2 <- load_model(tmp_file_model)
str(get_dictionary(model1))
# chr [1:5060] "the" "</s>" "of" "to" "and" "in" "a" "that" "is" "for" ...
str(get_dictionary(model2))
# chr [1:5060] "the" "</s>" "of" "to" "and" "in" "a" "that" "is" "for" ...

欢迎访问stackoverflow.com,请阅读