Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/71.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/assembly/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 将数据框从列中合并出来_R_Dataframe_Calculated Columns_Quanteda - Fatal编程技术网

R 将数据框从列中合并出来

R 将数据框从列中合并出来,r,dataframe,calculated-columns,quanteda,R,Dataframe,Calculated Columns,Quanteda,我有一个data.frame,有两个变量:ID和Text 我正在使用下面的文本分析命令,该命令提供48列的data.frame输出 analysis <- textstat_readability(mydata$text, measure = c("all"), remove_hyphens = TRUE) 但这需要永远的时间才能完成。您有10万条文本记录。根据您的系统和每条文本记录的大小,这可能需要一些时间。您可以尝试通过使用更多内核来加快进程。quanteda的大多数进程是并行运行

我有一个
data.frame
,有两个变量:
ID
Text
我正在使用下面的文本分析命令,该命令提供48列的
data.frame
输出

analysis <- textstat_readability(mydata$text,  measure = c("all"), remove_hyphens = TRUE)

但这需要永远的时间才能完成。

您有10万条文本记录。根据您的系统和每条文本记录的大小,这可能需要一些时间。您可以尝试通过使用更多内核来加快进程。quanteda的大多数进程是并行运行的,因此值得一试

尝试执行以下操作以查看是否加快了速度:

library(quanteda)
# use all available cores - 1
quanteda_options(threads = parallel::detectCores() - 1)

analyses <- textstat_readability(mydata$text[1:100000],  measure = c("all"), remove_hyphens = TRUE)

analyses <- cbind(mydata$text[1:100000], analyses)
库(quanteda)
#使用所有可用的核心-1
quanteda_选项(线程=parallel::detectCores()-1)

分析您有10万条文本记录。根据您的系统和每条文本记录的大小,这可能需要一些时间。您可以尝试通过使用更多内核来加快进程。quanteda的大多数进程是并行运行的,因此值得一试

尝试执行以下操作以查看是否加快了速度:

library(quanteda)
# use all available cores - 1
quanteda_options(threads = parallel::detectCores() - 1)

analyses <- textstat_readability(mydata$text[1:100000],  measure = c("all"), remove_hyphens = TRUE)

analyses <- cbind(mydata$text[1:100000], analyses)
库(quanteda)
#使用所有可用的核心-1
quanteda_选项(线程=parallel::detectCores()-1)

分析老实说,我不确定为什么你的方法要花很长时间才能完成,但我认为正确的方法是:

# (0.) Load the package and make a random sample dataset (usually this should be
# provided in the question, just saying):

library(quanteda)
mydata <- data.frame(ID = 1:100,
                     text = stringi::stri_rand_strings(
                       n = 100, 
                       length = runif(100, min=1, max=100), 
                       pattern = "[A-Za-z0-9]"),
                     stringsAsFactors = FALSE)

# 1. Make a quanteda corpus, where the ID is stored alongside the text column:

mydata_corpus <- corpus(mydata, docid_field = "ID", text_field = "text")

# 2. Then run the readability command:

`analysis <- textstat_readability(mydata_corpus,  measure = c("all"), remove_hyphens = TRUE)`

# 3. Now you can either keep this, or merge it with your original set based on
# IDs:

mydata_analysis <- merge(mydata, analysis, by.x = "ID", by.y = "document")
#(0.)加载包并生成随机样本数据集(通常应为
#在问题中提供,只是说):
图书馆(quanteda)

mydata老实说,我不知道为什么你的方法要花很长时间才能完成,但我认为正确的方法是:

# (0.) Load the package and make a random sample dataset (usually this should be
# provided in the question, just saying):

library(quanteda)
mydata <- data.frame(ID = 1:100,
                     text = stringi::stri_rand_strings(
                       n = 100, 
                       length = runif(100, min=1, max=100), 
                       pattern = "[A-Za-z0-9]"),
                     stringsAsFactors = FALSE)

# 1. Make a quanteda corpus, where the ID is stored alongside the text column:

mydata_corpus <- corpus(mydata, docid_field = "ID", text_field = "text")

# 2. Then run the readability command:

`analysis <- textstat_readability(mydata_corpus,  measure = c("all"), remove_hyphens = TRUE)`

# 3. Now you can either keep this, or merge it with your original set based on
# IDs:

mydata_analysis <- merge(mydata, analysis, by.x = "ID", by.y = "document")
#(0.)加载包并生成随机样本数据集(通常应为
#在问题中提供,只是说):
图书馆(quanteda)
我的数据