R 将数据框从列中合并出来_R_Dataframe_Calculated Columns_Quanteda

R 将数据框从列中合并出来

r dataframe

R 将数据框从列中合并出来,r,dataframe,calculated-columns,quanteda,R,Dataframe,Calculated Columns,Quanteda,我有一个data.frame，有两个变量：ID和Text 我正在使用下面的文本分析命令，该命令提供48列的data.frame输出 analysis <- textstat_readability(mydata$text, measure = c("all"), remove_hyphens = TRUE) 但这需要永远的时间才能完成。您有10万条文本记录。根据您的系统和每条文本记录的大小，这可能需要一些时间。您可以尝试通过使用更多内核来加快进程。quanteda的大多数进程是并行运行

我有一个

data.frame

，有两个变量：

ID

和

Text

我正在使用下面的文本分析命令，该命令提供48列的

data.frame

输出

analysis <- textstat_readability(mydata$text,  measure = c("all"), remove_hyphens = TRUE)

但这需要永远的时间才能完成。

您有10万条文本记录。根据您的系统和每条文本记录的大小，这可能需要一些时间。您可以尝试通过使用更多内核来加快进程。quanteda的大多数进程是并行运行的，因此值得一试

尝试执行以下操作以查看是否加快了速度：

library(quanteda)
# use all available cores - 1
quanteda_options(threads = parallel::detectCores() - 1)

analyses <- textstat_readability(mydata$text[1:100000],  measure = c("all"), remove_hyphens = TRUE)

analyses <- cbind(mydata$text[1:100000], analyses)

库（quanteda）
#使用所有可用的核心-1
quanteda_选项（线程=parallel:：detectCores（）-1）
分析您有10万条文本记录。根据您的系统和每条文本记录的大小，这可能需要一些时间。您可以尝试通过使用更多内核来加快进程。quanteda的大多数进程是并行运行的，因此值得一试
尝试执行以下操作以查看是否加快了速度：
library(quanteda)
# use all available cores - 1
quanteda_options(threads = parallel::detectCores() - 1)

analyses <- textstat_readability(mydata$text[1:100000],  measure = c("all"), remove_hyphens = TRUE)

analyses <- cbind(mydata$text[1:100000], analyses)

库（quanteda）
#使用所有可用的核心-1
quanteda_选项（线程=parallel:：detectCores（）-1）
分析老实说，我不确定为什么你的方法要花很长时间才能完成，但我认为正确的方法是：
# (0.) Load the package and make a random sample dataset (usually this should be
# provided in the question, just saying):

library(quanteda)
mydata <- data.frame(ID = 1:100,
                     text = stringi::stri_rand_strings(
                       n = 100, 
                       length = runif(100, min=1, max=100), 
                       pattern = "[A-Za-z0-9]"),
                     stringsAsFactors = FALSE)

# 1. Make a quanteda corpus, where the ID is stored alongside the text column:

mydata_corpus <- corpus(mydata, docid_field = "ID", text_field = "text")

# 2. Then run the readability command:

`analysis <- textstat_readability(mydata_corpus,  measure = c("all"), remove_hyphens = TRUE)`

# 3. Now you can either keep this, or merge it with your original set based on
# IDs:

mydata_analysis <- merge(mydata, analysis, by.x = "ID", by.y = "document")

#（0.）加载包并生成随机样本数据集（通常应为
#在问题中提供，只是说）：
图书馆（quanteda）
mydata老实说，我不知道为什么你的方法要花很长时间才能完成，但我认为正确的方法是：
# (0.) Load the package and make a random sample dataset (usually this should be
# provided in the question, just saying):

library(quanteda)
mydata <- data.frame(ID = 1:100,
                     text = stringi::stri_rand_strings(
                       n = 100, 
                       length = runif(100, min=1, max=100), 
                       pattern = "[A-Za-z0-9]"),
                     stringsAsFactors = FALSE)

# 1. Make a quanteda corpus, where the ID is stored alongside the text column:

mydata_corpus <- corpus(mydata, docid_field = "ID", text_field = "text")

# 2. Then run the readability command:

`analysis <- textstat_readability(mydata_corpus,  measure = c("all"), remove_hyphens = TRUE)`

# 3. Now you can either keep this, or merge it with your original set based on
# IDs:

mydata_analysis <- merge(mydata, analysis, by.x = "ID", by.y = "document")

#（0.）加载包并生成随机样本数据集（通常应为
#在问题中提供，只是说）：
图书馆（quanteda）
我的数据