R中多行的文本摘要

R中多行的文本摘要,r,vector,purrr,sapply,lsa,R,Vector,Purrr,Sapply,Lsa,我有一组短文本文件,我可以将它们组合成一个datatest,这样每个文件都是一行 我试图使用LSAfun包,使用泛型函数参数genericSummary(text,k,split=c(“.”,“!”,“?”),min=5,breakdown=FALSE,…)总结内容。 这对于单个文本输入非常有效,但在我的情况下却不行。在软件包说明中,它说文本输入应该是“长度(text)=1的字符向量,指定要汇总的文本” 请看这个例子 # Generate a dataset example (text exam

我有一组短文本文件,我可以将它们组合成一个datatest,这样每个文件都是一行

我试图使用LSAfun包,使用泛型函数参数
genericSummary(text,k,split=c(“.”,“!”,“?”),min=5,breakdown=FALSE,…)总结内容。

这对于单个文本输入非常有效,但在我的情况下却不行。在软件包说明中,它说文本输入应该是“长度(text)=1的字符向量,指定要汇总的文本”

请看这个例子

# Generate a dataset example (text examples were copied from wikipedia): 
 
dd = structure(list(text = structure(1:2, .Label = c("Forest gardening, a forest-based food production system, is the world's oldest form of gardening.[1] Forest gardens originated in prehistoric times along jungle-clad river banks and in the wet foothills of monsoon regions. In the gradual process of families improving their immediate environment, useful tree and vine species were identified, protected and improved while undesirable species were eliminated. Eventually foreign species were also selected and incorporated into the gardens.[2]\n\nAfter the emergence of the first civilizations, wealthy individuals began to create gardens for aesthetic purposes. Ancient Egyptian tomb paintings from the New Kingdom (around 1500 BC) provide some of the earliest physical evidence of ornamental horticulture and landscape design; they depict lotus ponds surrounded by symmetrical rows of acacias and palms. A notable example of ancient ornamental gardens were the Hanging Gardens of Babylon—one of the Seven Wonders of the Ancient World —while ancient Rome had dozens of gardens.\n\nWealthy ancient Egyptians used gardens for providing shade. Egyptians associated trees and gardens with gods, believing that their deities were pleased by gardens. Gardens in ancient Egypt were often surrounded by walls with trees planted in rows. Among the most popular species planted were date palms, sycamores, fir trees, nut trees, and willows. These gardens were a sign of higher socioeconomic status. In addition, wealthy ancient Egyptians grew vineyards, as wine was a sign of the higher social classes. Roses, poppies, daisies and irises could all also be found in the gardens of the Egyptians.\n\nAssyria was also renowned for its beautiful gardens. These tended to be wide and large, some of them used for hunting game—rather like a game reserve today—and others as leisure gardens. Cypresses and palms were some of the most frequently planted types of trees.\n\nGardens were also available in Kush. In Musawwarat es-Sufra, the Great Enclosure dated to the 3rd century BC included splendid gardens. [3]\n\nAncient Roman gardens were laid out with hedges and vines and contained a wide variety of flowers—acanthus, cornflowers, crocus, cyclamen, hyacinth, iris, ivy, lavender, lilies, myrtle, narcissus, poppy, rosemary and violets[4]—as well as statues and sculptures. Flower beds were popular in the courtyards of rich Romans.", 
"The Middle Ages represent a period of decline in gardens for aesthetic purposes. After the fall of Rome, gardening was done for the purpose of growing medicinal herbs and/or decorating church altars. Monasteries carried on a tradition of garden design and intense horticultural techniques during the medieval period in Europe. Generally, monastic garden types consisted of kitchen gardens, infirmary gardens, cemetery orchards, cloister garths and vineyards. Individual monasteries might also have had a \"green court\", a plot of grass and trees where horses could graze, as well as a cellarer's garden or private gardens for obedientiaries, monks who held specific posts within the monastery.\n\nIslamic gardens were built after the model of Persian gardens and they were usually enclosed by walls and divided in four by watercourses. Commonly, the centre of the garden would have a reflecting pool or pavilion. Specific to the Islamic gardens are the mosaics and glazed tiles used to decorate the rills and fountains that were built in these gardens.\n\nBy the late 13th century, rich Europeans began to grow gardens for leisure and for medicinal herbs and vegetables.[4] They surrounded the gardens by walls to protect them from animals and to provide seclusion. During the next two centuries, Europeans started planting lawns and raising flowerbeds and trellises of roses. Fruit trees were common in these gardens and also in some, there were turf seats. At the same time, the gardens in the monasteries were a place to grow flowers and medicinal herbs but they were also a space where the monks could enjoy nature and relax.\n\nThe gardens in the 16th and 17th century were symmetric, proportioned and balanced with a more classical appearance. Most of these gardens were built around a central axis and they were divided into different parts by hedges. Commonly, gardens had flowerbeds laid out in squares and separated by gravel paths.\n\nGardens in Renaissance were adorned with sculptures, topiary and fountains. In the 17th century, knot gardens became popular along with the hedge mazes. By this time, Europeans started planting new flowers such as tulips, marigolds and sunflowers."
), class = "factor")), class = "data.frame", row.names = c(NA, 
-2L))


# This code is trying to generate the summary into another column:

dd$sum = genericSummary(dd$text,k=1) 


这在strsplit(text,split=split,fixed=T)中给出了一个错误:非字符参数

我相信这是因为使用了一个变量,而不是一个文本

我的预期输出是将为每行生成的摘要放在相应的第二列dd$sum中

我尝试使用
作为.vector(dd$text)
,但这不起作用。(我觉得它仍然将输出合并成一行)

我试着从purrr中读一些关于映射函数的内容,但在本例中无法应用它,我想知道是否有在r编程方面有经验的人可以提供帮助

此外,如果您知道使用文本摘要包(例如
lexrankr
)完成此部分的方法,这也会起作用。我在这里尝试了他们的代码,但仍然不起作用

谢谢

检查
类(dd$text)
。这是一个因素,而不是一个角色

以下工作:

library(dplyr)
library(purrr)
dd %>% 
  mutate(text = as.character(text)) %>%
  mutate(sum = map(text, genericSummary, k = 1))

非常感谢。如果您有时间,可以使用lexRankr帮助检查这段代码,找出它不起作用的原因吗?dd%>%mutate(text=as.character(text))%%>%mutate(top_3=map(lexRankr::lexRank(text,docId=rep(1,length(text)),n=3,continuous=TRUE))%%>%mutate(order_of_外观=map(order(as.integer)(gsub(“,”,top_3$sentenceId))))%%mutate(order___-top_3=map(top_外观的顺序,,“句子”)@Bahi8482
purr::map
的正确用法是
map(数据、函数、参数到函数)
,或者
map(数据、匿名函数)
。当您需要多次引用参数中的其他内容或数据时,后者在这里更好。因此对于第一个,您需要:
mutate(top\u 3=map(text,function(x))Lexrank::laxRank(x,docId=rep(1,length(x)),n=3,continuous=TRUE))
。感谢您的快速回复。我相信根据您的评论,第一个答案是正确的。第二个和第三个即使在我尝试更改它们之后也不起作用。参数的顺序不正确吗?
dd%>%mutate(text=as.character(text))%>%mutate(top_3=map(文本,函数(x)lexRank::lexRank(x,docId=rep(1,长度(x)),n=3,continuous=TRUE))%>%mutate(外观顺序=map(top_3$sentenceId,函数(x)顺序(as.integer(gsub(“,”,x“))%>%mutate(有序的top_3=map(外观顺序,函数(x)top_3[x,“句子”))
@Ben