Json R：将特定行转换为列_Json_R_Dplyr_Tidyr_Tidyverse

Json R：将特定行转换为列

json r

Json R：将特定行转换为列,json,r,dplyr,tidyr,tidyverse,Json,R,Dplyr,Tidyr,Tidyverse,我从json文件中导入了非常混乱的数据，看起来如下： raw_df <- data.frame(text = c(paste0('text', 1:3), '---------- OUTCOME LINE ----------', paste0('text', 4:6), '---------- OUTCOME LINE ----------'), demand = c('cat1', rep('', 2), 'info', 'c

我从json文件中导入了非常混乱的数据，看起来如下：

raw_df <- data.frame(text = c(paste0('text', 1:3), '---------- OUTCOME LINE ----------', paste0('text', 4:6), '---------- OUTCOME LINE ----------'),
                              demand = c('cat1', rep('', 2), 'info', 'cat2', rep('', 2), 'info2')
                     )



raw_df
                                text demand
1                              text1   cat1
2                              text2       
3                              text3       
4 ---------- OUTCOME LINE ----------   info
5                              text4   cat2
6                              text5       
7                              text6       
8 ---------- OUTCOME LINE ----------  info2

最快和最有效的方法是什么？感谢您的提示。

在这里，我们根据“文本”列中存在的

使用“grepl”创建一个逻辑索引，将“raw_df”子集以删除这些行，通过获得“indx”的累积和创建一个分组列，

将'
替换为NA
并使用NA.locf
填充以前的非NA值后，聚合到粘贴按“需求”分组的“文本”列。然后，通过使用“indx”子集，从“需求”中创建“结果”
indx <- grepl("-", raw_df$text)
transform(aggregate(text~demand, transform(raw_df[!indx,], 
  demand = zoo::na.locf(replace(demand, demand=="", NA))), toString),
    outcome = raw_df$demand[indx])
#  demand                text outcome
#1   cat1 text1, text2, text3    info
#2   cat2 text4, text5, text6   info2

一个dplyr
和tidyr
解决方案：
raw_df %>% 
    mutate(outcome = demand,
           demand = replace(demand, demand == '', NA),
           outcome = replace(outcome, outcome == '', NA),
           outcome = gsub("^cat\\d+", NA, outcome)) %>% 
    fill(demand) %>% 
    fill(outcome, .direction = "up") %>% 
    filter(!grepl("-----", text)) %>%
    group_by(demand, outcome) %>% 
    summarize(text = gsub(",", "\\.", toString(text))) %>% 
    select(text, everything())


根据需要调整要显示的文本，将空格替换为NA
s，并准备结果列
填充
默认向下方向的需求
列，以及向上方向的结果列
根据其连字符过滤出----结果行---

为文本
列生成分组
，然后将默认的，
替换为

按所需顺序选择列


#一个tible:2x3
#分组：需求[2]
文本需求结果
1文本1。文本2。text3 cat1信息
2文本4。文本5。text6第2类信息2

两种解决方案都非常有效，谢谢！然而，toString
使用逗号作为文本字段之间的分隔符，有没有办法将其改为句号？@KasiaKulma对不起，我想这没关系，你可以使用text=paste（text，collapse=“.”）来解决这个问题！非常可读，还教我了dplyr
中的fill（）
，非常感谢！太棒了！我让tidyverse线模糊了一点：fill
确实是一个tidyr函数。谢谢你的轻推！
library(data.table)
setDT(raw_df)[demand == "", demand := NA][!indx, .(text= paste(text, collapse='. ')),
          .(demand = zoo::na.locf(demand))][, outcome := raw_df$demand[indx]][]

raw_df %>% 
    mutate(outcome = demand,
           demand = replace(demand, demand == '', NA),
           outcome = replace(outcome, outcome == '', NA),
           outcome = gsub("^cat\\d+", NA, outcome)) %>% 
    fill(demand) %>% 
    fill(outcome, .direction = "up") %>% 
    filter(!grepl("-----", text)) %>%
    group_by(demand, outcome) %>% 
    summarize(text = gsub(",", "\\.", toString(text))) %>% 
    select(text, everything())

# A tibble: 2 x 3
# Groups:   demand [2]
                 text demand outcome
                <chr> <fctr>   <chr>
1 text1. text2. text3   cat1    info
2 text4. text5. text6   cat2   info2