R 如何将包含可解析字段的字符串添加到可以添加到数据帧的列中
我有一个数据帧。在dataframe的每一行中,最后一列是一个名为data_listing的字符串。列出字符串的数据本身是一系列键:值对,由逗号分隔。以下是其中一个字符串的示例:R 如何将包含可解析字段的字符串添加到可以添加到数据帧的列中,r,tidyr,R,Tidyr,我有一个数据帧。在dataframe的每一行中,最后一列是一个名为data_listing的字符串。列出字符串的数据本身是一系列键:值对,由逗号分隔。以下是其中一个字符串的示例: > data_listing[1:2] [1] "id:4006422,memberId:2932850,price:999,make:Chevrolet,model:Cobalt,makeYear:2009,trim:LT,mileage:142000,sellerType:For Sale By Owner,
> data_listing[1:2]
[1] "id:4006422,memberId:2932850,price:999,make:Chevrolet,model:Cobalt,makeYear:2009,trim:LT,mileage:142000,sellerType:For Sale By Owner,dealerOptions:null,index:2"
[2] "id:3987513,memberId:67473,price:26799,make:Audi,model:S5,makeYear:2013,trim:Prestige,mileage:44673,sellerType:Dealership,dealerOptions:{options:{VDPcarousel:true,allowUsed:true,calculator:true,carFaxIntegration:true,featuredCarousel:true,feed:true,homepageSpotlight:0,inlineSpotlight:11,limit:-1,map:true,monsterAds:true,pop:2,priceReduced:true,refresh:7,wrap:true,chat:false,inventoryComparison:true,standardFeatured:3}},index:3"
我想在dataframe中为data_列表字符串中的每个值创建一列。每列将使用键值作为其名称
如果我运行strsplitdata_清单,,那么我会得到一个字符串列表。每个列表元素都包含一个字符向量键:值对
我不太愿意写一个for循环来grep每个子列表元素,并将值添加到原始数据帧中的各个列中,但这是我唯一能弄清楚如何做到这一点的方法
我已经研究了transform和tidyr::separate,但它们适合于为字符串中的单个项而不是28个值进行灰色化
你将如何解决这个问题 我会这样做:
data_listing <- c("id:4006422,memberId:2932850,price:999,make:Chevrolet,model:Cobalt,makeYear:2009,trim:LT,mileage:142000,sellerType:For Sale By Owner,dealerOptions:null,index:2",
"id:3987513,memberId:67473,price:26799,make:Audi,model:S5,makeYear:2013,trim:Prestige,mileage:44673,sellerType:Dealership,dealerOptions:{options:{VDPcarousel:true,allowUsed:true,calculator:true,carFaxIntegration:true,featuredCarousel:true,feed:true,homepageSpotlight:0,inlineSpotlight:11,limit:-1,map:true,monsterAds:true,pop:2,priceReduced:true,refresh:7,wrap:true,chat:false,inventoryComparison:true,standardFeatured:3}},index:3")
library(tidyverse)
# custom fxn for use on a single element in data_listing
parser <- function(x) {
strsplit(x, ",", ) %>%
unlist %>%
as.tibble %>%
separate(value, c("colnames", "values")) %>%
spread(colnames, values)
}
map_dfr(data_listing, parser) # apply to each element then rbind() together
# console ...
# A tibble: 2 x 28
dealerOptions id index make makeYear memberId mileage model price
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 null 4006422 2 Chevrolet 2009 2932850 142000 Cobalt 999
2 options 3987513 3 Audi 2013 67473 44673 S5 26799
# ... with 19 more variables: sellerType <chr>, trim <chr>, allowUsed <chr>,
# calculator <chr>, carFaxIntegration <chr>, chat <chr>, featuredCarousel <chr>,
# feed <chr>, homepageSpotlight <chr>, inlineSpotlight <chr>,
# inventoryComparison <chr>, limit <chr>, map <chr>, monsterAds <chr>, pop <chr>,
# priceReduced <chr>, refresh <chr>, standardFeatured <chr>, wrap <chr>
完美的字符串dealerOptions的倒数第二个元素是包含子元素的复杂元素。我将不得不遵循您的逻辑并复制您所做的工作,以便在存在dealerOptions时将dealerOptions传播到新的数据列中。