R 连接通话记录中的多个字符串
我有一个数据集,看起来像下面的一千行:R 连接通话记录中的多个字符串,r,R,我有一个数据集,看起来像下面的一千行: dat = c("Speaker 1: ONE TWO THREE | Speaker 2: FOUR FIVE SIX SEVEN | Speaker 1: EIGHT NINE TEN | Speaker 2: ELEVEN* TWELVE THIRTEEN | Speaker 1: FOURTEEN FIFTEEN","Speaker 1: ONE TWO") dat[1]: Four five six seven. Eleven twelve t
dat = c("Speaker 1: ONE TWO THREE | Speaker 2: FOUR FIVE SIX SEVEN | Speaker 1: EIGHT NINE TEN | Speaker 2: ELEVEN* TWELVE THIRTEEN | Speaker 1: FOURTEEN FIFTEEN","Speaker 1: ONE TWO")
dat[1]:
Four five six seven. Eleven twelve thirteen.
dat[2]:
NA #(or blank)
dat=tolowerdat小写
dat=gsub\\\*,dat带星号
我正试图让它看起来像下面这样:
dat = c("Speaker 1: ONE TWO THREE | Speaker 2: FOUR FIVE SIX SEVEN | Speaker 1: EIGHT NINE TEN | Speaker 2: ELEVEN* TWELVE THIRTEEN | Speaker 1: FOURTEEN FIFTEEN","Speaker 1: ONE TWO")
dat[1]:
Four five six seven. Eleven twelve thirteen.
dat[2]:
NA #(or blank)
也就是说,我想删除演讲者1中的任何内容,删除星号,将剩余内容更改为句子大小写,并在每个语句的末尾加上句号
非常感谢您的帮助,尤其是如果此解决方案存在于此处,但我未能找到它。因为您需要对同一对象应用多个操作,并且您需要str_trim函数,最好使用tidyverse:
使用base R,您可以执行以下操作:
a = gsub(".*?2:\\s*([^|]*)\\b|(?:(?!Speaker 2).)*","\\L\\1. ", dat, perl = T)
b = gsub("\\*", "", sub("(?|(?<=^)|(?<=\\W))\\W*$", '', a, perl = T))
`is.na<-`(b,nchar(b)==0)
[1] "four five six seven. eleven twelve thirteen."
[2] NA