替换r中的正则表达式模式

替换r中的正则表达式模式,r,regex,gsub,R,Regex,Gsub,我有一个文本列,其中包含客户和代理商之间电话的语音到文本记录。在对原始文本值进行一些文本操作之后,假设我有一个向量,如下所示作为示例: text我们可以使用stru-extract library(stringr) v1 <- str_extract_all(text, "(?<=:)(customer\\s+\\w+\\s*\\d*)|(agent\\s+\\w+\\s*\\d*)")[[1]] v1[c(TRUE, FALSE)] v1[c(FALSE, TRUE)] 现在,我

我有一个文本列,其中包含客户和代理商之间电话的语音到文本记录。在对原始文本值进行一些文本操作之后,假设我有一个向量,如下所示作为示例:


text我们可以使用
stru-extract

library(stringr)
v1 <- str_extract_all(text, "(?<=:)(customer\\s+\\w+\\s*\\d*)|(agent\\s+\\w+\\s*\\d*)")[[1]]
v1[c(TRUE, FALSE)]
v1[c(FALSE, TRUE)]

现在,我可以如下解决它。我想它可能会被一些在正则表达式方面更有经验的人缩短

df$conversationCustomer <- gsub("agent:.*?customer:", ",", df$conversation)  # replaces any text starting with "agent:" and ending with "customer:" and assigns the customer text to new variable.
df$conversationCustomer <- gsub("agent:.*", "", df$conversationCustomer) # this is for the agent texts at the end of conversation those I couldn't clean the "agent:" part using first regex 
df$conversationCustomer <- gsub("customer:", "", df$conversationCustomer) # this is for removing the "customer:" in the conversations those starts with customer text. (Again, I couldn't clean "customer:" part using first regex.)
df$conversationAgent <- gsub("customer:.*?agent:", ",", df$conversation)
df$conversationAgent <- gsub("customer:.*", "", df$conversationAgent)
df$conversationAgent <- gsub("agent:", "", df$conversationAgent)

df$conversationCustomer R用于文本解析?上帝保佑你。在上面的问题中,我给出了向量“文本”作为例子,你的解决方案在这方面非常有效。谢谢然而,当我在数据帧上使用实际数据尝试('strsplit'方法)时,它产生了以下错误。>df$conversation\u customer@kzmlbyrk如果它是data.frame,那么您不需要对第一个元素进行子集,ue sapply
lst我是否遗漏了什么?df$conversationCustomer我刚刚意识到的另一件事是:“df$conversation”列有时以客户文本开头,有时以代理文本开头。因此,[c(TRUE,FALSE)]语句可能不会一直过滤所需的文本。很抱歉,很晚才意识到这种情况。
> callid <- c("1","2")
> conversation <- c(" customer:customer text 1 agent:agent text 1 customer:customer text 2 agent:agent text 2",
+                   " agent:agent text 8 customer:customer text 8 agent:agent text 9 customer:customer text 9")
> conversationCustomer <- c("customer text 1, customer text 2", "customer text 8, customer text 9")
> conversationAgent <- c("agent text 1, agent text 2", "agent text 8, agent text 9")
> df <- data.frame(callid, conversation)
> dfDesired <- data.frame(callid, conversation, conversationCustomer, conversationAgent)
> rm(callid, conversation, conversationCustomer, conversationAgent)
> 
> df
  callid                                                                             conversation
1      1  customer:customer text 1 agent:agent text 1 customer:customer text 2 agent:agent text 2
2      2  agent:agent text 8 customer:customer text 8 agent:agent text 9 customer:customer text 9
> dfDesired
  callid                                                                             conversation             conversationCustomer          conversationAgent
1      1  customer:customer text 1 agent:agent text 1 customer:customer text 2 agent:agent text 2 customer text 1, customer text 2 agent text 1, agent text 2
2      2  agent:agent text 8 customer:customer text 8 agent:agent text 9 customer:customer text 9 customer text 8, customer text 9 agent text 8, agent text 9
library(stringr)
v1 <- str_extract_all(text, "(?<=:)(customer\\s+\\w+\\s*\\d*)|(agent\\s+\\w+\\s*\\d*)")[[1]]
v1[c(TRUE, FALSE)]
v1[c(FALSE, TRUE)]
v1 <- strsplit(trimws(text), "(customer|agent):\\s*")[[1]]
v2 <- trimws(v1[nzchar(v1)])
toString(v2[c(TRUE, FALSE)])
toString(v2[c(FALSE, TRUE)])
df$conversationCustomer <- gsub("agent:.*?customer:", ",", df$conversation)  # replaces any text starting with "agent:" and ending with "customer:" and assigns the customer text to new variable.
df$conversationCustomer <- gsub("agent:.*", "", df$conversationCustomer) # this is for the agent texts at the end of conversation those I couldn't clean the "agent:" part using first regex 
df$conversationCustomer <- gsub("customer:", "", df$conversationCustomer) # this is for removing the "customer:" in the conversations those starts with customer text. (Again, I couldn't clean "customer:" part using first regex.)
df$conversationAgent <- gsub("customer:.*?agent:", ",", df$conversation)
df$conversationAgent <- gsub("customer:.*", "", df$conversationAgent)
df$conversationAgent <- gsub("agent:", "", df$conversationAgent)