R 遍历数据并创建新的数据帧
我正在以以下方式使用来自数据库的数据帧:R 遍历数据并创建新的数据帧,r,rstudio,R,Rstudio,我正在以以下方式使用来自数据库的数据帧: username elements username1 """interfaces"".""dual()""" username1 """interfaces"".""f_capitalaccrualcurrentyear""" username2 """interfaces"".""dnow_completion"",""interfaces"".""dnow_s_daily_prod_ta""" username2 """int
username elements
username1 """interfaces"".""dual()"""
username1 """interfaces"".""f_capitalaccrualcurrentyear"""
username2 """interfaces"".""dnow_completion"",""interfaces"".""dnow_s_daily_prod_ta"""
username2 """interfaces"".""dnow_completion"",""interfaces"".""dnow_s_daily_prod_ta"""
username2 """interfaces"".""dnow_completion"",""interfaces"".""dnow_s_daily_prod_ta"""
username4 """interfaces"".""dnow_s_downtime_stat_with_lat_long"""
username3 """interfaces"".""dnow_completion"",""interfaces"".""dnow_s_daily_prod_ta"""
所以,有两列,用户名和元素。因此,用户可以在一个事务中使用一个或多个元素。当有多个元素时,它们在事务中用逗号分隔。我需要将元素分开,每行一个,但仍然用用户名标记。最后,我希望是这样:
username elements
username1 """interfaces"".""dual()"""
username1 """interfaces"".""f_capitalaccrualcurrentyear"""
username2 """interfaces"".""dnow_completion""
username2 ""interfaces"".""dnow_s_daily_prod_ta"""
username2 """interfaces"".""dnow_completion""
username2 ""interfaces"".""dnow_s_daily_prod_ta"""
username2 """interfaces"".""dnow_completion""
username2 ""interfaces"".""dnow_s_daily_prod_ta"""
username4 """interfaces"".""dnow_s_downtime_stat_with_lat_long"""
username3 """interfaces"".""dnow_completion""
username3 ""interfaces"".""dnow_s_daily_prod_ta"""
我一直在尝试遍历数据帧,拆分带有逗号的元素,然后将它们与相应的用户名放回一起
我一直在尝试下面的代码,但效率非常低。我是新手,所以我想应该有一个更有效的方法来做到这一点
interface.data <-data.frame(
username = c(),
elements = c()
)
for (row in 1:nrow(input)) { ##input is the frame that comes from the database
myrowbrk<-input[row,"elements"]
myrowelements<-chartr(",", "\n", myrowbrk)
user<-input[row,"username"]
interface.newdata <- data.frame(
username = user,
elements = c(myrowelements)
)
interface.final<- rbind(interface.data,interface.newdata )
}
output<-interface.final
您可以使用tidyrpackage来实现这一点。我的解决方案使用两个步骤来获取所需格式的数据:1使用逗号字符分隔元素列,2将格式从宽更改为长
library(tidyr)
#Separate the 'elements' column from your 'df' data frame using the comma character
#Set the new variable names as a sequence of 1 to the max number of expected columns
df2 <- separate(data = df,
col = elements,
into = as.character(seq(1,2,1)),
sep = ",")
#This code gives a warning because not every row has a string with a comma.
#Empty entries are filled with NA
#Then change from wide to long format, dropping NA entries
#Drop the column that indicates the name of the column from which the elements entry was obtained (i.e., 1 or 2)
df2 <- df2 %>%
pivot_longer(cols = "1":"2",
values_to = "elements",
values_drop_na = TRUE) %>%
select(-name)
试试tidyr::separate_rowsinput,elements,sep=,?太棒了!这就解决了问题。非常感谢。谢谢你的回复!非常感谢!