需要使用R清理数据方面的帮助吗
我需要一些使用R清理数据的帮助。 我的CSV文件如下所示需要使用R清理数据方面的帮助吗,r,R,我需要一些使用R清理数据的帮助。 我的CSV文件如下所示 "id","gender","age","category1","category2","category3","category4","category5","category6","category7","category8","category9","category10" 1,"Male",22,"movies","music","travel","cloths","grocery",,,,, 2,"Male",28,"trave
"id","gender","age","category1","category2","category3","category4","category5","category6","category7","category8","category9","category10"
1,"Male",22,"movies","music","travel","cloths","grocery",,,,,
2,"Male",28,"travel","books","movies",,,,,,,
3,"Female",27,"rent","fuel","grocery","cloths",,,,,,
4,"Female",22,"rent","grocery","travel","movies","cloths",,,,,
5,"Female",22,"rent","online-shopping","utiliy",,,,,,,
id gender age category rank
1 Male 22 movies 1
1 Male 22 music 2
1 Male 22 travel 3
1 Male 22 cloths 4
1 Male 22 grocery 5
1 Male 22 books NA
1 Male 22 rent NA
1 Male 22 fuel NA
1 Male 22 utility NA
1 Male 22 online-shopping NA
...................................
5 Female 22 movies NA
5 Female 22 music NA
5 Female 22 travel NA
5 Female 22 cloths NA
5 Female 22 grocery NA
5 Female 22 books NA
5 Female 22 rent 1
5 Female 22 fuel NA
5 Female 22 utility NA
5 Female 22 online-shopping 2
mini <- read.csv("~/MS/coding/mini.csv", header=FALSE)
mini_clean <- mini[-1,]
df_mini <- melt(df_clean, id.vars=c("V1","V2","V3"))
sqldf('select * from df_mini order by "V1"')
我需要重新格式化如下
"id","gender","age","category1","category2","category3","category4","category5","category6","category7","category8","category9","category10"
1,"Male",22,"movies","music","travel","cloths","grocery",,,,,
2,"Male",28,"travel","books","movies",,,,,,,
3,"Female",27,"rent","fuel","grocery","cloths",,,,,,
4,"Female",22,"rent","grocery","travel","movies","cloths",,,,,
5,"Female",22,"rent","online-shopping","utiliy",,,,,,,
id gender age category rank
1 Male 22 movies 1
1 Male 22 music 2
1 Male 22 travel 3
1 Male 22 cloths 4
1 Male 22 grocery 5
1 Male 22 books NA
1 Male 22 rent NA
1 Male 22 fuel NA
1 Male 22 utility NA
1 Male 22 online-shopping NA
...................................
5 Female 22 movies NA
5 Female 22 music NA
5 Female 22 travel NA
5 Female 22 cloths NA
5 Female 22 grocery NA
5 Female 22 books NA
5 Female 22 rent 1
5 Female 22 fuel NA
5 Female 22 utility NA
5 Female 22 online-shopping 2
mini <- read.csv("~/MS/coding/mini.csv", header=FALSE)
mini_clean <- mini[-1,]
df_mini <- melt(df_clean, id.vars=c("V1","V2","V3"))
sqldf('select * from df_mini order by "V1"')
到目前为止,我的努力如下
"id","gender","age","category1","category2","category3","category4","category5","category6","category7","category8","category9","category10"
1,"Male",22,"movies","music","travel","cloths","grocery",,,,,
2,"Male",28,"travel","books","movies",,,,,,,
3,"Female",27,"rent","fuel","grocery","cloths",,,,,,
4,"Female",22,"rent","grocery","travel","movies","cloths",,,,,
5,"Female",22,"rent","online-shopping","utiliy",,,,,,,
id gender age category rank
1 Male 22 movies 1
1 Male 22 music 2
1 Male 22 travel 3
1 Male 22 cloths 4
1 Male 22 grocery 5
1 Male 22 books NA
1 Male 22 rent NA
1 Male 22 fuel NA
1 Male 22 utility NA
1 Male 22 online-shopping NA
...................................
5 Female 22 movies NA
5 Female 22 music NA
5 Female 22 travel NA
5 Female 22 cloths NA
5 Female 22 grocery NA
5 Female 22 books NA
5 Female 22 rent 1
5 Female 22 fuel NA
5 Female 22 utility NA
5 Female 22 online-shopping 2
mini <- read.csv("~/MS/coding/mini.csv", header=FALSE)
mini_clean <- mini[-1,]
df_mini <- melt(df_clean, id.vars=c("V1","V2","V3"))
sqldf('select * from df_mini order by "V1"')
minitext1='“id”、“性别”、“年龄”、“类别1”、“类别2”、“类别3”、“类别4”、“类别5”、“类别6”、“类别7”、“类别8”、“类别9”、“类别10”
1,“男性”,22,“电影”,“音乐”,“旅游”,“衣服”,“杂货店”,,,,,
2,“男性”,28,“旅游”,“书籍”,“电影”,,,,,,,
3,“女性”,27,“租金”,“燃料”,“杂货店”,“衣服”,,,,,,
4,“女性”,22,“租金”,“杂货店”,“旅游”,“电影”,“衣服”,,,,,
5,“女性”,22,“租金”,“网上购物”,“实用性”
d1text1='“id”、“性别”、“年龄”、“类别1”、“类别2”、“类别3”、“类别4”、“类别5”、“类别6”、“类别7”、“类别8”、“类别9”、“类别10”
1,“男性”,22,“电影”,“音乐”,“旅游”,“衣服”,“杂货店”,,,,,
2,“男性”,28,“旅游”,“书籍”,“电影”,,,,,,,
3,“女性”,27,“租金”,“燃料”,“杂货店”,“衣服”,,,,,,
4,“女性”,22,“租金”,“杂货店”,“旅游”,“电影”,“衣服”,,,,,
5,“女性”,22,“租金”,“网上购物”,“实用性”
d1我们可以使用从tidyr
library(tidyr)
d2 <- gather(d1, rank, category, -(1:3)) %>%
extract(rank, into='rank', '.*(\\d+)')
head(d2)
# id gender age rank category
#1 1 Male 22 1 movies
#2 2 Male 28 1 travel
#3 3 Female 27 1 rent
#4 4 Female 22 1 rent
#5 5 Female 22 1 rent
#6 1 Male 22 2 music
library(tidyr)
d2%
提取(秩,放入='秩','.*(\\d+))
总目(d2)
#id性别年龄等级类别
#1男22 1电影
#2男28 1旅游
#3女27 1租金
#4女22 1租金
#5女22 1租金
#6 1男22 2音乐
我们可以使用从tidyr
library(tidyr)
d2 <- gather(d1, rank, category, -(1:3)) %>%
extract(rank, into='rank', '.*(\\d+)')
head(d2)
# id gender age rank category
#1 1 Male 22 1 movies
#2 2 Male 28 1 travel
#3 3 Female 27 1 rent
#4 4 Female 22 1 rent
#5 5 Female 22 1 rent
#6 1 Male 22 2 music
library(tidyr)
d2%
提取(秩,放入='秩','.*(\\d+))
总目(d2)
#id性别年龄等级类别
#1男22 1电影
#2男28 1旅游
#3女27 1租金
#4女22 1租金
#5女22 1租金
#6 1男22 2音乐
Thank rank按预期工作。但我需要为每个用户填写所有缺失的类别。我该如何填写?如何填写这些类别?请说明逻辑。Thx.对于每个用户,我需要填写10个类别。例如,id为1的用户仅对“电影”、“音乐”、“旅行”、“衣服”、“杂货”类别具有共享首选项。但其他可能的类别包括“书籍”、“租金”、“燃料”、“网上购物”和“公用事业”。谢谢。我使用了“complete”函数来填充缺少的分类Hanks rank工作正如预期的那样。但我需要为每个用户填写所有缺失的类别。我该如何填写?如何填写这些类别?请说明逻辑。Thx.对于每个用户,我需要填写10个类别。例如,id为1的用户仅对“电影”、“音乐”、“旅行”、“衣服”、“杂货”类别具有共享首选项。但其他可能的类别包括“书籍”、“租金”、“燃料”、“网上购物”和“公用事业”。谢谢。我使用“complete”函数来填充缺少的分类任务。但是,为每个用户列出所有类别的最佳方式是什么。请参考我的预期输出。等级应该是NA,而不是CategoryTanks。但是,为每个用户列出所有类别的最佳方式是什么。请参考我的预期输出。等级应该是NA而不是类别