Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/css/35.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
需要使用R清理数据方面的帮助吗_R - Fatal编程技术网

需要使用R清理数据方面的帮助吗

需要使用R清理数据方面的帮助吗,r,R,我需要一些使用R清理数据的帮助。 我的CSV文件如下所示 "id","gender","age","category1","category2","category3","category4","category5","category6","category7","category8","category9","category10" 1,"Male",22,"movies","music","travel","cloths","grocery",,,,, 2,"Male",28,"trave

我需要一些使用R清理数据的帮助。 我的CSV文件如下所示

"id","gender","age","category1","category2","category3","category4","category5","category6","category7","category8","category9","category10"
1,"Male",22,"movies","music","travel","cloths","grocery",,,,,
2,"Male",28,"travel","books","movies",,,,,,,
3,"Female",27,"rent","fuel","grocery","cloths",,,,,,
4,"Female",22,"rent","grocery","travel","movies","cloths",,,,,
5,"Female",22,"rent","online-shopping","utiliy",,,,,,,
id gender age category            rank
1 Male    22  movies               1
1 Male    22  music                2
1 Male    22  travel               3
1 Male    22  cloths               4
1 Male    22  grocery              5
1 Male    22  books                NA
1 Male    22  rent                 NA
1 Male    22  fuel                 NA
1 Male    22  utility              NA
1 Male    22  online-shopping      NA
...................................
5 Female    22  movies             NA
5 Female    22  music              NA
5 Female    22  travel             NA
5 Female    22  cloths             NA
5 Female    22  grocery            NA
5 Female    22  books              NA
5 Female    22  rent               1
5 Female    22  fuel               NA
5 Female    22  utility            NA
5 Female    22  online-shopping    2
mini <- read.csv("~/MS/coding/mini.csv", header=FALSE)
mini_clean <- mini[-1,]
df_mini <- melt(df_clean, id.vars=c("V1","V2","V3"))
sqldf('select * from df_mini order by  "V1"')
我需要重新格式化如下

"id","gender","age","category1","category2","category3","category4","category5","category6","category7","category8","category9","category10"
1,"Male",22,"movies","music","travel","cloths","grocery",,,,,
2,"Male",28,"travel","books","movies",,,,,,,
3,"Female",27,"rent","fuel","grocery","cloths",,,,,,
4,"Female",22,"rent","grocery","travel","movies","cloths",,,,,
5,"Female",22,"rent","online-shopping","utiliy",,,,,,,
id gender age category            rank
1 Male    22  movies               1
1 Male    22  music                2
1 Male    22  travel               3
1 Male    22  cloths               4
1 Male    22  grocery              5
1 Male    22  books                NA
1 Male    22  rent                 NA
1 Male    22  fuel                 NA
1 Male    22  utility              NA
1 Male    22  online-shopping      NA
...................................
5 Female    22  movies             NA
5 Female    22  music              NA
5 Female    22  travel             NA
5 Female    22  cloths             NA
5 Female    22  grocery            NA
5 Female    22  books              NA
5 Female    22  rent               1
5 Female    22  fuel               NA
5 Female    22  utility            NA
5 Female    22  online-shopping    2
mini <- read.csv("~/MS/coding/mini.csv", header=FALSE)
mini_clean <- mini[-1,]
df_mini <- melt(df_clean, id.vars=c("V1","V2","V3"))
sqldf('select * from df_mini order by  "V1"')
到目前为止,我的努力如下

"id","gender","age","category1","category2","category3","category4","category5","category6","category7","category8","category9","category10"
1,"Male",22,"movies","music","travel","cloths","grocery",,,,,
2,"Male",28,"travel","books","movies",,,,,,,
3,"Female",27,"rent","fuel","grocery","cloths",,,,,,
4,"Female",22,"rent","grocery","travel","movies","cloths",,,,,
5,"Female",22,"rent","online-shopping","utiliy",,,,,,,
id gender age category            rank
1 Male    22  movies               1
1 Male    22  music                2
1 Male    22  travel               3
1 Male    22  cloths               4
1 Male    22  grocery              5
1 Male    22  books                NA
1 Male    22  rent                 NA
1 Male    22  fuel                 NA
1 Male    22  utility              NA
1 Male    22  online-shopping      NA
...................................
5 Female    22  movies             NA
5 Female    22  music              NA
5 Female    22  travel             NA
5 Female    22  cloths             NA
5 Female    22  grocery            NA
5 Female    22  books              NA
5 Female    22  rent               1
5 Female    22  fuel               NA
5 Female    22  utility            NA
5 Female    22  online-shopping    2
mini <- read.csv("~/MS/coding/mini.csv", header=FALSE)
mini_clean <- mini[-1,]
df_mini <- melt(df_clean, id.vars=c("V1","V2","V3"))
sqldf('select * from df_mini order by  "V1"')
mini
text1='“id”、“性别”、“年龄”、“类别1”、“类别2”、“类别3”、“类别4”、“类别5”、“类别6”、“类别7”、“类别8”、“类别9”、“类别10”
1,“男性”,22,“电影”,“音乐”,“旅游”,“衣服”,“杂货店”,,,,,
2,“男性”,28,“旅游”,“书籍”,“电影”,,,,,,,
3,“女性”,27,“租金”,“燃料”,“杂货店”,“衣服”,,,,,,
4,“女性”,22,“租金”,“杂货店”,“旅游”,“电影”,“衣服”,,,,,
5,“女性”,22,“租金”,“网上购物”,“实用性”
d1
text1='“id”、“性别”、“年龄”、“类别1”、“类别2”、“类别3”、“类别4”、“类别5”、“类别6”、“类别7”、“类别8”、“类别9”、“类别10”
1,“男性”,22,“电影”,“音乐”,“旅游”,“衣服”,“杂货店”,,,,,
2,“男性”,28,“旅游”,“书籍”,“电影”,,,,,,,
3,“女性”,27,“租金”,“燃料”,“杂货店”,“衣服”,,,,,,
4,“女性”,22,“租金”,“杂货店”,“旅游”,“电影”,“衣服”,,,,,
5,“女性”,22,“租金”,“网上购物”,“实用性”

d1我们可以使用
tidyr

library(tidyr)
d2 <- gather(d1,  rank, category, -(1:3))  %>% 
                extract(rank, into='rank', '.*(\\d+)')
head(d2)
#  id gender age rank category
#1  1   Male  22    1   movies
#2  2   Male  28    1   travel
#3  3 Female  27    1     rent
#4  4 Female  22    1     rent
#5  5 Female  22    1     rent
#6  1   Male  22    2    music
library(tidyr)
d2%
提取(秩,放入='秩','.*(\\d+))
总目(d2)
#id性别年龄等级类别
#1男22 1电影
#2男28 1旅游
#3女27 1租金
#4女22 1租金
#5女22 1租金
#6 1男22 2音乐

我们可以使用
tidyr

library(tidyr)
d2 <- gather(d1,  rank, category, -(1:3))  %>% 
                extract(rank, into='rank', '.*(\\d+)')
head(d2)
#  id gender age rank category
#1  1   Male  22    1   movies
#2  2   Male  28    1   travel
#3  3 Female  27    1     rent
#4  4 Female  22    1     rent
#5  5 Female  22    1     rent
#6  1   Male  22    2    music
library(tidyr)
d2%
提取(秩,放入='秩','.*(\\d+))
总目(d2)
#id性别年龄等级类别
#1男22 1电影
#2男28 1旅游
#3女27 1租金
#4女22 1租金
#5女22 1租金
#6 1男22 2音乐

Thank rank按预期工作。但我需要为每个用户填写所有缺失的类别。我该如何填写?如何填写这些类别?请说明逻辑。Thx.对于每个用户,我需要填写10个类别。例如,id为1的用户仅对“电影”、“音乐”、“旅行”、“衣服”、“杂货”类别具有共享首选项。但其他可能的类别包括“书籍”、“租金”、“燃料”、“网上购物”和“公用事业”。谢谢。我使用了“complete”函数来填充缺少的分类Hanks rank工作正如预期的那样。但我需要为每个用户填写所有缺失的类别。我该如何填写?如何填写这些类别?请说明逻辑。Thx.对于每个用户,我需要填写10个类别。例如,id为1的用户仅对“电影”、“音乐”、“旅行”、“衣服”、“杂货”类别具有共享首选项。但其他可能的类别包括“书籍”、“租金”、“燃料”、“网上购物”和“公用事业”。谢谢。我使用“complete”函数来填充缺少的分类任务。但是,为每个用户列出所有类别的最佳方式是什么。请参考我的预期输出。等级应该是NA,而不是CategoryTanks。但是,为每个用户列出所有类别的最佳方式是什么。请参考我的预期输出。等级应该是NA而不是类别