使用R提取行中标识符的数据并将其放入新列中
我有一些使用ArcGIS获取的数据,我想通过流域标识符构建一个数据库,例如HUC_8=1404106。数据包含流域标识符HUC_8、流域面积、土壤类型和土壤面积。流域标识符的列出次数与土壤类型的列出次数相同。我想创建一个基于流域的数据库,该流域的标识符在列中只出现一次,并在不同的列中按类型提取土壤面积。我附上了数据的一个子集,希望它是清楚的。我对R有些陌生,但我觉得这可以用for循环来完成。知道如何做到这一点会非常有帮助,因为我经常使用GIS,但希望在R中执行更多分析使用R提取行中标识符的数据并将其放入新列中,r,loops,gis,watershed,R,Loops,Gis,Watershed,我有一些使用ArcGIS获取的数据,我想通过流域标识符构建一个数据库,例如HUC_8=1404106。数据包含流域标识符HUC_8、流域面积、土壤类型和土壤面积。流域标识符的列出次数与土壤类型的列出次数相同。我想创建一个基于流域的数据库,该流域的标识符在列中只出现一次,并在不同的列中按类型提取土壤面积。我附上了数据的一个子集,希望它是清楚的。我对R有些陌生,但我觉得这可以用for循环来完成。知道如何做到这一点会非常有帮助,因为我经常使用GIS,但希望在R中执行更多分析 HUC_8 Water
HUC_8 WatershedArea Soil SoilArea A_Area B_Area C_Area D_Area Null_Area
14040106 461104.4883 B 96590.33424
14040106 461104.4883 C 86282.93487
14040106 461104.4883 D 24945.9992
14050007 921494.3621 Null 2.861388
14050007 921494.3621 A 87214.28385
14050007 921494.3621 B 131417.8659
14050007 921494.3621 C 268324.5125
14050007 921494.3621 D 314131.5806
14060001 627348.8316 Null 8119.375083
14060001 627348.8316 A 5315.511117
14060001 627348.8316 B 286915.9001
14060001 627348.8316 C 114357.5251
14060001 627348.8316 D 163671.7545
你可以试试
lst <- Map(function(x,y) ifelse(df$Soil==x,y, NA),
sort(unique(df$Soil)), list(df$SoilArea))
names(lst) <- paste(names(lst), 'Area', sep="_")
df[names(lst)] <- lst
head(df,3)
# HUC_8 WatershedArea Soil SoilArea A_Area B_Area C_Area D_Area
#1 14040106 461104.5 B 96590.33 NA 96590.33 NA NA
#2 14040106 461104.5 C 86282.93 NA NA 86282.93 NA
#3 14040106 461104.5 D 24946.00 NA NA NA 24946
# Null_Area
#1 NA
#2 NA
#3 NA
数据
从本质上讲,听起来您希望将数据从长格式改为宽格式。图书馆在这里可以派上用场
#sample data
dd<-read.table(text="HUC_8 WatershedArea Soil SoilArea
14040106 461104.4883 B 96590.33424
14040106 461104.4883 C 86282.93487
14040106 461104.4883 D 24945.9992
14050007 921494.3621 Null 2.861388
14050007 921494.3621 A 87214.28385
14050007 921494.3621 B 131417.8659
14050007 921494.3621 C 268324.5125
14050007 921494.3621 D 314131.5806
14060001 627348.8316 Null 8119.375083
14060001 627348.8316 A 5315.511117
14060001 627348.8316 B 286915.9001
14060001 627348.8316 C 114357.5251
14060001 627348.8316 D 163671.7545", header=T)
谢谢你,弗利克先生。我尝试了您的方法,但在读取使用SoilArea作为值列时出错:使用value.var覆盖。。不知道那是什么意思这不是个错误。这是一个警告。如果需要,可以在函数调用中显式设置value.var=参数。在学习新函数时,我鼓励您阅读?dcast帮助页。我在较大的数据集上再次尝试,但发现缺少一个错误聚合函数:默认为长度。下面是我的代码>Huc8Soil wide2更改默认列名>soils nameswide2[匹配soils,nameswide2]>wide2 HUC_8流域A_区B_区C_区D_区Null_区1 14010005 31458.13 0 1 0 0 2 14040106 461104.49 0 1 1 1 1 1 1 0 3 14040107 116521.39 0 0 0 1 0 4 14050007 921494.36 1 1 2 1 1 1 1 1 1 5 14060001627348.83 11 16 14060002 412849.74 0 1 1 17 14060003 1693374.17 1 1 1 1 1 1 1……感谢您的帮助。我确信这是我在CSV文件中忽略的东西。谢谢你akrun,我很感激。这很有帮助,但对于每个HUC_8值,我仍然只需要一行。
df <- structure(list(HUC_8 = c(14040106L, 14040106L, 14040106L, 14050007L,
14050007L, 14050007L, 14050007L, 14050007L, 14060001L, 14060001L,
14060001L, 14060001L, 14060001L), WatershedArea = c(461104.4883,
461104.4883, 461104.4883, 921494.3621, 921494.3621, 921494.3621,
921494.3621, 921494.3621, 627348.8316, 627348.8316, 627348.8316,
627348.8316, 627348.8316), Soil = c("B", "C", "D", "Null", "A",
"B", "C", "D", "Null", "A", "B", "C", "D"), SoilArea = c(96590.33424,
86282.93487, 24945.9992, 2.861388, 87214.28385, 131417.8659,
268324.5125, 314131.5806, 8119.375083, 5315.511117, 286915.9001,
114357.5251, 163671.7545)), .Names = c("HUC_8", "WatershedArea",
"Soil", "SoilArea"), class = "data.frame", row.names = c(NA,
-13L))
#sample data
dd<-read.table(text="HUC_8 WatershedArea Soil SoilArea
14040106 461104.4883 B 96590.33424
14040106 461104.4883 C 86282.93487
14040106 461104.4883 D 24945.9992
14050007 921494.3621 Null 2.861388
14050007 921494.3621 A 87214.28385
14050007 921494.3621 B 131417.8659
14050007 921494.3621 C 268324.5125
14050007 921494.3621 D 314131.5806
14060001 627348.8316 Null 8119.375083
14060001 627348.8316 A 5315.511117
14060001 627348.8316 B 286915.9001
14060001 627348.8316 C 114357.5251
14060001 627348.8316 D 163671.7545", header=T)
library(reshape2)
wide <- dcast(dd, HUC_8+ WatershedArea ~ Soil)
#change default column names
soils <- levels(dd$Soil)
names(wide)[match(soils, names(wide))] <- paste(soils,"Area",sep="_")
HUC_8 WatershedArea A_Area B_Area C_Area D_Area Null_Area
1 14040106 461104.5 NA 96590.33 86282.93 24946.0 NA
2 14050007 921494.4 87214.284 131417.87 268324.51 314131.6 2.861388
3 14060001 627348.8 5315.511 286915.90 114357.53 163671.8 8119.375083