R 基于两个条件在数据帧子集上循环
我有以下问题:我需要运行数据帧的每个子集-基于一个变量的值-根据两个条件为另一个变量创建一个新条目 数据框架(dt3)如下:我有4个变量(出生年份、姓氏-姓名、家庭角色-角色和家庭-hh-)。整个集合由hh变量分割或子集,hh变量收集同一家庭下的所有个体。例如,在我下面的示例中,前4行属于家庭“1”。此外,在可变角色下,只说明户主。其余的角色是空的,必须派生,这就是我要做的。我的第一步是分配“孩子”的角色。我想通过在整个数据集和每个子集(每个hh值)上运行循环来实现这一点。如果每一行中有一个人与户主姓相同,且出生年份至少比户主晚15年,则该人被推断为“子女” 原始数据帧是:R 基于两个条件在数据帧子集上循环,r,list,loops,if-statement,mutate,R,List,Loops,If Statement,Mutate,我有以下问题:我需要运行数据帧的每个子集-基于一个变量的值-根据两个条件为另一个变量创建一个新条目 数据框架(dt3)如下:我有4个变量(出生年份、姓氏-姓名、家庭角色-角色和家庭-hh-)。整个集合由hh变量分割或子集,hh变量收集同一家庭下的所有个体。例如,在我下面的示例中,前4行属于家庭“1”。此外,在可变角色下,只说明户主。其余的角色是空的,必须派生,这就是我要做的。我的第一步是分配“孩子”的角色。我想通过在整个数据集和每个子集(每个hh值)上运行循环来实现这一点。如果每一行中有一个人与
birth_year Name role hh
1877 Snijders Head ofhousehold 1
1885 Marteen NA 1
1897 Snijders NA 1
1892 Zelstra NA 1
1878 Kuipers Head of household 2
1870 Marteen NA 2
1897 Wals NA 2
1900 Venstra NA 2
1900 Lippe Head of household 3
1905 Flachs NA 3
1920 Lippe NA 3
1922 Lippe NA 3
因此,我需要运行整个集合和每个hh子集,并执行以下两个条件:
A.如果此人的姓名==头部的姓名,以及
B如果该人的出生年份与头部年龄相差15年或以上
那么这个人就是“孩子”
到目前为止,我一直在尝试一些事情。当我把户主的角色放在每个家庭的第一排时,我就这样做了:
(a)
嵌套循环,其中我尝试运行数据集,然后运行每个hh。对于每个hh,我运行条件(通过将每行的名称和出生年份与hh第一行的名称和出生年份进行比较,即头-)
还有b),我也尝试过同样的方法,但是使用了列表。我首先通过hh变量拆分dt3
dt3 <- split(dt3, f = dt3$hh)
我所探索的两种解决方案都没有成功,我所期望的结果如下:
birth_year Name role hh
1877 Snijders Head ofhousehold 1
1885 Marteen NA 1
1897 Snijders children 1
1892 Zelstra NA 1
1878 Kuipers Head of household 2
1870 Marteen NA 2
1897 Wals NA 2
1900 Venstra NA 2
1900 Lippe Head of household 3
1905 Flachs NA 3
1920 Lippe children 3
1922 Lippe children 3
欢迎提供任何提示
提前谢谢您可以先提取所有“HeadOfHousehold”,并将其合并到您的dt3
中,然后对姓名和出生年份进行比较
dt3 <- read.table(header=T, text="birth_year Name role hh
1877 Snijders HeadOfHousehold 1
1885 Marteen NA 1
1897 Snijders NA 1
1892 Zelstra NA 1
1878 Kuipers HeadOfHousehold 2
1870 Marteen NA 2
1897 Wals NA 2
1900 Venstra NA 2
1900 Lippe HeadOfHousehold 3
1905 Flachs NA 3
1920 Lippe NA 3
1922 Lippe NA 3", as.is = T)
tt <- with(dt3[!is.na(dt3$role) & dt3$role=="HeadOfHousehold",], data.frame(a=birth_year, b=Name, hh))
me <- merge(dt3, tt, all.x=T)
me$role[me$Name==me$b & me$birth_year > me$a+14] <- "children"
me[names(dt3)]
1 1877 Snijders HeadOfHousehold 1
2 1885 Marteen <NA> 1
3 1897 Snijders children 1
4 1892 Zelstra <NA> 1
5 1878 Kuipers HeadOfHousehold 2
6 1870 Marteen <NA> 2
7 1897 Wals <NA> 2
8 1900 Venstra <NA> 2
9 1900 Lippe HeadOfHousehold 3
10 1905 Flachs <NA> 3
11 1920 Lippe children 3
12 1922 Lippe children 3
dt3可能以下速度更快:
您可以先按hh和角色订购=“户主”,将户主角色放在每个家庭的第一排,你已经做了什么,但可能以不同的方式,然后用ave
per hh测试姓名是否相等,出生年份差异是否超过14
dt3 <- read.table(header=T, text="birth_year Name role hh
1877 Snijders HeadOfHousehold 1
1885 Marteen NA 1
1897 Snijders NA 1
1892 Zelstra NA 1
1878 Kuipers HeadOfHousehold 2
1870 Marteen NA 2
1897 Wals NA 2
1900 Venstra NA 2
1900 Lippe HeadOfHousehold 3
1905 Flachs NA 3
1920 Lippe NA 3
1922 Lippe NA 3", as.is = T)
dt3 <- dt3[with(dt3, order(hh,role!="HeadOfHousehold")),]
dt3$role[with(dt3, as.logical(ave(Name, hh, FUN = function(x) x==x[1])) & ave(birth_year, hh, FUN = function(x) x>(x[1]+14)))] <- "children"
dt3
birth_year Name role hh
1 1877 Snijders HeadOfHousehold 1
2 1885 Marteen <NA> 1
3 1897 Snijders children 1
4 1892 Zelstra <NA> 1
5 1878 Kuipers HeadOfHousehold 2
6 1870 Marteen <NA> 2
7 1897 Wals <NA> 2
8 1900 Venstra <NA> 2
9 1900 Lippe HeadOfHousehold 3
10 1905 Flachs <NA> 3
11 1920 Lippe children 3
12 1922 Lippe children 3
dt3您也可以简单地使用for循环,如:
dt3 <- read.table(header=T, text="birth_year Name role hh
1877 Snijders HeadOfHousehold 1
1885 Marteen NA 1
1897 Snijders NA 1
1892 Zelstra NA 1
1878 Kuipers HeadOfHousehold 2
1870 Marteen NA 2
1897 Wals NA 2
1900 Venstra NA 2
1900 Lippe HeadOfHousehold 3
1905 Flachs NA 3
1920 Lippe NA 3
1922 Lippe NA 3", as.is = T)
dt3 <- dt3[with(dt3, order(hh,role!="HeadOfHousehold")),]
for(i in 1:nrow(dt3)) {
if(!is.na(dt3$role[i]) & dt3$role[i] == "HeadOfHousehold") {
hh <- dt3$hh[i]
Name <- dt3$Name[i]
birth_year <- dt3$birth_year[i]
} else {
if(hh == dt3$hh[i] & Name == dt3$Name[i] & dt3$birth_year[i] > birth_year+14) {dt3$role[i] <- "children"}
}
}
dt3
birth_year Name role hh
1 1877 Snijders HeadOfHousehold 1
2 1885 Marteen <NA> 1
3 1897 Snijders children 1
4 1892 Zelstra <NA> 1
5 1878 Kuipers HeadOfHousehold 2
6 1870 Marteen <NA> 2
7 1897 Wals <NA> 2
8 1900 Venstra <NA> 2
9 1900 Lippe HeadOfHousehold 3
10 1905 Flachs <NA> 3
11 1920 Lippe children 3
12 1922 Lippe children 3
dt3非常感谢您的时间@user10488504。唯一需要说明的是,我的表有很多行(DT3113000;tt 12400)。。。所以这次合并花了很长时间,非常感谢@user10488504的慷慨,以及第二种方法。而且,它实际上可以做的是由HH合并,因为我只对每个家庭的两个条件感兴趣,所以这样,合并是即时的。
dt3 <- read.table(header=T, text="birth_year Name role hh
1877 Snijders HeadOfHousehold 1
1885 Marteen NA 1
1897 Snijders NA 1
1892 Zelstra NA 1
1878 Kuipers HeadOfHousehold 2
1870 Marteen NA 2
1897 Wals NA 2
1900 Venstra NA 2
1900 Lippe HeadOfHousehold 3
1905 Flachs NA 3
1920 Lippe NA 3
1922 Lippe NA 3", as.is = T)
dt3 <- dt3[with(dt3, order(hh,role!="HeadOfHousehold")),]
dt3$role[with(dt3, as.logical(ave(Name, hh, FUN = function(x) x==x[1])) & ave(birth_year, hh, FUN = function(x) x>(x[1]+14)))] <- "children"
dt3
birth_year Name role hh
1 1877 Snijders HeadOfHousehold 1
2 1885 Marteen <NA> 1
3 1897 Snijders children 1
4 1892 Zelstra <NA> 1
5 1878 Kuipers HeadOfHousehold 2
6 1870 Marteen <NA> 2
7 1897 Wals <NA> 2
8 1900 Venstra <NA> 2
9 1900 Lippe HeadOfHousehold 3
10 1905 Flachs <NA> 3
11 1920 Lippe children 3
12 1922 Lippe children 3
dt3 <- read.table(header=T, text="birth_year Name role hh
1877 Snijders HeadOfHousehold 1
1885 Marteen NA 1
1897 Snijders NA 1
1892 Zelstra NA 1
1878 Kuipers HeadOfHousehold 2
1870 Marteen NA 2
1897 Wals NA 2
1900 Venstra NA 2
1900 Lippe HeadOfHousehold 3
1905 Flachs NA 3
1920 Lippe NA 3
1922 Lippe NA 3", as.is = T)
dt3 <- dt3[with(dt3, order(hh,role!="HeadOfHousehold")),]
for(i in 1:nrow(dt3)) {
if(!is.na(dt3$role[i]) & dt3$role[i] == "HeadOfHousehold") {
hh <- dt3$hh[i]
Name <- dt3$Name[i]
birth_year <- dt3$birth_year[i]
} else {
if(hh == dt3$hh[i] & Name == dt3$Name[i] & dt3$birth_year[i] > birth_year+14) {dt3$role[i] <- "children"}
}
}
dt3
birth_year Name role hh
1 1877 Snijders HeadOfHousehold 1
2 1885 Marteen <NA> 1
3 1897 Snijders children 1
4 1892 Zelstra <NA> 1
5 1878 Kuipers HeadOfHousehold 2
6 1870 Marteen <NA> 2
7 1897 Wals <NA> 2
8 1900 Venstra <NA> 2
9 1900 Lippe HeadOfHousehold 3
10 1905 Flachs <NA> 3
11 1920 Lippe children 3
12 1922 Lippe children 3