R中的列操作-匹配正确的名称
我有一个data.frame,它由多列和数千行组成。下面我尝试显示它的头部:R中的列操作-匹配正确的名称,r,R,我有一个data.frame,它由多列和数千行组成。下面我尝试显示它的头部: |year |state_name|idealPoint| vote_no| vote_yes| |:--------------|---------:|---------:|---------:|---------:| |1971 | China | -25.0000| 31.0000| 45.4209| |1972 | Chin
|year |state_name|idealPoint| vote_no| vote_yes|
|:--------------|---------:|---------:|---------:|---------:|
|1971 | China | -25.0000| 31.0000| 45.4209|
|1972 | China | -26.2550| 38.2974| 45.4209|
|1973 | China | 28.2550| 35.2974| 45.4209|
|1994 | Czech | 27.2550| 34.2974| 45.4209|
如你所见。并非所有国家(其中有196个)都在同一年参加了联合国的投票
我想做的是在我的data.frame投票中创建一个新列,该列由给定年份中国理想分数与捷克理想分数之间的绝对差值组成。。。。我知道如何使用dplyr创建新列,但如何从196个国家的列表中乘以正确的国家?我认为,加入年份之间的差异可以手动删除
最终的输出应该是新的data.frame或选票中的新列,如下所示:例如,1994年的中国理想点是2.2550
这也许能解决你的问题吗
library(tibble)
library(dplyr)
a <- tribble(
~year, ~ctry, ~vote,
1994, "China", 5,
1995, "China", 100,
1996, "China", 600,
1997, "China", 45,
1998, "China", 9,
1994, "Czech_Republic", 1,
1995, "Czech_Republic", 5,
1996, "Czech_Republic", 100,
1997, "Czech_Republic", 40,
1998, "Czech_Republic", 6,
)
a %>%
group_by(year) %>%
mutate(foo = abs(lag(lead(vote) - vote)))
输出:
# A tibble: 10 x 4
# Groups: year [5]
year ctry vote foo
<dbl> <chr> <dbl> <dbl>
1 1994 China 5 NA
2 1995 China 100 NA
3 1996 China 600 NA
4 1997 China 45 NA
5 1998 China 9 NA
6 1994 Czech_Republic 1 4
7 1995 Czech_Republic 5 95
8 1996 Czech_Republic 100 500
9 1997 Czech_Republic 40 5
10 1998 Czech_Republic 6 3
year state_name.x idealpoints.x vote_no.x vote_yes.x state_name.y idealpoints.y vote_no.y vote_yes.y Abs_diff
1 1994 China -25.000 31.0000 45.4209 Czech_Republic -27.000 33.0000 45.4209 2
2 1995 China -26.255 38.2974 45.4209 Czech_Republic -28.255 36.2974 45.4209 2
3 1996 China 28.255 35.2974 45.4209 Czech_Republic 29.255 37.2974 45.4209 1
4 1997 China 27.255 34.2974 45.4209 Czech_Republic 22.255 38.2974 45.4209 5
您必须过滤数据以满足您的需要,例如按国家/地区。代码:
df1 <- data.frame(year = c(1994,1995,1996,1997,1994,1995,1996,1997),
state_name = c("China","China","China","China","Czech_Republic","Czech_Republic","Czech_Republic","Czech_Republic"),
idealpoints = c(-25.0000,-26.2550,28.2550,27.2550,-27.0000,-28.2550,29.2550,22.2550),
vote_no = c(31.0000,38.2974,35.2974,34.2974,33.0000,36.2974,37.2974,38.2974),
vote_yes = c(45.4209,45.4209,45.4209,45.4209,45.4209,45.4209,45.4209,45.4209))
china_df <- df1[df1$state_name == "China",]
czech_df <- df1[df1$state_name == "Czech_Republic",]
china_czech_merge <- merge(china_df,czech_df,by = "year")
china_czech_merge$Abs_diff <- abs(china_czech_merge$idealpoints.x - china_czech_merge$idealpoints.y)
我想这对你有用
谢谢大家好,欢迎来到StackOverflow!有关R的问题,请参阅综合1。国家名称栏中是否只有中国和捷克?2.如果没有,你说有196个国家的名单的其他国家在哪里?3.变量vote_no和vote_yes与所有这些有什么关系?我的错,这里不需要变量vote_no和vote_yes。列state_name向下列出每个国家及其给定的数据[年份-理想点值等]。我想它会以某种方式解决问题,我只是对它不熟悉,不知道如何从列中选择数据。我继续在excel表格中手动创建了新的data.table。肯定太长了。一旦我感到惊讶,我会尝试修补它,它是有效的,而且是现成的:自动只需要一年的时间,两国都有理想的积分估计数据。太神了我所做的只是将abs方程包装成*-1,使其为负值,但这对我的工作来说是一件装饰性的事情。非常感谢。
year state_name.x idealpoints.x vote_no.x vote_yes.x state_name.y idealpoints.y vote_no.y vote_yes.y Abs_diff
1 1994 China -25.000 31.0000 45.4209 Czech_Republic -27.000 33.0000 45.4209 2
2 1995 China -26.255 38.2974 45.4209 Czech_Republic -28.255 36.2974 45.4209 2
3 1996 China 28.255 35.2974 45.4209 Czech_Republic 29.255 37.2974 45.4209 1
4 1997 China 27.255 34.2974 45.4209 Czech_Republic 22.255 38.2974 45.4209 5