Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/75.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 如何按分组,然后总结所有列中哪些行具有NA_R_Dplyr - Fatal编程技术网

R 如何按分组,然后总结所有列中哪些行具有NA

R 如何按分组,然后总结所有列中哪些行具有NA,r,dplyr,R,Dplyr,我不确定这是否可能。我希望能够使用summary来计算除了groupby之外的所有列中都有NA的所有行。我可以将所有5个条件放在一起,其中我有无百分比=,然后必须用和连接每一列。如果你能用SQL来做,我想你可以用dplyr或purrr来做,但似乎互联网上没有人尝试过 必须下载数据 代码如下。这是可行的,但真的没有办法对最后一行代码使用all函数吗?我需要能够首先做一个组,我不能在dplyr中全部使用过滤器 farmers_market = read.csv("Export.csv", strin

我不确定这是否可能。我希望能够使用summary来计算除了groupby之外的所有列中都有NA的所有行。我可以将所有5个条件放在一起,其中我有
无百分比=
,然后必须用
连接每一列。如果你能用SQL来做,我想你可以用dplyr或purrr来做,但似乎互联网上没有人尝试过

必须下载数据

代码如下。这是可行的,但真的没有办法对最后一行代码使用all函数吗?我需要能够首先做一个组,我不能在dplyr中全部使用过滤器

farmers_market = read.csv("Export.csv", stringsAsFactors = F, na.strings=c("NA","NaN", ""))

farmers_market %>% 
        select(c("Website", "Facebook", "Twitter", "Youtube", "OtherMedia", "State")) %>%
        group_by(State) %>%
        summarise(Num_Markets = n(),
                  FB_Percent = 100 - 100*sum(is.na(Facebook))/n(), 
                  TW_Percent = 100 - 100*sum(is.na(Twitter))/n(),
                  #fb=sum(is.na(Facebook)),
                  OL_Percent = 100 - 100*sum(is.na(Facebook) & is.na(Twitter))/n(),
                  NO_OL_Percent = 100 - 100*sum(is.na(Facebook) & is.na(Twitter) & is.na(Website) & is.na(Youtube) & is.na(OtherMedia))/n()
                  )

我删除了
select
语句,因为我们正在总结,所以只会选择相关的列。创建了一个
cols
向量,从中我们要计算
NA
s

我们首先检查每一行是否在
cols
列中有所有
NA
值,并将
TRUE
/
FALSE
值分配给新列
all\NA
。然后,我们按
状态
对所有列进行分组
,并对其余列按原样进行计算,但对于
无百分比
,我们对
所有列
求和,以获得每组的
NA
总数,并将其除以组中的总行数

library(dplyr)

cols <- c("Website", "Facebook", "Twitter", "Youtube", "OtherMedia")

farmers_market %>% 
   mutate(all_NA = rowSums(is.na(.[cols])) == length(cols)) %>%
   group_by(State) %>%
   summarise(Num_Markets = n(),
             FB_Percent = 100 - 100*sum(is.na(Facebook))/n(), 
             TW_Percent = 100 - 100*sum(is.na(Twitter))/n(),
             OL_Percent = 100 - 100*sum(is.na(Facebook) & is.na(Twitter))/n(),
             NO_OL_Percent = 100 - 100*sum(all_NA)/n())


#    State                Num_Markets FB_Percent TW_Percent OL_Percent NO_OL_Percent
#    <chr>                      <int>      <dbl>      <dbl>      <dbl>         <dbl>
# 1 Alabama                      139       25.9       5.76       25.9          37.4
# 2 Alaska                        38       42.1      10.5        42.1          65.8
# 3 Arizona                       92       57.6      27.2        57.6          80.4
# 4 Arkansas                     111       52.3       4.50       52.3          61.3
# 5 California                   759       41.5      14.5        43.2          70.1
# 6 Colorado                     161       44.1       9.94       44.1          82.6
# 7 Connecticut                  157       33.8      12.1        33.8          53.5
# 8 Delaware                      36       61.1      11.1        61.1          83.3
# 9 District of Columbia          57       50.9      43.9        50.9          87.7
#10 Florida                      262       43.1       8.78       43.1          83.2
# … with 43 more rows
库(dplyr)
科尔斯%
mutate(all_NA=行和(is.NA([cols])==长度(cols))%>%
按(州)分组%>%
总结(Num_Markets=n(),
FB_Percent=100-100*sum(is.na(Facebook))/n(),
TW_Percent=100-100*sum(is.na(Twitter))/n(),
OL_Percent=100-100*sum(is.na(Facebook)和is.na(Twitter))/n(),
无百分比=100-100*总和(全部)/n()
#州数量市场FB百分比TW百分比OLU百分比NOU OLU百分比
#                                                     
#1阿拉巴马州139 25.9 5.76 25.9 37.4
#2阿拉斯加38 42.1 10.5 42.1 65.8
#3亚利桑那州92 57.6 27.2 57.6 80.4
#4阿肯色州111 52.3 4.50 52.3 61.3
#5加利福尼亚75941.514.543.270.1
#6科罗拉多161 44.1 9.94 44.1 82.6
#7康涅狄格州157 33.8 12.1 33.8 53.5
#8特拉华36 61.11.1 61.1 83.3
#9哥伦比亚特区57 50.9 43.9 50.9 87.7
#10佛罗里达262 43.1 8.78 43.1 83.2
#…还有43行

这将提供与当前方法相同的输出,但不需要手动写入所有名称

获取
百分比的直接方法是:

farmers_market %>% 
    select("Website", "Facebook", "Twitter", "Youtube", "OtherMedia", "State") %>%
    group_by(State) %>% 
    summarise_all(funs("Percent" = sum(is.na(.))/n()))

# A tibble: 53 x 6
#  State   Website_Percent Facebook_Percent Twitter_Percent Youtube_Percent OtherMedia_Percent
#  <chr>             <dbl>            <dbl>           <dbl>           <dbl>              <dbl>
#1 Alabama           0.727            0.741           0.942           0.993              0.964
#2 Alaska            0.447            0.579           0.895           1                  0.974
farmers_market %>% 
    select("Website", "Facebook", "Twitter", "Youtube", "OtherMedia", "State") %>%
    group_by(State) %>% 
    mutate(num_markets = n()) %>% 
    group_by(State, num_markets) %>% 
    summarise_all(funs("Percent" = sum(is.na(.))/n()))

# A tibble: 53 x 7
# Groups:   State [2]
#  State   num_markets Website_Percent Facebook_Percent Twitter_Percent Youtube_Percent OtherMedia_Percent
#  <chr>         <int>           <dbl>            <dbl>           <dbl>           <dbl>              <dbl>
#1 Alabama         139           0.727            0.741           0.942           0.993              0.964
#2 Alaska           38           0.447            0.579           0.895           1                  0.974