在多个excel文件上循环以创建不同的数据帧,执行group by并在R中另存为单个df

在多个excel文件上循环以创建不同的数据帧,执行group by并在R中另存为单个df,r,excel,dataframe,R,Excel,Dataframe,我是R的新手,我有一个疑问,如果你能帮忙的话 我在一个文件夹中有几个excel文件。它们属于不同的子公司,但结构相同。 我想循环它们,将它们作为数据帧加载到R中,执行分组,并将所有内容保存在单个数据帧中,然后导出为单个文件。这可能吗 通过查看以下几个答案,我做到了: # Load the data as different dataframes library(tidyverse) library(readxl) f <- list.files(pattern="xlsx&q

我是R的新手,我有一个疑问,如果你能帮忙的话

我在一个文件夹中有几个excel文件。它们属于不同的子公司,但结构相同。
我想循环它们,将它们作为数据帧加载到R中,执行分组,并将所有内容保存在单个数据帧中,然后导出为单个文件。这可能吗

通过查看以下几个答案,我做到了:

# Load the data as different dataframes

library(tidyverse)
library(readxl)

f <- list.files(pattern="xlsx")

myfiles = lapply(f, read_excel)


for (i in 1:length(f)) assign(f[i], read_excel(f[i], sheet = "Deutsch", skip=7), data.frame(f[i]))
然后我创建了一个group by来执行一些计算:

for (i in 1:length(list_df))
{
  list_df[i] %>% 
    group_by(ABC) %>% 
    summarise(`Revenue in EUR` = sum(`Revenue in EUR`),
              `Weight in KG` = sum(`Weight in KG`),
              `Number of Materials` = length(`Materials`),
              `Avg of deliveries` = mean(`Deliveries`))
}
如果我对每个数据帧都这样做,它就会工作。但在这个循环中,情况并非如此。 你能帮我在所有的数据帧上循环,执行分组,并收集在一个文件中吗?可能吗

非常感谢您的关注

编辑:要包含虚拟数据样本,请执行以下操作:

> dput(df1)

structure(list(Materials = c("11575358", "75378378", "21333333", 
"02469984", "05465478", "05645648"), Deliveries = c(8, 1, 12, 
5, 1, 1), ABC = c("C", "A", "C", "B", "C", "C"), `Revenue in EUR` = c(6179, 
1804802.46, 3768.04, 9e+05, 1597.5, 1544.55), `Weight in KG` = c(16.6, 
4.695625, 19, 9.14625, 2.74041666666667, 1.44208333333333)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

> dput(df2)

structure(list(Materials = c("48654798", "05465489", "04598496", 
"08789453", "01589494", "06459849", "54694985", "65498848"), 
    Deliveries = c(24, 6, 32, 3, 11, 30, 45, 2), ABC = c("C", 
    "B", "C", "B", "C", "A", "A", "C"), `Revenue in EUR` = c(5509, 
    506978, 3978.04, 7e+05, 1597.5, 1200258, 2406975, 4059), 
    `Weight in KG` = c(29.6, 19, 24, 9.14625, 2.74041666666667, 
    50, 60, 10)), row.names = c(NA, -8L), class = c("tbl_df", 
"tbl", "data.frame"))

原始excel是xlsx格式,有5000到15000行,大约20个功能,7个选项卡。有22个excel文件要循环。

好的,可能会有一些错误,因为我没有您的文件,但请尝试以下操作:

# first of, write down your files in xlsx. I use xlsx because I prefere it
#but you should already have them
xlsx::write.xlsx2(df1,"df1.xlsx")
xlsx::write.xlsx2(df1,"df2.xlsx")

library(tidyverse)
library(readxl)

# here you get all the xlsx files
f <- list.files(pattern="xlsx")  
f
[1] "df1.xlsx" "df2.xlsx"

# an empty list
listed <- list()
# loop that populate the empty list with your files
for (i in f) { 
  listed[[i]] <- read_excel(i, sheet = "Sheet1" # , skip = 7  
                            )
  print(paste0("read the", i," file")) # here it says what it's doing
}

 listed
$df1.xlsx
# A tibble: 6 x 6
  ...1  Materials Deliveries ABC   `Revenue in EUR` `Weight in KG`
  <chr> <chr>          <dbl> <chr>            <dbl>          <dbl>
1 1     11575358           8 C                6179           16.6 
2 2     75378378           1 A             1804802.           4.70
3 3     21333333          12 C                3768.          19   
4 4     02469984           5 B              900000            9.15
5 5     05465478           1 C                1598.           2.74
6 6     05645648           1 C                1545.           1.44

$df2.xlsx
# A tibble: 6 x 6
  ...1  Materials Deliveries ABC   `Revenue in EUR` `Weight in KG`
  <chr> <chr>          <dbl> <chr>            <dbl>          <dbl>
1 1     11575358           8 C                6179           16.6 
2 2     75378378           1 A             1804802.           4.70
3 3     21333333          12 C                3768.          19   
4 4     02469984           5 B              900000            9.15
5 5     05465478           1 C                1598.           2.74
6 6     05645648           1 C                1545.           1.44

# now lapply to each element of the list, the summary, creating a new list
list_result <- lapply(listed, function(x) x %>% 
                                          group_by(ABC) %>% 
                                          summarise(
                          `Revenue in EUR` = sum(`Revenue in EUR`),
                          `Weight in KG` = sum(`Weight in KG`),
                          `Number of Materials` = length(`Materials`),
                          `Avg of deliveries` = mean(`Deliveries`)))

# put the result in a data.frame  
do.call(rbind,list_result)
# A tibble: 6 x 5
  ABC   `Revenue in EUR` `Weight in KG` `Number of Materials` `Avg of deliveries`
* <chr>            <dbl>          <dbl>                 <int>               <dbl>
1 A             1804802.           4.70                     1                 1  
2 B              900000            9.15                     1                 5  
3 C               13089.          39.8                      4                 5.5
4 A             1804802.           4.70                     1                 1  
5 B              900000            9.15                     1                 5  
6 C               13089.          39.8                      4                 5.5
#首先,用xlsx写下你的文件。我使用xlsx是因为我更喜欢它
#但是你应该已经有了
xlsx::write.xlsx2(df1,“df1.xlsx”)
xlsx::write.xlsx2(df1,“df2.xlsx”)
图书馆(tidyverse)
图书馆(readxl)
#这里有所有的xlsx文件

f好的,可能会有一些错误,因为我没有您的文件,但请尝试以下操作:

# first of, write down your files in xlsx. I use xlsx because I prefere it
#but you should already have them
xlsx::write.xlsx2(df1,"df1.xlsx")
xlsx::write.xlsx2(df1,"df2.xlsx")

library(tidyverse)
library(readxl)

# here you get all the xlsx files
f <- list.files(pattern="xlsx")  
f
[1] "df1.xlsx" "df2.xlsx"

# an empty list
listed <- list()
# loop that populate the empty list with your files
for (i in f) { 
  listed[[i]] <- read_excel(i, sheet = "Sheet1" # , skip = 7  
                            )
  print(paste0("read the", i," file")) # here it says what it's doing
}

 listed
$df1.xlsx
# A tibble: 6 x 6
  ...1  Materials Deliveries ABC   `Revenue in EUR` `Weight in KG`
  <chr> <chr>          <dbl> <chr>            <dbl>          <dbl>
1 1     11575358           8 C                6179           16.6 
2 2     75378378           1 A             1804802.           4.70
3 3     21333333          12 C                3768.          19   
4 4     02469984           5 B              900000            9.15
5 5     05465478           1 C                1598.           2.74
6 6     05645648           1 C                1545.           1.44

$df2.xlsx
# A tibble: 6 x 6
  ...1  Materials Deliveries ABC   `Revenue in EUR` `Weight in KG`
  <chr> <chr>          <dbl> <chr>            <dbl>          <dbl>
1 1     11575358           8 C                6179           16.6 
2 2     75378378           1 A             1804802.           4.70
3 3     21333333          12 C                3768.          19   
4 4     02469984           5 B              900000            9.15
5 5     05465478           1 C                1598.           2.74
6 6     05645648           1 C                1545.           1.44

# now lapply to each element of the list, the summary, creating a new list
list_result <- lapply(listed, function(x) x %>% 
                                          group_by(ABC) %>% 
                                          summarise(
                          `Revenue in EUR` = sum(`Revenue in EUR`),
                          `Weight in KG` = sum(`Weight in KG`),
                          `Number of Materials` = length(`Materials`),
                          `Avg of deliveries` = mean(`Deliveries`)))

# put the result in a data.frame  
do.call(rbind,list_result)
# A tibble: 6 x 5
  ABC   `Revenue in EUR` `Weight in KG` `Number of Materials` `Avg of deliveries`
* <chr>            <dbl>          <dbl>                 <int>               <dbl>
1 A             1804802.           4.70                     1                 1  
2 B              900000            9.15                     1                 5  
3 C               13089.          39.8                      4                 5.5
4 A             1804802.           4.70                     1                 1  
5 B              900000            9.15                     1                 5  
6 C               13089.          39.8                      4                 5.5
#首先,用xlsx写下你的文件。我使用xlsx是因为我更喜欢它
#但是你应该已经有了
xlsx::write.xlsx2(df1,“df1.xlsx”)
xlsx::write.xlsx2(df1,“df2.xlsx”)
图书馆(tidyverse)
图书馆(readxl)
#这里有所有的xlsx文件

f我喜欢编写函数,所以我会这样做(尽管它会创建一个更稳定的环境,以便在需要时进行修改/调试)

#主要功能
主功能路径
main_函数。创建_输出()->output
对于(列表中的文件)。文件(路径){
如果(!str_detect(文件'csv')){
下一个
}
读取excel(文件,sheet=“Deutsch”,skip=7)->数据
主函数。计算值(数据)->data.values
main_函数.append_值(文件、数据、数据.values、输出)->output
}
主函数导出(路径、输出、名称)
如果(进口){
赋值('values',output,envir=.Globalenv)
}
}
#功能
主功能输出输出[nrow(输出),col]
返回(输出)
}
主函数。计算值%group\U by(ABC)%>%
汇总(`Revenue in EUR`=总和(`Revenue in EUR`,na.rm=TRUE),
..)->数据
返回(数据)
}
主函数。创建路径
返回(路径)
}
主函数。创建输出
返回(输出)
}
这将创建
main_函数
,调用该函数时,它将遍历给定路径中列出的所有文件,并读取、处理、保存到
output
,该函数将保存在与给定名称相同的路径中。
如果将
import
设置为TRUE,它还将保存输出

我喜欢编写函数,因此我会这样做(尽管它会创建一个更稳定的环境,以便在需要时进行修改/调试)

#主要功能
主功能路径
main_函数。创建_输出()->output
对于(列表中的文件)。文件(路径){
如果(!str_detect(文件'csv')){
下一个
}
读取excel(文件,sheet=“Deutsch”,skip=7)->数据
主函数。计算值(数据)->data.values
main_函数.append_值(文件、数据、数据.values、输出)->output
}
主函数导出(路径、输出、名称)
如果(进口){
赋值('values',output,envir=.Globalenv)
}
}
#功能
主功能输出输出[nrow(输出),col]
返回(输出)
}
主函数。计算值%group\U by(ABC)%>%
汇总(`Revenue in EUR`=总和(`Revenue in EUR`,na.rm=TRUE),
..)->数据
返回(数据)
}
主函数。创建路径
返回(路径)
}
主函数。创建输出
返回(输出)
}
这将创建
main_函数
,调用该函数时,它将遍历给定路径中列出的所有文件,并读取、处理、保存到
output
,该函数将保存在与给定名称相同的路径中。
如果将
import
设置为TRUE,它也将保存输出

您也可以适当地使用
purrr::map

map_dfr(list_df, ~(. %>% 
    group_by(ABC) %>% 
    summarise(`Revenue in EUR` = sum(`Revenue in EUR`),
              `Weight in KG` = sum(`Weight in KG`),
              `Number of Materials` = length(`Materials`),
              `Avg of deliveries` = mean(`Deliveries`))))
它将同时
rbind
结果

即使将文件存储在
myfiles
中,也可以使用以下语法


library(janitor)
map_dfr(myfiles, ~(.[-c(1:5),] %>% row_to_names(1) %>% 
                     group_by(ABC) %>% 
                     summarise(`Revenue in EUR` = sum(as.numeric(`Revenue in EUR`)),
                               `Weight in KG` = sum(as.numeric(`Weight in KG`)),
                               `Number of Materials` = length(`Materials`),
                               `Avg of deliveries` = mean(as.numeric(`Deliveries`)))
                   %>% ungroup()))

使用给定文件的结果

# A tibble: 6 x 5
  ABC   `Revenue in EUR` `Weight in KG` `Number of Materials` `Avg of deliveries`
  <chr>            <dbl>          <dbl>                 <int>               <dbl>
1 A             1804802.           4.70                     1                 1  
2 B              900000            9.15                     1                 5  
3 C               13089.          39.8                      4                 5.5
4 A             3607233          110                        2                37.5
5 B             1206978           28.1                      2                 4.5
6 C               15144.          66.3                      4                17.2
#一个tible:6 x 5
ABC`欧元收入'`千克重量'`材料数量'`平均交货量`
1A 1804802.4.70 1 1
2 B 900000 9.15 1 5
3 C 13089.39.8 4 5.5
4 A 3607233 110 2 37.5
5 B 1206978 28.1 2 4.5
6 C 15144.66.3 4 17.2

您也可以适当地使用
purrr::map

map_dfr(list_df, ~(. %>% 
    group_by(ABC) %>% 
    summarise(`Revenue in EUR` = sum(`Revenue in EUR`),
              `Weight in KG` = sum(`Weight in KG`),
              `Number of Materials` = length(`Materials`),
              `Avg of deliveries` = mean(`Deliveries`))))
它将同时
rbind
结果

即使将文件存储在
myfiles
中,也可以使用以下语法


library(janitor)
map_dfr(myfiles, ~(.[-c(1:5),] %>% row_to_names(1) %>% 
                     group_by(ABC) %>% 
                     summarise(`Revenue in EUR` = sum(as.numeric(`Revenue in EUR`)),
                               `Weight in KG` = sum(as.numeric(`Weight in KG`)),
                               `Number of Materials` = length(`Materials`),
                               `Avg of deliveries` = mean(as.numeric(`Deliveries`)))
                   %>% ungroup()))

使用给定文件的结果

# A tibble: 6 x 5
  ABC   `Revenue in EUR` `Weight in KG` `Number of Materials` `Avg of deliveries`
  <chr>            <dbl>          <dbl>                 <int>               <dbl>
1 A             1804802.           4.70                     1                 1  
2 B              900000            9.15                     1                 5  
3 C               13089.          39.8                      4                 5.5
4 A             3607233          110                        2                37.5
5 B             1206978           28.1                      2                 4.5
6 C               15144.          66.3                      4                17.2
#一个tible:6 x 5
ABC`欧元收入'`千克重量'`材料数量'`平均交货量`
1A 1804802.4.70 1 1
2 B 900000 9.15 1 5
3 C 13089.39.8 4 5.5
4 A 3607233 110 2 37.5
5 B 1206978 28.1 2 4.5
6 C