在多个excel文件上循环以创建不同的数据帧，执行group by并在R中另存为单个df_R_Excel_Dataframe

在多个excel文件上循环以创建不同的数据帧，执行group by并在R中另存为单个df

r excel dataframe

在多个excel文件上循环以创建不同的数据帧，执行group by并在R中另存为单个df,r,excel,dataframe,R,Excel,Dataframe,我是R的新手，我有一个疑问，如果你能帮忙的话我在一个文件夹中有几个excel文件。它们属于不同的子公司，但结构相同。我想循环它们，将它们作为数据帧加载到R中，执行分组，并将所有内容保存在单个数据帧中，然后导出为单个文件。这可能吗通过查看以下几个答案，我做到了： # Load the data as different dataframes library(tidyverse) library(readxl) f <- list.files(pattern="xlsx&q

我是R的新手，我有一个疑问，如果你能帮忙的话

我在一个文件夹中有几个excel文件。它们属于不同的子公司，但结构相同。
我想循环它们，将它们作为数据帧加载到R中，执行分组，并将所有内容保存在单个数据帧中，然后导出为单个文件。这可能吗

通过查看以下几个答案，我做到了：

# Load the data as different dataframes

library(tidyverse)
library(readxl)

f <- list.files(pattern="xlsx")

myfiles = lapply(f, read_excel)


for (i in 1:length(f)) assign(f[i], read_excel(f[i], sheet = "Deutsch", skip=7), data.frame(f[i]))

然后我创建了一个group by来执行一些计算：

for (i in 1:length(list_df))
{
  list_df[i] %>% 
    group_by(ABC) %>% 
    summarise(`Revenue in EUR` = sum(`Revenue in EUR`),
              `Weight in KG` = sum(`Weight in KG`),
              `Number of Materials` = length(`Materials`),
              `Avg of deliveries` = mean(`Deliveries`))
}

如果我对每个数据帧都这样做，它就会工作。但在这个循环中，情况并非如此。你能帮我在所有的数据帧上循环，执行分组，并收集在一个文件中吗？可能吗

非常感谢您的关注

编辑：要包含虚拟数据样本，请执行以下操作：

> dput(df1)

structure(list(Materials = c("11575358", "75378378", "21333333", 
"02469984", "05465478", "05645648"), Deliveries = c(8, 1, 12, 
5, 1, 1), ABC = c("C", "A", "C", "B", "C", "C"), `Revenue in EUR` = c(6179, 
1804802.46, 3768.04, 9e+05, 1597.5, 1544.55), `Weight in KG` = c(16.6, 
4.695625, 19, 9.14625, 2.74041666666667, 1.44208333333333)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

> dput(df2)

structure(list(Materials = c("48654798", "05465489", "04598496", 
"08789453", "01589494", "06459849", "54694985", "65498848"), 
    Deliveries = c(24, 6, 32, 3, 11, 30, 45, 2), ABC = c("C", 
    "B", "C", "B", "C", "A", "A", "C"), `Revenue in EUR` = c(5509, 
    506978, 3978.04, 7e+05, 1597.5, 1200258, 2406975, 4059), 
    `Weight in KG` = c(29.6, 19, 24, 9.14625, 2.74041666666667, 
    50, 60, 10)), row.names = c(NA, -8L), class = c("tbl_df", 
"tbl", "data.frame"))

原始excel是xlsx格式，有5000到15000行，大约20个功能，7个选项卡。有22个excel文件要循环。

好的，可能会有一些错误，因为我没有您的文件，但请尝试以下操作：

# first of, write down your files in xlsx. I use xlsx because I prefere it
#but you should already have them
xlsx::write.xlsx2(df1,"df1.xlsx")
xlsx::write.xlsx2(df1,"df2.xlsx")

library(tidyverse)
library(readxl)

# here you get all the xlsx files
f <- list.files(pattern="xlsx")  
f
[1] "df1.xlsx" "df2.xlsx"

# an empty list
listed <- list()
# loop that populate the empty list with your files
for (i in f) { 
  listed[[i]] <- read_excel(i, sheet = "Sheet1" # , skip = 7  
                            )
  print(paste0("read the", i," file")) # here it says what it's doing
}

 listed
$df1.xlsx
# A tibble: 6 x 6
  ...1  Materials Deliveries ABC   `Revenue in EUR` `Weight in KG`
  <chr> <chr>          <dbl> <chr>            <dbl>          <dbl>
1 1     11575358           8 C                6179           16.6 
2 2     75378378           1 A             1804802.           4.70
3 3     21333333          12 C                3768.          19   
4 4     02469984           5 B              900000            9.15
5 5     05465478           1 C                1598.           2.74
6 6     05645648           1 C                1545.           1.44

$df2.xlsx
# A tibble: 6 x 6
  ...1  Materials Deliveries ABC   `Revenue in EUR` `Weight in KG`
  <chr> <chr>          <dbl> <chr>            <dbl>          <dbl>
1 1     11575358           8 C                6179           16.6 
2 2     75378378           1 A             1804802.           4.70
3 3     21333333          12 C                3768.          19   
4 4     02469984           5 B              900000            9.15
5 5     05465478           1 C                1598.           2.74
6 6     05645648           1 C                1545.           1.44

# now lapply to each element of the list, the summary, creating a new list
list_result <- lapply(listed, function(x) x %>% 
                                          group_by(ABC) %>% 
                                          summarise(
                          `Revenue in EUR` = sum(`Revenue in EUR`),
                          `Weight in KG` = sum(`Weight in KG`),
                          `Number of Materials` = length(`Materials`),
                          `Avg of deliveries` = mean(`Deliveries`)))

# put the result in a data.frame  
do.call(rbind,list_result)
# A tibble: 6 x 5
  ABC   `Revenue in EUR` `Weight in KG` `Number of Materials` `Avg of deliveries`
* <chr>            <dbl>          <dbl>                 <int>               <dbl>
1 A             1804802.           4.70                     1                 1  
2 B              900000            9.15                     1                 5  
3 C               13089.          39.8                      4                 5.5
4 A             1804802.           4.70                     1                 1  
5 B              900000            9.15                     1                 5  
6 C               13089.          39.8                      4                 5.5

#首先，用xlsx写下你的文件。我使用xlsx是因为我更喜欢它
#但是你应该已经有了
xlsx:：write.xlsx2（df1，“df1.xlsx”）
xlsx:：write.xlsx2（df1，“df2.xlsx”）
图书馆（tidyverse）
图书馆（readxl）
#这里有所有的xlsx文件
f好的，可能会有一些错误，因为我没有您的文件，但请尝试以下操作：
# first of, write down your files in xlsx. I use xlsx because I prefere it
#but you should already have them
xlsx::write.xlsx2(df1,"df1.xlsx")
xlsx::write.xlsx2(df1,"df2.xlsx")

library(tidyverse)
library(readxl)

# here you get all the xlsx files
f <- list.files(pattern="xlsx")  
f
[1] "df1.xlsx" "df2.xlsx"

# an empty list
listed <- list()
# loop that populate the empty list with your files
for (i in f) { 
  listed[[i]] <- read_excel(i, sheet = "Sheet1" # , skip = 7  
                            )
  print(paste0("read the", i," file")) # here it says what it's doing
}

 listed
$df1.xlsx
# A tibble: 6 x 6
  ...1  Materials Deliveries ABC   `Revenue in EUR` `Weight in KG`
  <chr> <chr>          <dbl> <chr>            <dbl>          <dbl>
1 1     11575358           8 C                6179           16.6 
2 2     75378378           1 A             1804802.           4.70
3 3     21333333          12 C                3768.          19   
4 4     02469984           5 B              900000            9.15
5 5     05465478           1 C                1598.           2.74
6 6     05645648           1 C                1545.           1.44

$df2.xlsx
# A tibble: 6 x 6
  ...1  Materials Deliveries ABC   `Revenue in EUR` `Weight in KG`
  <chr> <chr>          <dbl> <chr>            <dbl>          <dbl>
1 1     11575358           8 C                6179           16.6 
2 2     75378378           1 A             1804802.           4.70
3 3     21333333          12 C                3768.          19   
4 4     02469984           5 B              900000            9.15
5 5     05465478           1 C                1598.           2.74
6 6     05645648           1 C                1545.           1.44

# now lapply to each element of the list, the summary, creating a new list
list_result <- lapply(listed, function(x) x %>% 
                                          group_by(ABC) %>% 
                                          summarise(
                          `Revenue in EUR` = sum(`Revenue in EUR`),
                          `Weight in KG` = sum(`Weight in KG`),
                          `Number of Materials` = length(`Materials`),
                          `Avg of deliveries` = mean(`Deliveries`)))

# put the result in a data.frame  
do.call(rbind,list_result)
# A tibble: 6 x 5
  ABC   `Revenue in EUR` `Weight in KG` `Number of Materials` `Avg of deliveries`
* <chr>            <dbl>          <dbl>                 <int>               <dbl>
1 A             1804802.           4.70                     1                 1  
2 B              900000            9.15                     1                 5  
3 C               13089.          39.8                      4                 5.5
4 A             1804802.           4.70                     1                 1  
5 B              900000            9.15                     1                 5  
6 C               13089.          39.8                      4                 5.5

#首先，用xlsx写下你的文件。我使用xlsx是因为我更喜欢它
#但是你应该已经有了
xlsx:：write.xlsx2（df1，“df1.xlsx”）
xlsx:：write.xlsx2（df1，“df2.xlsx”）
图书馆（tidyverse）
图书馆（readxl）
#这里有所有的xlsx文件
f我喜欢编写函数，所以我会这样做（尽管它会创建一个更稳定的环境，以便在需要时进行修改/调试）
#主要功能
主功能路径
main_函数。创建_输出（）->output
对于（列表中的文件）。文件（路径）{
如果（！str_detect（文件'csv'））{
下一个
}
读取excel（文件，sheet=“Deutsch”，skip=7）->数据
主函数。计算值（数据）->data.values
main_函数.append_值（文件、数据、数据.values、输出）->output
}
主函数导出（路径、输出、名称）
如果（进口）{
赋值（'values'，output，envir=.Globalenv）
}
}
#功能
主功能输出输出[nrow（输出），col]
返回（输出）
}
主函数。计算值%group\U by（ABC）%>%
汇总（`Revenue in EUR`=总和（`Revenue in EUR`，na.rm=TRUE），
..）->数据
返回（数据）
}
主函数。创建路径
返回（路径）
}
主函数。创建输出
返回（输出）
}

这将创建main_函数
，调用该函数时，它将遍历给定路径中列出的所有文件，并读取、处理、保存到output，该函数将保存在与给定名称相同的路径中。
如果将import
设置为TRUE，它还将保存输出
我喜欢编写函数，因此我会这样做（尽管它会创建一个更稳定的环境，以便在需要时进行修改/调试）
#主要功能
主功能路径
main_函数。创建_输出（）->output
对于（列表中的文件）。文件（路径）{
如果（！str_detect（文件'csv'））{
下一个
}
读取excel（文件，sheet=“Deutsch”，skip=7）->数据
主函数。计算值（数据）->data.values
main_函数.append_值（文件、数据、数据.values、输出）->output
}
主函数导出（路径、输出、名称）
如果（进口）{
赋值（'values'，output，envir=.Globalenv）
}
}
#功能
主功能输出输出[nrow（输出），col]
返回（输出）
}
主函数。计算值%group\U by（ABC）%>%
汇总（`Revenue in EUR`=总和（`Revenue in EUR`，na.rm=TRUE），
..）->数据
返回（数据）
}
主函数。创建路径
返回（路径）
}
主函数。创建输出
返回（输出）
}

这将创建main_函数
，调用该函数时，它将遍历给定路径中列出的所有文件，并读取、处理、保存到output，该函数将保存在与给定名称相同的路径中。
如果将import
设置为TRUE，它也将保存输出
您也可以适当地使用purrr:：map

map_dfr(list_df, ~(. %>% 
    group_by(ABC) %>% 
    summarise(`Revenue in EUR` = sum(`Revenue in EUR`),
              `Weight in KG` = sum(`Weight in KG`),
              `Number of Materials` = length(`Materials`),
              `Avg of deliveries` = mean(`Deliveries`))))

它将同时rbind
结果
即使将文件存储在myfiles
中，也可以使用以下语法

library(janitor)
map_dfr(myfiles, ~(.[-c(1:5),] %>% row_to_names(1) %>% 
                     group_by(ABC) %>% 
                     summarise(`Revenue in EUR` = sum(as.numeric(`Revenue in EUR`)),
                               `Weight in KG` = sum(as.numeric(`Weight in KG`)),
                               `Number of Materials` = length(`Materials`),
                               `Avg of deliveries` = mean(as.numeric(`Deliveries`)))
                   %>% ungroup()))


使用给定文件的结果
# A tibble: 6 x 5
  ABC   `Revenue in EUR` `Weight in KG` `Number of Materials` `Avg of deliveries`
  <chr>            <dbl>          <dbl>                 <int>               <dbl>
1 A             1804802.           4.70                     1                 1  
2 B              900000            9.15                     1                 5  
3 C               13089.          39.8                      4                 5.5
4 A             3607233          110                        2                37.5
5 B             1206978           28.1                      2                 4.5
6 C               15144.          66.3                      4                17.2

#一个tible:6 x 5
ABC`欧元收入'`千克重量'`材料数量'`平均交货量`
1A 1804802.4.70 1 1
2 B 900000 9.15 1 5
3 C 13089.39.8 4 5.5
4 A 3607233 110 2 37.5
5 B 1206978 28.1 2 4.5
6 C 15144.66.3 4 17.2
您也可以适当地使用purrr:：map

map_dfr(list_df, ~(. %>% 
    group_by(ABC) %>% 
    summarise(`Revenue in EUR` = sum(`Revenue in EUR`),
              `Weight in KG` = sum(`Weight in KG`),
              `Number of Materials` = length(`Materials`),
              `Avg of deliveries` = mean(`Deliveries`))))

它将同时rbind
结果
即使将文件存储在myfiles
中，也可以使用以下语法

library(janitor)
map_dfr(myfiles, ~(.[-c(1:5),] %>% row_to_names(1) %>% 
                     group_by(ABC) %>% 
                     summarise(`Revenue in EUR` = sum(as.numeric(`Revenue in EUR`)),
                               `Weight in KG` = sum(as.numeric(`Weight in KG`)),
                               `Number of Materials` = length(`Materials`),
                               `Avg of deliveries` = mean(as.numeric(`Deliveries`)))
                   %>% ungroup()))


使用给定文件的结果
# A tibble: 6 x 5
  ABC   `Revenue in EUR` `Weight in KG` `Number of Materials` `Avg of deliveries`
  <chr>            <dbl>          <dbl>                 <int>               <dbl>
1 A             1804802.           4.70                     1                 1  
2 B              900000            9.15                     1                 5  
3 C               13089.          39.8                      4                 5.5
4 A             3607233          110                        2                37.5
5 B             1206978           28.1                      2                 4.5
6 C               15144.          66.3                      4                17.2

#一个tible:6 x 5
ABC`欧元收入'`千克重量'`材料数量'`平均交货量`
1A 1804802.4.70 1 1
2 B 900000 9.15 1 5
3 C 13089.39.8 4 5.5
4 A 3607233 110 2 37.5
5 B 1206978 28.1 2 4.5
6 C