R for循环从文件中提取信息并将其添加到TIBLE中?
我不喜欢tidyverse,所以如果这是一个简单的问题,请原谅我。我有一堆文件,其中包含我需要提取的数据,并将其添加到我创建的tibble中的不同列中 我希望行名称以我创建的文件ID开头:R for循环从文件中提取信息并将其添加到TIBLE中?,r,tidyverse,R,Tidyverse,我不喜欢tidyverse,所以如果这是一个简单的问题,请原谅我。我有一堆文件,其中包含我需要提取的数据,并将其添加到我创建的tibble中的不同列中 我希望行名称以我创建的文件ID开头: filelist <- list.fileS(pattern=".txt") # Gives me the filenames in current directory. # The filenames are something like AA1230.report.txt for
filelist <- list.fileS(pattern=".txt") # Gives me the filenames in current directory.
# The filenames are something like AA1230.report.txt for example
file_ID <- trimws(filelist, whitespace="\\..*") # Gives me the ID which is before the "report.txt"
metadata <- as_tibble(file_ID[1:181]) # create dataframe with IDs as row names for 180 files.
filelist您还可以执行以下操作:
library(tidyverse)
filelist <- list.files(pattern=".txt")
nms <- c("Percentage", "Num_reads_root", "Num_reads_taxon", "Rank", "NCBI_ID", "Name")
set_names(filelist,filelist) %>%
map_dfr(read_table, col_names = nms, .id = 'file_ID') %>%
filter(Rank == 'D') %>%
select(file_ID, Name, Num_reads_root) %>%
pivot_wider(id_cols = file_ID, names_from = Name, values_from = Num_reads_root) %>%
mutate(file_ID = str_remove(file_ID, '.txt'))
库(tidyverse)
文件列表%
筛选器(秩='D')%>%
选择(文件ID、名称、数量读取根)%>%
pivot\u wide(id\u cols=file\u id,names\u from=Name,values\u from=Num\u reads\u root)%>%
变异(file_ID=str_remove(file_ID,'.txt'))
我发现有时使用for循环是很好的,因为它可以保存整个过程的进度,以防遇到错误。然后您可以找到问题文件并对其进行调试,或者使用try()
但抛出警告()
库(tidyverse)
文件列表我尝试了此操作,但得到了以下错误列规范---------------------------------------------------------------------cols(百分比=col\u character(),Num\u reads\u root=col\u character(),Num\u reads\u taxon=col\u character(),Rank=col\u character(),NCBI\u ID=col\u character()),Name=col\u character())错误:无法组合'Num\u reads\u taxon'和'Num\u reads\u taxon'。运行`rlang::last_error()`查看错误发生的位置。
@CuriousDude,因此您似乎有不同的列类型。将在中编辑它minute@CuriousDude将read\u table
更改为read.table
并将col\u names
更改为col.names
我进行了更改,但出现以下错误:扫描错误(file=file,what=what,sep=sep,quote=quote,dec=dec,:第1行没有7个元素
@CuriousDude能否尝试运行map\u-dfr(filelist,read.table,col.names=nms,sep='\t',header=FALSE,.id='file\u-id')
这会导致错误吗?
Percentage Num_Reads_Root Num_Reads_Taxon Rank NCBI_ID Name
<dbl> <int> <int> <fct> <int> <fct>
1 75.9 60533 28 D 2 Bacteria
2 0.48 386 0 D 2759 Eukaryota
3 0.01 4 0 D 2157 Archaea
4 0.02 19 0 D 10239 Viruses
> metadata
value Bacteria_Counts Eukaryota_Counts Viruses_Counts Archaea_Counts
<chr> <int> <int> <int> <int>
1 AA1230 60533 386 19 4
2 AB0566
3 AA1231
4 AB0567
5 BC1148
6 AW0001
7 AW0002
8 BB1121
9 BC0001
10 BC0002
....with 171 more rows
for (files in file.list()) {
>> get_domains <<
}
library(tidyverse)
filelist <- list.files(pattern=".txt")
nms <- c("Percentage", "Num_reads_root", "Num_reads_taxon", "Rank", "NCBI_ID", "Name")
set_names(filelist,filelist) %>%
map_dfr(read_table, col_names = nms, .id = 'file_ID') %>%
filter(Rank == 'D') %>%
select(file_ID, Name, Num_reads_root) %>%
pivot_wider(id_cols = file_ID, names_from = Name, values_from = Num_reads_root) %>%
mutate(file_ID = str_remove(file_ID, '.txt'))
library(tidyverse)
filelist <- list.files(pattern=".txt") #list files
tmp_list <- list()
for (i in seq_along(filelist)) {
my_table <- read_tsv(filelist[i]) %>% # It looks like your files are all .tsv's
rename(Percentage=V1, Num_reads_root=V2, Num_reads_taxon=V3, Rank=V4, NCBI_ID=V5, Name=V6) %>%
filter(Rank=="D") %>%
mutate(file_ID <- trimws(filelist[i], whitespace="\\..*")) %>%
select(file_ID, everything())
tmp_list[[i]] <- my_table
}
out <- bind_rows(tmp_list)
out