R 如何在不提供样本表的情况下加载地质甲基化（450k）数据集？_R_Format_Bioconductor_Genetics_Genome

R 如何在不提供样本表的情况下加载地质甲基化（450k）数据集？

R 如何在不提供样本表的情况下加载地质甲基化（450k）数据集？,r,format,bioconductor,genetics,genome,R,Format,Bioconductor,Genetics,Genome,我从基因表达综合数据库（GEO）下载了一些Illumina 450k甲基化数据集 R Bioconductor封装minfi和ChAMP似乎需要一个称为“样本表”的东西 GEO上的大多数TAR文件似乎不包含这样的示例表-它们只包含.idat文件善良的灵魂会提供一些建议吗？我想知道如何在没有样品表的情况下运行ChAMP/Minfi管道；否则，是否有任何方法从.idat文件生成样本表谢谢如果要从目录中读取中的所有idat文件，可以使用： my_450k <- read.450k.exp(

我从基因表达综合数据库（GEO）下载了一些Illumina 450k甲基化数据集

R Bioconductor封装minfi和ChAMP似乎需要一个称为“样本表”的东西

GEO上的大多数TAR文件似乎不包含这样的示例表-它们只包含.idat文件

善良的灵魂会提供一些建议吗？我想知道如何在没有样品表的情况下运行ChAMP/Minfi管道；否则，是否有任何方法从.idat文件生成样本表

谢谢

如果要从目录中读取中的所有idat文件，可以使用：

my_450k <- read.450k.exp(base = "path/to/directory", recursive = TRUE)

my_450k我在一个GEO项目中遇到了类似的问题。我所做的是下载所有的.idat文件并将它们放在它们自己的文件夹中。然后我使用这段代码解析.idat文件名并创建一个示例表
它将解析文件名，如GSM1855609_9020331147_R02C02_Grn.idat
，并将所有内容存储在.csv文件中。然后，您可以将.csv文件读入R，添加标准化的列名（c（“示例名称”、“Sentrix\u ID”、“Sentrix\u位置”）
），这是像logger
这样的函数想要看到的，您就可以开始了
希望这有帮助
#!/usr/bin/env python
# Import the OS library
import os

# Get your Current Working Directory
cwd = os.getcwd()

# Get a list of all of the files (and directories, if there are any) in your directory.
# This will be a list of strings.
filenames = os.listdir(cwd)

# Split each one into the chunks that were separated by underscores ("_") and then keep the first three for each name.
# This will be a list of lists.
chunked_names = [filename.split("_")[0:3] for filename in filenames]

# For each name, rejoin the three chunks with commas
# We're back to having a list of strings.
csv_lines = [",".join(chunks) for chunks in chunked_names]
# Join all of those strings with the newline character to get just a long string.
contents = "\n".join(csv_lines)

# Print this string to standard output so that it can be redirected to a file.

print(contents)

较新的methyprep
python包具有下载地理数据集的功能。它适用于大多数系列，尽管其中许多系列的档案中没有相同类型的文件
methyprep
还有一个create sample\u sheet
命令行选项，如果您需要将其输入minfi
。像这样：
 python -m methylprep -v sample_sheet -d ~/GSE133062/GSE133062 --create

（其中-d指定解压缩的.idat文件的路径）
更多示例如下：
这是我获取样本表并将IDAT读入RGSet对象的方式：
#使用pacman安装和加载软件包
如果（！require（“pacman”））安装.packages（“pacman”）
pacman:：p_load（“地理查询”、“minfi”）
#增加文件下载超时
选项（超时=600）
#下载地理对象
gse完全披露：我是methylprep的维护者。该软件包的一大重点是简化NIH地理数据存储库的使用。