使用在数据帧中循环的动态查询结果填充R中的列
我有一个数据帧,df:使用在数据帧中循环的动态查询结果填充R中的列,r,dynamic,web,dataframe,R,Dynamic,Web,Dataframe,我有一个数据帧,df: Chrom Position Gene.Sym Ref Variant Lbase Rbase 1 chr1 888639 NOC2L T C 888638 888640 2 chr1 889158 NOC2L G C 889157 889159 3 chr1 889159 NOC2L A C 889158 889160 4 chr1
Chrom Position Gene.Sym Ref Variant Lbase Rbase
1 chr1 888639 NOC2L T C 888638 888640
2 chr1 889158 NOC2L G C 889157 889159
3 chr1 889159 NOC2L A C 889158 889160
4 chr1 982941 AGRN T C 982940 982942
5 chr1 1888193 KIAA1751 C A 1888192 1888194
6 chr1 3319632 PRDM16 G A 3319631 3319633
我想用readLines的[6]结果填充一个新列df$triplet,该结果应用于一个查询:示例:
> readLines('http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=chr20:1888192,1888194')
[1] "<?xml version=\"1.0\" standalone=\"no\"?>"
[2] "<!DOCTYPE DASDNA SYSTEM \"http://www.biodas.org/dtd/dasdna.dtd\">"
[3] "<DASDNA>"
[4] "<SEQUENCE id=\"chr20\" start=\"1888192\" stop=\"1888194\" version=\"1.00\">"
[5] "<DNA length=\"3\">"
[6] "cct"
[7] "</DNA>"
[8] "</SEQUENCE>"
[9] "</DASDNA>"
除了我想循环使用df$Chrom、df$Lbase和df$Rbase中的值,以填充整个列。我知道会是这样的,但我太糊涂了,无法准确地理解:
baseurl = 'http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment='
myurl = paste(baseurl, trip$Chrom, ":", trip$Lbase, ",", trip$Rbase, sep='')
x = readLines(myurl)
您可以使用
sapply
将readLines
应用于在myurl
中组装的URL向量,例如,将输出添加回数据帧:
df$dna <- sapply(myurl, function(url) readLines(url)[6])
df$dna惯用的方法是解析xml:
f <- function(i) {
library(XML)
library(stringr)
x <- trip[i,]
segment <- paste0(x$Chrom,":",x$Lbase,",",x$Rbase)
url <- paste0("http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=",segment)
doc <- xmlInternalTreeParse(url)
return(str_extract(xmlValue(doc["//DNA"][[1]]),"[a-z]+"))
}
trip$triplet=sapply(1:nrow(trip),f)
trip
# Chrom Position Gene.Sym Ref Variant Lbase Rbase triplet
# 1 chr1 888639 NOC2L T C 888638 888640 ctt
# 2 chr1 889158 NOC2L G C 889157 889159 cga
# 3 chr1 889159 NOC2L A C 889158 889160 gaa
# 4 chr1 982941 AGRN T C 982940 982942 ctc
# 5 chr1 1888193 KIAA1751 C A 1888192 1888194 ccg
# 6 chr1 3319632 PRDM16 G A 3319631 3319633 tgc
f谢谢!这很好用,正是我想要的。
f <- function(i) {
library(XML)
library(stringr)
x <- trip[i,]
segment <- paste0(x$Chrom,":",x$Lbase,",",x$Rbase)
url <- paste0("http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=",segment)
doc <- xmlInternalTreeParse(url)
return(str_extract(xmlValue(doc["//DNA"][[1]]),"[a-z]+"))
}
trip$triplet=sapply(1:nrow(trip),f)
trip
# Chrom Position Gene.Sym Ref Variant Lbase Rbase triplet
# 1 chr1 888639 NOC2L T C 888638 888640 ctt
# 2 chr1 889158 NOC2L G C 889157 889159 cga
# 3 chr1 889159 NOC2L A C 889158 889160 gaa
# 4 chr1 982941 AGRN T C 982940 982942 ctc
# 5 chr1 1888193 KIAA1751 C A 1888192 1888194 ccg
# 6 chr1 3319632 PRDM16 G A 3319631 3319633 tgc