Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/73.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用在数据帧中循环的动态查询结果填充R中的列_R_Dynamic_Web_Dataframe - Fatal编程技术网

使用在数据帧中循环的动态查询结果填充R中的列

使用在数据帧中循环的动态查询结果填充R中的列,r,dynamic,web,dataframe,R,Dynamic,Web,Dataframe,我有一个数据帧,df: Chrom Position Gene.Sym Ref Variant Lbase Rbase 1 chr1 888639 NOC2L T C 888638 888640 2 chr1 889158 NOC2L G C 889157 889159 3 chr1 889159 NOC2L A C 889158 889160 4 chr1

我有一个数据帧,df:

  Chrom Position Gene.Sym Ref Variant   Lbase   Rbase
1  chr1   888639    NOC2L     T         C  888638  888640
2  chr1   889158    NOC2L     G         C  889157  889159
3  chr1   889159    NOC2L     A         C  889158  889160
4  chr1   982941     AGRN     T         C  982940  982942
5  chr1  1888193 KIAA1751     C         A 1888192 1888194
6  chr1  3319632   PRDM16     G         A 3319631 3319633
我想用readLines的[6]结果填充一个新列df$triplet,该结果应用于一个查询:示例:

> readLines('http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=chr20:1888192,1888194')
[1] "<?xml version=\"1.0\" standalone=\"no\"?>"                                  
[2] "<!DOCTYPE DASDNA SYSTEM \"http://www.biodas.org/dtd/dasdna.dtd\">"          
[3] "<DASDNA>"                                                                   
[4] "<SEQUENCE id=\"chr20\" start=\"1888192\" stop=\"1888194\" version=\"1.00\">"
[5] "<DNA length=\"3\">"                                                         
[6] "cct"                                                                        
[7] "</DNA>"                                                                     
[8] "</SEQUENCE>"                                                                
[9] "</DASDNA>"
除了我想循环使用df$Chrom、df$Lbase和df$Rbase中的值,以填充整个列。我知道会是这样的,但我太糊涂了,无法准确地理解:

baseurl = 'http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment='
myurl = paste(baseurl, trip$Chrom, ":", trip$Lbase, ",", trip$Rbase, sep='')
x = readLines(myurl)

您可以使用
sapply
readLines
应用于在
myurl
中组装的URL向量,例如,将输出添加回数据帧:

df$dna <- sapply(myurl, function(url) readLines(url)[6])

df$dna惯用的方法是解析xml:

f <- function(i) {
  library(XML)
  library(stringr)
  x <- trip[i,]
  segment <- paste0(x$Chrom,":",x$Lbase,",",x$Rbase)
  url     <- paste0("http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=",segment)
  doc     <- xmlInternalTreeParse(url)
  return(str_extract(xmlValue(doc["//DNA"][[1]]),"[a-z]+"))
}
trip$triplet=sapply(1:nrow(trip),f)
trip
#   Chrom Position Gene.Sym Ref Variant   Lbase   Rbase triplet
# 1  chr1   888639    NOC2L   T       C  888638  888640     ctt
# 2  chr1   889158    NOC2L   G       C  889157  889159     cga
# 3  chr1   889159    NOC2L   A       C  889158  889160     gaa
# 4  chr1   982941     AGRN   T       C  982940  982942     ctc
# 5  chr1  1888193 KIAA1751   C       A 1888192 1888194     ccg
# 6  chr1  3319632   PRDM16   G       A 3319631 3319633     tgc
f谢谢!这很好用,正是我想要的。
f <- function(i) {
  library(XML)
  library(stringr)
  x <- trip[i,]
  segment <- paste0(x$Chrom,":",x$Lbase,",",x$Rbase)
  url     <- paste0("http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=",segment)
  doc     <- xmlInternalTreeParse(url)
  return(str_extract(xmlValue(doc["//DNA"][[1]]),"[a-z]+"))
}
trip$triplet=sapply(1:nrow(trip),f)
trip
#   Chrom Position Gene.Sym Ref Variant   Lbase   Rbase triplet
# 1  chr1   888639    NOC2L   T       C  888638  888640     ctt
# 2  chr1   889158    NOC2L   G       C  889157  889159     cga
# 3  chr1   889159    NOC2L   A       C  889158  889160     gaa
# 4  chr1   982941     AGRN   T       C  982940  982942     ctc
# 5  chr1  1888193 KIAA1751   C       A 1888192 1888194     ccg
# 6  chr1  3319632   PRDM16   G       A 3319631 3319633     tgc