Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Regex 从R中的向量元素中提取数据_Regex_R_Csv_Vector - Fatal编程技术网

Regex 从R中的向量元素中提取数据

Regex 从R中的向量元素中提取数据,regex,r,csv,vector,Regex,R,Csv,Vector,我试图编写一个R脚本来解析csv文件单元格中的有序数字对。 以下是CSV文件的前几行: Test1, Test2, Test3 Label1, [(1, 2), (5, 6), (9, 10)], High Label2, [(5, 9), (6, 10)], Low Label3, [(0, 5)], High 请注意,第二列是运行Python脚本产生的元组列表。我编写了一个R脚本,使用rea

我试图编写一个R脚本来解析csv文件单元格中的有序数字对。 以下是CSV文件的前几行:

Test1,   Test2,                     Test3
Label1,  [(1, 2), (5, 6), (9, 10)], High
Label2,  [(5, 9), (6, 10)],         Low
Label3,  [(0, 5)],                  High
请注意,第二列是运行Python脚本产生的元组列表。我编写了一个R脚本,使用read.csv将csv文件作为表格读取,然后从每列创建向量。然后,我希望它从第2列的每个向量元素/单元中读取每个有序对(元组),并将它们用于绘制矩形的开始和结束x值。但我无法从vector元素解析单个有序对(元组)。无论我做什么,R仍然将向量元素视为一个对象,而不是数组或列表

以下是R代码:

table1 <- read.csv("data.csv",header=TRUE,sep=",")
val1 <- paste(table1[,1])
val2 <- paste(table1[,2]) # First data row is [(1, 2), (5, 6), (9, 10)]
val3 <- paste(table1[,3])
nrows = length(val1)
for (i in 1:nrows) {
    rects <- val2[i]  # rects <- [(1, 2), (5, 6), (9, 10)]
    nval <- length(rects)  # Want nval to be 3
    if (nval > 0) {
        for (j in 1:nval) {
            bounds <- rects[j]  # Want bounds to be (1, 2), then (5, 6), then (9, 10)
            start <- bounds[1]  # Want start to be 1, 5, and then 9
            stop <- bounds[2]  # Want stop to be 2, 6, and then 10
            w <- stop - start # w should be 1
            vpp <- start + w/2 # vpp will be 1.5, 5.5, and then 9.5
            pushViewport(vp)
            grid.rect(x=0.5, y=0.5, width=w, height=0.5, gp=gpar(fill="violet"))
            upViewport()
         }
    }
}

table1我不确定我是否100%理解您想要的最终输出是什么,但这里有一种方法可以以这样的数据帧结束,开始和结束x值分开:

   Test1                 Test3 X1 X2
1 Label1                  High  1  2
2 Label1                  High  5  6
3 Label1                  High  9 10
4 Label2                   Low  5  9
5 Label2                   Low  6 10
6 Label3                  High  0  5
我创建了数据框,但必须手动替换粘贴文本中的分号

df <- read.table(text = "Test1;   Test2;                     Test3
Label1;  [(1, 2), (5, 6), (9, 10)]; High
Label2;  [(5, 9), (6, 10)];         Low
Label3;  [(0, 5)];                 High", sep = ";", header = TRUE, stringsAsFactors = FALSE)
这将创建如上所示的数据框,并允许您按照如下方式绘制矩形(将随机y值添加到数据框中):

library('dplyr')
库('ggplot2')
种子(10)
df4%
do(变异(,ymin=样本(10,nrow()))%>%#用于绘图的随机y值
突变(ymax=ymin+1)
ggplot(df4,aes(xmin=X1,xmax=X2,ymin=ymin,ymax=ymax))+
geom_rect()
这将看起来像:

谢谢您的回答。我今天要试一试。
splits <- strsplit(as.character(df$Test2), "), ")

# split up list of tuples
df2 <- data.frame(Test1 = rep(df$Test1, lapply(splits, length)), 
           Test3 = rep(df$Test3, lapply(splits, length)),
           Test2 = unlist(splits), stringsAsFactors = FALSE)

# split tuples into two columns
df3 <- cbind(df2[, c("Test1", "Test3")], 
             data.frame(do.call("rbind", strsplit(df2$Test2, ",", fixed = TRUE))))

# remove parens etc. and convert to numeric
df3$X1 <- as.numeric(gsub("[^[:digit:]]", "", df3$X1))
df3$X2 <- as.numeric(gsub("[^[:digit:]]", "", df3$X2))
library('dplyr')
library('ggplot2')
set.seed(10)
df4 <- df3 %>%
  do(mutate(., ymin = sample(10, nrow(.)))) %>% # random y values for plotting
  mutate(ymax = ymin + 1)

ggplot(df4, aes(xmin = X1, xmax = X2, ymin = ymin, ymax = ymax)) +
  geom_rect()