R 如何按行和列查找>;列名?
我正在考虑如何在下面的文件R 如何按行和列查找>;列名?,r,dplyr,R,Dplyr,我正在考虑如何在下面的文件DS.csv中按大学名称(第一行:A,…,F)、字段名称(第一列:Acute,…,En)和/或毕业时间(time)查找时间数据。 我正在考虑dplyr方法,但无法将数字ID查找(线程答案)扩展到三个变量的查找。 挑战 如何按第一行查找?也许,类似于$1==“A”的东西 如何将大学查找扩展到两列?Pseudocode$1==“A”是关于第二列和第三列,…,$1==“F”是关于最后两列 按3个查找标准进行查找:第一行(无标题)、第一列(标题字段和标题时间)。伪码 times
DS.csv
中按大学名称(第一行:A
,…,F
)、字段名称(第一列:Acute
,…,En
)和/或毕业时间(time
)查找时间数据。
我正在考虑dplyr
方法,但无法将数字ID查找(线程答案)扩展到三个变量的查找。
挑战
$1==“A”
的东西李>
$1==“A”
是关于第二列和第三列,…,$1==“F”
是关于最后两列李>
字段
和标题时间
)。伪码
times <- getTimes($1 == "A", Field == "Ane", by = "desc(time)")
并采用直线表格格式,以便
,A,,B,,C,,D,,E,,F,
Field,time,T,time,T,time,T,time,T,time,T,time,T
Acute,0,0,8.3,1,7.5,1,8.6,2,0,0,8.3,4
Ane,9,120,7.7,26,7.9,43,7.8,77,7.9,60,8.2,326
En,15.6,2,12.9,1,0,0,0,0,14.3,1,14.6,4
Fo,9.2,2,0,0,5.4,1,0,0,0,0,7.9,3
Field,time,T,Experiment
Acut,0,0,A
An,9,120,A
En,15.6,2,A
Fo,9.2,2,A
Acute,8.3,1,B
An,7.7,26,B
En,12.9,1,B
Fo,0,0,B
Acute,7.5,1,C
An,7.9,43,C
En,0,0,C
Fo,5.4,1,C
Acute,8.6,2,D
An,7.8,77,D
En,0,0,D
Fo,0,0,D
Acute,0,0,E
An,7.9,60,E
En,14.3,1,E
Fo,0,0,E
Acute,8.3,4,F
An,8.2,326,F
En,14.6,4,F
Fo,7.9,3,F
伪码
library('dplyr')
ow <- options("warn")
DF <- read.csv("/home/masi/CSV/DS.csv", header = T)
# Lookup by first row, Lookup by Field, lookup by Field's first column?
times <- getTimes($1 == "A", Field == "Ane", by = "desc(time)")
R:3.3.3(2017-03-06)操作系统:Debian 8.7
硬件:华硕Zenbook UX303UA更改交叉表
,A,,B,,C,,D,,E,,F,
Field,time,T,time,T,time,T,time,T,time,T,time,T
Acute,0,0,8.3,1,7.5,1,8.6,2,0,0,8.3,4
Ane,9,120,7.7,26,7.9,43,7.8,77,7.9,60,8.2,326
En,15.6,2,12.9,1,0,0,0,0,14.3,1,14.6,4
Fo,9.2,2,0,0,5.4,1,0,0,0,0,7.9,3
转换为直接的数据格式
Field,time,T,Experiment
Acut,0,0,A
An,9,120,A
En,15.6,2,A
Fo,9.2,2,A
Acute,8.3,1,B
An,7.7,26,B
En,12.9,1,B
Fo,0,0,B
Acute,7.5,1,C
An,7.9,43,C
En,0,0,C
Fo,5.4,1,C
Acute,8.6,2,D
An,7.8,77,D
En,0,0,D
Fo,0,0,D
Acute,0,0,E
An,7.9,60,E
En,14.3,1,E
Fo,0,0,E
Acute,8.3,4,F
An,8.2,326,F
En,14.6,4,F
Fo,7.9,3,F
在这里,我使用了Vim.csv插件和可视块模式
选择的多种方法
在将数据整理成易于格式化的直接表(而不是交叉表)后,这很容易以多种方式实现,我更喜欢SQL。我在下面演示了一个SQLDDF包,它对于大数据非常低效,但它很小,所以可以工作
另外,我将参考data.table
包中的非常高效的fread来读取文件,而不是非常低效的内置函数,例如read.csv
SQLDF
>库(data.table);
>sqldf(“从where Experiment='a'和Field='An'中选择时间”)
时间
1 9
不带sqldf的其他
> library(data.table);
> a<-fread("~/DS_straight_table.csv");
> a[Experiment=='A' & Field=='An']
Field time T Experiment
1: An 9 120 A
>库(data.table);
>a[实验=='a'&字段=='An']
田间时间T试验
1:AN9120 A
更改交叉表
,A,,B,,C,,D,,E,,F,
Field,time,T,time,T,time,T,time,T,time,T,time,T
Acute,0,0,8.3,1,7.5,1,8.6,2,0,0,8.3,4
Ane,9,120,7.7,26,7.9,43,7.8,77,7.9,60,8.2,326
En,15.6,2,12.9,1,0,0,0,0,14.3,1,14.6,4
Fo,9.2,2,0,0,5.4,1,0,0,0,0,7.9,3
转换为直接的数据格式
Field,time,T,Experiment
Acut,0,0,A
An,9,120,A
En,15.6,2,A
Fo,9.2,2,A
Acute,8.3,1,B
An,7.7,26,B
En,12.9,1,B
Fo,0,0,B
Acute,7.5,1,C
An,7.9,43,C
En,0,0,C
Fo,5.4,1,C
Acute,8.6,2,D
An,7.8,77,D
En,0,0,D
Fo,0,0,D
Acute,0,0,E
An,7.9,60,E
En,14.3,1,E
Fo,0,0,E
Acute,8.3,4,F
An,8.2,326,F
En,14.6,4,F
Fo,7.9,3,F
在这里,我使用了Vim.csv插件和可视块模式
选择的多种方法
在将数据整理成易于格式化的直接表(而不是交叉表)后,这很容易以多种方式实现,我更喜欢SQL。我在下面演示了一个SQLDDF包,它对于大数据非常低效,但它很小,所以可以工作
另外,我将参考data.table
包中的非常高效的fread来读取文件,而不是非常低效的内置函数,例如read.csv
SQLDF
>库(data.table);
>sqldf(“从where Experiment='a'和Field='An'中选择时间”)
时间
1 9
不带sqldf的其他
> library(data.table);
> a<-fread("~/DS_straight_table.csv");
> a[Experiment=='A' & Field=='An']
Field time T Experiment
1: An 9 120 A
>库(data.table);
>a[实验=='a'&字段=='An']
田间时间T试验
1:AN9120 A
使用“高”(直表)格式和库dplyr。您的数据每个字段只有一个值,即“实验”
library(dplyr)
## this is the more general result
df %>%
group_by(Field, Experiment) %>%
top_n(1, wt = -time)
## example function
getTimes<- function(data, field, experiment) {
data %>%
filter(Field == field, Experiment == experiment) %>%
top_n(1, wt = -time)
}
getTimes(df, 'An', 'A')
# Field time T Experiment
# 1 An 9 120 A
库(dplyr)
##这是更普遍的结果
df%>%
分组依据(现场、实验)%>%
top_n(1,wt=-时间)
##示例函数
获取时间%
过滤器(字段==字段,实验==实验)%>%
top_n(1,wt=-时间)
}
getTimes(df,'An','A')
#田间时间T试验
#1安9 120 A
使用“高”(直表)格式和库dplyr。您的数据每个字段只有一个值,即“实验”
library(dplyr)
## this is the more general result
df %>%
group_by(Field, Experiment) %>%
top_n(1, wt = -time)
## example function
getTimes<- function(data, field, experiment) {
data %>%
filter(Field == field, Experiment == experiment) %>%
top_n(1, wt = -time)
}
getTimes(df, 'An', 'A')
# Field time T Experiment
# 1 An 9 120 A
库(dplyr)
##这是更普遍的结果
df%>%
分组依据(现场、实验)%>%
top_n(1,wt=-时间)
##示例函数
获取时间%
过滤器(字段==字段,实验==实验)%>%
top_n(1,wt=-时间)
}
getTimes(df,'An','A')
#田间时间T试验
#1安9 120 A
以初始原始数据为起点:
# read the data & skip 1st & 2nd line which contain only header information
DF <- read.csv(text=",A,,B,,C,,D,,E,,F,
Field,time,T,time,T,time,T,time,T,time,T,time,T
Acute,0,0,8.3,1,7.5,1,8.6,2,0,0,8.3,4
Ane,9,120,7.7,26,7.9,43,7.8,77,7.9,60,8.2,326
En,15.6,2,12.9,1,0,0,0,0,14.3,1,14.6,4
Fo,9.2,2,0,0,5.4,1,0,0,0,0,7.9,3", header=FALSE, stringsAsFactors=FALSE, skip=2)
# read the first two lines which contain the header information
headers <- read.csv(text=",A,,B,,C,,D,,E,,F,
Field,time,T,time,T,time,T,time,T,time,T,time,T
Acute,0,0,8.3,1,7.5,1,8.6,2,0,0,8.3,4
Ane,9,120,7.7,26,7.9,43,7.8,77,7.9,60,8.2,326
En,15.6,2,12.9,1,0,0,0,0,14.3,1,14.6,4
Fo,9.2,2,0,0,5.4,1,0,0,0,0,7.9,3", header=FALSE, stringsAsFactors=FALSE, nrow=2)
# extract the university names for the 'headers' data.frame
universities <- unlist(headers[1,])
universities <- universities[universities != '']
# create column names from the 'headers' data.frame
vec <- headers[2,][headers[2,] == 'T']
headers[2,][headers[2,] == 'T'] <- paste0(vec, seq_along(vec))
names(DF) <- paste0(headers[2,],headers[1,])
由于最好将数据转换为长格式:
library(data.table)
DT <- melt(setDT(DF), id = 1,
measure.vars = patterns('^time','^T'),
variable.name = 'university',
value.name = c('time','t')
)[, university := universities[university]][]
现在,您可以选择所需的信息:
DT[university == 'A' & Field == 'Ane']
其中:
Field university time t
1: Ane A 9 120
过滤数据的几个
dplyr
示例:
library(dplyr)
DT %>%
filter(Field=="En" & t > 1)
给出:
Field university time t
1 En A 15.6 2
2 En F 14.6 4
Field university time t
1 Ane A 9.0 120
2 Acute F 8.3 4
3 Ane F 8.2 326
4 Ane C 7.9 43
5 Ane E 7.9 60
6 Ane D 7.8 77
7 Ane B 7.7 26
或:
以初始原始数据为起点:
# read the data & skip 1st & 2nd line which contain only header information
DF <- read.csv(text=",A,,B,,C,,D,,E,,F,
Field,time,T,time,T,time,T,time,T,time,T,time,T
Acute,0,0,8.3,1,7.5,1,8.6,2,0,0,8.3,4
Ane,9,120,7.7,26,7.9,43,7.8,77,7.9,60,8.2,326
En,15.6,2,12.9,1,0,0,0,0,14.3,1,14.6,4
Fo,9.2,2,0,0,5.4,1,0,0,0,0,7.9,3", header=FALSE, stringsAsFactors=FALSE, skip=2)
# read the first two lines which contain the header information
headers <- read.csv(text=",A,,B,,C,,D,,E,,F,
Field,time,T,time,T,time,T,time,T,time,T,time,T
Acute,0,0,8.3,1,7.5,1,8.6,2,0,0,8.3,4
Ane,9,120,7.7,26,7.9,43,7.8,77,7.9,60,8.2,326
En,15.6,2,12.9,1,0,0,0,0,14.3,1,14.6,4
Fo,9.2,2,0,0,5.4,1,0,0,0,0,7.9,3", header=FALSE, stringsAsFactors=FALSE, nrow=2)
# extract the university names for the 'headers' data.frame
universities <- unlist(headers[1,])
universities <- universities[universities != '']
# create column names from the 'headers' data.frame
vec <- headers[2,][headers[2,] == 'T']
headers[2,][headers[2,] == 'T'] <- paste0(vec, seq_along(vec))
names(DF) <- paste0(headers[2,],headers[1,])
由于最好将数据转换为长格式:
library(data.table)
DT <- melt(setDT(DF), id = 1,
measure.vars = patterns('^time','^T'),
variable.name = 'university',
value.name = c('time','t')
)[, university := universities[university]][]
现在,您可以选择所需的信息:
DT[university == 'A' & Field == 'Ane']
其中:
Field university time t
1: Ane A 9 120
过滤数据的几个
dplyr
示例:
library(dplyr)
DT %>%
filter(Field=="En" & t > 1)
给出:
Field university time t
1 En A 15.6 2
2 En F 14.6 4
Field university time t
1 Ane A 9.0 120
2 Acute F 8.3 4
3 Ane F 8.2 326
4 Ane C 7.9 43
5 Ane E 7.9 60
6 Ane D 7.8 77
7 Ane B 7.7 26
或:
@我已经回滚了你的编辑。您应该使用OP发布的数据,而不是插入您自己的数据表示。但是您知道如何将原始数据转换为已编辑的数据吗?如果没有,您不应该将其包含在问题imo中。然后由回答问题的人获得所需的格式。@hh我已回滚您的编辑。您应该使用OP发布的数据,而不是插入您自己的数据表示。但是您知道如何将原始数据转换为已编辑的数据吗?如果没有,你不应该把它包括在你的问题中。然后是回答的人得到想要的格式。。。正如@hhi am所给出的那样,我认为应该按
时间
应用降序,比如DF[time<14&t>3&by=“desc(time)”]
。我无法让它与数据一起工作。表包,所以我认为dplyr
应该可以工作。你认为呢?在上面的中,top\n
按照-时间的顺序选择第一行,减号“-”使其下降,因此选择最晚的时间。如果您只想订购,则可以使用arrange(time)
或arrange(-time)
。更正确的方法是使用arrange(desc(time))
,因为desc既适用于数字也适用于字符。@epi99 Yes。您能否将排序集成到函数的参数getTimes
?这应该是可能的。EtcgetTimes(df,Field='An',T='A',desc(time))
--您还可以严格传递参数,如我的示例所示。我认为这比说第二个参数是fo要好