R:读第一列,然后读其余的
我有一个文件,里面有代码和描述。代码始终是一个短(3-6个字符)的字母字符串,与以下描述之间用空格分隔。描述通常是几个单词(也有空格)。 以下是一个例子:R:读第一列,然后读其余的,r,R,我有一个文件,里面有代码和描述。代码始终是一个短(3-6个字符)的字母字符串,与以下描述之间用空格分隔。描述通常是几个单词(也有空格)。 以下是一个例子: LIISS License Issued LIMOD License Modified LIPASS License Assigned (Partial Assignment) LIPND License Assigned (Partition/Disaggregation) LIPPND License Issued from a Part
LIISS License Issued
LIMOD License Modified
LIPASS License Assigned (Partial Assignment)
LIPND License Assigned (Partition/Disaggregation)
LIPPND License Issued from a Partial/P&D Assignment
LIPUR License Purged
LIREIN License Reinstated
LIREN License Renewed
我想把它理解为一个两列数据框,第一列是代码,第二列是描述。我怎样才能用R做这个 我们可以使用
readLines
读取此文件,然后使用sub
创建两列data.frame
#read the lines with readLines
lines <- readLines('pavel.txt')
#match one or more spaces followed by one or more characters
#replace with `''` to extract the non-space characters at the beginning.
str1 <- sub('\\s+.*', '', lines)
#match non space characters from the beginning (`^[^ ]+`) followed by space
#replace with `''` to extract the characters that follow after the space.
str2 <- sub('^[^ ]+\\s+', '', lines)
out <- data.frame(v1= str1, v2=str2, stringsAsFactors=FALSE)
head(out,3)
# v1 v2
#1 LIISS License Issued
#2 LIMOD License Modified
#3 LIPASS License Assigned (Partial Assignment)
或者我们可以将第一个空格替换为
,
,然后使用read.csv
替换为sep=',
read.table(text=sub(' ', ',', readLines('pavel.txt')), sep=',')
# V1 V2
#1 LIISS License Issued
#2 LIMOD License Modified
#3 LIPASS License Assigned (Partial Assignment)
#4 LIPND License Assigned (Partition/Disaggregation)
#5 LIPPND License Issued from a Partial/P&D Assignment
#6 LIPUR License Purged
#7 LIREIN License Reinstated
#8 LIREN License Renewed
如果我们使用的是linux,awk
可以通过fread
从data.table
或read.csv/read.table
进行管道传输
library(data.table)
fread("awk '{sub(\" \", \",\", $0)}1' pavel.txt", header=FALSE)
# V1 V2
#1: LIISS License Issued
#2: LIMOD License Modified
#3: LIPASS License Assigned (Partial Assignment)
#4: LIPND License Assigned (Partition/Disaggregation)
#5: LIPPND License Issued from a Partial/P&D Assignment
#6: LIPUR License Purged
#7: LIREIN License Reinstated
#8: LIREN License Renewed
我们可以使用
readLines
来读取此文件,然后使用sub
创建一个两列data.frame
#read the lines with readLines
lines <- readLines('pavel.txt')
#match one or more spaces followed by one or more characters
#replace with `''` to extract the non-space characters at the beginning.
str1 <- sub('\\s+.*', '', lines)
#match non space characters from the beginning (`^[^ ]+`) followed by space
#replace with `''` to extract the characters that follow after the space.
str2 <- sub('^[^ ]+\\s+', '', lines)
out <- data.frame(v1= str1, v2=str2, stringsAsFactors=FALSE)
head(out,3)
# v1 v2
#1 LIISS License Issued
#2 LIMOD License Modified
#3 LIPASS License Assigned (Partial Assignment)
或者我们可以将第一个空格替换为
,
,然后使用read.csv
替换为sep=',
read.table(text=sub(' ', ',', readLines('pavel.txt')), sep=',')
# V1 V2
#1 LIISS License Issued
#2 LIMOD License Modified
#3 LIPASS License Assigned (Partial Assignment)
#4 LIPND License Assigned (Partition/Disaggregation)
#5 LIPPND License Issued from a Partial/P&D Assignment
#6 LIPUR License Purged
#7 LIREIN License Reinstated
#8 LIREN License Renewed
如果我们使用的是linux,awk
可以通过fread
从data.table
或read.csv/read.table
进行管道传输
library(data.table)
fread("awk '{sub(\" \", \",\", $0)}1' pavel.txt", header=FALSE)
# V1 V2
#1: LIISS License Issued
#2: LIMOD License Modified
#3: LIPASS License Assigned (Partial Assignment)
#4: LIPND License Assigned (Partition/Disaggregation)
#5: LIPPND License Issued from a Partial/P&D Assignment
#6: LIPUR License Purged
#7: LIREIN License Reinstated
#8: LIREN License Renewed
您可以使用stringi中的
stri\u split\u fixed()
这里我们使用readLines()
来读取文件(如“x.txt”
所示)。然后stri\u split\u fixed()
表示我们希望在一个空格上拆分,并希望n=2列作为回报(因此只在第一个空格上拆分)simplify=TRUE
用于返回矩阵而不是列表
数据:x.txt
您可以使用stringi中的stri\u split\u fixed()
这里我们使用readLines()
来读取文件(如“x.txt”
所示)。然后stri\u split\u fixed()
表示我们希望在一个空格上拆分,并希望n=2列作为回报(因此只在第一个空格上拆分)simplify=TRUE
用于返回矩阵而不是列表
数据:x.txt
发布一个可复制的例子。发布一个可复制的例子。这很有效!谢谢你,理查德!成功了!谢谢你,理查德!