R：读第一列，然后读其余的_R

R：读第一列，然后读其余的

R：读第一列，然后读其余的,r,R,我有一个文件，里面有代码和描述。代码始终是一个短（3-6个字符）的字母字符串，与以下描述之间用空格分隔。描述通常是几个单词（也有空格）。以下是一个例子： LIISS License Issued LIMOD License Modified LIPASS License Assigned (Partial Assignment) LIPND License Assigned (Partition/Disaggregation) LIPPND License Issued from a Part

我有一个文件，里面有代码和描述。代码始终是一个短（3-6个字符）的字母字符串，与以下描述之间用空格分隔。描述通常是几个单词（也有空格）。以下是一个例子：

LIISS License Issued
LIMOD License Modified
LIPASS License Assigned (Partial Assignment)
LIPND License Assigned (Partition/Disaggregation)
LIPPND License Issued from a Partial/P&D Assignment
LIPUR License Purged
LIREIN License Reinstated
LIREN License Renewed

我想把它理解为一个两列数据框，第一列是代码，第二列是描述。我怎样才能用R做这个

我们可以使用

readLines

读取此文件，然后使用

sub

创建两列

data.frame

#read the lines with readLines
lines <- readLines('pavel.txt')
#match one or more spaces followed by one or more characters
#replace with `''` to extract the non-space characters at the beginning.
str1 <- sub('\\s+.*', '', lines)
#match non space characters from the beginning (`^[^ ]+`) followed by space
#replace with `''` to extract the characters that follow after the space.
str2 <- sub('^[^ ]+\\s+', '', lines)
out <- data.frame(v1= str1, v2=str2, stringsAsFactors=FALSE)
head(out,3)
#      v1                                    v2
#1  LIISS                        License Issued
#2  LIMOD                      License Modified
#3 LIPASS License Assigned (Partial Assignment)

或者我们可以将第一个空格替换为

，

，然后使用

read.csv

替换为

sep='，

read.table(text=sub(' ', ',', readLines('pavel.txt')), sep=',')
#      V1                                           V2
#1  LIISS                               License Issued
#2  LIMOD                             License Modified
#3 LIPASS        License Assigned (Partial Assignment)
#4  LIPND  License Assigned (Partition/Disaggregation)
#5 LIPPND License Issued from a Partial/P&D Assignment
#6  LIPUR                               License Purged
#7 LIREIN                           License Reinstated
#8  LIREN                              License Renewed

如果我们使用的是linux，

awk

可以通过

fread

从

data.table

或

read.csv/read.table

进行管道传输

library(data.table)
fread("awk '{sub(\" \", \",\", $0)}1' pavel.txt", header=FALSE)
#      V1                                           V2
#1:  LIISS                               License Issued
#2:  LIMOD                             License Modified
#3: LIPASS        License Assigned (Partial Assignment)
#4:  LIPND  License Assigned (Partition/Disaggregation)
#5: LIPPND License Issued from a Partial/P&D Assignment
#6:  LIPUR                               License Purged
#7: LIREIN                           License Reinstated
#8:  LIREN                              License Renewed

我们可以使用

readLines

来读取此文件，然后使用

sub

创建一个两列

data.frame

#read the lines with readLines
lines <- readLines('pavel.txt')
#match one or more spaces followed by one or more characters
#replace with `''` to extract the non-space characters at the beginning.
str1 <- sub('\\s+.*', '', lines)
#match non space characters from the beginning (`^[^ ]+`) followed by space
#replace with `''` to extract the characters that follow after the space.
str2 <- sub('^[^ ]+\\s+', '', lines)
out <- data.frame(v1= str1, v2=str2, stringsAsFactors=FALSE)
head(out,3)
#      v1                                    v2
#1  LIISS                        License Issued
#2  LIMOD                      License Modified
#3 LIPASS License Assigned (Partial Assignment)

或者我们可以将第一个空格替换为

，

，然后使用

read.csv

替换为

sep='，

read.table(text=sub(' ', ',', readLines('pavel.txt')), sep=',')
#      V1                                           V2
#1  LIISS                               License Issued
#2  LIMOD                             License Modified
#3 LIPASS        License Assigned (Partial Assignment)
#4  LIPND  License Assigned (Partition/Disaggregation)
#5 LIPPND License Issued from a Partial/P&D Assignment
#6  LIPUR                               License Purged
#7 LIREIN                           License Reinstated
#8  LIREN                              License Renewed

如果我们使用的是linux，

awk

可以通过

fread

从

data.table

或

read.csv/read.table

进行管道传输

library(data.table)
fread("awk '{sub(\" \", \",\", $0)}1' pavel.txt", header=FALSE)
#      V1                                           V2
#1:  LIISS                               License Issued
#2:  LIMOD                             License Modified
#3: LIPASS        License Assigned (Partial Assignment)
#4:  LIPND  License Assigned (Partition/Disaggregation)
#5: LIPPND License Issued from a Partial/P&D Assignment
#6:  LIPUR                               License Purged
#7: LIREIN                           License Reinstated
#8:  LIREN                              License Renewed

您可以使用stringi中的

stri\u split\u fixed（）

这里我们使用

readLines（）

来读取文件（如

“x.txt”

所示）。然后

stri\u split\u fixed（）

表示我们希望在一个空格上拆分，并希望

n=2列作为回报（因此只在第一个空格上拆分）simplify=TRUE
用于返回矩阵而不是列表
数据：x.txt
您可以使用stringi中的stri\u split\u fixed（）

这里我们使用readLines（）
来读取文件（如“x.txt”
所示）。然后stri\u split\u fixed（）
表示我们希望在一个空格上拆分，并希望n=2列作为回报（因此只在第一个空格上拆分）simplify=TRUE
用于返回矩阵而不是列表
数据：x.txt
发布一个可复制的例子。发布一个可复制的例子。这很有效！谢谢你，理查德！成功了！谢谢你，理查德！