r字符串分离问题
我正在处理以下几个字符串r字符串分离问题,r,string,split,R,String,Split,我正在处理以下几个字符串 Col1 -------------------------- 554 - partial-completion_3 4011 - structure painted 5459 - 1 int mam-corrosion issue 996 - cast iron countershock `5459 - 1 int mam-corrosion issue` 我的目标是像这样把这些字符串分成两部分 Col1_Id Col2_Desc -------
Col1
--------------------------
554 - partial-completion_3
4011 - structure painted
5459 - 1 int mam-corrosion issue
996 - cast iron countershock
`5459 - 1 int mam-corrosion issue`
我的目标是像这样把这些字符串分成两部分
Col1_Id Col2_Desc
--------------------------
554 partial-completion_3
4011 structure painted
5459 1 int mam-corrosion issue
996 cast iron countershock
`5459 - 1 int mam-corrosion issue`
`5459 - 1 int mam`
我尝试使用了分离
功能
df_sep = df %>%
separate(Col1, c("Col1_ID", "Col2_Desc"), "-")
`5459 - 1 int mam-corrosion issue`
只有当字符串中只有一个-时,如果有两个-
以字符串为例
`5459 - 1 int mam-corrosion issue`
然后,separate函数在第二个-之后删除描述,输出如下
Col1_Id Col2_Desc
--------------------------
554 partial-completion_3
4011 structure painted
5459 1 int mam-corrosion issue
996 cast iron countershock
`5459 - 1 int mam-corrosion issue`
`5459 - 1 int mam`
这不是我所期望的。我期待下面这样的输出
`5459 - 1 int mam-corrosion issue`
Col1_Id Col2_Desc
--------------------------
554 partial-completion_3
4011 structure painted
5459 1 int mam-corrosion issue
996 cast iron countershock
非常感谢您的任何提示或建议。我们可以使用
sub
将第一个-
替换为,
,然后使用read.csv
`5459 - 1 int mam-corrosion issue`
read.csv(text= sub("-", ",", df1$Col1), header=FALSE,
col.names=c("Col1_Id", "Col2_Desc"), stringsAsFactors=FALSE)
# Col1_Id Col2_Desc
#1 554 partial-completion_3
#2 4011 structure painted
#3 5459 1 int mam-corrosion issue
#4 996 cast iron countershock
在
separate
的情况下,有一个额外的
参数,可用于解决此问题
`5459 - 1 int mam-corrosion issue`
library(tidyr)
separate(df1, Col1, into = c("Col1_Id", "Col2_Desc"), extra="merge")
# Col1_Id Col2_Desc
#1 554 partial-completion_3
#2 4011 structure painted
#3 5459 1 int mam-corrosion issue
#4 996 cast iron countershock
数据
df1一个基本R选项是strsplit
将列拆分为一个列表,然后使用rbind.data.frame
构建一个data.frameSetNames
用于方便地在同一行中设置名称
`5459 - 1 int mam-corrosion issue`
setNames(do.call(rbind.data.frame, strsplit(df1$Col1, split=" - ")),
c("Col1_Id", "Col2_Desc"))
Col1_Id Col2_Desc
1 554 partial-completion_3
2 4011 structure painted
3 5459 1 int mam-corrosion issue
4 996 cast iron countershock
看起来akrun已经解决了这个问题,但在将来,如果您以一种易于复制的方式(如dput()
)共享数据,或者通过在您提供的代码中创建数据来共享数据会更好,这很好,很好。