R 读取列中带有逗号的CSV文件
我有一个csv文件,有6列,其中一列有用逗号分隔的文本,例如螺栓、RD HD SQ短颈、公制 当我在R中读取此文件时,此列出现溢出,随后数据移动到新行 下面我粘贴了几行 014003051906,ETN5080,0450,螺栓套件上轴,5速,1.000,F 014003051906,ETN5967,0460,传感器FH后轴速度,1.000,F 014003051906,ETN64267,0470,倾斜装置传感器,1.000,F 014003065376,03M7184,0020,螺栓-M 8.0 X 1.250 X 20.0- 8.8-锌,4.000,G 014003065376,03M7386,0090,螺栓,RD HD SQ短颈,公制,18.000,G 014003065376,14M7296,0090,螺母,公制,六角 法兰,14.000,克 最后两行是问题所在。“公制六角法兰螺母”应属于一个变量R 读取列中带有逗号的CSV文件,r,csv,R,Csv,我有一个csv文件,有6列,其中一列有用逗号分隔的文本,例如螺栓、RD HD SQ短颈、公制 当我在R中读取此文件时,此列出现溢出,随后数据移动到新行 下面我粘贴了几行 014003051906,ETN5080,0450,螺栓套件上轴,5速,1.000,F 014003051906,ETN5967,0460,传感器FH后轴速度,1.000,F 014003051906,ETN64267,0470,倾斜装置传感器,1.000,F 014003065376,03M7184,0020,螺栓-M 8.0
如何解决这个问题?
data您是如何找到这些数据的?(从Excel保存为CSV?)最好的解决方案是要求以引用数据或使用不同分隔符的格式保存数据。@Benjamin我也想到过同样的方法。但不幸的是,这是我们唯一的来源。你可以选择常规的expressions@BenjaminExcel在很多方面都是粗鲁的,但至少在引号中加上包含逗号的字符串是礼貌的。@HongOoi我的错误,我向Excel道歉。扩展您的想法,您可以在第四个字段周围添加逗号,并调用read.csv
:read.csv(text=gsub(“^([^,]*,[^,]*,[^,]*,)(*)((,[^,]*,[^,]*)$,“\\1\”\\2\“\\3”,data),header=FALSE)
@Apom我对regex不熟悉,请解释一下regex部分好吗?
data <- readLines(con = textConnection("014003051906,ETN5080 ,0450,BOLT KIT UPPER SHAFT WITH 5 SPEED,1.000,F
014003051906,ETN5967 ,0460,SENSOR SENSOR FH BACKSHAFT SPEED,1.000,F
014003051906,ETN64267 ,0470,TILT UNIT SENSOR,1.000,F
014003065376,03M7184 ,0020,BOLT - M 8.0 X 1.250 X 20.0 - 8.8-Zinc,4.000,G
014003065376,03M7386 ,0090,BOLT, RD HD SQ SHORT NECK, METRIC,18.000,G
014003065376,14M7296 ,0090,NUT, METRIC, HEX FLANGE,14.000,G"))
pattern <- "^([^,]*),([^,]*),([^,]*),(.*),([^,]*),([^,]*)$"
library(stringr)
str_match(data, pattern)[, - 1]
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] "014003051906" "ETN5080 " "0450" "BOLT KIT UPPER SHAFT WITH 5 SPEED" "1.000" "F"
# [2,] "014003051906" "ETN5967 " "0460" "SENSOR SENSOR FH BACKSHAFT SPEED" "1.000" "F"
# [3,] "014003051906" "ETN64267 " "0470" "TILT UNIT SENSOR" "1.000" "F"
# [4,] NA NA NA NA NA NA
# [5,] "014003065376" "03M7184 " "0020" "BOLT - M 8.0 X 1.250 X 20.0 - 8.8-Zinc" "4.000" "G"
# [6,] "014003065376" "03M7386 " "0090" "BOLT, RD HD SQ SHORT NECK, METRIC" "18.000" "G"
# [7,] "014003065376" "14M7296 " "0090" "NUT, METRIC, HEX FLANGE" "14.000" "G"