Sql server 读取带有成对和不成对引号的csv_Sql Server_Regex_R_Csv

Sql server 读取带有成对和不成对引号的csv

sql-server regex r csv

Sql server 读取带有成对和不成对引号的csv,sql-server,regex,r,csv,Sql Server,Regex,R,Csv,我有一个从MS SQL Server生成的csv文件，我正在尝试将其读入R。它包含以下数据： # reproduce file possibilities <- c('this is good','"this has, a comma"','here is a " quotation','') newstrings <- expand.grid(possibilities,possibilities,possibilities,stringsAsFactors = F) xwrite

我有一个从MS SQL Server生成的csv文件，我正在尝试将其读入R。它包含以下数据：

# reproduce file
possibilities <- c('this is good','"this has, a comma"','here is a " quotation','')
newstrings <- expand.grid(possibilities,possibilities,possibilities,stringsAsFactors = F)
xwrite <- apply(newstrings,1,paste,collapse = ",")
xwrite <- c('v1,v2,v3',xwrite)
writeLines(xwrite,con = 'test.csv')

#复制文件
可能性我认为没有直接的方法。这里我基本上是使用带有逗号的strsplit
作为分隔符。但首先，我处理特殊的分隔符，如，\“
或\”，

这是非常接近，可能会做。如果逗号旁边有一个单独的引号，它将失败，因为我假设这些将是实际需要引用的字符串的开始或结束
rl <- readLines('test.csv')
rl <- gsub('([^,])(\")([^,])','\\1\\3',rl,perl = T)
writeLines(rl,'testfixed.csv')
read.csv('testfixed.csv')

rl
# works if only comma OR unpaired quotation but not both
rl[grep('^[^\"]*\"[^\"]*$',rl)] <- sub('^([^\"]*)(\")([^\"]*)$','\\1\\3',rl[grep('^[^\"]*\"[^\"]*$',rl)])
writeLines(rl,'testfixed.csv')
read.csv('testfixed.csv')

lines <- readLines('test.csv')
## separate teh quotaion case
lines_spe <- strsplit(lines,',\"|\",')
nn <- sapply(lines_spe,length)==1
## the normal case
lines[nn] <- strsplit(lines[nn],',',perl=TRUE)
## aggregate the results
lines[!nn] <- lines_spe[!nn]
## bind to create a data.frame
dat <-
setNames(as.data.frame(do.call(rbind,lines[-1]),stringsAsFactors =F),
         lines[[1]])
## treat the special case of strsplit('some text without second part,',',')
dat[dat$v1==dat$v2,"v2"] <- ""
dat
#                         v1                      v2
# 1             this is good            this is fine
# 2       this has no commas      this has, a comma"
# 3   this has no quotations  this has a " quotation
# 4 this field has something                        
# 5                          now the other side does
# 6       "this has, a comma  this has a " quotation
# 7         and a final line     that should be fine

 strsplit('aaa,',',')
[[1]]
[1] "aaa"

> strsplit(',aaa',',')
[[1]]
[1] ""    "aaa"

rl <- readLines('test.csv')
rl <- gsub('([^,])(\")([^,])','\\1\\3',rl,perl = T)
writeLines(rl,'testfixed.csv')
read.csv('testfixed.csv')