Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/70.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
替换R中vector中的rogue双引号_R_Regex - Fatal编程技术网

替换R中vector中的rogue双引号

替换R中vector中的rogue双引号,r,regex,R,Regex,我有一个破碎的CSV文件,长文本字段包含双引号和逗号。我已经能够在某种程度上对其进行清理,现在将制表符分隔的字段作为整行向量(每个值都是一行) 然后,我将temp作为文件写入,并将其读回(我发现这比textConnection快得多)。但是,read.table(“temp”,sep=“\t”,quote=“\”,encoding=“UTF-8”,colClasses=“character”)会在某些行上阻塞,并向我提供以下消息: 扫描错误(文件=文件,内容=内容,sep=sep,quote=q

我有一个破碎的CSV文件,长文本字段包含双引号和逗号。我已经能够在某种程度上对其进行清理,现在将制表符分隔的字段作为整行向量(每个值都是一行)

然后,我将temp作为文件写入,并将其读回(我发现这比textConnection快得多)。但是,
read.table(“temp”,sep=“\t”,quote=“\”,encoding=“UTF-8”,colClasses=“character”)
会在某些行上阻塞,并向我提供以下消息:

扫描错误(文件=文件,内容=内容,sep=sep,quote=quote,dec =dec,:第66951行没有29个元素

我认为这是由于rogue双引号引起的,如下所示(rogue引号可以在“TripAdvisor de la sant?”之后立即找到)

我建议用单引号替换恶意双引号,但我必须保留预期的引号。引号应在分隔符(制表符)之前或之后,以及行首(仅第一行)和行尾。我在正则表达式中编写了以下尝试,其中包含制表符和行首和行尾的lookarounds,但不起作用:

temp <- gsub("(?<![^\t])\"(?![\t$])", "'", temp, perl = T)
您的
(?前面不带除制表符以外的字符(因此,
”之前必须有制表符或字符串开头),并且后面不带制表符或
$
符号

因此,字符类中的
^
$
将失去其锚定意义

用替换组替换字符类:

gsub("(?<!\t|^)\"(?!\t|$)", "'", temp, perl=TRUE)
gsub(“”)试试
gsub(“”?
temp <- gsub("(?<![^\t])\"(?![\t$])", "'", temp, perl = T)
temp[181]
[1] "198\torganizations/playfusion\tplayfusion\torganizations/playfusion\torganization/playfusion\tPlayFusion\t\tPlayFusion is a developer of computer games.\tPlayFusion is pioneering the next generation of connected interactive entertainment. PlayFusion's proprietary technology platform fuses video games, robotics, toys, and trans-media entertainment. The company is currently working on its own original IP to trail-blaze its vision ahead of opening its platform to others.    PlayFusion is an independent, employee-owned company with offices in Cambridge and Derby in the UK, Douglas in the Isle of Man, and New York and San Francisco in the USA.\thttp://public.crunchbase.com/t_api_images/v1475688372/xnhrd4t254pxj6yxegzt.png\tcompany\t\t\t\t\t2015-01-01\t4\tFALSE\t\t0\t11\t50\t\t\t0\t0\thttp://playfusion.com/#intro\t1475688521\t1475899292"
gsub("(?<!\t|^)\"(?!\t|$)", "'", temp, perl=TRUE)