“转换”命令&引用；以CSV格式发送至\“；_Csv_Sed_Awk_Amazon Redshift

“转换”命令&引用；以CSV格式发送至\“；

csv sed awk amazon-redshift

“转换”命令&引用；以CSV格式发送至\“；,csv,sed,awk,amazon-redshift,Csv,Sed,Awk,Amazon Redshift,从外部来源，我得到了巨大的CSV文件（大约16GB），其中的字段可以选择用双引号（“）括起来。字段之间用分号（；）分隔。当一个字段在内容中包含双引号时，它将作为两个双引号转义目前，我正在将它们导入一个MySQL数据库，该数据库能够理解“的语义我正在考虑迁移到Amazon Redshift，但他们（或者可能是一般的PostgreSQL）要求引号用反斜杠转义为\“ 现在我正在搜索最快的命令行工具（可能是awk，sed？）和转换文件的确切语法输入示例： """start of line";"""

从外部来源，我得到了巨大的CSV文件（大约16GB），其中的字段可以选择用双引号（“）括起来。字段之间用分号（；）分隔。当一个字段在内容中包含双引号时，它将作为两个双引号转义

目前，我正在将它们导入一个MySQL数据库，该数据库能够理解

“

的语义

我正在考虑迁移到Amazon Redshift，但他们（或者可能是一般的PostgreSQL）要求引号用反斜杠转义为

\“

现在我正在搜索最快的命令行工具（可能是awk，sed？）和转换文件的确切语法
输入示例：

"""start of line";"""beginning "" middle and end """;"end of line""" 12345;"Tell me an ""intelligent"" joke; I tell you one in return" 54321;"Your mom is ""nice""" "";"";"" "However, if;""Quotes""; are present"
示例输出：

"\"start of line";"\"beginning \" middle and end \"";"end of line\"" 12345;"Tell me an \"intelligent\" joke; I tell you one in return" 54321;"Your mom is \"nice\"" "";"";"" "However, if;\"Quotes\"; are present"

编辑：添加了更多测试。
我将使用
sed
，正如您在帖子中建议的那样：

$ sed 's@""@\\"@g' input 12345;"Tell me an \"intelligent\" joke; I tell you one in return" 54321;"Your mom is \"nice\""

我会选择使用sed：

$ sed 's:"":\\":g' your_csv.csv
在以下情况下对其进行测试时：

new """ test "" "hows "" this "" "
我得到：

new \"" test \" "hows \" this \" "
这条线应该可以工作：

sed 's/""/\\"/g' file
使用
sed
：

sed 's/""/\\"/g' input_file
测试：
有几个边缘案例需要注意：

如果字符串开头有双引号怎么办
如果该字符串是第一个字段呢

包含空字符串的字段

Sed是一个高效的工具，但是使用16GB文件需要一段时间。您最好有至少16GB的可用磁盘空间来写入更新的文件（即使是sed的
-i
inplaceedit也在幕后使用临时文件）

参考文献：，
谢谢，我将您的特殊情况添加到测试输入中。它很管用。复制16GB文件需要3M21秒，写入文件需要4M34秒，速度非常快！中间空空如也，是的。对于空的第一个或最后一个字段，不可以。我认为如果没有lookbehind，sed可能无法处理它。如上所述，8月9日，AWS宣布他们正在“添加对标准CSV双引号转义的支持”。请看帖子：我相信你所经历的问题现在可以通过本机处理。8月9日，AWS宣布他们正在“添加对标准CSV双引号转义的支持”。请看帖子：如果你能够测试和验证这个新特性的功能，我会对你的结果感兴趣。
$ cat n.txt 12345;"Tell me an ""intelligent"" joke; I tell you one in return" 54321;"Your mom is ""nice""" $ sed 's/""/\\"/g' n.txt 12345;"Tell me an \"intelligent\" joke; I tell you one in return" 54321;"Your mom is \"nice\""

sed -r ' # at the start of a line or the start of a field, # replace """ with "\" s/(^|;)"""/\1"\\"/g # replace any doubled double-quote with an escaped double-quote. # this affects any "inner" quote pair as well as end of field or end of line # if there is an escaped quote from the previous command, don't be fooled by # a proceeding quote. s/([^\\])""/\1\\"/g # the above step will destroy empty strings. fix them here. this uses a # conditional loop: if there are 2 consecutive empty fields, they will # share a delimited, so we have to process the line more than once :fix_empty_fields s/(^|;)\\"($|;)/\1""\2/g tfix_empty_fields ' <<'END' """start of line";"""beginning "" middle and end """;"end of line""" "";"";"";"""";"""""";"";"" END

"\"start of line";"\"beginning \" middle and end \"";"end of line\"" "";"";"";"\"";"\"\"";"";""