Regex 如何删除额外的双引号,而不是使用bash脚本在文本行中打开和关闭双引号

Regex 如何删除额外的双引号,而不是使用bash脚本在文本行中打开和关闭双引号,regex,bash,postgresql,shell,pattern-matching,Regex,Bash,Postgresql,Shell,Pattern Matching,我有一个文本文件,我想把它复制到CSV文件中,然后再复制到PostgreSQL表中 我的输入文本文件是(old_sample.txt) “润滑油、油和过滤器-德索索斯”1“,“1”、“91”、“91”、“91”、“车道”、“车道”、“车道”、“L”、“拉仅仅”、“125.00”、“125.00”、“125.00”、“125.00”、“125.00”、“125.00”、“0.00”、“0.00”、“0.00”、“0.00”、“0.00”、“0.00”、“0.00”、“0.00”、“0”、“0.00

我有一个文本文件,我想把它复制到CSV文件中,然后再复制到PostgreSQL表中

我的输入文本文件是(old_sample.txt)

“润滑油、油和过滤器-德索索斯”1“,“1”、“91”、“91”、“91”、“车道”、“车道”、“车道”、“L”、“拉仅仅”、“125.00”、“125.00”、“125.00”、“125.00”、“125.00”、“125.00”、“0.00”、“0.00”、“0.00”、“0.00”、“0.00”、“0.00”、“0.00”、“0.00”、“0”、“0.00”、“0”、“0”、“0.00”、“0”、“0”、“0.00”、“0”、“0”、“0.17”、“0”、“53.98”、“53.98”、“40”、“40”、“40.97”、“40”、“40.97”、“40”、“40.97”、“40.97”、“40”、“40.97”、“全合成电机油润滑、润滑油、润滑油、油、油、油、油和油和油和油和油和油和过滤器-德索索--德索索索斯-德索索斯“1”、“1”、“1”、“1”、“1”、“1”、“1”、“1”、“1”、“1”、“1”、“1”、“1”、“1”、“1”、“1|||||||||N“ 我必须使用下面的代码

cat old_sample.txt
printf "\n"
echo "____________________________________"
printf "\n"
cat old_sample.txt | sed ': again
s/\("[^",]*\)"\([^",]*"\)/\1\2/g
t again
s/""/"/g' 
输出为

“润滑油、油和过滤器-德索索斯”1“,“1”、“91”、“91”、“91”、“车道”、“车道”、“车道”、“L”、“拉仅仅”、“125.00”、“125.00”、“125.00”、“125.00”、“125.00”、“125.00”、“0.00”、“0.00”、“0.00”、“0.00”、“0.00”、“0.00”、“0.00”、“0.00”、“0”、“0.00”、“0”、“0”、“0.00”、“0”、“0”、“0.00”、“0”、“0”、“0.17”、“0”、“53.98”、“53.98”、“40”、“40”、“40.97”、“40”、“40.97”、“40”、“40.97”、“40.97”、“40”、“40.97”、“全合成电机油润滑、润滑油、润滑油、油、油、油、油和油和油和油和油和油和过滤器-德索索--德索索索斯-德索索斯“1”、“1”、“1”、“1”、“1”、“1”、“1”、“1”、“1”、“1”、“1”、“1”、“1”、“1”、“1”、“1||||||||”,“N” “润滑油、油和过滤器-德克索斯”1“,“1”、“1”、“1”、“1”、“1”、“0.4”、“0.4”、“15.95”、“15.95”、“10.80”、“10.80”、“0.00”、“0.00”、“0.00”、“0.00”、“0.00”、“0.00”、“0.00”、“0.00”、“0.00”、“0.00”、“0.00”、“38.03”、“30.17”、“30.17”、“53.98”、“40.98”、“40”、“40.97”、“40”、“40.97”、“40”、“40.97”、“全合成电机油润滑油、润滑油、油、油和油和过滤器-德克索索-德克索索索索索斯“1”、“1”、“1”、“1”、“1”、“1”、“1”、“1”、“1”、“1”、“91”、“1”、“91”、“91”、“91”、“91”、“91”、“91”、“91”、“91”、“91”、“91”、“91”、“91”、“91”、“91”、“91”、“91”、“91”、“91”、“91”、“91”、“11”、“11”、“11”、“11”、“11”、“”、“”、““N” 问题是润滑油、机油和滤清器-DEXOS“1”

“1”由于逗号的原因,此双引号未被删除,但“2019”0627“此功能正常,因此我希望删除包含在打开和关闭双引号中的字符串中的所有双引号。否则,它将显示数据库错误

这是我的密码

nl -ba -nln -s, < old_sample.txt | sed ': again
                                      s/\("[^",]*\)"\([^",]*"\)/\1\2/g
                                      t again' | grep 'SVCPTS' > old_sample.csv
psql_local <<SQL || die "Failed to import parts data"
        \copy sample_table from 'old_sample.csv' with (format csv, header false)
SQL 

就我个人而言,如果我这样做,我会接触到一个实用程序。我认为你可以通过找到合适的正则表达式来实现它,但是它可能会变得非常复杂


使用类似的东西,特别是,看起来容易多了。如果您将来需要将此脚本与其他数据一起重新使用(某些字段中可能有换行符,或者您可能需要说明的其他情况),则此脚本也会更可靠。

请尝试以下操作:

while IFS= read -r str; do          # assign a variable "str" to a line
    while true; do                  # infinite loop
        str2=$(sed 's/\([^,]\)"\([^,]\)/\1\2/g' <<< "$str")
        [[ "$str2" = "$str" ]] && break
                                    # if there is no change, exit the loop
        str="$str2"                 # update "str" for next iteration
    done
    echo "$str"
done < "old_sample.txt"
  • regex
    \([^,]\)“\([^,]\)
    匹配一个被包围的双引号 使用非逗号字符
  • 它循环直到所有额外的双引号被删除
  • 上面的脚本适用于提供的示例,但可能不够健壮 足够任意输入。建议引入工具 正如chrisputnam9建议的那样,它能够解析csv文件以获得可靠的结果
[编辑] 如果您的文件有CR+LF行结尾,请改为尝试:

while IFS= read -r str; do      # assign a variable "str" to a line
    while true; do              # infinite loop
        str2=$(sed 's/\([^,]\)"\([^,]\)/\1\2/g' <<< "$str")
        [[ "$str2" = "$str" ]] && break
                                # if there is no change, exit the loop
        str="$str2"             # update "str" for next iteration
    done
#   echo "$str"                 # add LF at the end of the output line
    echo -ne "$str\r\n"         # add CR+LF at the end of the output line
done < <(tr -d "\r" < "VehSer_NEWM11_test.txt")
                                # remove CR code

不能在一个命令中完成,所以我做了这个

 $ sed "s/['\"]//g; s/,/\",\"/g; s/\",\" /, /g; s/,,/,\"\",/g; s/$/\"/; s/\"//" file
SVCOP,"12980","20190627","1DEX","LUBE, OIL & FILTER - DEXOS 1","I,0.4","0.4","15.95","10.80","0.00","0.00","0.00","0.00","0.00","0.00","38.03","30.17","53.98","40.97","FULL SYNTHETIC MOTOR OIL.","LUBE, OIL & FILTER - DEXOS 1","91","LANE","LANE","L,LA MERE","125.00","125.00,"",0.00","0.00","0,0","0,||||||||||||||||||||||||","N"
如果您需要
“1”


问题是
“润滑油、机油和滤清器-德克斯”1“
,但在样本中却是
“润滑油、机油和滤清器-德克斯”1“
vs
)。这是哪一个?“润滑油、机油和滤清器-DEXOS“1”这一个实际上我的文本文件不包含一行,有很多文本行,但使用此代码时bash挂起,但如果它在一行以上,则工作正常。@SherinGreen感谢您的反馈。我已经更新了代码,逐行处理文件,以支持多行。你能试试吗?如果它仍然不能工作,我希望您能提供一个多行示例(不是完整的行,而是最小的行),这会破坏我的代码。先谢谢你。BR.嗨,我已经用输入文件尝试了这段代码,但处理挂起。我将共享该文本文件共享链接-->@SherinGreen谢谢您共享您的文件。这对我帮助很大。我注意到您的文件包含CR+LF行结尾,这超出了我的假设。我添加了一个新版本来支持CR+LF行结束符。它可能在我的环境中工作,尽管它需要几分钟才能完成。请你再测试一下好吗?很抱歉给您带来不便,感谢您的支持。嗨,我已经测试了上面的代码,但并没有进程结束,字符串行多次显示在终端中
perl -pe '1 while s/([^,])"([^,\r])/$1$2/g' VehSer_NEWM11_test.txt
 $ sed "s/['\"]//g; s/,/\",\"/g; s/\",\" /, /g; s/,,/,\"\",/g; s/$/\"/; s/\"//" file
SVCOP,"12980","20190627","1DEX","LUBE, OIL & FILTER - DEXOS 1","I,0.4","0.4","15.95","10.80","0.00","0.00","0.00","0.00","0.00","0.00","38.03","30.17","53.98","40.97","FULL SYNTHETIC MOTOR OIL.","LUBE, OIL & FILTER - DEXOS 1","91","LANE","LANE","L,LA MERE","125.00","125.00,"",0.00","0.00","0,0","0,||||||||||||||||||||||||","N"
$ sed 's/"//g; s/,/","/g; s/"," /, /g; s/,,/,"",/g; s/$/"/; s/"//' file
SVCOP,"12980","20190627","1DEX","LUBE, OIL & FILTER - DEXOS 1","I","0.4","0.4","15.95","10.80","0.00","0.00","0.00","0.00","0.00","0.00","38.03","30.17","53.98","40.97","FULL SYNTHETIC MOTOR OIL.","LUBE, OIL & FILTER - DEXOS ''1''","91","LANE","LANE","L","LA MERE","125.00","125.00","","0.00","0.00","0","0","0","||||||||||||||||||||||||","N"