清理CSV格式电话号码的不同格式

清理CSV格式电话号码的不同格式,csv,awk,sed,ksh,Csv,Awk,Sed,Ksh,假设有几种不同类型的CSV格式电话号码,如下所示: "Name","Address","FullPhone" "Mike Wise","101 Abc Drive","4061234567" // Need to separate area code from the rest "Name","Address","Areacode","Phone" "Mike Wise","101 Abc Drive","406","123-4567" // Need to remove the dash i

假设有几种不同类型的CSV格式电话号码,如下所示:

"Name","Address","FullPhone"
"Mike Wise","101 Abc Drive","4061234567" // Need to separate area code from the rest
"Name","Address","Areacode","Phone"
"Mike Wise","101 Abc Drive","406","123-4567" // Need to remove the dash in the seven-digit phone number
这是第一个CSV文件,其行如下:

"Name","Address","FullPhone"
"Mike Wise","101 Abc Drive","4061234567" // Need to separate area code from the rest
"Name","Address","Areacode","Phone"
"Mike Wise","101 Abc Drive","406","123-4567" // Need to remove the dash in the seven-digit phone number
下面是另一个CSV文件,其行如下:

"Name","Address","FullPhone"
"Mike Wise","101 Abc Drive","4061234567" // Need to separate area code from the rest
"Name","Address","Areacode","Phone"
"Mike Wise","101 Abc Drive","406","123-4567" // Need to remove the dash in the seven-digit phone number
是否有某种sed一行程序将其转换为以下通用格式

"Name","Address","NPA","TELNO"
"Mike Wise","101 Abc Drive","406","1234567"
我更喜欢一行,但如果它必须不止一行,那就这样吧。此外,不需要sed。只是觉得sed可能更容易,尽管我还没有想出一个sed解决方案

$ cat tst.awk
BEGIN { FS=OFS="\",\"" }
{
    if (NR==1) {
        $3 = "NPA"
        $4 = "TELNO\""
    }
    else {
        gsub(/-/,"",$NF)
        if (NF==3) {
            sub(/.{3}/,"&"OFS,$NF)
        }
    }
    print
}

$ cat file1
"Name","Address","FullPhone"
"Mike Wise","101 Abc Drive","4061234567"

$ awk -f tst.awk file1
"Name","Address","NPA","TELNO"
"Mike Wise","101 Abc Drive","406","1234567"

$ cat file2            
"Name","Address","Areacode","Phone"
"Mike Wise","101 Abc Drive","406","123-4567"

$ awk -f tst.awk file2
"Name","Address","NPA","TELNO"
"Mike Wise","101 Abc Drive","406","1234567"
以及一些您没有要求的特定输入,但可能会发生,并且如果发生,将得到正确处理:

$ cat file3
"Name","Address","FullPhone"
"Mike Wise","101 Abc Drive","406-1234-567"

$ awk -f tst.awk file3
"Name","Address","NPA","TELNO"
"Mike Wise","101 Abc Drive","406","1234567"
如果您需要从输入的电话号码中删除空格,而不是只删除
-
s,那么只需将
gsub(/-/,“”,$NF)
更改为
gsub(/[-[:space:]]/,“”,$NF)
gsub(/[^0-9]/,“”,$NF)
或类似内容

sed '1 c\
"Name","Address","Areacode","Phone"
     s/"\([0-9]\{3\}\)\([0-9]\{7\}\)"[[:space:]]*$/"\1","\2"/
     s/-\([0-9]\{1,6\}\)"[[:space:]]*$/\1"/
     ' YourFile
以及一些您没有要求的特定输入,但可能会发生,并且如果发生,将得到正确处理:

$ cat file3
"Name","Address","FullPhone"
"Mike Wise","101 Abc Drive","406-1234-567"

$ awk -f tst.awk file3
"Name","Address","NPA","TELNO"
"Mike Wise","101 Abc Drive","406","1234567"
如果您需要从输入的电话号码中删除空格,而不是只删除
-
s,那么只需将
gsub(/-/,“”,$NF)
更改为
gsub(/[-[:space:]]/,“”,$NF)
gsub(/[^0-9]/,“”,$NF)
或类似内容

sed '1 c\
"Name","Address","Areacode","Phone"
     s/"\([0-9]\{3\}\)\([0-9]\{7\}\)"[[:space:]]*$/"\1","\2"/
     s/-\([0-9]\{1,6\}\)"[[:space:]]*$/\1"/
     ' YourFile
将适用于csv文件格式(也适用于@EdMorton备注后的标题)

  • 1c\
    :将第一行更改为下一行(强制此行而不是原始页眉)
  • 第一个s///将使用s的组功能更改任何ta尾随3位和7位的行(因此1个数据包中有10位),并用双引号括起,每个字段值为3位和7位///
  • 第二个s///将使用组功能(由
    \1
    引用)更改尾随的
    -
    ,后跟1到6位数字,并更改尾随的双引号,而不使用
    -
第一个s///将不取第二个样本的行(无模式对应),第二个s///将不取第一个样本的行(相同原因),并且将不取第一个s///更改的行(仍然相同原因) 第二行

将适用于csv文件格式(也适用于@EdMorton备注后的标题)

  • 1c\
    :将第一行更改为下一行(强制此行而不是原始页眉)
  • 第一个s///将使用s的组功能更改任何ta尾随3位和7位的行(因此1个数据包中有10位),并用双引号括起,每个字段值为3位和7位///
  • 第二个s///将使用组功能(由
    \1
    引用)更改尾随的
    -
    ,后跟1到6位数字,并更改尾随的双引号,而不使用
    -
第一个s///将不取第二个样本的行(无模式对应),第二个s///将不取第一个样本的行(相同原因),并且将不取第一个s///更改的行(仍然相同原因)
第二行是的,它们在两个不同的文件中,每个文件都有两种格式中的一种。谢谢你提醒我这一点。@EdMorton是的,它们在两个不同的文件中,每个文件都有两种格式中的一种。感谢您提醒我。这些列表位于两个不同的CSV文件中。我还想将相应的标题更改为带有
“NPA”
“TELNO”
的通用标题。这些列表位于两个不同的CSV文件中。我还想用
“NPA”
“TELNO”
@EdMorton将相应的标题更改为通用标题。非常好的评论,我将为此更改代码,很抱歉,忘记第一次替换时的数字捕捉,更正确实有些困难,现在它工作得很好。你能详细介绍一下第二个sed正则表达式吗?我还不太明白。谢谢@EdMorton说得很好,我会为此更改代码,很抱歉在第一次替换时忘记了数字捕获,更正了一些,现在它工作得很好。你能详细介绍一下第二个sed正则表达式吗?我还不太明白。谢谢