清理CSV格式电话号码的不同格式
假设有几种不同类型的CSV格式电话号码,如下所示:清理CSV格式电话号码的不同格式,csv,awk,sed,ksh,Csv,Awk,Sed,Ksh,假设有几种不同类型的CSV格式电话号码,如下所示: "Name","Address","FullPhone" "Mike Wise","101 Abc Drive","4061234567" // Need to separate area code from the rest "Name","Address","Areacode","Phone" "Mike Wise","101 Abc Drive","406","123-4567" // Need to remove the dash i
"Name","Address","FullPhone"
"Mike Wise","101 Abc Drive","4061234567" // Need to separate area code from the rest
"Name","Address","Areacode","Phone"
"Mike Wise","101 Abc Drive","406","123-4567" // Need to remove the dash in the seven-digit phone number
这是第一个CSV文件,其行如下:
"Name","Address","FullPhone"
"Mike Wise","101 Abc Drive","4061234567" // Need to separate area code from the rest
"Name","Address","Areacode","Phone"
"Mike Wise","101 Abc Drive","406","123-4567" // Need to remove the dash in the seven-digit phone number
下面是另一个CSV文件,其行如下:
"Name","Address","FullPhone"
"Mike Wise","101 Abc Drive","4061234567" // Need to separate area code from the rest
"Name","Address","Areacode","Phone"
"Mike Wise","101 Abc Drive","406","123-4567" // Need to remove the dash in the seven-digit phone number
是否有某种sed一行程序将其转换为以下通用格式
"Name","Address","NPA","TELNO"
"Mike Wise","101 Abc Drive","406","1234567"
我更喜欢一行,但如果它必须不止一行,那就这样吧。此外,不需要sed。只是觉得sed可能更容易,尽管我还没有想出一个sed解决方案
$ cat tst.awk
BEGIN { FS=OFS="\",\"" }
{
if (NR==1) {
$3 = "NPA"
$4 = "TELNO\""
}
else {
gsub(/-/,"",$NF)
if (NF==3) {
sub(/.{3}/,"&"OFS,$NF)
}
}
print
}
$ cat file1
"Name","Address","FullPhone"
"Mike Wise","101 Abc Drive","4061234567"
$ awk -f tst.awk file1
"Name","Address","NPA","TELNO"
"Mike Wise","101 Abc Drive","406","1234567"
$ cat file2
"Name","Address","Areacode","Phone"
"Mike Wise","101 Abc Drive","406","123-4567"
$ awk -f tst.awk file2
"Name","Address","NPA","TELNO"
"Mike Wise","101 Abc Drive","406","1234567"
以及一些您没有要求的特定输入,但可能会发生,并且如果发生,将得到正确处理:
$ cat file3
"Name","Address","FullPhone"
"Mike Wise","101 Abc Drive","406-1234-567"
$ awk -f tst.awk file3
"Name","Address","NPA","TELNO"
"Mike Wise","101 Abc Drive","406","1234567"
如果您需要从输入的电话号码中删除空格,而不是只删除-
s,那么只需将gsub(/-/,“”,$NF)
更改为gsub(/[-[:space:]]/,“”,$NF)
或gsub(/[^0-9]/,“”,$NF)
或类似内容
sed '1 c\
"Name","Address","Areacode","Phone"
s/"\([0-9]\{3\}\)\([0-9]\{7\}\)"[[:space:]]*$/"\1","\2"/
s/-\([0-9]\{1,6\}\)"[[:space:]]*$/\1"/
' YourFile
以及一些您没有要求的特定输入,但可能会发生,并且如果发生,将得到正确处理:
$ cat file3
"Name","Address","FullPhone"
"Mike Wise","101 Abc Drive","406-1234-567"
$ awk -f tst.awk file3
"Name","Address","NPA","TELNO"
"Mike Wise","101 Abc Drive","406","1234567"
如果您需要从输入的电话号码中删除空格,而不是只删除-
s,那么只需将gsub(/-/,“”,$NF)
更改为gsub(/[-[:space:]]/,“”,$NF)
或gsub(/[^0-9]/,“”,$NF)
或类似内容
sed '1 c\
"Name","Address","Areacode","Phone"
s/"\([0-9]\{3\}\)\([0-9]\{7\}\)"[[:space:]]*$/"\1","\2"/
s/-\([0-9]\{1,6\}\)"[[:space:]]*$/\1"/
' YourFile
将适用于csv文件格式(也适用于@EdMorton备注后的标题)
:将第一行更改为下一行(强制此行而不是原始页眉)1c\
- 第一个s///将使用s的组功能更改任何ta尾随3位和7位的行(因此1个数据包中有10位),并用双引号括起,每个字段值为3位和7位///
- 第二个s///将使用组功能(由
引用)更改尾随的\1
,后跟1到6位数字,并更改尾随的双引号,而不使用-
-
:将第一行更改为下一行(强制此行而不是原始页眉)1c\
- 第一个s///将使用s的组功能更改任何ta尾随3位和7位的行(因此1个数据包中有10位),并用双引号括起,每个字段值为3位和7位///
- 第二个s///将使用组功能(由
引用)更改尾随的\1
,后跟1到6位数字,并更改尾随的双引号,而不使用-
-
第二行是的,它们在两个不同的文件中,每个文件都有两种格式中的一种。谢谢你提醒我这一点。@EdMorton是的,它们在两个不同的文件中,每个文件都有两种格式中的一种。感谢您提醒我。这些列表位于两个不同的CSV文件中。我还想将相应的标题更改为带有
“NPA”
和“TELNO”
的通用标题。这些列表位于两个不同的CSV文件中。我还想用“NPA”
和“TELNO”
@EdMorton将相应的标题更改为通用标题。非常好的评论,我将为此更改代码,很抱歉,忘记第一次替换时的数字捕捉,更正确实有些困难,现在它工作得很好。你能详细介绍一下第二个sed正则表达式吗?我还不太明白。谢谢@EdMorton说得很好,我会为此更改代码,很抱歉在第一次替换时忘记了数字捕获,更正了一些,现在它工作得很好。你能详细介绍一下第二个sed正则表达式吗?我还不太明白。谢谢