如何使用shell脚本/awk/gawk/sed命令删除第一行中特定单词（.com）之后的最后一个单词，并将数字拆分为下一行_Shell_Awk_Sed_Scripting

如何使用shell脚本/awk/gawk/sed命令删除第一行中特定单词（.com）之后的最后一个单词，并将数字拆分为下一行

shell awk sed scripting

如何使用shell脚本/awk/gawk/sed命令删除第一行中特定单词（.com）之后的最后一个单词，并将数字拆分为下一行,shell,awk,sed,scripting,Shell,Awk,Sed,Scripting,我有下面的示例csv文件需要删除第一列主机中.com之后的最后一个单词。如果第一列主机中有任何数字，则该数字应在第二列端口中打印。删除http//和https:// 完整的文件几乎是12KB，这里我附上了样本。示例csv文件 Host Port https://abcd03.face.op.api.example.com/v1/authent/token?gra

我有下面的示例csv文件

需要删除第一列主机中.com之后的最后一个单词。如果第一列主机中有任何数字，则该数字应在第二列端口中打印。删除http//和https:// 完整的文件几乎是12KB，这里我附上了样本。示例csv文件

    Host                                                             Port
https://abcd03.face.op.api.example.com/v1/authent/token?grant_type,443 
https://defghu04.core.op.api.example.com/hello1/v4/tokens,443
https://abcdo3.xyz.def.tata.com/v1/xyz/accesstoken?grant_type,443
https://abcdef.clever.api.sell.com/samsung/v1/managements/autoPayments,443
https://abcdefe.orsd.api.ssample.com/auth/v1/customer-management/interacting,443
http://century.test.ext.sample.com:6102/ABC1/Genereate/CreditSale,80
http://century.test.ext.extra.com:6102/ABC2/proxy/sales,80
http://century.test.ext.sell.com:6550/commerce/1.x/transactionProcessor,80
https://century.test.ext.basic.com:6446/tokenize,443
https://sell.test.ext.state.com:6446/transfer,443
https://century.test.ext.sell.com:6446/delete,443

预期结果：

abcd03.face.op.api.example.com,443
defghu04.core.op.api.example.com,443
abcdo3.xyz.def.tata.com,443
abcdef.clever.api.sell.com,443
abcdefe.orsd.api.ssample.com,443
century.test.ext.sample.com,6102
century.test.ext.extra.com,6102
century.test.ext.sell.com,6550
century.test.ext.basic.com,6446
sell.test.ext.state.com,6446
century.test.ext.sell.com,6446

提前感谢您的帮助。

考虑到您的实际输入文件将与显示的示例相同，请尝试以下内容

awk '
BEGIN{
  FS=OFS=","
}
match($0,/\/\/.*\.com:[0-9]+/){
  val=substr($0,RSTART+2,RLENGTH-2)
  sub(/:/,",",val)
  print val
  next
}
match($0,/\/\/.*\.com[^/]*/){
  val=substr($0,RSTART+2,RLENGTH-2)
  print val,$NF
}
'  Input_file

说明：增加对以上内容的详细说明

awk '                                      ##Starting awk program from here.
BEGIN{                                     ##Starting BEGIN section from here.
  FS=OFS=","                               ##Setting FS and OFS as comma here.
}
match($0,/\/\/.*\.com:[0-9]+/){            ##Matching from // to till .com digits then if match found then do following.
  val=substr($0,RSTART+2,RLENGTH-2)        ##Creating val which has sub-string of matched value above.
  sub(/:/,",",val)                         ##Substituting colon with comma here in val.
  print val                                ##Printing val here.
  next                                     ##next will skip all further statements.
}
match($0,/\/\/.*\.com[^/]*/){              ##Matching from // to .com here, followed by /
  val=substr($0,RSTART+2,RLENGTH-2)        ##Creating val which has sub-string of current line.
  print val,$NF                            ##Printing val and last field here.
}
' Input_file                               ##Mentioning Input_file name here.

考虑到您的实际输入文件将与所示示例相同，请尝试以下内容

awk '
BEGIN{
  FS=OFS=","
}
match($0,/\/\/.*\.com:[0-9]+/){
  val=substr($0,RSTART+2,RLENGTH-2)
  sub(/:/,",",val)
  print val
  next
}
match($0,/\/\/.*\.com[^/]*/){
  val=substr($0,RSTART+2,RLENGTH-2)
  print val,$NF
}
'  Input_file

说明：增加对以上内容的详细说明

awk '                                      ##Starting awk program from here.
BEGIN{                                     ##Starting BEGIN section from here.
  FS=OFS=","                               ##Setting FS and OFS as comma here.
}
match($0,/\/\/.*\.com:[0-9]+/){            ##Matching from // to till .com digits then if match found then do following.
  val=substr($0,RSTART+2,RLENGTH-2)        ##Creating val which has sub-string of matched value above.
  sub(/:/,",",val)                         ##Substituting colon with comma here in val.
  print val                                ##Printing val here.
  next                                     ##next will skip all further statements.
}
match($0,/\/\/.*\.com[^/]*/){              ##Matching from // to .com here, followed by /
  val=substr($0,RSTART+2,RLENGTH-2)        ##Creating val which has sub-string of current line.
  print val,$NF                            ##Printing val and last field here.
}
' Input_file                               ##Mentioning Input_file name here.

使用bash时，请尝试：

while IFS=, read -r url port; do
    if [[ $url =~ https?://([^/:]+)(:([0-9]+))? ]]; then
        [[ -n ${BASH_REMATCH[3]} ]] && port="${BASH_REMATCH[3]}"
        # if the port number is included in the url, replace the 2nd field with it
        echo "${BASH_REMATCH[1]},$port"
    fi
done < file.csv

使用bash时，请尝试：

while IFS=, read -r url port; do
    if [[ $url =~ https?://([^/:]+)(:([0-9]+))? ]]; then
        [[ -n ${BASH_REMATCH[3]} ]] && port="${BASH_REMATCH[3]}"
        # if the port number is included in the url, replace the 2nd field with it
        echo "${BASH_REMATCH[1]},$port"
    fi
done < file.csv

这可能适用于GNU sed：

sed -E 's#^https?://##;s#/[^,]*##;s/:([^,]*).*/,\1/' file

拆下前绳

去掉中间的绳子

如果端口已存在，请删除第二列

请参见演示

备选方案：

sed -E 's#^https?://(([^:]*):([^/]*).*(,).*|([^/]*)/.*(,.*))#\2\4\3\5\6#' file

这可能适用于GNU sed：

sed -E 's#^https?://##;s#/[^,]*##;s/:([^,]*).*/,\1/' file

拆下前绳

去掉中间的绳子

如果端口已存在，请删除第二列

请参见演示

备选方案：

sed -E 's#^https?://(([^:]*):([^/]*).*(,).*|([^/]*)/.*(,.*))#\2\4\3\5\6#' file

请向我们展示您在解决方案方面的尝试。域名是否始终为.com？有时不能是.org、.edu等吗？总是考虑这些事情，以及应该如何处理它们。@mathguy是的，只有.com和.org需要处理。@Beta我已经在第一个条件sed's/com.*/com/'| sed's/org.*/org/'中尝试了这个命令，但它正在删除另一行，第三个条件我正在使用这个命令sed-E's_^https://，我在评论中补充道，第二种情况让搜索变得很累，几乎花了一周的时间，但没有luckThanks@RavinderSingh13。当然可以，请告诉我们您在解决方案上的尝试。域名是否始终为.com？有时不能是.org、.edu等吗？总是考虑这些事情，以及应该如何处理它们。@mathguy是的，只有.com和.org需要处理。@Beta我已经在第一个条件sed's/com.*/com/'| sed's/org.*/org/'中尝试了这个命令，但它正在删除另一行，第三个条件我正在使用这个命令sed-E's_^https://，我在评论中补充道，第二种情况让搜索变得很累，几乎花了一周的时间，但没有luckThanks@RavinderSingh13。当然，我会继续做下去的。我已经接受了你的建议，现在我正在使用awk和sed命令，并得到了解释：@KalpanaPinninty，欢迎你的欢呼，在这个伟大的论坛上愉快地学习，继续学习，并不断发布好的问题和答案。一般问题，刚才我想到，如果有人提到这个答案，可能会有帮助，有时主机会是ip地址，例如：10.20.34.459034在这种情况下，我们可以添加awk'BEGIN{FS=OFS=，}match$0，/\/\/.\/.\.com:[0-9]+/{val=substr$0，RSTART+2，RLENGTH-2 sub/：/，，，val print val val-val-next}match$0，/\/\/.\/.[^/]*/{val=substr$0，RSTART+2，RLENGTH-2打印val，$NF}匹配$0，/./././.\[0-9]*。{val=substr$0，RSTART+2，RLENGTH-2打印val，$NF}"输入,_File@KalpanaPinninty，现在这将是一个完全不同的问题：因为你的样本也会发生变化，未来的用户可能会感到困惑，因为还有其他答案，我请求你是否可以为它添加一个新的问题，请在其中添加你的努力，干杯，让我知道一旦完成，我们将在那里讨论。我教了same，谢谢你的意见。干杯：我已经接受了你的意见，现在我正在使用awk和sed命令，也得到了解释：@KalpanaPinninty，欢迎你的干杯，在这个伟大的论坛上愉快地学习，继续学习，继续发布好的问题和答案。如果有人提到这个答案，我会想到一个普通的问题也许这会有帮助，有时主机将是ip地址，例如：10.20.34.459034在这种情况下，我们可以添加awk'BEGIN{FS=OFS=，}match$0，/\/\/\/.\\/.\.com:[0-9]+/{val=substr$0，RSTART+2，RLENGTH-2 sub/：/，，，val print val-val-next}match$0，/\/\/.\/.[^/\/{val=substr 0，RSTART+2，RLENGTH-2 print val，$0，/./.*./././.-.-.[。{val=substr$0，RSTART+2，RLENGTH-2打印val，$NF}"输入,_File@KalpanaPinninty，现在这将是一个完全不同的问题：因为你的样本也会发生变化，未来的用户可能会感到困惑，因为还有其他答案，我请求你是否可以为它添加一个新的问题，请在其中添加你的努力，干杯，让我知道一旦完成，我们将在那里讨论。我教了same，谢谢你的意见。干杯：