匹配模式,将其保存在变量中,并使用sed/awk/grep将其追加到行的末尾

匹配模式,将其保存在变量中,并使用sed/awk/grep将其追加到行的末尾,awk,sed,Awk,Sed,这是我过去4天一直在努力解决的问题。我读过谷歌和软件的教程,但没有一本能帮我。我把它作为一个问题扔在这里,以便其他人可以尝试并帮助我解决它。我已经用粗糙的方法解决了这个问题,但我在想是否有更聪明的方法。因此有一个包含滚珠轴承及其属性列表的文件。看起来是这样的: <li class="odd first"> <a href="/productcatalogue/prodlink.html?lang=en&amp;imperial=false&amp;p

这是我过去4天一直在努力解决的问题。我读过谷歌和软件的教程,但没有一本能帮我。我把它作为一个问题扔在这里,以便其他人可以尝试并帮助我解决它。我已经用粗糙的方法解决了这个问题,但我在想是否有更聪明的方法。因此有一个包含滚珠轴承及其属性列表的文件。看起来是这样的:

<li class="odd  first">
     <a href="/productcatalogue/prodlink.html?lang=en&amp;imperial=false&amp;prodid=1310003030&amp;pubid=21&amp;WT.oss=&amp;WT.z_oss_boost=0&amp;WT.z_oss_ref=ProductSearch&amp;WT.z_oss_rank=1">33030</a>
    |<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&amp;imperial=false&amp;prodid=1310003030&amp;pubid=21&amp;WT.oss=&amp;WT.z_oss_boost=0&amp;WT.z_oss_ref=ProductSearch&amp;WT.z_oss_rank=1&amp;isTableView=true" class="product-table-link">Tapered roller bearings single row</a>

        |<strong>Width: </strong> 59 mm
        |<strong>Bore diameter: </strong> 150 mm
        |<strong>Outside diameter: </strong> 225 mm
        |<strong>Source: </strong> -


        |<strong>Limiting speed: </strong> 2600 r/min
        |<strong>Reference speed: </strong> 2000 r/min


</li>
<li class="even ">
     <a href="/productcatalogue/prodlink.html?lang=en&amp;imperial=false&amp;prodid=1310000230&amp;pubid=21&amp;WT.oss=&amp;WT.z_oss_boost=0&amp;WT.z_oss_ref=ProductSearch&amp;WT.z_oss_rank=2">30230</a>
    |<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&amp;imperial=false&amp;prodid=1310000230&amp;pubid=21&amp;WT.oss=&amp;WT.z_oss_boost=0&amp;WT.z_oss_ref=ProductSearch&amp;WT.z_oss_rank=2&amp;isTableView=true" class="product-table-link">Tapered roller bearings single row</a>

        |<strong>Width: </strong> 49 mm
        |<strong>Bore diameter: </strong> 150 mm
        |<strong>Outside diameter: </strong> 270 mm
        |<strong>Source: </strong> -


        |<strong>Limiting speed: </strong> 2400 r/min
        |<strong>Reference speed: </strong> 1800 r/min


</li>
<li class="odd  ">
     <a href="/productcatalogue/prodlink.html?lang=en&amp;imperial=false&amp;prodid=1310003024&amp;pubid=21&amp;WT.oss=&amp;WT.z_oss_boost=0&amp;WT.z_oss_ref=ProductSearch&amp;WT.z_oss_rank=3">33024</a>
    |<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&amp;imperial=false&amp;prodid=1310003024&amp;pubid=21&amp;WT.oss=&amp;WT.z_oss_boost=0&amp;WT.z_oss_ref=ProductSearch&amp;WT.z_oss_rank=3&amp;isTableView=true" class="product-table-link">Tapered roller bearings single row</a>

        |<strong>Width: </strong> 48 mm
        |<strong>Bore diameter: </strong> 120 mm
        |<strong>Outside diameter: </strong> 180 mm
        |<strong>Source: </strong> -


        |<strong>Limiting speed: </strong> 3400 r/min
        |<strong>Reference speed: </strong> 2600 r/min


</li>
<li class="even ">
     <a href="/productcatalogue/prodlink.html?lang=en&amp;imperial=false&amp;prodid=1310003022&amp;pubid=21&amp;WT.oss=&amp;WT.z_oss_boost=0&amp;WT.z_oss_ref=ProductSearch&amp;WT.z_oss_rank=4">33022</a>
    |<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&amp;imperial=false&amp;prodid=1310003022&amp;pubid=21&amp;WT.oss=&amp;WT.z_oss_boost=0&amp;WT.z_oss_ref=ProductSearch&amp;WT.z_oss_rank=4&amp;isTableView=true" class="product-table-link">Tapered roller bearings single row</a>

        |<strong>Width: </strong> 47 mm
        |<strong>Bore diameter: </strong> 110 mm
        |<strong>Outside diameter: </strong> 170 mm
        |<strong>Source: </strong> -


        |<strong>Limiting speed: </strong> 3600 r/min
        |<strong>Reference speed: </strong> 2600 r/min


</li>
<li class="odd  ">
     <a href="/productcatalogue/prodlink.html?lang=en&amp;imperial=false&amp;prodid=1310003220&amp;pubid=21&amp;WT.oss=&amp;WT.z_oss_boost=0&amp;WT.z_oss_ref=ProductSearch&amp;WT.z_oss_rank=5">33220</a>
    |<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&amp;imperial=false&amp;prodid=1310003220&amp;pubid=21&amp;WT.oss=&amp;WT.z_oss_boost=0&amp;WT.z_oss_ref=ProductSearch&amp;WT.z_oss_rank=5&amp;isTableView=true" class="product-table-link">Tapered roller bearings single row</a>

        |<strong>Width: </strong> 63 mm
        |<strong>Bore diameter: </strong> 100 mm
        |<strong>Outside diameter: </strong> 180 mm
        |<strong>Source: </strong> -


        |<strong>Limiting speed: </strong> 3600 r/min
        |<strong>Reference speed: </strong> 2400 r/min              
</li>

这是sed版本。我不得不承认,使用sed来交换不同行上的单词顺序并不是那么容易


用于通用文本操作的UNIX工具是awk:


将HTML拖拽到机器可读的输出中是徒劳的。看看你是否能连接到最初产生HTML的任何东西。我正试图从一个网站上搜集一些数据,供我自己分析。所以我无法控制源代码。请使用正确的工具节省您的时间,尝试xmllint来完成此工作
33030 |Product category: Tapered roller bearings single row |Width: 59 mm |Bore diameter: 150 mm |Outside diameter: 225 mm |Source: - |Limiting speed: 2600 r/min |Reference speed: 2000 r/min | 1310003030 
30230 |Product category: Tapered roller bearings single row |Width: 49 mm |Bore diameter: 150 mm |Outside diameter: 270 mm |Source: - |Limiting speed: 2400 r/min |Reference speed: 1800 r/min | 1310000230 
33024 |Product category: Tapered roller bearings single row |Width: 48 mm |Bore diameter: 120 mm |Outside diameter: 180 mm |Source: - |Limiting speed: 3400 r/min |Reference speed: 2600 r/min | 1310003024 
33022 |Product category: Tapered roller bearings single row |Width: 47 mm |Bore diameter: 110 mm |Outside diameter: 170 mm |Source: - |Limiting speed: 3600 r/min |Reference speed: 2600 r/min | 1310003022
sed -nre '
/^ *<a/{
    h;s/^.*prodid=([0-9]+).*$/ |\1/;x;s_^.*>([0-9]+)</a.*$_\1_
    :back
    N
    s/\n.*(Product category:).*\">(.*)<.*$/ |\1 \2/
    s_\n.*strong>(.*)</strong>(.*)$_ |\1 \2_
    /<\/li>$/ !bback
    /<\/li>$/ {
        s/<\/li>$//;G;s/\n//g;s/  */ /g;p
    }
}
' file
$ cat tst.awk
BEGIN {
    FS = "[[:space:]]*<[^>]+>[[:space:]]*"
    OFS = " |"
}

/^[[:space:]]*<a href/{
    split($0,a,/.*prodid=|&.*/)
    prodid = a[2]
    prodnr = $(NF-1)
}

/<strong>/ {
    name  = $2
    value = ($NF == "" ? $(NF-1) : $NF)
    sub(/[[:space:]]+$/,"",value)
    n2v[name] = value
    if (!seen[name]++) {
        names[++numNames] = name
    }
}

/<\/li>/ {
    printf "%s%s", prodnr, OFS
    for (nameNr=1; nameNr<=numNames; nameNr++) {
        name  = names[nameNr]
        value = n2v[name]
        printf "%s %s%s", name, value, OFS
    }
    print " " prodid
}
$ awk -f tst.awk file
33030 |Product category: Tapered roller bearings single row |Width: 59 mm |Bore diameter: 150 mm |Outside diameter: 225 mm |Source: - |Limiting speed: 2600 r/min |Reference speed: 2000 r/min | 1310003030
30230 |Product category: Tapered roller bearings single row |Width: 49 mm |Bore diameter: 150 mm |Outside diameter: 270 mm |Source: - |Limiting speed: 2400 r/min |Reference speed: 1800 r/min | 1310000230
33024 |Product category: Tapered roller bearings single row |Width: 48 mm |Bore diameter: 120 mm |Outside diameter: 180 mm |Source: - |Limiting speed: 3400 r/min |Reference speed: 2600 r/min | 1310003024
33022 |Product category: Tapered roller bearings single row |Width: 47 mm |Bore diameter: 110 mm |Outside diameter: 170 mm |Source: - |Limiting speed: 3600 r/min |Reference speed: 2600 r/min | 1310003022
33220 |Product category: Tapered roller bearings single row |Width: 63 mm |Bore diameter: 100 mm |Outside diameter: 180 mm |Source: - |Limiting speed: 3600 r/min |Reference speed: 2400 r/min | 1310003220