匹配模式,将其保存在变量中,并使用sed/awk/grep将其追加到行的末尾
这是我过去4天一直在努力解决的问题。我读过谷歌和软件的教程,但没有一本能帮我。我把它作为一个问题扔在这里,以便其他人可以尝试并帮助我解决它。我已经用粗糙的方法解决了这个问题,但我在想是否有更聪明的方法。因此有一个包含滚珠轴承及其属性列表的文件。看起来是这样的:匹配模式,将其保存在变量中,并使用sed/awk/grep将其追加到行的末尾,awk,sed,Awk,Sed,这是我过去4天一直在努力解决的问题。我读过谷歌和软件的教程,但没有一本能帮我。我把它作为一个问题扔在这里,以便其他人可以尝试并帮助我解决它。我已经用粗糙的方法解决了这个问题,但我在想是否有更聪明的方法。因此有一个包含滚珠轴承及其属性列表的文件。看起来是这样的: <li class="odd first"> <a href="/productcatalogue/prodlink.html?lang=en&imperial=false&p
<li class="odd first">
<a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310003030&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=1">33030</a>
|<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310003030&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=1&isTableView=true" class="product-table-link">Tapered roller bearings single row</a>
|<strong>Width: </strong> 59 mm
|<strong>Bore diameter: </strong> 150 mm
|<strong>Outside diameter: </strong> 225 mm
|<strong>Source: </strong> -
|<strong>Limiting speed: </strong> 2600 r/min
|<strong>Reference speed: </strong> 2000 r/min
</li>
<li class="even ">
<a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310000230&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=2">30230</a>
|<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310000230&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=2&isTableView=true" class="product-table-link">Tapered roller bearings single row</a>
|<strong>Width: </strong> 49 mm
|<strong>Bore diameter: </strong> 150 mm
|<strong>Outside diameter: </strong> 270 mm
|<strong>Source: </strong> -
|<strong>Limiting speed: </strong> 2400 r/min
|<strong>Reference speed: </strong> 1800 r/min
</li>
<li class="odd ">
<a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310003024&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=3">33024</a>
|<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310003024&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=3&isTableView=true" class="product-table-link">Tapered roller bearings single row</a>
|<strong>Width: </strong> 48 mm
|<strong>Bore diameter: </strong> 120 mm
|<strong>Outside diameter: </strong> 180 mm
|<strong>Source: </strong> -
|<strong>Limiting speed: </strong> 3400 r/min
|<strong>Reference speed: </strong> 2600 r/min
</li>
<li class="even ">
<a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310003022&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=4">33022</a>
|<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310003022&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=4&isTableView=true" class="product-table-link">Tapered roller bearings single row</a>
|<strong>Width: </strong> 47 mm
|<strong>Bore diameter: </strong> 110 mm
|<strong>Outside diameter: </strong> 170 mm
|<strong>Source: </strong> -
|<strong>Limiting speed: </strong> 3600 r/min
|<strong>Reference speed: </strong> 2600 r/min
</li>
<li class="odd ">
<a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310003220&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=5">33220</a>
|<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310003220&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=5&isTableView=true" class="product-table-link">Tapered roller bearings single row</a>
|<strong>Width: </strong> 63 mm
|<strong>Bore diameter: </strong> 100 mm
|<strong>Outside diameter: </strong> 180 mm
|<strong>Source: </strong> -
|<strong>Limiting speed: </strong> 3600 r/min
|<strong>Reference speed: </strong> 2400 r/min
</li>
这是sed版本。我不得不承认,使用sed来交换不同行上的单词顺序并不是那么容易
用于通用文本操作的UNIX工具是awk:
将HTML拖拽到机器可读的输出中是徒劳的。看看你是否能连接到最初产生HTML的任何东西。我正试图从一个网站上搜集一些数据,供我自己分析。所以我无法控制源代码。请使用正确的工具节省您的时间,尝试xmllint来完成此工作
33030 |Product category: Tapered roller bearings single row |Width: 59 mm |Bore diameter: 150 mm |Outside diameter: 225 mm |Source: - |Limiting speed: 2600 r/min |Reference speed: 2000 r/min | 1310003030
30230 |Product category: Tapered roller bearings single row |Width: 49 mm |Bore diameter: 150 mm |Outside diameter: 270 mm |Source: - |Limiting speed: 2400 r/min |Reference speed: 1800 r/min | 1310000230
33024 |Product category: Tapered roller bearings single row |Width: 48 mm |Bore diameter: 120 mm |Outside diameter: 180 mm |Source: - |Limiting speed: 3400 r/min |Reference speed: 2600 r/min | 1310003024
33022 |Product category: Tapered roller bearings single row |Width: 47 mm |Bore diameter: 110 mm |Outside diameter: 170 mm |Source: - |Limiting speed: 3600 r/min |Reference speed: 2600 r/min | 1310003022
sed -nre '
/^ *<a/{
h;s/^.*prodid=([0-9]+).*$/ |\1/;x;s_^.*>([0-9]+)</a.*$_\1_
:back
N
s/\n.*(Product category:).*\">(.*)<.*$/ |\1 \2/
s_\n.*strong>(.*)</strong>(.*)$_ |\1 \2_
/<\/li>$/ !bback
/<\/li>$/ {
s/<\/li>$//;G;s/\n//g;s/ */ /g;p
}
}
' file
$ cat tst.awk
BEGIN {
FS = "[[:space:]]*<[^>]+>[[:space:]]*"
OFS = " |"
}
/^[[:space:]]*<a href/{
split($0,a,/.*prodid=|&.*/)
prodid = a[2]
prodnr = $(NF-1)
}
/<strong>/ {
name = $2
value = ($NF == "" ? $(NF-1) : $NF)
sub(/[[:space:]]+$/,"",value)
n2v[name] = value
if (!seen[name]++) {
names[++numNames] = name
}
}
/<\/li>/ {
printf "%s%s", prodnr, OFS
for (nameNr=1; nameNr<=numNames; nameNr++) {
name = names[nameNr]
value = n2v[name]
printf "%s %s%s", name, value, OFS
}
print " " prodid
}
$ awk -f tst.awk file
33030 |Product category: Tapered roller bearings single row |Width: 59 mm |Bore diameter: 150 mm |Outside diameter: 225 mm |Source: - |Limiting speed: 2600 r/min |Reference speed: 2000 r/min | 1310003030
30230 |Product category: Tapered roller bearings single row |Width: 49 mm |Bore diameter: 150 mm |Outside diameter: 270 mm |Source: - |Limiting speed: 2400 r/min |Reference speed: 1800 r/min | 1310000230
33024 |Product category: Tapered roller bearings single row |Width: 48 mm |Bore diameter: 120 mm |Outside diameter: 180 mm |Source: - |Limiting speed: 3400 r/min |Reference speed: 2600 r/min | 1310003024
33022 |Product category: Tapered roller bearings single row |Width: 47 mm |Bore diameter: 110 mm |Outside diameter: 170 mm |Source: - |Limiting speed: 3600 r/min |Reference speed: 2600 r/min | 1310003022
33220 |Product category: Tapered roller bearings single row |Width: 63 mm |Bore diameter: 100 mm |Outside diameter: 180 mm |Source: - |Limiting speed: 3600 r/min |Reference speed: 2400 r/min | 1310003220