Bash 检查csv字段中的文本大小并转换为字节

Bash 检查csv字段中的文本大小并转换为字节,bash,csv,sed,import,Bash,Csv,Sed,Import,我收到一个小CSV,在linux主机上通常包含20-50行和4个字段。第三个字段必须是字节大小,但我发现它通常是人类可读的格式,并且在值之前有额外的空格。如果第三个字段包含人类可读的文本,是否有一种简单的方法来删除前导空格并将其转换为最接近的字节值(舍入是可以的)?下面的例子 "3","3"," 5815","User 1" "6","12"," 788MB&quo

我收到一个小CSV,在linux主机上通常包含20-50行和4个字段。第三个字段必须是字节大小,但我发现它通常是人类可读的格式,并且在值之前有额外的空格。如果第三个字段包含人类可读的文本,是否有一种简单的方法来删除前导空格并将其转换为最接近的字节值(舍入是可以的)?下面的例子

"3","3","  5815","User 1"
"6","12"," 788MB","User 2"
"2","4"," 983KB","User 3"
"25","4","1600MB","User 4"
"647","201","  19GB","User 5"
使用awk:

$ awk '
BEGIN {
    FS=OFS=","                                   # field delimiter
    a[""]=1                                      # none is one
    a["KB"]=kilo=1024                            # KB defined
    a["GB"]=kilo*(a["MB"]=(kilo*(kilo)))         # defining MB and GB
}
{
    gsub(/^" *| *"$/,"",$3)                      # remove quotes and space
    match($3,/[KMG]B/)                           # extract the term
    $3="\"" $3*a[substr($3,RSTART,RLENGTH)] "\"" # lookup and multiply
}1' file                                         # output
输出

"3","3","5815","User 1"
"6","12","826277888","User 2"
"2","4","1006592","User 3"
"25","4","1677721600","User 4"
"647","201","20401094656","User 5"
它只适用于KB、MB和GB。如果需要,定义更多。另外,我的知识库是旧式的1024 B,当你离开20世纪时,改变它;D

这可能适合您(GNU sed和numfmt):

删除第三个字段和字母
B
的填充(如果存在),并使用其值替换为shell插值的
numfmt
命令。然后
echo
调用插入命令的整行

备选方案:

sed 's/\([KMG]\)B/\1/;' file | numfmt -d \" --from=iec --field 6

这是完美的作品,与您的意见,我将能够修改更大的尺寸插入一个类似的定义,也改变了匹配线[KMGT]。我也是老派,使用1024,所以在那里一切都很好。
sed 's/\([KMG]\)B/\1/;' file | numfmt -d \" --from=iec --field 6