String 将ASCII字符字符串转换为相应的小数字符串_String_Bash_Awk

String 将ASCII字符字符串转换为相应的小数字符串

string bash awk

String 将ASCII字符字符串转换为相应的小数字符串,string,bash,awk,String,Bash,Awk,请允许我向你介绍破坏我周末生活的问题。我有4列生物数据 @ID:::12345/1 ACGACTACGA text !"#$%vwxyz @ID:::12345/2 TATGACGACTA text :;<=>?VWXYZ 第一部分很简单，但我坚持第二部分。我尝试过使用awk顺序函数和sprintf；我只能让前者处理字符串中的第一个字符，而只能让后者将十六进制转换为十进制，而不能使用空格。还尝试了bash函数 $ od -t d1 test3 | awk 'BEGIN{OFS=

请允许我向你介绍破坏我周末生活的问题。我有4列生物数据

@ID:::12345/1 ACGACTACGA text !"#$%vwxyz  
@ID:::12345/2 TATGACGACTA text :;<=>?VWXYZ

第一部分很简单，但我坚持第二部分。我尝试过使用awk顺序函数和sprintf；我只能让前者处理字符串中的第一个字符，而只能让后者将十六进制转换为十进制，而不能使用空格。还尝试了bash函数

$ od -t d1 test3 | awk 'BEGIN{OFS=","}{i = $1; $1 = ""; print $0}'

但不知道如何在awk中调用此函数。我更喜欢使用awk，因为我有一些下游操作也可以在awk中完成

非常感谢您的支持

Perl soltuion：

perl -lnae '$F[0] =~ s%[:/]%-%g; $F[-1] =~ s/(.)/ord($1) . ","/ge; chop $F[-1]; print "@F";' < input

perl-lnae'$F[0]=~s%[：/]%-%g$F[-1]=~s/（）/ord（$1）。“，”/ge；印章$F[-1]；打印“@F””<输入

第一个替换用破折号替换第一个字段中的

：

和

，第二个替换用ord和逗号替换最后一个字段中的每个字符，

chop

删除最后一个逗号。

使用

awk -f ord.awk  --source '{
    # replace : with - in the first field
    gsub(/:/,"-",$1)

    # calculate the ordinal by looping over the characters in the fourth field
    res=ord($4)
    for(i=2;i<=length($4);i++) {
        res=res","ord(substr($4,i))
    }
    $4=res
}1' file

awk 'BEGIN{ _ord_init()}
function _ord_init(low, high, i, t)
{
    low = sprintf("%c", 7) # BEL is ascii 7
    if (low == "\a") {    # regular ascii
        low = 0
        high = 127
    } else if (sprintf("%c", 128 + 7) == "\a") {
        # ascii, mark parity
        low = 128
        high = 255
    } else {        # ebcdic(!)
        low = 0
        high = 255
    }

    for (i = low; i <= high; i++) {
        t = sprintf("%c", i)
        _ord_[t] = i
    }
}
{
    # replace : with - in the first field
    gsub(/:/,"-",$1)

    # calculate the ordinal by looping over the characters in the fourth field
    res=_ord_[substr($4,1,1)]
    for(i=2;i<=length($4);i++) {
        res=res","_ord_[substr($4,i,1)]
    }
    $4=res
}1' file

以下是

ord.awk

（原样取自：）

非常感谢@dogbane…有没有办法只在第一个脚本中定义ord.awk的相关部分，而不必在-f ord.awkeven better下包含第二个脚本的全部内容…因为我只需要ASCII字符33-126，大概我只需要第一个if条款——我会按原样离开，因为这样更容易转移，谢谢。实际上，我发现这个perl一行程序非常有用

@ID---12345/1 ACGACTACGA text 33,34,35,36,37,118,119,120,121,122
@ID---12345/2 TATGACGACTA text 58,59,60,61,62,63,86,87,88,89,90

# ord.awk --- do ord and chr

# Global identifiers:
#    _ord_:        numerical values indexed by characters
#    _ord_init:    function to initialize _ord_



BEGIN    { _ord_init() }

function _ord_init(    low, high, i, t)
{
    low = sprintf("%c", 7) # BEL is ascii 7
    if (low == "\a") {    # regular ascii
        low = 0
        high = 127
    } else if (sprintf("%c", 128 + 7) == "\a") {
        # ascii, mark parity
        low = 128
        high = 255
    } else {        # ebcdic(!)
        low = 0
        high = 255
    }

    for (i = low; i <= high; i++) {
        t = sprintf("%c", i)
        _ord_[t] = i
    }
}

function ord(str, c)
{
    # only first character is of interest
    c = substr(str, 1, 1)
    return _ord_[c]
}

function chr(c)
{
    # force c to be numeric by adding 0
    return sprintf("%c", c + 0)
}

awk 'BEGIN{ _ord_init()}
function _ord_init(low, high, i, t)
{
    low = sprintf("%c", 7) # BEL is ascii 7
    if (low == "\a") {    # regular ascii
        low = 0
        high = 127
    } else if (sprintf("%c", 128 + 7) == "\a") {
        # ascii, mark parity
        low = 128
        high = 255
    } else {        # ebcdic(!)
        low = 0
        high = 255
    }

    for (i = low; i <= high; i++) {
        t = sprintf("%c", i)
        _ord_[t] = i
    }
}
{
    # replace : with - in the first field
    gsub(/:/,"-",$1)

    # calculate the ordinal by looping over the characters in the fourth field
    res=_ord_[substr($4,1,1)]
    for(i=2;i<=length($4);i++) {
        res=res","_ord_[substr($4,i,1)]
    }
    $4=res
}1' file