Linux Awk:loop&；是否将不同的行保存到不同的文件？_Linux_Bash_Shell_Awk

Linux Awk:loop&；是否将不同的行保存到不同的文件？

linux bash shell awk

Linux Awk:loop&；是否将不同的行保存到不同的文件？,linux,bash,shell,awk,Linux,Bash,Shell,Awk,我使用shell脚本在一系列大文件上循环： i=0 while read line do # get first char of line first=`echo "$line" | head -c 1` # make output filename name="$first" if [ "$first" = "," ]; then name='comma' fi if [ "$first" = "." ]; then

我使用shell脚本在一系列大文件上循环：

i=0
while read line
do

    # get first char of line
    first=`echo "$line" | head -c 1`

    # make output filename
    name="$first"
    if [ "$first" = "," ]; then
        name='comma'
    fi
    if [ "$first" = "." ]; then
        name='period'
    fi

    # save line to new file
    echo "$line" >> "$2/$name.txt"

    # show live counter and inc
    echo -en "\rLines:\t$i"
    ((i++))

done <$file

i=0
读行时
做
#获取行的第一个字符
first=`echo“$line”| head-c1`
#生成输出文件名
name=“$first”
如果[“$first”=”，“]；然后
name='逗号'
fi
如果[“$first”=”]；然后
name='period'
fi
#将行保存到新文件
回显“$line”>>“$2/$name.txt”
#显示现场计数器和公司
echo-en“\r行：\t$i”
（（i++）
完成这在bash中肯定可以更有效地完成
举个例子：echo foo | head
执行fork（）
调用，创建子shell，设置管道，启动外部head
程序。。。根本没有理由这么做
如果您想要一行的第一个字符，而不需要对子流程进行任何低效的破坏，那么它就这么简单：
c=${line:0:1}

<>我也会认真考虑排序你的输入，所以当你看到一个新的第一个字符时，你只能重新打开输出文件，而不是每次通过循环。
也就是说，使用sort进行预处理（如替换#！/usr/bin/awk-f
开始{
点列表=“，.？！-”“
pnamelist=“逗号句点问题\标记感叹号\标记连字符撇号”
pcount=拆分（点列表，点）
ncount=split（pnamelist，pnames）
如果（pcount！=ncount）{print“错误：计数不匹配，pcount:，pcount，ncount:，ncount；退出}
对于（i=1；i点查[substr（$0,1,1）]”.txt“
printf“\r%6d”，i++
}
结束{
printf“\n”
}

BEGIN
块构建一个关联数组，这样您就可以执行putch\u查找[”，“]
并获得“逗号”
主块只是查找文件名并将行输出到文件中。在AWK中，
第一次截断文件并随后追加。如果不希望截断现有文件，则将其更改为>
（否则不要使用>
）.
一些加快速度的方法：
不要使用echo/head来获取第一个字符。你是
每行至少产生两个额外进程。相反，
使用bash的参数扩展工具获取第一个字符
使用if-elif避免对所有
可能性
如果您使用的是Bash4.0或更高版本，那么最好使用关联数组
存储输出文件名，而不是对照
$first
在每行的大if语句中
如果您没有支持关联的bash版本
数组中，将if语句替换为以下语句
if [[ "$first" = "," ]]; then
    name='comma'
elif [[ "$first" = "." ]]; then
    name='period'
else
    name="$first"
fi 


但建议如下。请注意，如果未给出名称（仅供参考），则使用$REPLY
作为read
使用的默认变量
declare-A输出\u fname
输出[“，”]=逗号
输出[“]”=期间
输出[“？”]=问号
输出[“！”]=感叹号
输出[“-”]=连字符
输出[“'”]=撇号
i=0
读书时
做
#获取行的第一个字符
first=${回复：0:1}
#生成输出文件名
name=${output[$first]：-$first}
#将行保存到新文件
echo$REPLY>>“$name.txt”
#显示现场计数器和公司
echo-en“\r$i”
（（i++）
完成又一次拍摄：
declare -i i=0
declare -A names
while read line; do
    first=${line:0:1}
    if [[ -z ${names[$first]} ]]; then
        case $first in
            ,) names[$first]="$2/comma.txt" ;;
            .) names[$first]="$2/period.txt" ;;
            *) names[$first]="$2/$first.txt" ;;
        esac
    fi
    printf "%s\n" "$line" >> "${names[$first]}"
    printf "\rLine $((++i))"
done < "$file"

$[]
不推荐使用，请使用（（i++）
或（（i++=1））
。此外，当回送
变量（以及大多数其他情况下使用变量）时，应将其引用为：回送“$LINE”
。最好使用小写或混合大小写变量名，以避免与外壳或环境变量的名称冲突。@DennisWilliamson，谢谢。更新。约定点——大写名称仅适用于环境变量或内置变量，不适用于常规的内部外壳变量。此外，您正在打开out每次循环都要输出文件，这比计算其名称要昂贵得多。@CharlesDuffy这是我在原始代码中使用的，直到Dennis指出它并对其进行了更改。@CharlesDuffy：很好。我将更改变量名称；我将根据您的代码在必要时对文件进行排序和打开sary.awk是否缓存文件句柄，或者这是在每一行上执行一对新的open（）和close（）调用？我只是strace
d它并打开（并保持打开）每个文件都有单独的文件描述符。实际上，它似乎将写操作缓存在4K缓冲区中。很好，那么——我讨厌看到不必要地排除纯bash解决方案，但这应该也有不错的性能。由于putch\u lookup
非常小，只需将其写出即可更易于维护：putch\u lookup[，]=“逗号”；点状查找[“]=”句点“…
是否要换行？printf'%s\n'$line'>&4
#!/usr/bin/awk -f
BEGIN {
    punctlist = ", . ? ! - '"
    pnamelist = "comma period question_mark exclamation_mark hyphen apostrophe"
    pcount = split(punctlist, puncts)
    ncount = split(pnamelist, pnames)
    if (pcount != ncount) {print "error: counts don't match, pcount:", pcount, "ncount:", ncount; exit}
    for (i = 1; i <= pcount; i++) {
        punct_lookup[puncts[i]] = pnames[i]
    }
}
{
    print > punct_lookup[substr($0, 1, 1)] ".txt"
    printf "\r%6d", i++
}
END {
    printf "\n"
}

if [[ "$first" = "," ]]; then
    name='comma'
elif [[ "$first" = "." ]]; then
    name='period'
else
    name="$first"
fi 

declare -A OUTPUT_FNAMES
output[","]=comma
output["."]=period
output["?"]=question_mark
output["!"]=exclamation_mark
output["-"]=hyphen
output["'"]=apostrophe
i=0
while read
do

    # get first char of line
    first=${REPLY:0:1}

    # make output filename
    name=${output[$first]:-$first}

    # save line to new file
    echo $REPLY >> "$name.txt"

    # show live counter and inc
    echo -en "\r$i"
    ((i++))

done <$file

declare -i i=0
declare -A names
while read line; do
    first=${line:0:1}
    if [[ -z ${names[$first]} ]]; then
        case $first in
            ,) names[$first]="$2/comma.txt" ;;
            .) names[$first]="$2/period.txt" ;;
            *) names[$first]="$2/$first.txt" ;;
        esac
    fi
    printf "%s\n" "$line" >> "${names[$first]}"
    printf "\rLine $((++i))"
done < "$file"

awk -v dir="$2" '
    {
        first = substr($0,1,1)
        if (! (first in names)) {
            if (first == ",")      names[first] = dir "/comma.txt"
            else if (first == ".") names[first] = dir "/period.txt"
            else                   names[first] = dir "/" first ".txt"
        }
        print > names[first]
        printf("\rLine %d", NR)
    }
'