Linux 如何计算文件中的数字/字母数？_Linux_Bash_Shell_Wc

Linux 如何计算文件中的数字/字母数？

linux bash shell

Linux 如何计算文件中的数字/字母数？,linux,bash,shell,wc,Linux,Bash,Shell,Wc,我尝试在Bash中计算文件中的数字和字母数。我知道我可以使用wc-c文件来计算字符数，但如何将其固定为字母和第二个数字？您可以使用tr通过组合-c（补码）和-d（删除）标志来仅保留字母数字字符。从那时起，这只是一些管道问题： $ cat myfile.txr | tr -cd [:alnum:] | wc -c 要计算字母和数字的数量，您可以将grep与wc组合使用： grep -o [a-z] myfile | wc -c grep -o [0-9] myfile | wc -c 只

我尝试在Bash中计算文件中的数字和字母数。

我知道我可以使用

wc-c文件

来计算字符数，但如何将其固定为字母和第二个数字？

您可以使用

tr

通过组合

-c

（补码）和

-d

（删除）标志来仅保留字母数字字符。从那时起，这只是一些管道问题：

$ cat myfile.txr | tr -cd [:alnum:] | wc -c

要计算字母和数字的数量，您可以将

grep

与

wc

组合使用：

 grep -o [a-z] myfile | wc -c
 grep -o [0-9] myfile | wc -c

只要稍加调整，你就可以修改它来计算数字、字母词或字母数字词

grep -o [a-z]+ myfile | wc -c
grep -o [0-9]+ myfile | wc -c
grep -o [[:alnum:]]+ myfile | wc -c

您可以使用sed替换所有不属于您要查找的类型的字符，然后对结果中的字符进行字数计算

# 1h;1!H will place all lines into the buffer that way you can replace
# newline characters
sed -n '1h;1!H;${;g;s/[^a-zA-Z]//g;p;}' myfile | wc -c

It's easy enough to just do numbers as well.
sed -n '1h;1!H;${;g;s/[^0-9]//g;p;}' myfile | wc -c

Or why not both.
sed -n '1h;1!H;${;g;s/[^0-9a-zA-Z]//g;p;}' myfile | wc -c

在bash中，有许多方法可以分析文本文件的行、字和字符频率。利用bash内置的字符大小写过滤器（例如，

[：upper://code>等），您可以深入到文本文件中每个字符类型每次出现的频率。下面是一个简单的脚本，它从stdin
读取，并提供正常的wc
输出作为第一行输出，然后输出上限
、下限
、位数
、点
和空白

#!/bin/bash

declare -i lines=0
declare -i words=0
declare -i chars=0
declare -i upper=0
declare -i lower=0
declare -i digit=0
declare -i punct=0

oifs="$IFS"

# Read line with new IFS, preserve whitespace
while IFS=$'\n' read -r line; do

    # parse line into words with original IFS
    IFS=$oifs
    set -- $line
    IFS=$'\n'

    # Add up lines, words, chars, upper, lower, digit
    lines=$((lines + 1))
    words=$((words + $#))
    chars=$((chars + ${#line} + 1))
    for ((i = 0; i < ${#line}; i++)); do
        [[ ${line:$((i)):1} =~ [[:upper:]] ]] && ((upper++))
        [[ ${line:$((i)):1} =~ [[:lower:]] ]] && ((lower++))
        [[ ${line:$((i)):1} =~ [[:digit:]] ]] && ((digit++))
        [[ ${line:$((i)):1} =~ [[:punct:]] ]] && ((punct++))
    done
done

echo " $lines $words $chars $file"
echo " upper: $upper,  lower: $lower,  digit: $digit,  punct: $punct,  \
whitespace: $((chars-upper-lower-digit-punct))"

示例使用/输出
$ cat dat/captnjackn.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.
(along with 2357 other pirates)

$ bash wcount3.sh <dat/captnjackn.txt
 5 21 108
 upper: 12,  lower: 68,  digit: 4,  punct: 3,  whitespace: 21

$bash wcount3.sh这里有一种完全避免使用管道的方法，只需使用tr
和shell的方法通过${variable}
给出变量的长度：
$ cat file
123 sdf
231 (3)
huh? 564
242 wr =!
$ NUMBERS=$(tr -dc '[:digit:]' < file)
$ LETTERS=$(tr -dc '[:alpha:]' < file)
$ ALNUM=$(tr -dc '[:alnum:]' < file)
$ echo ${#NUMBERS} ${#LETTERS} ${#ALNUM}
13 8 21

$cat文件
123自卫队
231 (3)
呵呵？564
242 wr=！
$NUMBERS=$（tr-dc'[：digit:][file）
$LETTERS=$（tr-dc'[：alpha::'<文件）
$ALNUM=$（tr-dc'[：ALNUM:][file）
$echo${#NUMBERS}${#LETTERS}${#ALNUM}
13 8 21
cat myfile.txr | tr-cd[123456789]| wc-c
该示例正确吗？cat的无用使用。如果存在名为m
的文件，则失败。终端显示第一个和第二个示例的不正确输出，嗯？这会对至少有一个字母或数字字符的任何行的所有字符进行计数。将couting混合行111
设为grep-o
。当当前目录中有一个名为a
或7
的文件时，首先提到的grep会失败。总是引用shell元字符！