String 在Bash中提取子字符串_String_Bash_Shell_Substring

String 在Bash中提取子字符串

string bash shell

String 在Bash中提取子字符串,string,bash,shell,substring,String,Bash,Shell,Substring,给定一个文件名，格式为someletters\u 12345\u moreleters.ext，我想提取5位数字并将其放入变量中为了强调这一点，我有一个文件名，其中包含x个字符，然后是一个五位数的序列，两边都有一个下划线，然后是另一组x个字符。我想取5位数字，并将其放入一个变量中我对实现这一目标的各种不同方式非常感兴趣。使用：更通用： INPUT='someletters_12345_moreleters.ext' SUBSTRING=$(echo $INPUT| cut -d'_' -f

给定一个文件名，格式为

someletters\u 12345\u moreleters.ext

，我想提取5位数字并将其放入变量中

为了强调这一点，我有一个文件名，其中包含x个字符，然后是一个五位数的序列，两边都有一个下划线，然后是另一组x个字符。我想取5位数字，并将其放入一个变量中

我对实现这一目标的各种不同方式非常感兴趣。

使用：

更通用：

INPUT='someletters_12345_moreleters.ext'
SUBSTRING=$(echo $INPUT| cut -d'_' -f 2)
echo $SUBSTRING

通用解决方案，其中数字可以位于文件名中的任何位置，使用以下序列中的第一个：

number=$(echo $filename | egrep -o '[[:digit:]]{5}' | head -n1)

另一种精确提取变量一部分的解决方案：

number=${filename:offset:length}

如果文件名的格式始终为

stuff\u digits\u…

，则可以使用awk：

number=$(echo $filename | awk -F _ '{ print $2 }')

另一个解决方案是删除除数字以外的所有内容，使用

number=$(echo $filename | tr -cd '[[:digit:]]')

还有bash内置的“expr”命令：

INPUT="someletters_12345_moreleters.ext"  
SUBSTRING=`expr match "$INPUT" '.*_\([[:digit:]]*\)_.*' `  
echo $SUBSTRING

基于乔的答案（这对我不适用）：

如果x为常数，则以下参数展开将执行子字符串提取：

b=${a:12:5}

其中，12是偏移量（从零开始），而5是长度

如果数字周围的下划线是输入中唯一的下划线，可以分两步去除前缀和后缀：

tmp=${a#*_}   # remove prefix ending in "_"
b=${tmp%_*}   # remove suffix starting with "_"

如果还有其他的强调，这可能是可行的，尽管更棘手。如果有人知道如何在一个表达式中执行这两个扩展，我也想知道

提供的两种解决方案都是纯bash，不涉及进程生成，因此速度非常快。

如果没有任何子进程，您可以：

shopt -s extglob
front=${input%%_+([a-zA-Z]).*}
digits=${front##+([a-zA-Z])_}

ksh93中还有一个非常小的变体也可以使用。

以下是我的做法：

FN=someletters_12345_moreleters.ext
[[ ${FN} =~ _([[:digit:]]{5})_ ]] && NUM=${BASH_REMATCH[1]}

说明：

特定于Bash：

```
[[]]
```
```
=~
```
```
&&
```
如果先前的命令成功

正则表达式（RE）：

（数字：]{5}）

```
\
```
是用于标定/锚定匹配字符串的匹配边界的文本
```
（）
```
创建捕获组
```
[[：digit:][]
```
是一个字符类，我认为它本身就说明了问题
```
{5}
```
表示前面的字符、类（如本例所示）或组中有五个必须匹配

在英语中，您可以认为它的行为是这样的：

FN

字符串逐字符迭代，直到我们看到一个

\u

，此时捕获组被打开，我们尝试匹配五位数字。如果匹配成功，则捕获组保存经过的五位数字。如果下一个字符是

，则条件成功，捕获组在

BASH\u REMATCH

中可用，并且可以执行下一个

NUM=

语句。如果匹配的任何部分失败，则保存的详细信息将被处理，并在

\uuu

之后继续逐字符处理。e、 g.如果

FN

where

\u 1\u 12\u 123\u 1234\u 12345\u

，在找到匹配项之前会有四个错误的开始。

只要试着使用

cut-c startIndx stopIndx
这里有一个前缀-后缀解决方案（类似于JB和Darron给出的解决方案）与第一个数字块匹配且不依赖于周围下划线的：
str='someletters_12345_morele34ters.ext'
s1="${str#"${str%%[[:digit:]]*}"}"   # strip off non-digit prefix from str
s2="${s1%%[^[:digit:]]*}"            # strip off non-digit suffix from s1
echo "$s2"                           # 12345

如果有人想要更严格的信息，你也可以像这样在ManBash中搜索
$ man bash [press return key]
/substring  [press return key]
[press "n" key]
[press "n" key]
[press "n" key]
[press "n" key]

结果:
${parameter:offset}
       ${parameter:offset:length}
              Substring Expansion.  Expands to  up  to  length  characters  of
              parameter  starting  at  the  character specified by offset.  If
              length is omitted, expands to the substring of parameter  start‐
              ing at the character specified by offset.  length and offset are
              arithmetic expressions (see ARITHMETIC  EVALUATION  below).   If
              offset  evaluates  to a number less than zero, the value is used
              as an offset from the end of the value of parameter.  Arithmetic
              expressions  starting  with  a - must be separated by whitespace
              from the preceding : to be distinguished from  the  Use  Default
              Values  expansion.   If  length  evaluates to a number less than
              zero, and parameter is not @ and not an indexed  or  associative
              array,  it is interpreted as an offset from the end of the value
              of parameter rather than a number of characters, and the  expan‐
              sion is the characters between the two offsets.  If parameter is
              @, the result is length positional parameters beginning at  off‐
              set.   If parameter is an indexed array name subscripted by @ or
              *, the result is the length members of the array beginning  with
              ${parameter[offset]}.   A  negative  offset is taken relative to
              one greater than the maximum index of the specified array.  Sub‐
              string  expansion applied to an associative array produces unde‐
              fined results.  Note that a negative offset  must  be  separated
              from  the  colon  by  at least one space to avoid being confused
              with the :- expansion.  Substring indexing is zero-based  unless
              the  positional  parameters are used, in which case the indexing
              starts at 1 by default.  If offset  is  0,  and  the  positional
              parameters are used, $0 is prefixed to the list.
${参数：偏移量}
${参数：偏移量：长度}
子串扩展。扩展到最多个字符的长度
从偏移量指定的字符开始的参数。如果
长度被省略，扩展到参数start‐的子字符串
在由偏移量指定的字符处进行加密。长度和偏移量是相同的
算术表达式（请参见下面的算术计算）。如果
偏移量计算为小于零的数字，则使用该值
作为从参数值末尾的偏移。算术
以-开头的表达式必须用空格分隔
如上所述：区别于使用默认值
价值观扩展。如果长度计算为小于
零，且参数不是@且不是索引或关联的
数组中，它被解释为距值末尾的偏移量
参数而不是字符数，以及
sion是两个偏移之间的字符。如果参数为
@，结果是从off‐开始的长度位置参数
设置如果参数是由@or下标的索引数组名
*，结果是数组成员的长度，以开头
${参数[offset]}。负偏移是相对于
比指定数组的最大索引大一个。次级
应用于关联数组的字符串扩展将生成unde
罚款结果。请注意，负偏移必须分开
从冒号中至少保留一个空格，以避免混淆
与：-扩展。子字符串索引是基于零的，除非
使用位置参数，在这种情况下，索引
默认情况下从1开始。如果偏移为0，则位置
使用参数时，列表的前缀为$0。
我很惊讶这个纯bash解决方案没有出现：
a="someletters_12345_moreleters.ext"
IFS="_"
set $a
echo $2
# prints 12345

您可能希望将IFS重置为之前的值，或者在之后取消IFS的设置
 类似于php中的substr（'abcdefg'，2-1，3）：
遵循要求
我有一个文件名，有x个字符，然后是五位数
顺序两边各有一个下划线
${parameter:offset}
       ${parameter:offset:length}
              Substring Expansion.  Expands to  up  to  length  characters  of
              parameter  starting  at  the  character specified by offset.  If
              length is omitted, expands to the substring of parameter  start‐
              ing at the character specified by offset.  length and offset are
              arithmetic expressions (see ARITHMETIC  EVALUATION  below).   If
              offset  evaluates  to a number less than zero, the value is used
              as an offset from the end of the value of parameter.  Arithmetic
              expressions  starting  with  a - must be separated by whitespace
              from the preceding : to be distinguished from  the  Use  Default
              Values  expansion.   If  length  evaluates to a number less than
              zero, and parameter is not @ and not an indexed  or  associative
              array,  it is interpreted as an offset from the end of the value
              of parameter rather than a number of characters, and the  expan‐
              sion is the characters between the two offsets.  If parameter is
              @, the result is length positional parameters beginning at  off‐
              set.   If parameter is an indexed array name subscripted by @ or
              *, the result is the length members of the array beginning  with
              ${parameter[offset]}.   A  negative  offset is taken relative to
              one greater than the maximum index of the specified array.  Sub‐
              string  expansion applied to an associative array produces unde‐
              fined results.  Note that a negative offset  must  be  separated
              from  the  colon  by  at least one space to avoid being confused
              with the :- expansion.  Substring indexing is zero-based  unless
              the  positional  parameters are used, in which case the indexing
              starts at 1 by default.  If offset  is  0,  and  the  positional
              parameters are used, $0 is prefixed to the list.
a="someletters_12345_moreleters.ext"
IFS="_"
set $a
echo $2
# prints 12345

echo 'abcdefg'|tail -c +2|head -c 3

$ echo "someletters_12345_moreleters.ext" | grep -Eo "[[:digit:]]+" 
12345

$ echo "someletters_12345_moreleters.ext" | grep -Eo "[[:digit:]]{5}" 
12345

$ echo "someletters_12345_moreleters.ext" | grep -Po '(?<=_)\d+' 
12345

$ echo "someletters_12345_moreleters.ext" | grep -Po '(?<=_)\d{5}' 
12345

host:/tmp$ asd=someletters_12345_moreleters.ext 
host:/tmp$ echo `expr $asd : '.*_\(.*\)_'`
12345
host:/tmp$ 

set `grep "now at" /proc/timer_list`
nano=$3
fraction=`expr $nano : '.*\(...\)......'`
$debug nano is $nano, fraction is $fraction

filename=someletters_12345_moreletters.ext
substring=${filename//@(+([a-z])_|_+([a-z]).*)}
echo $substring
12345

IFS="_" read -r x digs x <<<'someletters_12345_moreleters.ext'

input='someletters_12345_moreleters.ext'
IFS="_" read -r _ digs _ <<<"$input"

str="someletters_12345_moreleters.ext"
str=${str#*_}
str=${str%_more*}
echo $str

str="someletters_123-45-24a&13b-1_moreleters.ext"

cut -b19-20 test.txt > test1.txt # This will extract chars 19 & 20 "ST" 
while read -r; do;
> x=$REPLY
> done < test1.txt
echo $x
ST

> var="someletters_12345_moreletters.ext"
> digits=$( echo $var | sed "s/.*_\([0-9]\+\).*/\1/p" -n )
> echo $digits
12345

> man sed | grep s/regexp/replacement -A 2
s/regexp/replacement/
    Attempt to match regexp against the pattern space.  If successful, replace that portion matched with replacement.  The replacement may contain the special  character  &  to
    refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.

function substring() {
    local str="$1" start="${2}" end="${3}"
    
    if [[ "$start" == "" ]]; then start="0"; fi
    if [[ "$end"   == "" ]]; then end="${#str}"; fi
    
    local length="((${end}-${start}+1))"
    
    echo "${str:${start}:${length}}"
} 

    substring 01234 0
    01234
    substring 012345 0
    012345
    substring 012345 0 0
    0
    substring 012345 1 1
    1
    substring 012345 1 2
    12
    substring 012345 0 1
    01
    substring 012345 0 2
    012
    substring 012345 0 3
    0123
    substring 012345 0 4
    01234
    substring 012345 0 5
    012345

    substring 012345 0
    012345
    substring 012345 1
    12345
    substring 012345 2
    2345
    substring 012345 3
    345
    substring 012345 4
    45
    substring 012345 5
    5
    substring 012345 6
    
    substring 012345 3 5
    345
    substring 012345 3 4
    34
    substring 012345 2 4
    234
    substring 012345 1 3
    123

 str=2020-08-08T07:40:00.000Z
 echo ${str:11:8}

 str=2020-08-08T07:40:00.000Z
 cut -c12-19 <<< $str

 str=2020-08-08T07:40:00.000Z
 awk '{time=gensub(/.{11}(.{8}).*/,"\\1","g",$1); print time}' <<< $str