Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/bash/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
String 在Bash中提取子字符串_String_Bash_Shell_Substring - Fatal编程技术网

String 在Bash中提取子字符串

String 在Bash中提取子字符串,string,bash,shell,substring,String,Bash,Shell,Substring,给定一个文件名,格式为someletters\u 12345\u moreleters.ext,我想提取5位数字并将其放入变量中 为了强调这一点,我有一个文件名,其中包含x个字符,然后是一个五位数的序列,两边都有一个下划线,然后是另一组x个字符。我想取5位数字,并将其放入一个变量中 我对实现这一目标的各种不同方式非常感兴趣。使用: 更通用: INPUT='someletters_12345_moreleters.ext' SUBSTRING=$(echo $INPUT| cut -d'_' -f

给定一个文件名,格式为
someletters\u 12345\u moreleters.ext
,我想提取5位数字并将其放入变量中

为了强调这一点,我有一个文件名,其中包含x个字符,然后是一个五位数的序列,两边都有一个下划线,然后是另一组x个字符。我想取5位数字,并将其放入一个变量中

我对实现这一目标的各种不同方式非常感兴趣。

使用:

更通用:

INPUT='someletters_12345_moreleters.ext'
SUBSTRING=$(echo $INPUT| cut -d'_' -f 2)
echo $SUBSTRING

通用解决方案,其中数字可以位于文件名中的任何位置,使用以下序列中的第一个:

number=$(echo $filename | egrep -o '[[:digit:]]{5}' | head -n1)
另一种精确提取变量一部分的解决方案:

number=${filename:offset:length}
如果文件名的格式始终为
stuff\u digits\u…
,则可以使用awk:

number=$(echo $filename | awk -F _ '{ print $2 }')
另一个解决方案是删除除数字以外的所有内容,使用

number=$(echo $filename | tr -cd '[[:digit:]]')

还有bash内置的“expr”命令:

INPUT="someletters_12345_moreleters.ext"  
SUBSTRING=`expr match "$INPUT" '.*_\([[:digit:]]*\)_.*' `  
echo $SUBSTRING
基于乔的答案(这对我不适用):

如果x为常数,则以下参数展开将执行子字符串提取:

b=${a:12:5}
其中,12是偏移量(从零开始),而5是长度

如果数字周围的下划线是输入中唯一的下划线,可以分两步去除前缀和后缀:

tmp=${a#*_}   # remove prefix ending in "_"
b=${tmp%_*}   # remove suffix starting with "_"
如果还有其他的强调,这可能是可行的,尽管更棘手。如果有人知道如何在一个表达式中执行这两个扩展,我也想知道


提供的两种解决方案都是纯bash,不涉及进程生成,因此速度非常快。

如果没有任何子进程,您可以:

shopt -s extglob
front=${input%%_+([a-zA-Z]).*}
digits=${front##+([a-zA-Z])_}
ksh93中还有一个非常小的变体也可以使用。

以下是我的做法:

FN=someletters_12345_moreleters.ext
[[ ${FN} =~ _([[:digit:]]{5})_ ]] && NUM=${BASH_REMATCH[1]}
说明:

特定于Bash:

  • [[]]
  • =~
  • &&
    如果先前的命令成功
正则表达式(RE):
(数字:]{5})

  • \
    是用于标定/锚定匹配字符串的匹配边界的文本
  • ()
    创建捕获组
  • [[:digit:][]
    是一个字符类,我认为它本身就说明了问题
  • {5}
    表示前面的字符、类(如本例所示)或组中有五个必须匹配

在英语中,您可以认为它的行为是这样的:
FN
字符串逐字符迭代,直到我们看到一个
\u
,此时捕获组被打开,我们尝试匹配五位数字。如果匹配成功,则捕获组保存经过的五位数字。如果下一个字符是
,则条件成功,捕获组在
BASH\u REMATCH
中可用,并且可以执行下一个
NUM=
语句。如果匹配的任何部分失败,则保存的详细信息将被处理,并在
\uuu
之后继续逐字符处理。e、 g.如果
FN
where
\u 1\u 12\u 123\u 1234\u 12345\u
,在找到匹配项之前会有四个错误的开始。

只要试着使用
cut-c startIndx stopIndx

这里有一个前缀-后缀解决方案(类似于JB和Darron给出的解决方案)与第一个数字块匹配且不依赖于周围下划线的:

str='someletters_12345_morele34ters.ext'
s1="${str#"${str%%[[:digit:]]*}"}"   # strip off non-digit prefix from str
s2="${s1%%[^[:digit:]]*}"            # strip off non-digit suffix from s1
echo "$s2"                           # 12345

如果有人想要更严格的信息,你也可以像这样在ManBash中搜索

$ man bash [press return key]
/substring  [press return key]
[press "n" key]
[press "n" key]
[press "n" key]
[press "n" key]
结果:

${parameter:offset} ${parameter:offset:length} Substring Expansion. Expands to up to length characters of parameter starting at the character specified by offset. If length is omitted, expands to the substring of parameter start‐ ing at the character specified by offset. length and offset are arithmetic expressions (see ARITHMETIC EVALUATION below). If offset evaluates to a number less than zero, the value is used as an offset from the end of the value of parameter. Arithmetic expressions starting with a - must be separated by whitespace from the preceding : to be distinguished from the Use Default Values expansion. If length evaluates to a number less than zero, and parameter is not @ and not an indexed or associative array, it is interpreted as an offset from the end of the value of parameter rather than a number of characters, and the expan‐ sion is the characters between the two offsets. If parameter is @, the result is length positional parameters beginning at off‐ set. If parameter is an indexed array name subscripted by @ or *, the result is the length members of the array beginning with ${parameter[offset]}. A negative offset is taken relative to one greater than the maximum index of the specified array. Sub‐ string expansion applied to an associative array produces unde‐ fined results. Note that a negative offset must be separated from the colon by at least one space to avoid being confused with the :- expansion. Substring indexing is zero-based unless the positional parameters are used, in which case the indexing starts at 1 by default. If offset is 0, and the positional parameters are used, $0 is prefixed to the list. ${参数:偏移量} ${参数:偏移量:长度} 子串扩展。扩展到最多个字符的长度 从偏移量指定的字符开始的参数。如果 长度被省略,扩展到参数start‐的子字符串 在由偏移量指定的字符处进行加密。长度和偏移量是相同的 算术表达式(请参见下面的算术计算)。如果 偏移量计算为小于零的数字,则使用该值 作为从参数值末尾的偏移。算术 以-开头的表达式必须用空格分隔 如上所述:区别于使用默认值 价值观扩展。如果长度计算为小于 零,且参数不是@且不是索引或关联的 数组中,它被解释为距值末尾的偏移量 参数而不是字符数,以及 sion是两个偏移之间的字符。如果参数为 @,结果是从off‐开始的长度位置参数 设置如果参数是由@or下标的索引数组名 *,结果是数组成员的长度,以开头 ${参数[offset]}。负偏移是相对于 比指定数组的最大索引大一个。次级 应用于关联数组的字符串扩展将生成unde 罚款结果。请注意,负偏移必须分开 从冒号中至少保留一个空格,以避免混淆 与:-扩展。子字符串索引是基于零的,除非 使用位置参数,在这种情况下,索引 默认情况下从1开始。如果偏移为0,则位置 使用参数时,列表的前缀为$0。
我很惊讶这个纯bash解决方案没有出现:

a="someletters_12345_moreleters.ext"
IFS="_"
set $a
echo $2
# prints 12345
您可能希望将IFS重置为之前的值,或者在之后取消IFS的设置

类似于php中的substr('abcdefg',2-1,3):


遵循要求

我有一个文件名,有x个字符,然后是五位数 顺序两边各有一个下划线 ${parameter:offset} ${parameter:offset:length} Substring Expansion. Expands to up to length characters of parameter starting at the character specified by offset. If length is omitted, expands to the substring of parameter start‐ ing at the character specified by offset. length and offset are arithmetic expressions (see ARITHMETIC EVALUATION below). If offset evaluates to a number less than zero, the value is used as an offset from the end of the value of parameter. Arithmetic expressions starting with a - must be separated by whitespace from the preceding : to be distinguished from the Use Default Values expansion. If length evaluates to a number less than zero, and parameter is not @ and not an indexed or associative array, it is interpreted as an offset from the end of the value of parameter rather than a number of characters, and the expan‐ sion is the characters between the two offsets. If parameter is @, the result is length positional parameters beginning at off‐ set. If parameter is an indexed array name subscripted by @ or *, the result is the length members of the array beginning with ${parameter[offset]}. A negative offset is taken relative to one greater than the maximum index of the specified array. Sub‐ string expansion applied to an associative array produces unde‐ fined results. Note that a negative offset must be separated from the colon by at least one space to avoid being confused with the :- expansion. Substring indexing is zero-based unless the positional parameters are used, in which case the indexing starts at 1 by default. If offset is 0, and the positional parameters are used, $0 is prefixed to the list.
a="someletters_12345_moreleters.ext"
IFS="_"
set $a
echo $2
# prints 12345
echo 'abcdefg'|tail -c +2|head -c 3
$ echo "someletters_12345_moreleters.ext" | grep -Eo "[[:digit:]]+" 
12345
$ echo "someletters_12345_moreleters.ext" | grep -Eo "[[:digit:]]{5}" 
12345
$ echo "someletters_12345_moreleters.ext" | grep -Po '(?<=_)\d+' 
12345
$ echo "someletters_12345_moreleters.ext" | grep -Po '(?<=_)\d{5}' 
12345
host:/tmp$ asd=someletters_12345_moreleters.ext 
host:/tmp$ echo `expr $asd : '.*_\(.*\)_'`
12345
host:/tmp$ 
set `grep "now at" /proc/timer_list`
nano=$3
fraction=`expr $nano : '.*\(...\)......'`
$debug nano is $nano, fraction is $fraction
filename=someletters_12345_moreletters.ext
substring=${filename//@(+([a-z])_|_+([a-z]).*)}
echo $substring
12345
IFS="_" read -r x digs x <<<'someletters_12345_moreleters.ext'
input='someletters_12345_moreleters.ext'
IFS="_" read -r _ digs _ <<<"$input"
str="someletters_12345_moreleters.ext"
str=${str#*_}
str=${str%_more*}
echo $str
str="someletters_123-45-24a&13b-1_moreleters.ext"
cut -b19-20 test.txt > test1.txt # This will extract chars 19 & 20 "ST" 
while read -r; do;
> x=$REPLY
> done < test1.txt
echo $x
ST
> var="someletters_12345_moreletters.ext"
> digits=$( echo $var | sed "s/.*_\([0-9]\+\).*/\1/p" -n )
> echo $digits
12345
> man sed | grep s/regexp/replacement -A 2
s/regexp/replacement/
    Attempt to match regexp against the pattern space.  If successful, replace that portion matched with replacement.  The replacement may contain the special  character  &  to
    refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.
function substring() {
    local str="$1" start="${2}" end="${3}"
    
    if [[ "$start" == "" ]]; then start="0"; fi
    if [[ "$end"   == "" ]]; then end="${#str}"; fi
    
    local length="((${end}-${start}+1))"
    
    echo "${str:${start}:${length}}"
} 
    substring 01234 0
    01234
    substring 012345 0
    012345
    substring 012345 0 0
    0
    substring 012345 1 1
    1
    substring 012345 1 2
    12
    substring 012345 0 1
    01
    substring 012345 0 2
    012
    substring 012345 0 3
    0123
    substring 012345 0 4
    01234
    substring 012345 0 5
    012345
    substring 012345 0
    012345
    substring 012345 1
    12345
    substring 012345 2
    2345
    substring 012345 3
    345
    substring 012345 4
    45
    substring 012345 5
    5
    substring 012345 6
    
    substring 012345 3 5
    345
    substring 012345 3 4
    34
    substring 012345 2 4
    234
    substring 012345 1 3
    123
 str=2020-08-08T07:40:00.000Z
 echo ${str:11:8}
 str=2020-08-08T07:40:00.000Z
 cut -c12-19 <<< $str
 str=2020-08-08T07:40:00.000Z
 awk '{time=gensub(/.{11}(.{8}).*/,"\\1","g",$1); print time}' <<< $str