如何使用bash提取子字符串_Bash

如何使用bash提取子字符串

bash

如何使用bash提取子字符串,bash,Bash,我有一个表示日期的字符串，如下所示： "May 5 2014" 我想知道如何从中提取“5” 到目前为止我所尝试的： echo "May 5 2014" | sed 's/[^0-9]*\s//' 返回“5 2014” 很抱歉提出了补救问题。对bash来说只是个新手使用cut： echo "May 5 2014" | cut -d' ' -f2 或awk： echo "May 5 2014" | awk '{print $2}' 如果您想在没有外部实用程序的情况下使用它，

我有一个表示日期的字符串，如下所示：

    "May 5 2014"

我想知道如何从中提取“5”

到目前为止我所尝试的：

   echo "May 5 2014" | sed 's/[^0-9]*\s//'

返回“5 2014”

很抱歉提出了补救问题。对bash来说只是个新手

使用

cut

：

echo "May 5 2014" | cut -d' ' -f2

或

awk

：

echo "May 5 2014" | awk '{print $2}'

如果您想在没有外部实用程序的情况下使用它，则需要两个步骤：

s="May 5 2014"
t="${s#* }"
echo "${t% *}"

对于sed，一种可能性是：

echo "May 5 2014" | sed 's/.* \([0-9]*\) .*/\1/'

另一个

echo "May 5 2014" | sed 's/[^ ]* //;s/ [^ ]*//'

另一个

echo "May 5 2014" | sed 's/\(.*\) \(.*\) \(.*\)/\2/'

和格雷普

echo "May 5 2014" | grep -oP '\b\d{1,2}\b'

或perl

echo "May 5 2014" | perl -lanE 'say $F[1]'

作为好奇心

最后是纯bash解决方案，无需启动任何外部命令

aaa="May 5 2014"
[[ $aaa =~ (.*)[[:space:]](.*)[[:space:]](.*) ]] && echo ${BASH_REMATCH[2]}

或

编辑因为Keith Reynolds要求一些基准测试，所以我测试了以下脚本。使用

time

不是一个完美的基准测试工具，但可以提供一些见解

每个测试输出N倍的结果（wc计数）
注意，外部命令仅执行10_000次，而纯bash解决方案执行100_000次

以下是脚本：

xbench_with_read() {
    let i=$1; while ((i--)); do
        read _ day _ <<< 'May 5 2014'
        echo $day
    done
}

xbench_regex_3x_assign() {
    let i=$1; while ((i--)); do
        aaa="May 5 2014"
        re="(.*) (.*) (.*)"
        [[ $aaa =~ $re ]] && month="${BASH_REMATCH[1]}" && day="${BASH_REMATCH[2]}" && year="${BASH_REMATCH[3]}" && echo "$day"
    done
}

xbench_regex_1x_assign() {
    let i=$1; while ((i--)); do
        aaa="May 5 2014"
        re="(.*) (.*) (.*)"
        [[ $aaa =~ $re ]] && day=${BASH_REMATCH[2]} && echo "$day"
    done
}

xbench_var_expansion() {
    let i=$1; while ((i--)); do
        s="May 5 2014"
        t="${s#* }"
        echo "${t% *}"
    done
}

xbench_ext_cut() {
    let i=$1; while ((i--)); do
        echo "May 5 2014" | cut -d' ' -f2
    done
}

xbench_ext_grep() {
    let i=$1; while ((i--)); do
        echo "May 5 2014" | grep -oP '\b\d{1,2}\b'
    done
}

xbench_ext_sed() {
    let i=$1; while ((i--)); do
        echo "May 5 2014" | sed 's/\(.*\) \(.*\) \(.*\)/\2/'
    done
}

xbench_ext_xargs() {
    let i=$1; while ((i--)); do
        echo "May 5 2014" | xargs -n1 | sed -n 2p
    done
}

title() {
    echo '~ -'$___{1..20} '~' >&2
    echo "Timing $1 $2 times" >&2
}

for script in $(compgen -A function | grep xbench)
do
    cnt=100000
    #external programs run 10x less times
    [[ $script =~ _ext_ ]] && cnt=$(( $cnt / 10 ))
    title $script $cnt
    time $script $cnt | wc -l
done

所以按实时执行时间排序

纯bash解决方案10万次

xbench_var_扩展-实际0.5.148秒-5.2秒

xbench_regex_1x_assign-real 0m11.215s-11.2sec

xbench_regex_3x_assign-real 0m14.669s-14.7秒

xbench_带_读取-实际0m27.700s-27.7秒
这一点也不奇怪——变量扩展只是最快的解决方案
外部程序只有10000次

xbench_ext_cut-real 0m37.752s-37.8sec

xbench_ext_sed-real 0m41.628s-41.6sec

xbench_ext_grep-real 1m35.570s-95.6秒

xbench_ext_xargs-real 1m42.235s-102.2秒
这里有两个惊喜（至少对我来说）：

grep
解决方案是
sed解决方案的2倍
xargs （好奇解决方案）只比grep稍微慢一点
环境： $ uname -a Darwin marvin.local 13.1.0 Darwin Kernel Version 13.1.0: Thu Jan 16 19:40:37 PST 2014; root:xnu-2422.90.20~2/RELEASE_X86_64 x86_64 $ LC_ALL=C bash --version GNU bash, version 4.2.45(2)-release (i386-apple-darwin13.0.0) Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> $uname-a Darwin marvin.local 13.1.0 Darwin内核版本13.1.0:Thu Jan 16 19:40:37 PST 2014；root:xnu-2422.90.20~2/RELEASE\u X86\u 64 X86\u 64 $LC_ALL=C bash——版本 GNUBash，版本4.2.45（2）-发行版（i386-apple-darwin13.0.0）版权所有（C）2011免费软件基金会。许可证GPLv3+：GNU GPL版本3或更高版本带awk： echo "May 5 2014" | awk '{print $2}' 如果您正在编写一个需要解析日期字符串的脚本，那么您肯定可以使用sed 等工具来完成它，事实上，这里已经有几个答案可以很好地解决这个问题但是，我的建议是让date 程序为您完成繁重的工作： $ date -d "May 5 2014" +%-d 5 date 程序的维护人员无疑花费了很多时间和时间来正确解析日期代码。为什么不利用这项工作，而不是自己滚动编辑添加了BSD解决方案，例如用于（Mac OS X）在BSD上，需要用-f格式告诉日期 “传入”日期是什么格式，并以+格式输出。-j 表示不设置日期。 Bash的内置读取命令可以将输入拆分为多个变量。“可以使用并应用偏移量（：4 ）和长度（：1 ）值。在字符串格式发生变化的情况下，只需调整偏移量和长度值以下是一个例子： $ date_format="May 5 2014" $ echo "${date_format:4:1}" 5 $ date_format="2014 May 5" $ echo "${date_format: -1:1}" # <- Watch that space before the negative value 5 $ date_format="5 May 2014" $ echo "${date_format:0:1}" 5 $date\u format=“2014年5月5日” $echo“${date_格式：4:1}” 5. $date\u format=“2014年5月5日” $echo“${date_format:-1:1}”#可能与devnull重复，我的日期字符串存储在名为$a的变量中。当我执行“newvar=$（echo$a | cut-d'-f2）”时，我得到一个错误，表示newvar不是found@dot消除= 周围的空格。我喜欢这里的变化。事实上，我只是自己写了你的第一个可能性，但是忘记了[0-9]*@SS781之前的额外空间-最好的方法是使用Devnull已经回答过的cut 。cut 很小-在脚本中快速启动和简短键入；）。但在现实中，最好的是一个纯bash解决方案，它还没有被任何人展示，因为纯bash不会启动任何外部程序…很好的一个！不幸的是，BSD系统（OS X）无法正常工作（需要另一种语法）。请注意，尽管date 不接受您能想到的任何和所有日期格式（例如，它抱怨“2014年5月5日”），但它仍然比假设单一格式灵活得多。例如，date 将接受诸如“2014年5月5日”、“5月5日”、“2014年5月5日”、“2014年5月5日”等日期。对日期解析而言，除此之外的任何操作都有点奇怪（和许多其他问题的重复）。问题中没有提到BSD+1@BroSlow不要再提Linux了。。。你不能假设每个人都使用GNUdate ，这里也有很多Mac用户。无论如何，我同意这个答案-用date解析日期很好-但需要注意不同的操作系统语法。@jm666不反对bsd（尽管我不喜欢某些工具（如find、stat等）的bsd变体），gnu只是更普遍，OP询问bsd的问题往往带有osx、solaris、bsd等标记。。。但很明显，提供多种解决方案很好。这真的很好：）+1@davesines不仅整洁，而且，read month day year@KeithReynolds@devnull的纯bash解决方案比这个read 解决方案快5倍，而我的纯bash regex解决方案比这个read 解决方案快3倍。因此，它很简洁，但不是最快的：）@jm666我还发现，如果你只寻找一天，你的纯bash regex解决方案比这个read解决方案快3倍。另一方面，在我的系统中读取月-日-年@KeithReynolds 100000次，读取解决方案：27秒，带3x赋值的正则表达式10秒，正则表达式1x赋值8秒，以及devnulls解决方案5.4秒无论如何，这真的不是很重要——都是纯bash echo "May 5 2014" | awk '{print $2}' $ date -d "May 5 2014" +%-d 5 date -j -f '%b %d %Y' 'May 5 2014' '+%d' read first second remainder <<< "May 5 2014" read _ day _ <<< 'May 5 2014 utc' $ date_format="May 5 2014" $ echo "${date_format:4:1}" 5 $ date_format="2014 May 5" $ echo "${date_format: -1:1}" # <- Watch that space before the negative value 5 $ date_format="5 May 2014" $ echo "${date_format:0:1}" 5