Linux 如何使用Bash解析HTTP头？_Linux_Bash_Curl

Linux 如何使用Bash解析HTTP头？

linux bash curl

Linux 如何使用Bash解析HTTP头？,linux,bash,curl,Linux,Bash,Curl,我需要从使用curl获得的网页标题中获取2个值。我已经能够使用以下方法单独获取值： response1=$(curl -I -s http://www.example.com | grep HTTP/1.1 | awk {'print $2'}) response2=$(curl -I -s http://www.example.com | grep Server: | awk {'print $2'}) 但是我不知道如何使用单个curl请求分别grep这些值，例如： response=$(c

我需要从使用curl获得的网页标题中获取2个值。我已经能够使用以下方法单独获取值：

response1=$(curl -I -s http://www.example.com | grep HTTP/1.1 | awk {'print $2'})
response2=$(curl -I -s http://www.example.com | grep Server: | awk {'print $2'})

但是我不知道如何使用单个curl请求分别grep这些值，例如：

response=$(curl -I -s http://www.example.com)
http_status=$response | grep HTTP/1.1 | awk {'print $2'}
server=$response | grep Server: | awk {'print $2'}

每次尝试都会导致错误消息或空值。我确信这只是一个语法问题。

使用进程替换，（

完整的bash
解决方案。演示如何在不需要awk
的情况下轻松解析其他标头：
shopt -s extglob # Required to trim whitespace; see below

while IFS=':' read key value; do
    # trim whitespace in "value"
    value=${value##+([[:space:]])}; value=${value%%+([[:space:]])}

    case "$key" in
        Server) SERVER="$value"
                ;;
        Content-Type) CT="$value"
                ;;
        HTTP*) read PROTO STATUS MSG <<< "$key{$value:+:$value}"
                ;;
     esac
done < <(curl -sI http://www.google.com)
echo $STATUS
echo $SERVER
echo $CT


根据，HTTP头的建模如中所述，其中明确说明了第3.1.2节：
字段名必须由可打印的ASCII字符组成
（即，值介于33和126之间的字符。，
十进制，冒号除外）。字段体可以由任何
ASCII字符，CR或LF除外。（CR和/或LF可为
在实际文本中，它们通过
展开田野。）
因此，上面的脚本应该捕获任何与RFC-[2]822兼容的头，但有一个显著的例外。
如果要提取多个头，可以将所有头填充到一个bash关联数组中。下面是一个简单的函数，它假设任何给定头只出现一次。（不要将其用于设置Cookie
；请参见下文。）
请注意，SO响应在Set Cookie
标题中包含一个或多个Cookie，但我们只能看到最后一个Cookie，因为原始脚本会覆盖具有相同标题名的条目（碰巧只有一个Cookie，但我们不知道）虽然可以将脚本扩展为特例Set Cookie
，但更好的方法可能是提供一个Cookie jar文件，并使用-b
和-c
curl选项来维护它。
通过Bash>=4.2特性改进和现代化：

使用declare-n
nameref变量引用关联数组
使用声明-l
自动将变量值小写
使用${var@a}
查询变量声明属性
更改为处理输入流，而不是调用curl
命令
使其与

！/usr/bin/env bash
shopt-s extglob#需要扩展globbing
#将输入头流处理为关联数组
#@参数
#$1：接收标头的关联数组
#@输入
#&1：标题流
解析_头（）{
如果[$#-ne 1]；则
printf'需要关联数组名参数\n'>&2
返回1
fi
本地-n头=$1#Nameref参数
#检查参数是否为关联数组的名称
案例${header@a}在
A | At |；；
*)
printf\
'属性为%s的变量%s不是合适的关联数组\n'\
“${！标头}”${header@a}" >&2
返回1
;;
以撒
header=（）#清除关联数组
局部——线路静止v
local-l k#自动小写
#获取第一行，假设为HTTP/1.0或更高版本
#有大写的名字。
IFS=$'\t\n\r'读-r头['Proto']头['Status']rest
#从消息中删除CR（如果有）。
标头['Message']=“${rest%%*（[[：space:]]）}”
#现在阅读其余的标题。
而IFS=：read-r line rest&[-n“$line$rest”]；do
rest=${rest%%*（[[：空格:]]）}
rest=${rest##*（[[：space:]]）}
line=${line%%*（[[：space:]]）}
[-z“$line”]&&break#空行是标题流的结尾
如果[-n“$rest”]；则
k=美元行
v=$rest
其他的
#把手折叠收割台
#见：https://tools.ietf.org/html/rfc2822#section-2.2.3
v+=“${line##*（[[：space:]]）}”
fi
标题[“$k”]=“$v”
完成
}
declare-A HTTP_头
parse_headers HTTP_headers$response |……
将不起作用，因为$response
的值不是命令echo$response
应该可以使用。如果有20个属性需要读取，您会建议使用相同的方法吗？@jp对于我自己，我会使用awk
或bash
。在大多数情况下，两者都使用并不会增加太多。但是没有足够的背景知识，我只是猜测您想要一个混合解决方案。@jp我发布了另一个答案，演示如何单独使用bash
。根据您的需要，这可能是一个更好的解决方案。为什么HTTP*案例与其他案例不同？我是bash n00b，所以如果问题真的很基本，请原谅。@jpsIFS='：“
表示我根据：
字符将输入作为键/值断开。HTTP状态行没有该格式。所以这是一个特殊的案例。我认为HTTP*
案例可以更好地写成read PROTO STATUS MSG@rici谢谢你的评论。非常好的接球！我相应地修改了我的答案。@jp我已经更新了我的答案，以修剪值字段中的空白。您现在拥有了所有基本的构建块，以适应您的特殊需要。顺便说一句，您在解析非常特定的标题时遇到的问题很奇怪。你肯定应该把它（当然包括标题原始数据）作为新问题发布。这将是一个有趣的益智游戏，需要解决……如何在解决方案中添加超时？@Djurez:我可能会使用timeout
命令来包装整个脚本，但是bash的read内置有一个选项，可以设置超时，如果可以接受每行超时的话。
shopt -s extglob # Required to trim whitespace; see below

while IFS=':' read key value; do
    # trim whitespace in "value"
    value=${value##+([[:space:]])}; value=${value%%+([[:space:]])}

    case "$key" in
        Server) SERVER="$value"
                ;;
        Content-Type) CT="$value"
                ;;
        HTTP*) read PROTO STATUS MSG <<< "$key{$value:+:$value}"
                ;;
     esac
done < <(curl -sI http://www.google.com)
echo $STATUS
echo $SERVER
echo $CT

302
GFE/2.0
text/html; charset=UTF-8

# Call this as: headers ARRAY URL
headers () {
  {
    # (Re)define the specified variable as an associative array.
    unset $1;
    declare -gA $1;
    local line rest

    # Get the first line, assuming HTTP/1.0 or above. Note that these fields
    # have Capitalized names.
    IFS=$' \t\n\r' read $1[Proto] $1[Status] rest
    # Drop the CR from the message, if there was one.
    declare -gA $1[Message]="${rest%$'\r'}"
    # Now read the rest of the headers. 
    while true; do
      # Get rid of the trailing CR if there is one.
      IFS=$'\r' read line rest;
      # Stop when we hit an empty line
      if [[ -z $line ]]; then break; fi
      # Make sure it looks like a header
      # This regex also strips leading and trailing spaces from the value
      if [[ $line =~ ^([[:alnum:]_-]+):\ *(( *[^ ]+)*)\ *$ ]]; then
        # Force the header to lower case, since headers are case-insensitive,
        # and store it into the array
        declare -gA $1[${BASH_REMATCH[1],,}]="${BASH_REMATCH[2]}"
      else
        printf "Ignoring non-header line: %q\n" "$line" >> /dev/stderr
      fi
    done
  } < <(curl -Is "$2")
}

$ headers so http://stackoverflow.com/
$ for h in ${!so[@]}; do printf "%s=%s\n" $h "${so[$h]}"; done | sort
Message=OK
Proto=HTTP/1.1
Status=200
cache-control=public, no-cache="Set-Cookie", max-age=43
content-length=224904
content-type=text/html; charset=utf-8
date=Fri, 25 Jul 2014 17:35:16 GMT
expires=Fri, 25 Jul 2014 17:36:00 GMT
last-modified=Fri, 25 Jul 2014 17:35:00 GMT
set-cookie=prov=205fd7f3-10d4-4197-b03a-252b60df7653; domain=.stackoverflow.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly
vary=*
x-frame-options=SAMEORIGIN