Parsing 使用shell脚本分析csv行中的名称-值对

Parsing 使用shell脚本分析csv行中的名称-值对,parsing,shell,csv,sed,awk,Parsing,Shell,Csv,Sed,Awk,我有如下两行输入: TASK1,6,INITIAL,2013-01-15 19:20:40,PREPARING,2013-01-15 19:21:12,SCHEDULED,2013-01-15 19:21:13,TRANSLATING,2013-01-15 19:21:13,LOADING,2013-01-15 19:36:37,COMPLETE,2013-01-15 19:36:42 TASK2,5,INITIAL,2013-01-15 19:20:44,PREPARING,2013-01-1

我有如下两行输入:

TASK1,6,INITIAL,2013-01-15 19:20:40,PREPARING,2013-01-15 19:21:12,SCHEDULED,2013-01-15 19:21:13,TRANSLATING,2013-01-15 19:21:13,LOADING,2013-01-15 19:36:37,COMPLETE,2013-01-15 19:36:42
TASK2,5,INITIAL,2013-01-15 19:20:44,PREPARING,2013-01-15 19:21:13,SCHEDULED,2013-01-15 19:21:14,TRANSLATING,2013-01-15 19:36:37,TERMINAL,2013-01-15 20:28:10
我需要用这些行循环一个文件,并为每一行计算几个时间差。。。我在计算等方面做得很好,但我花了很长时间试图找出如何解析这个名值对的“可变长度”字符串

基本上#任务后#是“状态”的计数,然后是这些状态及其发生时间

我想做的是得到其中一行,最后得到这样的结果,将值分配给它们各自的变量。(以第一行为例):

在这种情况下,我希望我的输出为:

$TASK_ID=TASK1
$STATUS_COUNT=11
$INITIAL=2013-01-15 20:20:40
$PREPARING=2013-01-15 20:21:12
$SCHEDULED=2013-01-15 20:21:12
$TRANSLATING=2013-01-15 20:21:13
$LOADING=<NULL>
$COMPLETE=<NULL>
$TERMINAL=2013-01-15 20:36:42
$TASK\u ID=TASK1
$STATUS\u COUNT=11
$INITIAL=2013-01-15 20:20:40
$PREPARING=2013-01-15 20:21:12
$SCHEDULED=2013-01-15 20:21:12
$TRANSLATING=2013-01-15 20:21:13
$LOADING=
$COMPLETE=
$TERMINAL=2013-01-15 20:36:42
我被难住了,有人能帮忙吗

提前感谢

#/bin/bash
#!/bin/bash

# Splitting on commas, read the task ID and status count followed by all of the statuses,
# which we'll parse later.
while IFS=, read -r TASK_ID STATUS_COUNT STATUSES; do
(
    # Subtly, but importantly, we put the loop body inside parentheses so each loop
    # iteration runs in a sub-shell. This ensures that the $LOADING, $COMPLETE, etc.
    # variables we set don't leak into future iterations.

    echo "TASK_ID      = $TASK_ID"
    echo "STATUS_COUNT = $STATUS_COUNT"

    # Convert the comma-separated string $STATUSES into an array using `read -a'.
    IFS=, read -ra STATUSES <<< "$STATUSES"

    # Assign the statuses to named variables. A side benefit of this is that only the
    # last value of each status type is used.
    for ((i = 0; i < ${#STATUSES[@]}; i += 2)); do
        declare "${STATUSES[$i]}=${STATUSES[$((i+1))]}"
    done

    # Print each of the statuses, or <NULL> if that stage wasn't listed.
    echo "INITIAL      = ${INITIAL:-<NULL>}"
    echo "PREPARING    = ${PREPARING:-<NULL>}"
    echo "SCHEDULED    = ${SCHEDULED:-<NULL>}"
    echo "TRANSLATING  = ${TRANSLATING:-<NULL>}"
    echo "LOADING      = ${LOADING:-<NULL>}"
    echo "COMPLETE     = ${COMPLETE:-<NULL>}"
    echo "TERMINAL     = ${TERMINAL:-<NULL>}"

    echo
)
done
#使用逗号拆分,读取任务ID和状态计数,然后读取所有状态, #稍后我们将对其进行分析。 当IFS=时,读取-r任务\u ID状态\u计数状态;做 ( #微妙但重要的是,我们将循环体放在括号内,这样每个循环 #迭代在子shell中运行。这确保了$LOADING、$COMPLETE等。 #我们设置的变量不会泄漏到未来的迭代中。 echo“TASK\u ID=$TASK\u ID” echo“STATUS\u COUNT=$STATUS\u COUNT” #使用“read-a”将逗号分隔的字符串$status转换为数组。
IFS=,read-ra STATUSES@glennjackman我相信我最近从您那里了解到了
read-a
,再次感谢您的帮助!这里唯一的问题是,在最后一种情况下(任务多次运行)…我希望加载和完成为空,因为在第一次运行中发生了这些情况(状态集)但不在第二组中…如果有助于中断运行,则INITIAL始终启动新的运行。使用一些日期数学,我能够通过添加一些语句来消除时间戳$TASK_ID=TASK1 $STATUS_COUNT=11 $INITIAL=2013-01-15 20:20:40 $PREPARING=2013-01-15 20:21:12 $SCHEDULED=2013-01-15 20:21:12 $TRANSLATING=2013-01-15 20:21:13 $LOADING=<NULL> $COMPLETE=<NULL> $TERMINAL=2013-01-15 20:36:42
#!/bin/bash

# Splitting on commas, read the task ID and status count followed by all of the statuses,
# which we'll parse later.
while IFS=, read -r TASK_ID STATUS_COUNT STATUSES; do
(
    # Subtly, but importantly, we put the loop body inside parentheses so each loop
    # iteration runs in a sub-shell. This ensures that the $LOADING, $COMPLETE, etc.
    # variables we set don't leak into future iterations.

    echo "TASK_ID      = $TASK_ID"
    echo "STATUS_COUNT = $STATUS_COUNT"

    # Convert the comma-separated string $STATUSES into an array using `read -a'.
    IFS=, read -ra STATUSES <<< "$STATUSES"

    # Assign the statuses to named variables. A side benefit of this is that only the
    # last value of each status type is used.
    for ((i = 0; i < ${#STATUSES[@]}; i += 2)); do
        declare "${STATUSES[$i]}=${STATUSES[$((i+1))]}"
    done

    # Print each of the statuses, or <NULL> if that stage wasn't listed.
    echo "INITIAL      = ${INITIAL:-<NULL>}"
    echo "PREPARING    = ${PREPARING:-<NULL>}"
    echo "SCHEDULED    = ${SCHEDULED:-<NULL>}"
    echo "TRANSLATING  = ${TRANSLATING:-<NULL>}"
    echo "LOADING      = ${LOADING:-<NULL>}"
    echo "COMPLETE     = ${COMPLETE:-<NULL>}"
    echo "TERMINAL     = ${TERMINAL:-<NULL>}"

    echo
)
done
$ ./tasks < tasks.txt
TASK_ID      = TASK1
STATUS_COUNT = 6
INITIAL      = 2013-01-15 19:20:40
PREPARING    = 2013-01-15 19:21:12
SCHEDULED    = 2013-01-15 19:21:13
TRANSLATING  = 2013-01-15 19:21:13
LOADING      = 2013-01-15 19:36:37
COMPLETE     = 2013-01-15 19:36:42
TERMINAL     = <NULL>

TASK_ID      = TASK2
STATUS_COUNT = 5
INITIAL      = 2013-01-15 19:20:44
PREPARING    = 2013-01-15 19:21:13
SCHEDULED    = 2013-01-15 19:21:14
TRANSLATING  = 2013-01-15 19:36:37
LOADING      = <NULL>
COMPLETE     = <NULL>
TERMINAL     = 2013-01-15 20:28:10

TASK_ID      = TASK1
STATUS_COUNT = 11
INITIAL      = 2013-01-15 20:20:40
PREPARING    = 2013-01-15 20:21:12
SCHEDULED    = 2013-01-15 20:21:13
TRANSLATING  = 2013-01-15 20:21:13
LOADING      = 2013-01-15 19:36:37
COMPLETE     = 2013-01-15 19:36:42
TERMINAL     = 2013-01-15 20:36:42
events=(INITIAL PREPARING SCHEDULED TRANSLATING LOADING COMPLETE TERMINAL)

while IFS=, read -r TASK_ID STATUS_COUNT rest; do
    IFS=, read -ra STATUSES <<< "$rest"

    for (( i=0; i < ${#STATUSES[@]}; i+=2 )); do
        # if this this the initial event, reset all statuses
        if [[ ${STATUSES[i]} == ${events[0]} ]]; then
            for event in "${events[@]}"; do
                declare "$event="
            done
        fi
        declare "${STATUSES[i]}=${STATUSES[i+1]}"
    done
    for var in TASK_ID STATUS_COUNT "${events[@]}"; do
        printf "$%s = %s\n" $var "${!var:-<NULL>}"
    done

done