Regex 将第一个字符串复制到第二行_Regex_Bash

Regex 将第一个字符串复制到第二行

regex bash

Regex 将第一个字符串复制到第二行,regex,bash,Regex,Bash,我有一个以下格式的文本文件： abacası Abaca[Noun]+[Prop]+[A3sg]+SH[P3sg]+[Nom] : 20.1748046875 abacı Abaç[Noun]+[Prop]+[A3sg]+SH[P3sg]+[Nom] : 16.3037109375 Aba[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+[A3sg]+[Pnon]+[Nom] : 23.0185546875 abacılarla Aba[Noun]+[

我有一个以下格式的文本文件：

abacası Abaca[Noun]+[Prop]+[A3sg]+SH[P3sg]+[Nom] : 20.1748046875
abacı Abaç[Noun]+[Prop]+[A3sg]+SH[P3sg]+[Nom] : 16.3037109375 Aba[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+[A3sg]+[Pnon]+[Nom] : 23.0185546875
abacılarla Aba[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+lAr[A3pl]+[Pnon]+YlA[Ins] : 27.8974609375 aba[Noun]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+lAr[A3pl]+[Pnon]+YlA[Ins] : 23.3427734375 abacı[Noun]+lAr[A3pl]+[Pnon]+YlA[Ins] : 19.556640625

这里，我将第一个空格前的第一个字符串称为

word

（例如

abacısı

）

以第一个空格后开始，以整数结尾的字符串是

定义（例如Abaca[noon]+[Prop]+[A3sg]+SH[P3sg]+[Nom]：20.1748046875
）
我想这样做：如果一行包含多个定义（第一行有一个，第二行有两个，第三行有三个），应用换行符并将第一个字符串（word
）放在新行的开头。预期产出：
abacası Abaca[Noun]+[Prop]+[A3sg]+SH[P3sg]+[Nom] : 20.1748046875
abacı Abaç[Noun]+[Prop]+[A3sg]+SH[P3sg]+[Nom] : 16.3037109375
abacı Aba[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+[A3sg]+[Pnon]+[Nom] : 23.0185546875
abacılarla Aba[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+lAr[A3pl]+[Pnon]+YlA[Ins] : 27.8974609375
abacılarla aba[Noun]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+lAr[A3pl]+[Pnon]+YlA[Ins] : 23.3427734375
abacılarla abacı[Noun]+lAr[A3pl]+[Pnon]+YlA[Ins] : 19.556640625

我的文本文件中几乎有1500.000行，每行的定义数量不确定。假设每个定义始终有4个空格分隔的单词，则可以是1到5个：
awk '{for (i=1; i<NF; i+=4) print $i, $(i+1), $(i+2), $(i+3)}' file

（这是Avinash答案的perl等价物）
假设每个定义总是有4个空格分隔的单词：
awk '{for (i=1; i<NF; i+=4) print $i, $(i+1), $(i+2), $(i+3)}' file

（这是Avinash答案的perl等价物）
下面是一个正在运行的sed
sed -r '/^indirger(ken|di)/{s/([0-9]+[.][0-9]+ )(indirge)/\1\n\2/g}' my_file

输出
indirgerdi indirge[Verb]+[Pos]+Hr[Aor]+[A3sg]+YDH[Past] : 22.2626953125 
indirge[Verb]+[Pos]+Hr[Aor]+YDH[Past]+[A3sg] : 18.720703125
indirgerken indirge[Verb]+[Pos]+Hr[Aor]+[A3sg]-Yken[Adv+While] : 19.6201171875

这是一个正在运行的sed
sed -r '/^indirger(ken|di)/{s/([0-9]+[.][0-9]+ )(indirge)/\1\n\2/g}' my_file

输出
indirgerdi indirge[Verb]+[Pos]+Hr[Aor]+[A3sg]+YDH[Past] : 22.2626953125 
indirge[Verb]+[Pos]+Hr[Aor]+YDH[Past]+[A3sg] : 18.720703125
indirgerken indirge[Verb]+[Pos]+Hr[Aor]+[A3sg]-Yken[Adv+While] : 19.6201171875

小型python脚本完成了这项工作。输入应在Input.txt中，输出应转到output.txt
import re

rf = re.compile('([^\s]+\s).+')
r = re.compile('([^\s]+\s\:\s\d+\.\d+)')

with open("input.txt", "r") as f:
    text = f.read()

with open("output.txt", "w") as f:
    for l in text.split('\n'):
        offset = 0
        first = ""
        match = re.search(rf, l[offset:])
        if match:
            first = match.group(1)
            offset = len(first)
        while True:
            match =  re.search(r, l[offset:])
            if not match:
                break
            s = match.group(1)
            offset += len(s)
            f.write(first + " " + s + "\n")

小型python脚本完成了这项工作。输入应在Input.txt中，输出应转到output.txt
import re

rf = re.compile('([^\s]+\s).+')
r = re.compile('([^\s]+\s\:\s\d+\.\d+)')

with open("input.txt", "r") as f:
    text = f.read()

with open("output.txt", "w") as f:
    for l in text.split('\n'):
        offset = 0
        first = ""
        match = re.search(rf, l[offset:])
        if match:
            first = match.group(1)
            offset = len(first)
        while True:
            match =  re.search(r, l[offset:])
            if not match:
                break
            s = match.group(1)
            offset += len(s)
            f.write(first + " " + s + "\n")

Bash和grep：
#!/bin/bash

while IFS=' ' read -r in1 in2 in3 in4; do
    if [[ -n $in4 ]]; then
        prepend="$in1"
        echo "$in1 $in2 $in3 $in4"
    else
        echo "$prepend $in1 $in2 $in3"
    fi
done < <(grep -o '[[:alnum:]][^:]\+ : [[:digit:].]\+' "$1")

for
循环现在使用一个空格作为输入文件分隔符在该文件上循环。如果in4
是长度为零的字符串，那么我们所在的行中缺少“word”，因此我们在它前面加上前缀
脚本将输入文件名作为参数，通过简单的重定向即可将输出保存到输出文件：
./script inputfile > outputfile

Bash和grep：
#!/bin/bash

while IFS=' ' read -r in1 in2 in3 in4; do
    if [[ -n $in4 ]]; then
        prepend="$in1"
        echo "$in1 $in2 $in3 $in4"
    else
        echo "$prepend $in1 $in2 $in3"
    fi
done < <(grep -o '[[:alnum:]][^:]\+ : [[:digit:].]\+' "$1")

for
循环现在使用一个空格作为输入文件分隔符在该文件上循环。如果in4
是长度为零的字符串，那么我们所在的行中缺少“word”，因此我们在它前面加上前缀
脚本将输入文件名作为参数，通过简单的重定向即可将输出保存到输出文件：
./script inputfile > outputfile

我采用以下格式：
word definitionkey : definitionvalue [definitionkey : definitionvalue …]

这些元素都不能包含空格，并且它们始终由单个空格分隔
以下代码应该可以工作：
awk '{ for (i=2; i<=NF; i+=3) print $1, $i, $(i+1), $(i+2) }' file

awk'{for（i=2；i=4{…}'
在这里是有意义的，因为给出的错误少于四个字段）。NF
是字段数和一个美元符号（$
）表示我们需要给定字段的值，因此$1
是第一个字段的值，$NF
是最后一个字段的值，$（i+1）
是第三个字段的值（假设i=2
）。print
将默认在其参数之间使用空格，并在末尾添加换行符（否则，我们需要printf”%s%s%s%s\n“，$1，$i，$（i+1），$（i+2）
，这有点难读）。
我假设以下格式：
word definitionkey : definitionvalue [definitionkey : definitionvalue …]

这些元素都不能包含空格，并且它们始终由单个空格分隔
以下代码应该可以工作：
awk '{ for (i=2; i<=NF; i+=3) print $1, $i, $(i+1), $(i+2) }' file

awk'{for（i=2；i=4{…}'
在这里是有意义的，因为给出的错误少于四个字段）。NF
是字段数和一个美元符号（$
）表示我们需要给定字段的值，因此$1
是第一个字段的值，$NF
是最后一个字段的值，$（i+1）
是第三个字段的值（假设i=2
）。print
将默认在其参数之间使用空格，并在末尾添加换行符（否则，我们需要printf”%s%s%s%s\n“，$1，$i，$（i+1），$（i+2）
，这有点难读）。
请查找以下bash代码
    #!/bin/bash
    # read.sh
    while read variable
    do
            for i in "$variable"
            do
                    var=`echo "$i" |wc -w`
                    array_1=( $i )
                    counter=0
                    for((j=1 ; j < $var ; j++))
                    do
                            if [ $counter = 0 ]  #1
                            then
                                    echo -ne ${array_1[0]}' '
                            fi #1
                            echo -ne ${array_1[$j]}' '
                            counter=$(expr $counter + 1)
                            if [ $counter = 3 ] #2
                            then
                                    counter=0
                                    echo
                            fi #2
                    done
            done
    done

！/bin/bash
#read.sh
读取变量时
做
对于“$variable”中的i
做
var=`echo“$i”| wc-w`
数组_1=（$i）
计数器=0
对于（（j=1；j<$var；j++）
做
如果[$counter=0]#1
然后
echo-ne${array_1[0]}''
菲#1
echo-ne${array_1[$j]}''
计数器=$（expr$计数器+1）
如果[$counter=3]#2
然后
计数器=0
回响
fi#2
完成
完成
完成

我已经测试过了，它正在工作。
检验
在bashshell提示符下，发出以下命令
     $ ./read.sh < input.txt > output.txt

$./read.shoutput.txt

其中read.sh是脚本，input.txt是输入文件，output.txt是生成输出的地方
请查找以下bash代码
    #!/bin/bash
    # read.sh
    while read variable
    do
            for i in "$variable"
            do
                    var=`echo "$i" |wc -w`
                    array_1=( $i )
                    counter=0
                    for((j=1 ; j < $var ; j++))
                    do
                            if [ $counter = 0 ]  #1
                            then
                                    echo -ne ${array_1[0]}' '
                            fi #1
                            echo -ne ${array_1[$j]}' '
                            counter=$(expr $counter + 1)
                            if [ $counter = 3 ] #2
                            then
                                    counter=0
                                    echo
                            fi #2
                    done
            done
    done

！/bin/bash
#read.sh
读取变量时
做
对于“$variable”中的i
做
var=`echo“$i”| wc-w`
数组_1=（$i）
计数器=0
对于（（j=1；j<$var；j++）
做
如果[$counter=0]#1
然后
echo-ne${array_1[0]}''
菲#1
echo-ne${array_1[$j]}''
计数器=$（expr$计数器+1）
如果[$counter=3]#2
然后
计数器=0
回响
fi#2
完成
完成
完成

我已经测试过了，它正在工作。
检验
在bashshell提示符下，发出以下命令
     $ ./read.sh < input.txt > output.txt

$./read.shoutput.txt

其中read.sh是脚本，input.txt是输入文件，output.txt是生成输出的地方