Regex 从文本文件中提取字符串，并在bash中相应地重命名它们_Regex_Bash

Regex 从文本文件中提取字符串，并在bash中相应地重命名它们

regex bash

Regex 从文本文件中提取字符串，并在bash中相应地重命名它们,regex,bash,Regex,Bash,我有很多随机命名的文本文件（大约70000个文件）；我所知道的是，在前30行的某个地方，有两行的格式是作者：塞缪尔·理查森和另一行标题：克拉丽莎，第5卷（共9卷）。我不确定这两条线的情况我想提取标题和作者，并相应地重命名文件，类似于“Clarissa，第5卷（共9卷），\uuuuuuu，Samuel Richardson.txt”（我使用，\uuuuuuuuu，，以便在作者和标题之间有有效的分隔符我的代码是 for filename in *.txt; do title=$(hea

我有很多随机命名的文本文件（大约70000个文件）；我所知道的是，在前30行的某个地方，有两行的格式是

作者：塞缪尔·理查森和另一行标题：克拉丽莎，第5卷（共9卷）
。我不确定这两条线的情况
我想提取标题和作者，并相应地重命名文件，类似于“Clarissa，第5卷（共9卷），\uuuuuuu，Samuel Richardson.txt”
（我使用，\uuuuuuuuu，
，以便在作者和标题之间有有效的分隔符
我的代码是
for filename in *.txt; do

    title=$(head -n 30 $filename.txt | grep -i 'Title:' | sed -n 's/^.*Title: //p')
    author=$(head -n 30 $filename.txt | grep -i 'Author:' | sed -n 's/^.*Author: //p')
    new_name="$title ,___, $author"

    mv $filename $new_name.txt
done

它未按预期工作。子代码无效
echo "title: $title _"
echo "author: $author _"

new_name="$title ,___, $author"

echo $new_name

打印以下内容作为输出
 _tle: Clarissa, Volume 5 (of 9)
 _thor: Samuel Richardson
 ,___, Samuel Richardson)

此外，我不知道如何将使用head
命令提取前30行的计算保存到变量firstlines
，以便不应重新计算
代码
firstlines=$(head -n 30 randomname.txt)

使用title=$（$firstlines | grep-i'title:'| sed-n's/^.*title://p'）

打印出错误命令未找到
@Poshi's right:您的主要问题是行尾。看起来每个行尾都包含回车（\r
）。就其本身而言，\r
只是将光标移回行的开头。与\n
结合使用时，它可以正常工作，因为它会移到下一行的开头。但它本身会导致您看到的内容：一些文本，然后光标移回行的开头，然后是更多的文本覆盖原来的内容原来有
编辑：如果我提供了一个解决方案，可能会有所帮助。类似的方法应该可以工作，在分配给新名称之前插入：
title=$(echo -e $title | sed 's/\r//')
author=$(echo -e $author | sed 's/\r//')

至于第二个问题，您未找到命令的原因是变量$firstlines
中的第一个字不是命令。您需要类似以下内容：
title=$(echo -e $firstlines | grep -i 'Title:' | sed -n 's/^.*Title: //p')

@Poshi是对的：您的主要问题是行尾。看起来好像每条行尾都包含一个回车（\r
）。就其本身而言，\r
只是将光标移回行的开头。与\n
结合使用时，它可以正常工作，因为它会移到下一行的开头。但它本身会导致您看到的内容：一些文本，然后光标移回行的开头，然后是更多的文本覆盖原来的内容原来有
编辑：如果我提供了一个解决方案，可能会有所帮助。类似的方法应该可以工作，在分配给新名称之前插入：
title=$(echo -e $title | sed 's/\r//')
author=$(echo -e $author | sed 's/\r//')

至于第二个问题，您未找到命令的原因是变量$firstlines
中的第一个字不是命令。您需要类似以下内容：
title=$(echo -e $firstlines | grep -i 'Title:' | sed -n 's/^.*Title: //p')

@Poshi关于行结尾的评论是正确的，@B.Shefter的答案是正确的，但有一些问题（不带引号的变量引用，依赖于echo
和sed
的非标准特性），所以我想我应该重写（希望）修复了这些问题
此外，我将重复我在评论中给出的建议：使用mv-n
或mv-I
避免在出现任何错误时覆盖文件，并首先进行备份。（无论如何，您都有备份，对吗？您应该始终备份您不想丢失的任何内容。）
不管怎样，我的看法是：
#!/bin/bash

for filename in *.txt; do

    # Grab the first 30 lines with carriage returns removed:
    firstlines=$(head -n 30 "$filename" | tr -d '\r')

    # Capture the title and author. Note that sed doesn't have case-insensitive
    # patterns, so use e.g. [Tt] to manually make them case-insensitive. Also, use
    # [[:blank:]]* to allow any number of spaces and/or tabs after the ":".
    title=$(echo "$firstlines" | sed -n 's/^.*[Tt][Ii][Tt][Ll][Ee]:[[:blank:]]*//p')
    if [ -z "$title" ]; then
        echo "Unable to find Title: in $filename; skipping" >&2
        continue
    fi

    author=$(echo "$firstlines" | sed -n 's/^.*[Aa][Uu][Tt][Hh][Oo][Rr]:[[:blank:]]*//p')
    if [ -z "$author" ]; then
        echo "Unable to find Author: in $filename; skipping" >&2
        continue
    fi

    new_name="$title ,___, $author.txt"

    # Note: the filenames here will contain spaces, so double-quoting is *critical*
    mv -i "$filename" "$new_name"
done

@Poshi关于行结尾的评论是正确的，@B.Shefter的答案是正确的，但有一些问题（不带引号的变量引用，依赖于echo
和sed
的非标准特性），所以我想我应该重写（希望）修复了这些问题
此外，我将重复我在评论中给出的建议：使用mv-n
或mv-I
避免在出现任何错误时覆盖文件，并首先进行备份。（无论如何，您都有备份，对吗？您应该始终备份您不想丢失的任何内容。）
不管怎样，我的看法是：
#!/bin/bash

for filename in *.txt; do

    # Grab the first 30 lines with carriage returns removed:
    firstlines=$(head -n 30 "$filename" | tr -d '\r')

    # Capture the title and author. Note that sed doesn't have case-insensitive
    # patterns, so use e.g. [Tt] to manually make them case-insensitive. Also, use
    # [[:blank:]]* to allow any number of spaces and/or tabs after the ":".
    title=$(echo "$firstlines" | sed -n 's/^.*[Tt][Ii][Tt][Ll][Ee]:[[:blank:]]*//p')
    if [ -z "$title" ]; then
        echo "Unable to find Title: in $filename; skipping" >&2
        continue
    fi

    author=$(echo "$firstlines" | sed -n 's/^.*[Aa][Uu][Tt][Hh][Oo][Rr]:[[:blank:]]*//p')
    if [ -z "$author" ]; then
        echo "Unable to find Author: in $filename; skipping" >&2
        continue
    fi

    new_name="$title ,___, $author.txt"

    # Note: the filenames here will contain spaces, so double-quoting is *critical*
    mv -i "$filename" "$new_name"
done

快速猜测…检查行尾，它们可能是Windows风格的。如果出现错误，可能会出现大量混乱，可能会丢失数据。我建议使用mv-n
或mv-I
避免意外覆盖文件，事先备份所有内容将是一个非常好的主意。循环调用head
两次、grep
两次和sed
两次，使用4个管道和2个命令替换，效率极低（并且在大型数据集上会花费很长时间）。相反，只需在整个过程中使用awk
或sed
。一个简单的sed-n'/^\（作者：\；标题：\）/p'文件
将找到以作者开头的两行：
或标题：
允许您删除这两个文件，只留下想要的信息。快速猜测…检查行尾，它们可能是Windows风格的。如果出现错误，可能会出现大量混乱和数据丢失。我建议使用mv-n
或mv-I
以避免意外覆盖文件，并备份eve事先做好准备将是一个非常好的主意。使用4个管道和2个命令替换循环调用head
两次、grep
两次和sed
两次是非常低效的（并且在大数据集上会花费很长时间）。相反，只需对整个内容使用awk
或sed
。一个简单的sed-n'/^\（作者：\\；标题：\）/p'文件
将找到以“作者：
或“标题：
开头的两行，允许您删除这两行内容，只留下想要的信息。