Bash 如何将字符串列表传递给xargs以同时使用wget_Bash_Sed_Xargs

Bash 如何将字符串列表传递给xargs以同时使用wget

bash sed

Bash 如何将字符串列表传递给xargs以同时使用wget,bash,sed,xargs,Bash,Sed,Xargs,我有一个字符串数组（列表），其中包含一个附加了自定义目录名的url urls="http://domain.com/book1**Shakespeare http://domain.com/book2**King http://domain.com/book3**Twain" 每个字符串的url部分（在**之前）请求一个.zip文件，我希望我想传给wget 每个字符串的第二部分（在**之后）是我希望将wget请求中的文件放入的目录因此，最终，我对目录结构的期望是： /Shakespea

我有一个字符串数组（列表），其中包含一个附加了自定义目录名的url

urls="http://domain.com/book1**Shakespeare http://domain.com/book2**King http://domain.com/book3**Twain"

每个字符串的url部分（在**之前）请求一个.zip文件，我希望我想传给wget
每个字符串的第二部分（在**之后）是我希望将wget请求中的文件放入的目录

因此，最终，我对目录结构的期望是：

/Shakespeare/
    book.zip
/King/
    book.zip
/Twain/
    book.zip

到目前为止我得到的是

echo $urls | xargs -n 1 -P 8 | sed 's/\*\*.*//'

这将正确地输出我将需要wget的5个url，而不在每个url的末尾附加**作者。（即

http://domain.com/book2**国王

变成了

http://domain.com/book2

）

现在…我想将这些新的格式正确的下载URL传递给wget，同时还以某种方式传递剥离的

**author

部分，作为wget目标选项的一部分提供

我使用“xargs”的主要原因是我能够将URL列表传递给它，并同时将它们设置为。我希望我可以在同一个电话中同时将它们下载到不同的目标目录。
您所要求的是使用shell管道非常麻烦。主要问题是，您试图获取一个进程（
echo
和/或
sed
）的标准输出作为另一个进程（
wget
）的参数。管道在这里帮不了你，因为通过设计，它们将一个进程的
stdin
连接到另一个进程的
stdout
。这将工具处理的内容与描述处理过程的参数合并在一起。所以管道并不是你真正想要的

import multiprocessing as mp import os import urllib.request # Check out the `requests` 3rd-party library too, it's great # Split string into (URL, author) tuples. You can read this from stdin or a file, too. urls = 'http://domain.com/book1**Shakespeare http://domain.com/book2**King' # etc args = map(lambda x: x.split('**'), urls.split(' ')) def download_file(url, author_name): if not os.path.isdir(author_name): os.mkdir(author_name) # Transfer URLs contents to local file with urllib.request.urlopen(url) as u, open(author_name + '/book.zip', 'wb') as f: f.write(u.read()) # Run the download function in a pool of worker processes (defaults to CPU count) # A simple `os.system()` or `os.popen()` call would work too with multiprocessing.Pool() as pool: pool.map(download_file, args)
您可以使用
sed
或
awk
以及
split
、
paste
等工具对其进行破解，但您至少需要编写一个完整的shell脚本，而不仅仅是一个管道。但我真的建议使用功能更全面的脚本语言，特别是更好的字符串处理。您希望的另一件事是能够启动子流程
这一切都表明Python是一个不错的选择。下面是一个示例实现（经过测试，但不严格），它应该满足您的要求

import multiprocessing as mp import os import urllib.request # Check out the `requests` 3rd-party library too, it's great # Split string into (URL, author) tuples. You can read this from stdin or a file, too. urls = 'http://domain.com/book1**Shakespeare http://domain.com/book2**King' # etc args = map(lambda x: x.split('**'), urls.split(' ')) def download_file(url, author_name): if not os.path.isdir(author_name): os.mkdir(author_name) # Transfer URLs contents to local file with urllib.request.urlopen(url) as u, open(author_name + '/book.zip', 'wb') as f: f.write(u.read()) # Run the download function in a pool of worker processes (defaults to CPU count) # A simple `os.system()` or `os.popen()` call would work too with multiprocessing.Pool() as pool: pool.map(download_file, args)
这应该满足您的需要，不过更好的方法可能是将逻辑移到上游，即无论您在哪里生成
$url
。另外，我不清楚您是否需要添加
.zip
。如果是这样，您也可以在
sed
模式中执行此操作
说明：

wget
可以使用
-p
选项指定前缀/下载位置
因此，如果您的目标是以如下方式运行每个命令：

wget http://domain.com/book1 -P Shakespeare

然后我会首先使用
sed
将每个
**
替换为
-p
，然后使用
-n3
导入
xargs
，因为当它到达
xargs
时，实际上你想一次传递三个单词到
wget
，这种用法就是
xargs
的用途。我以前从未真正使用过Bash，所以我想我的方法非常笨拙，因为我花了很多时间才弄明白！：让我开始的是我建立的一个小型自动化工作流程，它可以处理这个问题。它涉及到一些“shell脚本”片段来进行重命名、移动等…我想也许我可以将整个过程移植到Bash脚本来改进它。我也从未真正与Python合作过，尽管我很欣赏这个方向的推动，我会尝试一下！砰，这正是我想要的，而且效果很好。（不需要附加“.zip”，因为它是通过url重定向处理的。）非常感谢！