在bash中多次读取txt文件（线程化）_Bash_Curl_Xargs_Gnu Parallel

在bash中多次读取txt文件（线程化）

bash curl

在bash中多次读取txt文件（线程化）,bash,curl,xargs,gnu-parallel,Bash,Curl,Xargs,Gnu Parallel,下面是一个用于HTTP状态代码的简单bash脚本 while read url do urlstatus=$(curl -o /dev/null --silent --head --write-out '%{http_code}' "${url}" --max-time 5 ) echo "$url $urlstatus" >> urlstatus.txt done < $1 #/bin/bash 读行时；做 curl-

下面是一个用于HTTP状态代码的简单bash脚本

 while read url
    do
        urlstatus=$(curl -o /dev/null --silent --head --write-out  '%{http_code}' "${url}" --max-time 5 )
        echo "$url  $urlstatus" >> urlstatus.txt
    done < $1

#/bin/bash
读行时；做
curl-o/dev/null--silent--head--write out“{http_code}'”$LINE“&echo
回音“$LINE”
完成


您正在逐行读取一个文件，并将该行传递给正在获取内容的curl，然后当curl完成时，它将读取新行。因此，为了避免这种情况，您需要添加&echo
一个令人讨厌的例子：
file="/tmp/url-list.txt"
echo "hello 1" >>$file 
echo "hello 2" >>$file
echo "hello3" >>$file 
while read line ;do 
  sleep 3 && echo "i run after sleep 3 - $line"  & echo "i runn as the same time of sleep 3"
done< "$file"

file=“/tmp/url list.txt”
回显“hello 1”>>$file
回显“hello 2”>>$file
echo“hello3”>>$file
读行时；做
sleep 3&&echo“我在sleep 3之后跑步-$line”&echo“我在sleep 3的同一时间跑步”
完成<“$file”
您提到您在GNU并行方面运气不好。也许这样试试
format='curl -o /dev/null --silent --head --write-out "%{http_code}" "%s"; echo "%s"\n'

awk -v fs="$format" '{printf fs, $0, $0}' url-list.txt | parallel

是否需要128个同步进程
awk -v fs="$format" '{printf fs, $0, $0}' url-list.txt | parallel -P128

GNU并行和xargs也一次处理一行（已测试）
你能举个例子吗？如果您使用-j
，那么您应该能够一次运行多个进程
我会这样写：
doit() {
    url="$1"
    urlstatus=$(curl -o /dev/null --silent --head --write-out  '%{http_code}' "${url}" --max-time 5 )
    echo "$url  $urlstatus"
}
export -f doit
cat "$1" | parallel -j0 -k doit >> urlstatus.txt

根据输入：
Input file is txt file and lines are separated  as
ABC.Com
Bcd.Com
Any.Google.Com
Something  like this
www.google.com
pi.dk

我得到输出：
Input file is txt file and lines are separated  as  000
ABC.Com  301
Bcd.Com  301
Any.Google.Com  000
Something  like this  000
www.google.com  302
pi.dk  200

看起来是对的：
000 if domain does not exist
301/302 for redirection
200 for success

从文件中读取一行不会占用时间<代码>卷曲
正在花费时间。。。查看是否可以选择运行curl
作为后台进程。这是我经常使用parallel
执行的操作。如果你展示你用parallel
和xargs所做的尝试，也许有人会发现一个小的、可修复的问题。Sh filelist.txt和n1也是testedah，这就解释了这一点。考虑通过<代码>人平行教程。它将解释这一点，并让您更好地理解为什么它会这样工作。实际上，您的代码（我测试过）相当快，但它给出的状态代码不是实际的状态代码，而是000，任何建议您都需要从inputfile中给我们几行示例来测试和调试。用这个更新你的问题。（并考虑离开-J在0）。
Input file is txt file and lines are separated  as  000
ABC.Com  301
Bcd.Com  301
Any.Google.Com  000
Something  like this  000
www.google.com  302
pi.dk  200

000 if domain does not exist
301/302 for redirection
200 for success