Python 基于键从表中获取所需数据
我在一个文件中有一个数据集,由三列(IP地址、端口、域名)组成,如下所示:Python 基于键从表中获取所需数据,python,linux,bash,shell,Python,Linux,Bash,Shell,我在一个文件中有一个数据集,由三列(IP地址、端口、域名)组成,如下所示: 172.56.146.16 61981 r5---sn-uhvcpax0n5-x5ue.googlevideo.com 172.56.146.13 64576 r2---sn-uhvcpax0n5-x5ue.googlevideo.com 172.56.146.46 56483 ssl.gstatic.com 172.56.146.14 57054 r3---sn-uhvcpax0n5-x5ue.googlevideo.
172.56.146.16 61981 r5---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.13 64576 r2---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.46 56483 ssl.gstatic.com
172.56.146.14 57054 r3---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.14 58157 r3---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.18 62666 ssl.gstatic.com
172.56.146.15 55682 r4---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.16 52234 r5---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.59 57106 ssl.gstatic.com
172.56.146.18 58897 ssl.gstatic.com
172.56.146.16 52258 r5---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.15 55694 r4---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.32 64281 ssl.gstatic.com
172.56.146.39 60581 ssl.gstatic.com
172.56.146.13 57137 r2---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.14 64763 r3---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.13 57135 r2---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.15 51318 r4---sn-uhvcpax0n5-x5ue.googlevideo.com
我还在文件中设置了一个密钥,该密钥仅由IP地址和端口组成:
172.56.146.15 49333
172.56.146.16 52233
172.56.146.46 56483
172.56.146.14 58928
172.56.146.16 61981
172.56.146.13 64576
172.56.146.14 58157
172.56.146.18 62666
172.56.146.15 55682
172.56.146.14 57054
现在我要逐一考虑从密钥集中的所有行,把它作为输入到我的数据集,作为回报,我应该能够从每个密钥的数据集(IP地址和从密钥集获取的端口)获得域名。
例如,对于172.56.146.15 49333
我可以得到结果“未找到域”,对于172.56.146.46 56483
,我应该得到结果ssl.gstatic.com
,依此类推。有没有人能告诉我如何使用shell命令或脚本来实现这一点,以便生成的输出如下所示(与键集中的键一一对应):
找不到域
ssl.gstatic.com
r5--sn-uhvcpax0n5-x5ue.googlevideo.com
使用GNU bash:
#!/bin/bash
while read -r ip foo bar; do
grep "$ip $foo" dataset
[[ $? != 0 ]] && echo "$ip $foo domain not found"
done < keys
#/bin/bash
而读-r ip foo条;做
grep“$ip$foo”数据集
[[$?!=0]]&回显“$ip$foo域未找到”
完成<键
输出:
172.56.146.15 49333 domain not found
172.56.146.16 52233 domain not found
172.56.146.46 56483 ssl.gstatic.com
172.56.146.14 58928 domain not found
172.56.146.16 61981 r5---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.13 64576 r2---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.14 58157 r3---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.18 62666 ssl.gstatic.com
172.56.146.15 55682 r4---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.14 57054 r3---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.15 49333未找到域
172.56.146.16 52233未找到域
172.56.146.46 56483 ssl.gstatic.com
172.56.146.14未找到58928域
172.56.146.16 61981 r5--sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.13 64576 r2--sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.14 58157 r3--sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.18 62666 ssl.gstatic.com
172.56.146.15 55682 r4--sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.14 57054 r3--sn-uhvcpax0n5-x5ue.googlevideo.com
使用GNU bash:
#!/bin/bash
while read -r ip foo bar; do
grep "$ip $foo" dataset
[[ $? != 0 ]] && echo "$ip $foo domain not found"
done < keys
#/bin/bash
而读-r ip foo条;做
grep“$ip$foo”数据集
[[$?!=0]]&回显“$ip$foo域未找到”
完成<键
输出:
172.56.146.15 49333 domain not found
172.56.146.16 52233 domain not found
172.56.146.46 56483 ssl.gstatic.com
172.56.146.14 58928 domain not found
172.56.146.16 61981 r5---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.13 64576 r2---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.14 58157 r3---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.18 62666 ssl.gstatic.com
172.56.146.15 55682 r4---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.14 57054 r3---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.15 49333未找到域
172.56.146.16 52233未找到域
172.56.146.46 56483 ssl.gstatic.com
172.56.146.14未找到58928域
172.56.146.16 61981 r5--sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.13 64576 r2--sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.14 58157 r3--sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.18 62666 ssl.gstatic.com
172.56.146.15 55682 r4--sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.14 57054 r3--sn-uhvcpax0n5-x5ue.googlevideo.com
两种解决方案,都是将数据文件读入数组,然后查找密钥文件中每一行的数组值
SO.sh
是脚本文件的名称,data
是数据文件,keys
是带有键的文件#!/usr/bin/awk -f
# Process first file, read into array
NR == FNR {
datafile[$1, $2] = $3
next
}
# Look up value for key
{
if (datafile[$1, $2] == "")
print "domain not found"
else
print datafile[$1, $2]
}
这被调用,假设它存储在SO.awk
中,如下所示:
./SO.sh data keys
./SO.awk data keys
对于大文件,awk解决方案的速度将提高几个数量级。两种解决方案,都是将数据文件读入数组,然后查找密钥文件中每一行的数组值
SO.sh
是脚本文件的名称,data
是数据文件,keys
是带有键的文件#!/usr/bin/awk -f
# Process first file, read into array
NR == FNR {
datafile[$1, $2] = $3
next
}
# Look up value for key
{
if (datafile[$1, $2] == "")
print "domain not found"
else
print datafile[$1, $2]
}
这被调用,假设它存储在SO.awk
中,如下所示:
./SO.sh data keys
./SO.awk data keys
#!/bin/sh
while IFS='' read -r line || [[ -n "$line" ]]; do
if grep -q -s "$line" table.txt; then
result=($(grep -s $line table.txt))
echo ${result[2]}
else
echo "domain not found"
fi
done < "$1"
结果:
domain not found
domain not found
ssl.gstatic.com
domain not found
r5---sn-uhvcpax0n5-x5ue.googlevideo.com
r2---sn-uhvcpax0n5-x5ue.googlevideo.com
r3---sn-uhvcpax0n5-x5ue.googlevideo.com
ssl.gstatic.com
r4---sn-uhvcpax0n5-x5ue.googlevideo.com
r3---sn-uhvcpax0n5-x5ue.googlevideo.com
用这个
#!/bin/sh
while IFS='' read -r line || [[ -n "$line" ]]; do
if grep -q -s "$line" table.txt; then
result=($(grep -s $line table.txt))
echo ${result[2]}
else
echo "domain not found"
fi
done < "$1"
结果:
domain not found
domain not found
ssl.gstatic.com
domain not found
r5---sn-uhvcpax0n5-x5ue.googlevideo.com
r2---sn-uhvcpax0n5-x5ue.googlevideo.com
r3---sn-uhvcpax0n5-x5ue.googlevideo.com
ssl.gstatic.com
r4---sn-uhvcpax0n5-x5ue.googlevideo.com
r3---sn-uhvcpax0n5-x5ue.googlevideo.com
你应该提供一些启动代码,你不能期望我们为你做这件事…那么输出会是什么样的呢?附加到密钥的域?只是域名?输出中的行的顺序重要吗?以上刚刚回答。输出应为单列形式,包含如上所示的域名。@Seekheart:即使在方向上有一些帮助,也可能是好的。例如,我只是想看看如何使用python生成唯一的密钥(作为ip/端口号的函数)。我知道可能有多种方法,但任何简单/快速的解决方案都将受到高度赞赏。您应该提供一些启动程序代码,您不能期望我们为您这样做……那么输出会是什么样的呢?附加到密钥的域?只是域名?输出中的行的顺序重要吗?以上刚刚回答。输出应为单列形式,包含如上所示的域名。@Seekheart:即使在方向上有一些帮助,也可能是好的。例如,我只是想看看如何使用python生成唯一的密钥(作为ip/端口号的函数)。我知道可能有多种方法,但任何简单/快速的解决方案都将受到高度赞赏。