Awk选择数据行
我必须使用awk处理以下数据文件:Awk选择数据行,awk,selection,rows,Awk,Selection,Rows,我必须使用awk处理以下数据文件: YEARS:1995:1996:1997:1998:1999:2000 VISITS Domain1:259:2549:23695:24889:1240:21202 Domain2:32632:87521:147122:22952:2365:121230 Domain3:5985:92104:921744:43124:74234:68350 Domain4:8321:36520:68712:32102:22003:82100 SIGNUPS Domain1:2
YEARS:1995:1996:1997:1998:1999:2000
VISITS
Domain1:259:2549:23695:24889:1240:21202
Domain2:32632:87521:147122:22952:2365:121230
Domain3:5985:92104:921744:43124:74234:68350
Domain4:8321:36520:68712:32102:22003:82100
SIGNUPS
Domain1:212:202:992:1202:986:3253
Domain2:10401:44522:20103:3595:11410:353
Domain3:3695:23230:452030:25052:9858:3020
Domain4:969:24247:9863:24101:5541:3663
我需要知道每年的总访问量和注册量。我的问题是我找不到只选择前四行和后四行的方法,有人能给我一些关于如何实现这一点的提示吗
示例输出(仅限访问):
您可以匹配“访问”和“注册”行,并设置一个变量,指示正在处理的记录类型
例如:
BEGIN {
FS = ":";
}
/^YEARS/ {
for (i = 2 ; i <= NF; i++) {
year[i] = $i;
}
next;
}
/^VISITS/ {
mode = "VISITS";
next;
}
/^SIGNUPS/ {
mode = "SIGNUPS";
next;
}
{
for (i = 2; i <= NF; i++) {
# output "VISITS"/"SIGNUPS", domain, year, value
print mode, $1, year[i], $i;
}
}
开始{
FS=“:”;
}
/^年/{
对于(i=2;i您可以匹配“访问”和“注册”行,并设置一个变量,指示您正在处理的记录类型
例如:
BEGIN {
FS = ":";
}
/^YEARS/ {
for (i = 2 ; i <= NF; i++) {
year[i] = $i;
}
next;
}
/^VISITS/ {
mode = "VISITS";
next;
}
/^SIGNUPS/ {
mode = "SIGNUPS";
next;
}
{
for (i = 2; i <= NF; i++) {
# output "VISITS"/"SIGNUPS", domain, year, value
print mode, $1, year[i], $i;
}
}
开始{
FS=“:”;
}
/^年/{
对于(i=2;i
而不是:
split( x, d ); split( x, ym )
而不是:
split( x, d ); split( x, ym )
当您说“仅选择前四行和后四行”时,我假设您的意思是分别处理访问和注册:
awk -F: '
$1 == "YEARS" {for (i=2; i<=NF; i++) {yr[i] = $i}; next}
$1 == "VISITS" {visits = 1; signups = 0; next}
$1 == "SIGNUPS" {visits = 0; signups = 1; next}
visits {
for (i=2; i<=NF; i++) {
v_d[$1] += $i # visits by domain
v_y[yr[i]] += $i # visits by year
}
}
signups {
for (i=2; i<=NF; i++) {
s_d[$1] += $i # signups by domain
s_y[yr[i]] += $i # signups by year
}
}
END {
OFS=FS
print "VISITS"
for (d in v_d) print d, v_d[d]
for (y in v_y) print y, v_y[y]
print "SIGNUPS"
for (d in s_d) print d, s_d[d]
for (y in s_y) print y, s_y[y]
}'
当您说“仅选择前四行和后四行”时,我假设您的意思是分别处理访问和注册:
awk -F: '
$1 == "YEARS" {for (i=2; i<=NF; i++) {yr[i] = $i}; next}
$1 == "VISITS" {visits = 1; signups = 0; next}
$1 == "SIGNUPS" {visits = 0; signups = 1; next}
visits {
for (i=2; i<=NF; i++) {
v_d[$1] += $i # visits by domain
v_y[yr[i]] += $i # visits by year
}
}
signups {
for (i=2; i<=NF; i++) {
s_d[$1] += $i # signups by domain
s_y[yr[i]] += $i # signups by year
}
}
END {
OFS=FS
print "VISITS"
for (d in v_d) print d, v_d[d]
for (y in v_y) print y, v_y[y]
print "SIGNUPS"
for (d in s_d) print d, s_d[d]
for (y in s_y) print y, s_y[y]
}'
你能根据提供的输入发布一个预期输出的示例吗?你能根据提供的输入发布一个预期输出的示例吗?
VISITS
Domain1:73834
Domain2:413822
Domain3:1205541
Domain4:249758
1999:99842
2000:292882
1995:47197
1996:218694
1997:1161273
1998:123067
SIGNUPS
Domain1:6847
Domain2:90384
Domain3:516885
Domain4:68384
1999:27795
2000:10289
1995:15277
1996:92201
1997:482988
1998:53950