因此,我正在尝试创建一个awk脚本,按照最高的三个点击率的顺序确定最多的点击量。我这样做是基于apache web日志,它看起来像
192.168.198.92 - - [22/Dec/2002:23:08:37 -0400] "GET / HTTP/1.1" 200 6394 www.yahoo.com "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1...)" "-"
192.168.198.92 - - [22/Dec/2002:23:08:38 -0400] "GET /images/logo.gif HTTP/1.1" 200 807 www.yahoo.com "http://www.some.com/" "Mozilla/4.0 (compatible; MSIE 6...)" "-"
192.168.72.177 - - [22/Dec/2002:23:32:14 -0400] "GET /news/sports.html HTTP/1.1" 200 3500 www.yahoo.com "http://www.some.com/" "Mozilla/4.0 (compatible; MSIE ...)" "-"
192.168.72.177 - - [22/Dec/2002:23:32:14 -0400] "GET /favicon.ico HTTP/1.1" 404 1997 www.yahoo.com "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)..." "-"
192.168.72.177 - - [22/Dec/2002:23:32:15 -0400] "GET /style.css HTTP/1.1" 200 4138 www.yahoo.com "http://www.yahoo.com/index.html" "Mozilla/5.0 (Windows..." "-"
192.168.72.177 - - [22/Dec/2002:23:32:16 -0400] "GET /js/ads.js HTTP/1.1" 200 10229 www.yahoo.com "http://www.search.com/index.html" "Mozilla/5.0 (Windows..." "-"
192.168.72.177 - - [22/Dec/2002:23:32:19 -0400] "GET /search.php HTTP/1.1" 400 1997 www.yahoo.com "-" "Mozilla/4.0 JJohnJoJJJJJoJJoJJJJJoJJohJJJJJJJJJJJJohnJohJoJoJJJoJJ为了做到这一点,我这样做:
$1 ~ /[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/ {
hitCounter[$1]++
notIndexed=1
for(i in ips) {
if (i==$1) { notIndexed=0 }
}
if(notIndexed==1) {
ips[indexx]=$1
indexx++
}
}此行检测一个IP,然后在"hitCounter“数组中递增该IP的命中计数,该数组由IP索引。然后,我检查IP列表" ips ",看看命中的IP是否已经在其中。如果不是,则将该IP添加到"ips“数组中,并且索引计数加1。从理论上讲,通过这样做,"ips“中的每个索引都应该与"hitCounter”中的索引相关。我终于有了..。
END {
indexxx=0
for (i in hitCounter) {
if (i>hitCounter[firstIP])
firstIP=ips[indexxx]
else if (i>hitCounter[secondIP])
secondIP=ips[indexxx]
else
thirdIP=ips[indexxx]
indexxx++
}
}正是在这里,我检查了"hitCounter“中的IP命中计数,将它们与三个高命中率变量中的命中率进行比较,如果IP命中率大于三个高命中率变量内容之一,则将其设置为当前IP。
这对我来说似乎是可行的,我应该得到"192.168.72.177 192.168.198.92“作为输出,但我得到的却是"192.168.198.92 192.168.198.92”。
为什么?
编辑:对不起,这是我打印最终结果的方式,它被放在"hitCounter“foreach循环之后……
print "The most hits were from "firstIP" "secondIP" "thirdIP发布于 2012-05-06 01:07:34
我不是每次都搜索IP来查看它是否存在于IP地址列表中,而是这样做:
$1 ~ /[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/ {
hitCounter[$1]++
}
END {
for (ip in hitCounter) {
if (hitCounter[ip] > hitCounter[firstIP])
thirdIP = secondIP
secondIP = thirdIP
firstIP = ip
else if (hitCounter[ip] > hitCounter[secondIP])
thirdIP = secondIP
secondIP = ip
else
thirdIP = ip
}
}我认为您混淆的部分原因是认为i是for (i in hitCounter)中的价值而不是关键。
https://stackoverflow.com/questions/10463499
复制相似问题