Python实现对nginx日志access.log统计
需求:统计nginx日志里访问量最大的10个IP1.awk实现awk '{a[$1]++}END{for(i in a)print i ":" a[i]}' |sort -nr |head -n 102.python脚本# !/usr/bin/python# coding=utf8log_file = "data/access.log"ip = {}with open(log_file) as f
Nginx服务器日志相关指令主要有两条:一条是log_format,用来设置日志格式;另外一条是access_log,用来指定日志文件的存放路径、格式和缓存大小,可以参加ngx_http_log_module。一般在nginx的配置文件中日记配置(/usr/local/nginx/conf/nginx.conf)。
nginx日志格式如下:
42.57.99.126 - - [02/Oct/2018:20:40:22 +0800] "GET /favicon.ico HTTP/1.1" 404 564 "-" "Mozilla/5.0 (Linux; U; Android 8.0.0; zh-cn; MI 6 Build/OPR1.170623.027) AppleWebKit/537.36 (KHTML, like Gecko)Version/4.0 Chrome/37.0.0.0 MQQBrowser/7.8 Mobile Safari/537.36"
一般来说:nginx的log_format有很多可选的参数用于指示服务器的活动状态,默认的是:
|
想要记录更详细的信息需要自定义设置log_format,具体可设置的参数格式及说明如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
需求:统计nginx日志access.log里访问量最大的10个IP
1.awk实现
awk '{a[$1]++}END{for(i in a)print i ":" a[i]}' |sort -nr |head -n 10
2.python脚本
# !/usr/bin/python
# coding=utf8
log_file = "data/access.log"
ip = {}
with open(log_file) as f:
for i in f.readlines():
# print(i.strip().split()[0])
ip_attr = i.strip().split()[0]
if ip_attr in ip.keys(): # 如果ip存在于字典中,则将该ip的value也就是个数进行增加
ip[ip_attr] = ip[ip_attr] + 1
else:
ip[ip_attr] = 1
s=sorted(ip.items(),key=lambda x:x[1],reverse=True)
print(s)
# for value in sorted(ip.values()):
# for key in ip.keys():
# if ip[key]==value:
# print(key,ip[key])
print(ip)
3.流量统计
#!/usr/bin/python
#coding=utf8
log_file = "/usr/local/nginx/logs/access.log"
with open(log_file) as f:
contexts = f.readlines()
# define ip dict###
ip = {} # key为ip信息,value为ip数量(若重复则只增加数量)
flow = {} # key为ip信息,value为流量总和
sum = 0
for line in contexts:
# count row size of flow
size = line.split()[9]
# print ip
ip_attr = line.split()[0]
# count total size of flow
sum = int(size) + sum
if ip_attr in ip.keys(): # if ip repeated,如果ip重复就将ip数量加一,而流量继续叠加
# count of ip plus 1
ip[ip_attr] = ip[ip_attr] + 1
# size of flow plus size
flow[ip_attr] = flow[ip_attr] + int(size)
else:
# if ip not repeated
# define initial values of count of ip and size of flow
ip[ip_attr] = 1
flow[ip_attr] = int(size)
print(ip)
print(flow)
print(sum/1024/1024)
统计日志ip访问数
cat access . log | awk '{ips[$1]+=1} END{for(ip in ips) print ip,ips[ip]}'
查看3点-6点之间的Ip访问个数
grep "2021:0[3-6]" img.log | awk '{ips[$1]+=1} END{for(ip inips) print ips[ip],ip}' | sort-nr
/查看3点-6点之间的ip访问数,并且访问数>=200的ip.
grep '2021:0[3-6]' banma_access.log | awk '{ips[$1]+=1}END{for(ip in ips) if(ips[ip]>=200) printips[ip],ip}' | sort -nr
更多推荐
所有评论(0)