Nginx

Parsing Nginx Access Log Using Command Line

Parsing-log
mm
Written by Santosh Prasad

My manager ask me to grep some web code to investigate website status. That’s why I need to look nginx access log. After spending more time on internet. I found some command to parsing website access log.
Nginx defaul log format named is “ main” in nginx version: nginx/1.10.1.

log_format  main  '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"';

But unfortunately defaul log format was not sufficient for me. Than I configured nginx to work as a reverse proxy to manage web traffic load. I have configured my custom log format named “upstream “.

log_format upstream  '$http_x_forwarded_for - $remote_user [$time_local] "$request" ''$status $body_bytes_sent "$http_referer" ''"$http_user_agent" "$remote_addr"' 'rt=$request_time ut=$upstream_response_time ''[for $host via $upstream_addr] "$gzip_ratio"';

Here :-

  • $http_x_forwarded_for : Log the real user IP instead of our nginx proxy IP.
  • $remote_user : Most modern app does not use this option for HTTP authenticate with user. It will be blan for most apps.
  • [$time_local] : Logged request time-stamp as per server timezone.
  • “$request” : Request type like- GET, POST, PUT etc.. with args and version.
  • ‘$status $body_bytes_sent’ : Response code generated buy server.
  • “$http_referer”: URL referral
  • rt=$request_time : Response time
  • ut=$upstream_response_time : Upstream response time.
  • $host : Host-name
  • $upstream_addr : Upstream IP address from which request made here our nginx server IP.
  • “$gzip_ratio” : Logged compress ration.

Add the” upstream “parameter at the end of access_log line to tell nginx use custom log format like below :

access_log  /var/log/nginx/vhost/looklinix.com_access.log upstream;

Lets go ahead for parsing nginx access log using command.

List All HTTP Response Status Codes :

# cd /var/log/nginx/
# cat looklinix.com_access.log | cut -d '"' -f3 | cut -d ' ' -f2 | sort | uniq -c | sort

You will get some output like below:

    114 500
  12277 304
    123 416
      1 504
   1614 302
 185148 200
   1892 301
   2188 404
   2508 499
     27 403
     32 401
      3 400
    366 206
    456 502
    477 444
     79 405

Using AWK command :

 # awk '{print $9}' looklinix.com_access.log | sort | uniq -c | sort

You will get some output like below:

    114 500
  12277 304
    123 416
      1 504
   1614 302
 185148 200
   1892 301
   2188 404
   2508 499
     27 403
     32 401
      3 400
    366 206
    456 502
    477 444
     79 405

You can see more than 2000 request returned 404 (Not Found ) response code.

List All Broken (404) Request

Now find out which request are broken and getting 404. We can see all 404 visited page using below command.

 # awk '($9 ~ /404/)' looklinix.com_access.log | awk '{print $7}' | sort | uniq -c | sort -r

You will get some output like below:

      9 /wp-content/uploads/sfn.php
      9 /wp-content/plugins/woocommerce-product-options/includes/image-upload.php
      9 /wp-content/plugins/revslider/temp/update_extract/sfn.php
      9 /wp-content/plugins/revslider/temp/update_extract/revslider/db.php
      9 /wp-content/plugins/revslider/sfn.php
      9 /wp-content/plugins/Login-wall-etgFB/login_wall.php?login=cmd&z3=c2ZuLnBocA%3D%3D&z4=L3dwLWNvbnRlbnQvcGx1Z2lucy8%3d
      9 /wp-content/plugins/jquery-html5-file-upload/jquery-html5-file-upload.php
      9 /wp-content/plugins/formcraft/file-upload/server/php/upload.php
      9 /tiny_mce/plugins/tinybrowser/upload_file.php?folder=/&type=file&feid=&obfuscate=&sessidpass=
      9 /sfn.php
      9 /license.php
      8 /wp-content/uploads/2017/01/mail-client.jpg&_nc_hash=AQCLYYiKAUaGBX1g
      8 /wp-content/plugins/wptf-image-gallery/lib-mbox/ajax_load.php?url=/etc/passwd
      8 /wp-content/plugins/wp-symposium/server/php/index.php
      8 /wp-content/plugins/wp-mobile-detector/cache/db.php
      8 /wp-content/plugins/wp-ecommerce-shop-styling/includes/download.php?filename=../../../../../../../../../etc/passwd
      8 /wp-content/plugins/./simple-image-manipulator/controller/download.php?filepath=/etc/passwd
      8 /wp-content/plugins/recent-backups/download-file.php?file_link=/etc/passwd
      8 /wp-content/plugins/front-end-upload/destination.php
      8 /wp-content/plugins/candidate-application-form/downloadpdffile.php?fileName=../../../../../../../../../../etc/passwd
      8 /wp-content/cache/autoptimize/css/autoptimize_e039f9699b9008b4a87e6e80c5bf48b5.css
      7 /questions/question/php-fpm-and-nginx-502-bad-gateway/
      7 /?p=1602
      7 /author/santosh-prasad/page/4/
      6 /wp-content/uploads/2017/02/web-based-monitoring-tool.jpg&_nc_hash=AQC7GIulBxipvMss
      6 /wp-content/themes/infocus/lib/scripts/dl-skin.php
      6 /wp-content/plugins/simple-ads-manager/js/slider/tmpl.js
      6 /pagead/gen_204?id=
      6 /?p=1894
      6 /?p=1756
      6 /?p=1532
      6 /mdocs-posts/?mdocs-img-preview=../../../wp-config.php
      6 /easy-steps-to-upgrade-php-5-3-to-php-5-6-on-centos-6-x-and-rhel-6-x/http:%5C/%5C/www.looklinux.com%5C/wp-login.php?action=lostpassword
      6 /category/uncategorized/
      5 /wp-content/plugins/google-mp3-audio-player/direct_download.php?file=../../../wp-config.php
      5 /wp-content/plugins/db-backup/download.php?file=../../../wp-config.php
      5 /wp-content/cache/autoptimize/css/autoptimize_b2fab305691e9655969910635d0b8352.css
      5 /sample-page/
      5 /?p=1653

List All 301 Permanently Moved URLs

We can also list all top ten 301 permanently moved URLs using below command.

# awk '($9 ~ /301/)' looklinix.com_access.log | awk '{print $7}' | sort | uniq -c | sort -r | head

You will get some output like below:

      5 /looklinux/preview.php?title=vo7fw
      8 /how-to-run-process-or-program-on-specific-cpu-cores-in-linux
      8 /how-to-access-linux-terminal-using-chrome-web-browser
      8 /easy-steps-to-clone-your-hard-drive-using-dd
      8 /best-5-linux-open-source-text-editors
      8 /awstats/awstats.pl?config=looklinux.com
      7 /top-5-web-based-linux-monitoring-tools
      1 /basic-mysql-commands-database-administrator
      1 /awstats-log-analyzer-installation-and-configuration-on-centos-fedora-and-rhel-system/
      1 /awstats-log-analyzer-installation-and-configuration-on-centos-fedora-and-rhel-system
      1 /author/santosh-prasad/page/2/?ap_ajax_action=search_mentions&%23038;action=ap_ajax

We can also check from which source IP you are 404 request are coming .

# awk -F\” ‘($2 ~ “/survey/report/na”){print $1}’ looklinix.com_access.log | awk ‘{print $1}’ | sort | uniq -c | sort –r

You will get some output like below Command:

3 197.210.28.52,107.167.112.38
2 63.139.29.90
2 175.139.178.106
1 99.95.1.42
1 87.112.31.172
1 86.7.38.59
1 86.163.13.111
1 86.156.51.243
1 86.153.19.27
1 86.149.8.56
1 86.139.199.120
1 86.130.71.225
1 86.129.148.242
1 85.255.235.145
1 85.211.50.44
1 84.51.152.254
1 82.40.13.61 1 82.21.137.68
1 82.15.34.126
1 81.157.121.242 1 81.155.254.63

I hope this article will help to parsing your Nginx log. If you have any queries and problem please comment in comment section.

Thanks:)

About the author

mm

Santosh Prasad

Hi! I'm Santosh and I'm here to post some cool article for you. If you have any query and suggestion please comment in comment section.

Leave a Comment