Bash pipe fun

How about “recursively look at a log of hostnames used to request my site content. Sort them and ensure that only unique ip address and hostname combinations are counted. Find how many use my ‘.biz’ hostname to land on my site”:

find . -iname '*ecommerce-host_log*' | nice cat | nice xargs cut --delimiter=' ' -f 1,4 | nice sort | nice uniq | nice grep \.biz | nice wc -l

I wasn’t sure which commands would be most processor-intensive, so I used “nice” liberally.


Apache custom logging

Aren’t you interested in seeing what requests users, bots, or script kiddies make of your site, especially those things that client-side JavaScript-based analytics packages don’t tell you?

Under Apache, custom logging can give you lots of information you might not have seen otherwise. I’ll let the documentation for Apache’s mod_log_config say most of this, but as a quick preview, you could try defining a custom log format up near the top of your httpd.conf with

LogFormat "%a %t %{Host}i \"%r\"" hostlog

for example, then in all of your Directory containers, you could do

CustomLog logs/forest-monsen-site-host-log hostlog

Then, in my case, /var/log/httpd/forest-monsen-site-host-log would contain lines like
192.168.0.3 [31/Aug/2010:08:53:24 -0500] www.forestmonsen.com "GET /aggregator/sources/2 HTTP/1.0"
192.168.0.5 [31/Aug/2010:08:53:24 -0500] www.forestmonsen.org "GET /images/house.gif HTTP/1.1"

And I’d be able to tell which hostname was originally requested by the user — before any of my mod_rewrite rules got to it. Good stuff.