Which directories were archived? Command-line

I archived a directory. It took two hours, then exited with a non-zero exit status (that means an error). Hmm — I was just testing something; I only cared if certain specific subdirectories were present in the archive. So I needed a way to look deep inside, quickly, and find those particular directories.

GNU tar will let you “test” an archive with -t, but I only wanted a list of the directories archived. Then I wanted that sorted. So…

$ nice tar -tjvf data.tar.bz2 | tr -s ' ' | cut -d' ' -f 6- | cut -d / -f -2 > tardirs.txt
$ uniq tardirs.txt > tardirs_uniq.txt
$ sort tardirs_uniq.txt > tardirs_uniq_sorted.txt

The -tjvf arguments to tar let you look inside, the “tr” command collapses adjacent spaces so that the first “cut” command will output only the sixth (file) field, and the second “cut” command will reduce a directory like “folder/folder/folder/fun.txt” to “folder/folder.” Then “uniq” will remove non-unique names.


Bash pipe fun

How about “recursively look at a log of hostnames used to request my site content. Sort them and ensure that only unique ip address and hostname combinations are counted. Find how many use my ‘.biz’ hostname to land on my site”:

find . -iname '*ecommerce-host_log*' | nice cat | nice xargs cut --delimiter=' ' -f 1,4 | nice sort | nice uniq | nice grep \.biz | nice wc -l

I wasn’t sure which commands would be most processor-intensive, so I used “nice” liberally.


Apache custom logging

Aren’t you interested in seeing what requests users, bots, or script kiddies make of your site, especially those things that client-side JavaScript-based analytics packages don’t tell you?

Under Apache, custom logging can give you lots of information you might not have seen otherwise. I’ll let the documentation for Apache’s mod_log_config say most of this, but as a quick preview, you could try defining a custom log format up near the top of your httpd.conf with

LogFormat "%a %t %{Host}i \"%r\"" hostlog

for example, then in all of your Directory containers, you could do

CustomLog logs/forest-monsen-site-host-log hostlog

Then, in my case, /var/log/httpd/forest-monsen-site-host-log would contain lines like
192.168.0.3 [31/Aug/2010:08:53:24 -0500] www.forestmonsen.com "GET /aggregator/sources/2 HTTP/1.0"
192.168.0.5 [31/Aug/2010:08:53:24 -0500] www.forestmonsen.org "GET /images/house.gif HTTP/1.1"

And I’d be able to tell which hostname was originally requested by the user — before any of my mod_rewrite rules got to it. Good stuff.


Server move complete

I migrated a bunch of stuff from a CentOS 4 server to Ubuntu 8.04 LTS over the last couple of days.

  • Five websites: One Moodle and one Drupal site backed by MySQL databases, and three static sites. SSL setup.
  • Added some software. How can I work without vim and slocate?
  • Security hardening, including a service review, permissions, firewall setup, administrative access through SSH, sudo config, and Postfix with spam filtering.
  • Nagios server monitoring config.

I checked my work logs and decided that I did pretty well, considering I got it all done in 10 hours 35 minutes.


Set Debian or Ubuntu server timezone

This one’s an easy one, from the tzselect (1) manpage:

sudo dpkg-reconfigure tzdata


Install a Java Virtual Machine (JVM) plugin for Google Chrome beta running on Ubuntu Linux

Interested in getting Java to work in the just-released Google Chrome on your Ubuntu install? You can always try linking directly to the plugin binary:

$ locate libnpjp2.so
/usr/lib/jvm/java-6-sun-1.6.0.16/jre/lib/i386/libnpjp2.so
$ sudo mkdir /opt/google/chrome/plugins
$ cd /opt/google/chrome/plugins/
$ sudo ln -s /usr/lib/jvm/java-6-sun-1.6.0.16/jre/lib/i386/libnpjp2.so .

Works for me!


Flush DNS cache in Ubuntu

Interested in flushing your Ubuntu DNS cache? Note: I’m running Jaunty Jackalope as of the date of this post.

Well, Ubuntu doesn’t cache DNS by default. Your cache rests within your router, or your assigned DNS servers. You could restart your router, if you have access to it. Or wait until the time-to-live has expired.

You can install a local resolver that will cache DNS addresses, if you like. It will speed up your Web access slightly, since your Web browser will check the local cache first. I imagine the time you save will be measured in milliseconds.

Do that with:

sudo apt-get update && sudo apt-get install nscd

And to clear your local cache, restart the service:

sudo /etc/init.d/nscd restart


Recursively find and list filesize and full path on the command line

Can’t beat the command line for flexibility and power in accomplishing system administration tasks. Here’s one way to recursively list the filesizes and full paths of files with a particular extension from the command line:

nice find . -name "*.swf" -type f -print0 | xargs -0r ls -skS | less

This is a succinct way to say:
“Show me all Flash files in the current directory hierarchy, descending to unlimited depth. Print the full filename on standard output followed by a null character. Send each filename in turn to the ‘ls’ command, which will look up each file’s size and print that in 1K blocks followed by the filename. (If there aren’t any results from the first command, don’t even run the ‘ls’ command, since that will just give us a list of all the files in the current directory.) Finally, send all that output to the ‘less’ command, which will allow me to page through and view it easily.”

EDIT: Added -r switch to xargs command to ensure we don’t see a list of all files, if the first ‘find’ command doesn’t find any. That sort of thing could be confusing.


Resuming scp transfers using rsync

Well, since you love that good ol’ command line, I’ll pass on to you something I found today out there on the Internets. scp (“secure copy”) is great, but it can’t resume a transfer that failed halfway in the middle.

What you can do instead, since you have rsync installed, is:

rsync --partial --progress --bwlimit=10 --rsh=ssh user@host:/remote/file/path /local/file/path

Works good!