Recursively find and list filesize and full path on the command line

Can’t beat the command line for flexibility and power in accomplishing system administration tasks. Here’s one way to recursively list the filesizes and full paths of files with a particular extension from the command line:

nice find . -name "*.swf" -type f -print0 | xargs -0r ls -skS | less

This is a succinct way to say:
“Show me all Flash files in the current directory hierarchy, descending to unlimited depth. Print the full filename on standard output followed by a null character. Send each filename in turn to the ‘ls’ command, which will look up each file’s size and print that in 1K blocks followed by the filename. (If there aren’t any results from the first command, don’t even run the ‘ls’ command, since that will just give us a list of all the files in the current directory.) Finally, send all that output to the ‘less’ command, which will allow me to page through and view it easily.”

EDIT: Added -r switch to xargs command to ensure we don’t see a list of all files, if the first ‘find’ command doesn’t find any. That sort of thing could be confusing.

Still filtering spam primarily using the “From:” header? Then read this.

I’m working with an organization that has been refusing “share this” e-mails from our Web site; specifically, e-mails that originate at our Web server that have that organization’s domain name in the “From:” header.

Here’s the problem with this. Let’s say that Joe Bloggs works at Bloggy Spot, and his e-mail address is “” His coworker Carl really wants to forward him a relevant article from the Time Magazine Web site, so he fills out the form, enters his e-mail address (which is required), and Joe’s, and hits “send.”

But since that message from Time Magazine does not originate from inside your network — as far as you can tell — and it claims to come from Joe’s coworker Carl (“”), you refuse that message. “Sorry, can’t deliver to Joe,” you say. “There’s no way you could be Carl. Carl wouldn’t send e-mail from anywhere other than here.”

Don’t refuse those e-mails. Allow them. Rely on other, more reliable methods, and be happy.

Why shouldn’t you base your filtering on the From: header?

For two reasons.

First, you’re trying to fight against something that has been part of the nature of e-mail since its beginning, and second, you’re trying to fight against the nature of the Web today.

  1. This has been the nature of e-mail since its beginning.
    The e-mail protocol standard has always allowed e-mail clients, and hence people, to put whatever they want in the “from” box — so from the beginning, conscientious system administrators have had to rely
    on much more robust methods of content and spam filtering. Looking in the “From:” header for an e-mail supposedly sent from “,” and prohibiting e-mail that way, will only make it harder on users. Regarding the organization I’m negotiating with, their system administrator did point out that they already have multiple other layers of filtering and spam protection in place. I argued that since that was the case and since those methods are much more reliable, they should be relying on these instead.

    Perhaps you see the issue: a system that relied only on this level of filtering would be quite easy to defeat, and a system that relied on more filtering than this, wouldn’t need this type of quasi-effective filtering anyway.

  2. This is the nature of the Web today.
    When you visit a Web site and forward an article to someone you know, your message in the vast majority of cases comes “from” your e-mail address. Obviously, this is done so that the recipient will be more
    likely to accept the e-mail when it arrives. The Web’s most popular sites all follow this practice.

    The New York Times, Time Magazine, CNN, and Fox News sites, for example, allow — and in the case of the Times, require — a user to enter their own e-mail address as the “From:” address. Yahoo!, the Web’s third most visited site, does this as well. I’m sure there are many, many more examples.

Spam is a big problem for organizations, but when filtering spam, you’ve got to choose your battles carefully. If you hamstring your users too much, the costs probably won’t be worth the benefits.

Use crawl-delay in your robots.txt file to slow down robots

You can use the “Crawl-delay” tag in your robots.txt file to slow down Web crawlers:
User-agent: *
Crawl-delay: 15

The time is specified in seconds.