[Novalug] Old distros (from 90s)

Rich Kulawiec rsk@gsp.org
Sun Mar 8 16:13:00 EDT 2015


On Sat, Mar 07, 2015 at 04:14:56AM -0500, jerry w via Novalug wrote:
> Spammed wannabes,
> aka script kiddies, those not well versed in regex ????

Now that I have a little more time, let me give an extended version
of my initial answer.

It's really not necessary for anyone to be versed in regex in order to
harvest addresses: they appear -- in immediately-usable form -- in
the headers of billions of mail messages, Usenet articles, and elsewhere.

Any piece of harvesting software which is installed on any system
will, in all likelihood, examine every file on the system for anything
of the form:

	string@string

and return it to its controller as a plausible email address.  Thus
if john@example.com appears on any system anywhere on the planet and
that system is subsequently compromised, then john@example.com will
be picked up.  If that harvesting software is only a little more
sophisticated, then it will be picked up *and* associated with
useful metadata, e.g.:

	Address: john@example.com
	Harvested: 2015-03-08 19:20:21 UTC
	Context: mail folder
	Timestamp: 2015-01-06 06:30:47 PST
	IP: 192.168.4.7

which tells the controller when the address was harvested, where on
the system it was found, what the timestamp associated with that is
(which might be the timestamp on the mail message it was found in)
and the IP address of the reporting system.

That metadata could get more interesting:

	Mail-from: john@example.com
	Mail-to: fred@example.net
	Mail-to: betty@example.net
	Mail-to: sue@example.net
	Mail-to: george@example.net
	Mail-subject: next thursday

by including still more details from the message in which it was found.

Why do that?  Because it's quite helpful to know who john@example.com
corresponds with, whether you're trying to get past his automatic or
eyeball spam filters, or whether you're trying to phish/spearphish him.
(Some of you are probably thinking that this allow the construction
of a social graph around john@example.com.  Yes, it will.  The graph
will have nodes that are addresses -- and they may be coincident --
and edges that represent mail from/to, perhaps weighted by frequency.)

There's more metadata that can be included, but this is enough to
illustrate the point: and that is that addresses are trivial to pick
up and when associated with metadata, they can reveal quite a bit of
useful information to attackers.  Of course, use of regex and a little
bit scripting can reveal yet more information: I'm sure anyone on this
All mail infrastructure, all mail clients, all mail servers, all
mail *everything* relies on the addresses being present and 
list with basic fluency in Perl or Python or Ruby et.al. is completely
capable of crafting such software.

Things like RegistrarBoundaries.pm from CPAN make validating the
right-hand-side of addresses much easier than it might be otherwise.
That Perl module encapsulates knowledge of all the TLDs and their
descendants, so it knows for example that @example.co.uk is valid
and @example.zz.uk is not.  It knows about all the generic TLDs,
including all the new ones that are being overrun by spammers;
it knows about ccTLDs like .es and .br and .ca; it knows which
characters are allowed where -- e.g. example-.com is not valid --
and so on.  This makes it fast and simple to determine that
john@example.com is valid and mary@example.go.in is not.

And given that are a few hundred million fully-compromised systems on
the 'net right now along with, probably, an equal number of
partly-compromised systems, harvesters have a VERY high probability
of discovering every email address that's in use.

And all of this is just from harvesters that prowl compromised systems.
As I pointed out in URLs referenced upthread, there are MANY other methods
available.  Even modestly-skilled operations are quite good at using
a number of them and integrating the data (which isn't a particularly
difficult task) to build up per-address database entries that hold
a lot of information about every address.

There's nothing you or I or anyone can do to stop this, because there's
nothing we can do to address the current security environment at Internet
scale.  (And even if we did: the information's out.  There's no getting
it back.)  That's why tactics like rskNOSPAM@gsp.org and rsk at gsp dot org
are 100.00% worthless. The only thing they do is impede real live
non-spamming human beings.  That's it.  Spammers not only aren't
obstructed by them, they don't even see them.

The same goes for the farsical exercise we see on some web pages
these days: "address protected from spambots".  Oh, it *might* be,
although some spambots know Javascript and how to solve captchas, so
maybe not.  But it doesn't matter: the first time someone, anyone,
uses it, it begins to appear on multiple systems: their laptop,
their outbound mail server, the recipient's inbound mail server,
the recipient's desktop.  And the second time, it appears on
more systems; and the third, and the fourth, and so on.  It won't
be long until it appears on a system that's already compromised
and running a harvester.

All that those mechanisms do, again, is inconvenience and annoy
real live non-spamming human beings.  Spammers don't see them.

I don't like this situation.  Fifteen years ago, things weren't like
this and there existed some moderately effective methods for keeping
addresses away from spammers, at least for a time.  But then came SoBig
and friends, and the rise of the zombies (bots), and all of that
became irrelevant.  The rule now is that one should presume that ALL
email addresses are in the hands of spammers already -- because even
if that's not true today, it will very likely be true tomorrow.

One more outcome of this is that massive, curated, well-maintained
databases of addresses are out there.  Some spammers use them.
Some buy them.  Some lease them.  Some buy access to partial dumps of
them (e.g., "fresh" addresses, no older than 1 year, no .edu or .gov
domains, no freemail providers").  There's an entire thriving market
for this and it's quite competitive.  So even someone with zero,
zip, nada expertise in address harvesting can lay their hands on
a sizable corpus of addresses at low cost.  There's thus not much
of a barrier to even script-kiddie newbie spammers, provided
they're willing to pony up a little bitcoin.

Another outcome is that there is a sizable related sleazy and abusive
"business" called "e-pending": wanna-be spammers who have real names
but not email addresses pay some of those database maintainers to use
their copious metadata to make plausible guesses as to the email address
associated with a proper name (and perhaps a geographic location or
telephone area code).   Of course those doing the purchasing know
exactly what they're doing, and part of that price gives them
plausible deniability, since they can always blame the e-pender for
anything and everything.  Everyone makes money, and hundreds of
millions of Internet users get abused...again.   And again.

So.  Bottom line.  Assume that any/all addresses are already in
hands of the enemy and prepare defenses accordingly.  Don't waste
even five seconds of your life trying to pretend otherwise.

---rsk



More information about the Novalug mailing list