[Novalug] Appropriate Swap (was Re: PS Re: boot trick??)

Bryan J. Smith b.j.smith@ieee.org
Sun Dec 13 01:36:20 EST 2009


I don't understand your desktop viewpoint at all.  Increasing swap size does
not increase how much you swap at all.  The Linux kernel is not dumb.  It
actually knows what is and isn't usable.  Furthermore, you can shrink the
size of the swap slice (I pre-allocate equal slice size to /, /tmp and /var if
I'm not using LVM, but I may not make the slice that big).

Furthermore, if you want real, _effective_ swap, try ...
1.  Tune your kernel -- turn down swappiness and increase min_free_kb
2.  Further turn your VM -- the four (4) main ones:  the percents/centiseconds
3.  Don't use swap at all (create/reserve it, don't enable it or disable it sometimes)

Secondly, on enterprise systems, I said _desktop_, or more specifically
in my example, _notebook_.  I've been running with 2GiB since 1999 (yes,
1999), 4GiB since 2005 and I'm looking to go 8GiB as soon as it drops to
$250 for a DDR3 SO-DIMM pair.  That's _all_ I was trying to address,
especially given the original poster's sizing on memory.  ;)

But even for enterprise, again, #1 and #2.  Only do #3 if you really have
issues.  Furthermore ...
4.  Run x86-64 and use hugepages (which cannot be paged currently)

I have tuned major and I mean major, household-name financial systems
in my career, and I have dropped memory consumption 2-3x and increased
performance over a full order of magnitude (typically 20-30x over) by just
a few tunables.  The existence of swap is not the issue.  Never remove swap.
Just make sure it isn't used for performance reasons.  Neither its existence nor
size, again, is ever the issue, honestly.  ;)

Of course, talking more Oracle, their "legacy" (32-bit) guidelines were 2x for
up to 1.7GiB (max SGA for non-PAE), 1x for 4-8GiB with PAE (this is a "gray
area"), and 0.75x beyond 8GiB, IIRC.  That is very dated though and, in
general, you should be running x86-64 with hugepages (which don't page).
I don't know how many times I've run into Oracle consultants who were
brilliant for their Oracle-side tuning, but didn't know jack about Linux at all
(and were extremely dated -- like info 8-10 years old).

As far as Solaris, I'm _not_ remotely interested in Solaris.  Does not apply
to Linux at all, and I regularly have to "take the 'virtual baseball bat' to Solaris
admins" who start to make those assumptions.  If you mean Sun with Linux/
x86-64, then it's no different than other 128GiB to 1TiB x86-64 platforms (or
anything else beyond 32-64GiB RAM), create a 16-32GiB swap and that's it.

Again, I was referring to consumer desktop/notebook sizing with real examples
of 1-4GiB with the potential of 8GiB as some point.  And my swap is based on
the fact that I'm reserving to equal /, /tmp and /var, even if not used.

-- Bryan

P.S.  When not doing LVM, I create swap as the sda5 slice.  Why?  Because
the 5th slice in the legacy BIOS/DOS Disk Label, is the first "Logical" partition.
The first "Logical" partition "loses" its first 512 byte sector for the "Extended"
disk label (partition table).  That means it's not a perfect cylinder size, unlike
the 1st-4th "Primary" slices (again, in the legacy BIOS/DOS Disk Label --
Microsoft also calls this the "MBR Partition Table"), or the 6th on-ward "Logical"
slices in the "Extended" partition table.

When doing LVM, I don't worry about reserving for Swap, and size as I see fit.
I end up doing 8-16GiB swap for 4GiB RAM currently, 16GiB when I plan to
move to 8GiB RAM.

But regardless of disk labels, I still create a base /, /tmp and /var that is at least
2x memory for desktop/notebooks of 1-4GiB RAM.  It's too darn easy for
people to eat up 2GiB of /tmp or /var doing various operations.  That was my
original point.  People make all sorts of weird sizing and everything else for
different filesystems, and I try to stay with a "base" size that ...
A) is a multiple of RAM
B) is on an exact cylinder boundary (or perfect set of Extents in the case of LVM)
C) all match, because I like semmetry

My other filesystems are then multiples of the "base" size.  /usr might be 1x what
/ is, or it might be 2-4x if I'm installing quite a bit.  Many times it's just 1x and then
I create /usr/local and/or /opt, depending on what I need.  I slice off /var/www if
it's a web server, /var/spool or other things, maybe /srv (for those that know why
it was created ;), etc...  I'm almost always doing LVM in those cases where I have
a lot.


I know people debate many things, "I might grow bigger so I make one big file-
system," but that is a Windows-world attitude in my opinion and localizing frag-
mentation, possible downtime/fscks (you can bring the rest of the system back up
while a filesystem requiring a full fsck or other repair, etc... is on-going), etc... is
just better.  Especially when LVM exists.



----- Original Message ----
From: John Franklin <franklin@elfie.org>

For desktops, I'll keep it at or below 1x RAM.  If I have a bare 1GB and I'm using a full second gig of swap, my performance is going to be (how shall I put this?) suboptimal.  RAM is cheap enough that I'm more likely to add RAM than suffer constant swapping.

For enterprise deployments, I try to keep swap down to a few gig if I have any at all.  Swap isn't required on Linux and the performance metrics that dev and QA publish rarely takes into consideration swap performance.  If the production system is using swap, the app is probably operating out of spec or someone dropped the ball during capacity planning.  A little bit of swap (to keep it running) plus a Nagios alert when swap is used so we know to investigate.  The action required will be one of (a) up the RAM, (b) deploy another server (to handle more load), or (c) pull the server cause it's been hacked.

Also, 2x/4x RAM doesn't work so well on Very Large Systems.  Sun has a couple X86 boxes (the X4600 and X4640) that can support 512GB of RAM.  2x / 4x would be 1TB to 2TB of swap.  Perhaps a bit excessive.  If you go SPARC, the M9000 can support 4TB of RAM.  16TB of SWAP would be a shelf of disks each for / and swap.

jf



More information about the Novalug mailing list