Passed VCP5 few weeks ago, and while studying (hardly, it’s not an easy one!), I suddenly spotted two opposed concepts: VMware TPS (Transparent Page Sharing), and ASLR (Address Space Layout Randomization).
I’ve first heard about ASLR while playing with OpenBSD, and reading about their security features (they go far more than just randomizing memory pages!). When studying for the VCP5 exam, I realized that shared pages should suffer when memory pages get randomized: starting two identical machines could end with few identical memory pages to be shared.
After the exam, I started to look for performance tests about shared pages when ASLR is active, and found a very detailed blog entry called Windows 7 Transparent Page Sharing and the ASLR story. They found that ASLR could reduce TPS effectivity, but not too much.
I couldn’t resist the temptation of doing my own tests. My test lab is a cheap ESX5 HP ProLiant ML110 G5 server with 8Gb RAM, and a W2K3 vCenter virtual machine running in the PowerBook that I’m writing now.
I created an Ubuntu 10.10 32bit virtual machine with 1Gb of memory, installed VMware tools, and clonned it 15 times. Ubuntu comes with ASLR activated by default, so when all the virtual machines powered on, I thought that TPS will not do a good job: I was wrong, 1.85Gb of memory where shared after a few hours. Not too bad.
Then I turned off ASLR in every Ubuntu machine, running “sudo sysctl -w kernel.randomize_va_space=0”, and restarted all the virtual machines. After a few hours, TPS found 2.83Gb of shared memory.
Much better, but if all the machines are basically the same one and ASLR is turned off, why there are only 2.83Gb shared from the 15Gb of all the assigned memory?.
The response is probably that Ubuntu machines have 1Gb assigned, but they are doing nothing, there is no memory pressure over the ESX: memory is overcommitted, but in the actual circumstances there is no swapping, no balloon and plenty of unaccessed memory.
But what will happen if I power on twenty 1Gb machines in my 8Gb ESX and wait for a few hours?…
This is when VMware shines, when all the memory reclamation techniques start to work: you can see balloon, compressed and zero reclaiming adding to TPS, and that only 249Mb have been paged to disk.
When you look to each VM, you can see that machines with low Shared and Unaccessed memory values, have started to Balloon memory.
VM |
Consumed |
Shared |
Ballooned |
Unaccessed |
ubu00 ubu01 ubu02 ubu03 ubu04 ubu05 ubu06 ubu07 ubu08 ubu09 ubu10 ubu11 ubu12 ubu13 ubu14 ubu15 ubu16 ubu17 ubu18 ubu19 |
284 134 123 336 131 214 414 248 399 407 129 165 206 272 117 133 336 355 252 127 |
789 236 211 129 218 861 239 821 289 463 229 901 849 728 212 217 734 303 390 237 |
0 649 649 0 649 0 0 0 0 0 649 0 0 0 649 649 0 0 0 649 |
23 40 56 602 56 3 426 16 392 212 39 11 21 69 53 34 11 425 430 34 |
But, why some machines had high Unaccessed memory values and other have “accessed” all their memory?. It seems that linux uses all the free memory as a disk cache (see Linux ate my RAM), so I restarted all the Virtual Machines slowly, waiting for the down to 0Mhz of each Ubuntu vCPU before starting another one, and… all the Virtual Machines started, without Ballooning, or swapping!!!
Well, the trick is that every VM is doing nothing, and full of unaccessed memory. Moreover, remember that ASLR is turned off, and while waiting for the vCPU the TPS started it’s job, and the ESX was never out of free host memory.
VM |
Consumed |
Shared |
Ballooned |
Unaccessed |
ubu00 ubu01 ubu02 ubu03 ubu04 ubu05 ubu06 ubu07 ubu08 ubu09 ubu10 ubu11 ubu12 ubu13 ubu14 ubu15 ubu16 ubu17 ubu18 ubu19 |
286 288 350 287 283 285 264 282 280 283 308 362 286 365 292 232 177 162 166 163 |
66 64 78 67 65 67 263 67 68 66 75 68 63 63 151 125 189 209 205 208 |
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 |
713 713 643 713 718 713 540 717 717 717 641 637 715 637 647 720 722 719 724 723 |
I left these machines powered on for one day, and the same pattern emerged: some machines ballooning, others plenty of unaccessed memory, and the rest with high shared values.
VM |
Consumed |
Shared |
Ballooned |
Unaccessed |
ubu00 ubu01 ubu02 ubu03 ubu04 ubu05 ubu06 ubu07 ubu08 ubu09 ubu10 ubu11 ubu12 ubu13 ubu14 ubu15 ubu16 ubu17 ubu18 ubu19 |
303 290 251 158 279 295 189 322 305 300 245 316 304 263 156 167 134 307 271 406 |
590 248 835 231 800 513 884 383 162 173 842 724 160 822 270 896 229 608 816 257 |
0 0 0 649 0 0 0 0 0 0 0 0 0 0 623 0 649 0 0 0 |
179 529 11 33 4 257 11 360 599 595 5 27 604 18 22 19 49 151 5 427 |
You can see how part of the unaccessed memory has been used by Ubuntu, giving ESXi the chance to find more shared pages, ending with an incredible 10.3Gb of shared memory!.
Previous tests were done with ASLR turned off, so I logged into each Ubuntu, and run “sudo sysctl -w kernel.randomize_va_space=2”. Then restarted all the virtual machines, and waited for another full day…
You can see 9.2 Gb of shared memory with ASLR activated!. Think about it: my host has only 8Gb, and is running the ESXi plus 20 virtual machines with 1Gb assigned to each one.
It’s true that these virtual machines are only running system processes. But do you remember those memory oversized virtual servers that your boss wanted: after a few hours running they will end with a few hundred Mb of active memory, and TPS will recover all those shared pages (even with ASLR activated 🙂
In the next post, I will make some tests using “lookbusy” for generating memory load over two virtual machines, and testing again how ASLR affect TPS in a more real situation.