Sunday, July 1, 2018
Biggus diskus plus 31 5 0 and how to superphish in your copious spare time
Biggus diskus plus 31 5 0 and how to superphish in your copious spare time
One of the many great advances that Mac OS 9 had over the later operating system was the extremely flexible (and persistent!) RAM disk feature, which I use on almost all of my OS 9 systems to this day as a cache store for Classilla and temporary work area. Its not just for laptops!
While OS X can configure and use RAM disks, of course, its not as nicely integrated as the RAM Disk in Classic is and it isnt natively persistent, though the very nice Esperance DV prefpane comes pretty close to duplicating the earlier functionality. Esperance will let you create a RAM disk up to 2GB in size, which for most typical uses of a transient RAM disk (cache, scratch volume) would seem to be more than enough, and can back it up to disk when you exit. But there are some heavy duty tasks that 2GB just isnt enough for -- what if you, say, wanted to compile a PowerPC fork of Firefox in one, he asked nonchalantly, picking a purpose at random not at all intended to further this blog post?
The 2GB cap actually originates from two specific technical limitations. The first applies to G3 and G4 systems: they cant have more than 2GB total physical RAM anyway. Although OS X RAM disks are "sparse" and only actually occupy the amount of RAM needed to store their contents, if you filled up a RAM disk with 2GB of data even on a 2GB-equipped MDD G4 youd start spilling memory pages to the real hard disk and thrashing so badly youd be worse off than if you had just used the hard disk in the first place. The second limit applies to G5 systems too, even in Leopard -- the RAM disk is served by /System/Library/PrivateFrameworks/DiskImages.framework/Resources/diskimages-helper, a 32-bit process limited to a 4GB address space minus executable code and mapped-in libraries (it didnt become 64-bit until Snow Leopard). In practice this leaves exactly 4629672 512-byte disk blocks, or approximately 2.26GB, as the largest possible standalone RAM disk image on PowerPC. A full single-architecture build of TenFourFox takes about 6.5GB. Poop.
It dawned on me during one of my careful toilet thinking sessions that the way awound, er, around this pwobproblem was a speech pathology wefewwal to RAID volumes together. I am chagrined that others had independently came up with this idea before, but lets press on anyway. At this point Im assuming youre going to do this on a G5, because doing this on a G4 (or, egad, G3) would be absolutely nuts, and that your G5 has at least 8GB of RAM. The performance improvement we can expect depends on how the RAM disk is constructed (10.4 gives me the choices of concatenated, i.e., you move from component volume process to component volume process as they fill up, or striped, i.e., the component volume processes are interleaved [RAID 0]), and how much the tasks being performed on it are limited by disk access time. Building TenFourFox is admittedly a rather CPU-bound task, but there is a non-trivial amount of disk access, so lets see how we go.
Since I need at least 6.5GB, I decided the easiest way to handle this was 4 2+GB images (roughly 8.3GB all told). Obviously, the 8GB of RAM I had in my Quad G5 wasnt going to be enough, so (an order to MemoryX and) a couple days later I had a 16GB memory kit (8 x 2GB) at my doorstep for installation. (As an aside, this means my quad is now pretty much maxed out: between the 16GB of RAM and the Quadro FX 4500, its now the most powerful configuration of the most powerful Power Mac Apple ever made. Thats the same kind of sheer bloodymindedness that puts 256MB of RAM into a Quadra 950.)
Now to configure the RAM disk array. I ripped off a script from someone on Mac OS X Hints and modified it to be somewhat more performant. Here it is (its a shell script you run in the Terminal, or you could use Platypus or something to make it an app; works on 10.4 and 10.5):
% cat ~/bin/ramdisk
#!/bin/sh
/bin/test -e /Volumes/BigRAM && exit
diskutil erasevolume HFS+ r1
`hdiutil attach -nomount ram://4629672` &
diskutil erasevolume HFS+ r2
`hdiutil attach -nomount ram://4629672` &
diskutil erasevolume HFS+ r3
`hdiutil attach -nomount ram://4629672` &
diskutil erasevolume HFS+ r4
`hdiutil attach -nomount ram://4629672` &
wait
diskutil createRAID stripe BigRAM HFS+
/Volumes/r1 /Volumes/r2 /Volumes/r3 /Volumes/r4
Notice that Im using stripe here -- you would substitute concat for stripe above if you wanted that mode, but read on first before you do that. Open Disk Utility prior to starting the script and watch the side pane as it runs if you want to understand what its doing. Youll see the component volume processes start, reconfigure themselves, get aggregated, and then the main array come up. Its sort of a nerdily beautiful disk image ballet.
One complication, however, is you cant simply unmount the array and expect the component RAM volumes to go away by themselves; instead, you have to go seek and kill the component volumes first and then the array will go away by itself. If you fail to do that, youll run out of memory verrrrry quickly because the RAM will not be reclaimed! Heres a script for that too. I havent tested it on 10.5, but I dont see why it wouldnt work there either.
% cat ~/bin/noramdisk
#!/bin/sh
/bin/test -e /Volumes/BigRAM || exit
diskutil unmountDisk /Volumes/BigRAM
diskutil checkRAID BigRAM | tail -5 | head -4 |
cut -c 3-10 | grep -v Unknown |
sed s/s3// | xargs -n 1 diskutil eject
This script needs a little explanation. What it does is unmount the RAM disk array so it can be modified, then goes through the list of its component processes, isolates the diskn that backs them and ejects those. When all the disk arrays components are gone, OS X removes the array, and thats it. Naturally shutting down or restarting will also wipe the array away too.
(If you want to use these scripts for a different sized array, adjust the number of diskutil erasevolume lines in the mounter script, and make sure the last line has the right number of images [like /Volumes/r1 /Volumes/r2 by themselves for a 2-image array]. In the unmounter script, change the tail and head parameters to 1+images and images respectively [e.g., tail -3 | head -2 for a 2-image array].)
Since downloading the source code from Mozilla is network-bound (especially on my network), I just dumped it to the hard disk, and patched it on the disk as well so a problem with the array would not require downloading and patching everything again. Once that was done, I made a copy on the RAM disk with hg clone esr31g /Volumes/BigRAM/esr31g and started the build. My hard disk, for comparison, is a 7200rpm 64MB buffer Western Digital SATA drive; remember that all PowerPC OS X-compatible controllers only support SATA I. Heres the timings, with the Quad G5 in Highest performance mode:
hard disk: 2 hours 46 minutes
concatenated: 2 hours 15 minutes (18.7% improvement)
striped: 2 hours 8 minutes (22.9% improvement)
Considering how much of this is limited by the speed of the processors, this is a rather nice boost, and I bet it will be even faster with unified builds in 38ESR (these are somewhat more disk-bound, particularly during linking). Since Ive just saved almost two hours of build time over all four CPU builds, this is the way I intend to build TenFourFox in the future.
The 5.2% delta observed here between striping and concatenation doesnt look very large, but it is statistically significant, and actually the difference is larger than this test would indicate -- if our task were primarily disk-bound, the gulf would be quite wide. The reason striping is faster here is because each 2GB slice of the RAM disk array is an independent instance of diskimages-helper, and since we have four slices, each slice can run on one of the Quads cores. By spreading disk access equally among all the processes, we share it equally over all the processors and achieve lower latency and higher efficiencies. This would probably not be true if we had fewer cores, and indeed for dual G5s two slices (or concatenating four) may be better; the earliest single processor G5s should almost certainly use concatenation only.
Some of you will ask how this compares to an SSD, and frankly I dont know. Although Ive done some test builds in an SSD, Ive been using a Patriot Blaze SATA III drive connected to my FW800 drive toaster to avoid problems with interfacing, so I doubt any numbers Id get off that setup would be particularly generalizable and Id rather use the RAM disk anyhow because I dont have to worry about TRIM, write cycles or cleaning up. However, I would be very surprised if an SSD in a G5 achieved speeds faster than RAM, especially given the (comparatively, mind you) lower SATA bandwidth.
And, with that, 31.5.0 is released for testing (release notes, hashes, downloads). This only contains ESR security/stability fixes; youll notice the changesets hash the same as 31.4.0 because they are, in fact, the same. The build finalizes Monday PM Pacific as usual.
31.5.0 would have been out earlier (experiments with RAM disks notwithstanding) except that I was waiting to see what Mozilla would do about the Superfish/Komodia debacle: the fact that Lenovo was loading adware that MITM-ed HTTPS connections on their PCs ("Superfish") was bad enough, but the secret root certificate it possessed had an easily crackable private key password allowing a bad actor to create phony certificates, and now it looks like the company that developed the technology behind Superfish, Komodia, has by their willful bad faith actions caused the same problem to exist hidden in other kinds of adware they power.
Assuming you were not tricked into accepting their root certificate in some other fashion (their nastyware doesnt run on OS X and near as I can tell never has), your Power Mac is not at risk, but these kinds of malicious, malfeasant and incredibly ill-constructed root certificates need to be nuked from orbit (as well as the companies that try to sneak them on users machines; I suggest napalm, castration and feathers), and they will be marked as untrusted in future versions of TenFourFox and Classilla so that false certificates signed with them will not be honoured under any circumstances, even by mistake. Unfortunately, its also yet another example of how the roots are the most vulnerable part of secure connections (previously, previously).
Development on IonPower continues. Right now Im trying to work out a serious bug with Baseline stubs and not having a lot of luck; if I cant get this working by 38.0, well ship 38 with PPCBC (targeting a general release by 38.0.2 in that case). But Im trying as hard as I can!
No comments:
Post a Comment