Sunday, 12 February 2012

Joggler: eMMC hilarity

This is part of my series on running an unmodified Debian on the Joggler. See here for other posts on the same topic.

Since my last post, the helpful folks at Debian have added the missing sdhci_pci module to the installer packages, so now, daily builds of debian-installer do indeed detect the Joggler's internal storage. However, it doesn't quite work right. (shock!)

If you boot up d-i and install to the internal storage, everything will appear to be working fine, and it will install... but if you switch to tty4 where syslog is being displayed, you'll see about a million copies of this message:
mmc2: Too large timeout requested for CMD25!
Every single write to the internal storage causes this message to be printed (CMD25 is the write command). It doesn't prevent the install from working, but when you boot the actual system the console is just filled with this message and you can't see what you're doing. You can change the kernel loglevel so it doesn't spam those messages to the console (adding loglevel=4 to the kernel command line is sufficient here, to disable messages of level warning and below), but they are still getting spewed into klogd, which will, via syslogd... write them to your logfiles.

Yes, you guessed it: this causes more MMC writes, which causes more log spam, which causes more MMC writes... so the system starts spending an unfortunate amount of its resources spiralling its logs out of control. Oh dear. It's not as bad as it could be because the log is buffered, but it's still rather unfortunate. It's possible to hack the logging configuration to "solve" this in various ways, but rather than mess about with that, I wanted to look into why the kernel was printing this message in the first place.
Various other people report getting the same message on their Joggler when they try and run a system from internal storage, but they seem to mostly decide that since it doesn't break anything (directly) they can ignore it; they don't appear to have noticed that the logging is cyclic. I couldn't find anyone who knew what was actually going on, so I compiled my own kernel with a bunch of extra kprintf's in the MMC timeout code, to work it out for myself.

The Joggler's internal storage is an eMMC device - this is basically the innards of an MMC card, soldered onto the motherboard instead of being inside a plastic shell to make it a removable card. It's hardwired into one of the MMC/SD controllers of the Joggler's Poulsbo chipset. eMMC has a few minor differences from regular MMC in how you talk to it, but it's largely the same.

The way timeouts work in MMC is a bit fiddly. The card has a register called CSD, Card Specific Data, which contains a bunch of parameters for how this card is to be accessed. Timeout calculation starts with one of the CSD fields, TAAC. TAAC represents the time it takes to set up a single read access, and is stored in a value/exponent form that lets it represent times from 1 nanosecond up to 80 milliseconds. There's also a second part to the read time, NSAC, which is the number of card clock cycles required for the read access, and you add the two together after dividing NSAC by the clock speed, but in this case NSAC is very small and so I'm just going to ignore it (you will see why later). The Joggler's eMMC device specifies TAAC as 20 milliseconds, which is nearly the maximum value.

Now, we don't care about the read time because we're not getting errors on read, so how do you find the write time? Well, rather than have a similar value for writes, the CSD instead contains a multiplier, R2W_FACTOR - you multiply the read time by it, and that's how long a write takes. R2W_FACTOR is stored as a power of two between 1 and 128. The Joggler's R2W_FACTOR is 128, the maximum value. Can you see where this is going yet? :)

This makes the Joggler's MMC write time 2560 milliseconds: two and a half seconds! This is pretty implausible, but it looks like the manufacturers of the eMMC part just specified very large values to err on the side of caution. But wait! This is just how long a write is "expected" to take: the MMC spec says that to calculate a timeout, you have to multiply that by ten! So, the card's desired write timeout is twenty-five and a half seconds. You're starting to see where this kernel message comes from, now.. but this isn't even the whole story.

The first problem with this, other than its sheer ridiculousness, is that the kernel is calculating this in nanoseconds, since the smallest possible value for TAAC is one nanosecond. It's using an unsigned int, which means it's 32-bit on a 32-bit system like the Joggler, and the largest unsigned 32-bit value interpreted as nanoseconds is only 4.29 seconds. So, the Joggler's card is actually hitting an integer overflow. This is technically a kernel bug, since the spec allows for values this large and thus the kernel should be taking care to not let this overflow, but presumably values this big are rare in reality. So, because of the overflow, the kernel is actually using a timeout of 4.125 seconds.

The second problem with this is that even if you use the smaller value of 4.125 seconds, the way the timeout works doesn't allow for such a large value. The timeout is not being handled in software, the MMC controller does it in hardware. The Joggler uses an SDHCI controller, which is the standard for MMC/SD controller interfaces. SDHCI timeouts are based on a particular clock in the controller, the speed of which is set by the designer. The OS only has to give the controller a value specifying how many clock cycles it should wait for, as a power of two between 2^13 and 2^27. The Poulsbo SDHCI controller has a timeout clock of 48MHz, so the largest timeout it can handle is 2.8 seconds. So, even the overflowed value of 4.125 seconds is too large.

The controller driver detects this, caps it at 2.8 seconds and continues, and then the writes do work, because in reality the card doesn't require such a large timeout; but because this calculation is actually done for every single MMC command, not just once, it spams the kernel log with a message about this every time. I think this is unnecessary and the kernel should probably at most print this once, when the card is inserted and the CSD read, or just not print it at all, since there's nothing you can do about it anyway; the card's data is just crazy and the only "fix" for this warning is to throw the card out and get another one, which is tricky if it's a soldered eMMC part. I'll send a message to the Linux MMC folks and see what they think...

Stay tuned for another part soon, where I'll talk about the several other irritating issues with running Debian, even when installed to a USB device! :)