Here’s the link to Part 1.

A very significant limitation with the STM32F4xx family (STM32F405 / 407 / 415 / 417) is that fully a third of its internal RAM is inaccessible to the DMA controller. Of the 192 kB of available RAM, only 128 kB can be accessed by the DMA. The other 64 kB, known as the CCM, cannot be read or written by DMA.

For a Cortex-M4 processor that is promoted using DSP type benchmarks (filters and FFTs etc), this is a glaring oversight. DSP type operations are all about reading data in, processing the data, and writing the resultant data out. Two of those three tasks require the DMA if they’re to be performed efficiently, and on the STM32F4xx family the DMA is unusable for a third of its RAM. For me personally, coming from a long DSP background, this stilted memory architecture is crazy beyond words.

Still, it’s not the first time the hardware designers have made life tough for the software folks, and it won’t be the last. We just have to deal with it as best we can. I’ve been attempting to get the SDIO SD Card interface working under interrupt, so that the additional 64 kB of RAM we’re paying for can be accessed by the SDIO. This post will share a few things I’ve learned.

ST SD Card Interrupt Examples

As far as I can find, there aren’t any. I’ve looked through both the STM32F2xx and STM32F4xx software examples, and it all uses DMA exclusively for the data handling. If you come across any ST example code doing SD card data handling via interrupt, please let me know.

Double-Handling the Data

This is an option, and I have considered it. The idea would be (as an example):

  • DMA data from SD card into the 128 kB of RAM
  • Software copy the data into the 64 kB of RAM
  • process the data
  • Software copy the results from the 64 kB of RAM into the 128 kB of RAM
  • DMA the results from the 128 kB of RAM to the SD card

Obviously what I’ve listed is worst-case and ugly as sin. You really wouldn’t want to do it. Still, if you did, an efficient software-copy routine would be essential. This Stellaris forum posting contains details for a fast assembler Cortex-M3/M4 memory copy routine. I’ve played with it and it works well.

SDIO Requests More Data Then It Needs

If you’re using the STM32F2xx / STM32F4xx SDIO to transmit data to an SD Card, under interrupt you’ll probably be using the “transmit FIFO half empty” TXFIFOHE interrupt flag. When this triggers, you know your interrupt handler software needs to write 8 words (32 bytes) to the SDIO FIFO.

The problem is that the SDIO will request more data than what it actually requires, which could, if you’re not careful, result in you reading past the end of your data buffer, possibly generating some kind of a bus fault or hard fault. To explain, take a look at this example code snippet from within an SDIO interrupt handler:

if (SDIO->STA & SDIO_FLAG_TXFIFOHE) {
  ptr = source_addr;    // address of source data to Tx to card
  while (SDIO->STA & SDIO_FLAG_TXFIFOHE) {
    BUTTON_OUT_HIGH
    SDIO->FIFO = *ptr++;     // write first word (32 bits = 4 bytes) to the FIFO 
    SDIO->FIFO = *ptr++;
    SDIO->FIFO = *ptr++;
    SDIO->FIFO = *ptr++;
    SDIO->FIFO = *ptr++;
    SDIO->FIFO = *ptr++;
    SDIO->FIFO = *ptr++;
    SDIO->FIFO = *ptr++;      // 8th word of data written to the SDIO FIFO
    BUTTON_OUT_LOW
  }
  source_addr = ptr;   // remember data position for next time
}

You can see it’s checking to see if the Tx FIFO Half-Empty flag is set, and if so, it writes 8 words (32 bytes) of data to the FIFO, updates its data pointer, and that’s it. We’ve made it slightly more efficient by wrapping it in the while() loop, so it does it repeatedly until the Tx FIFO is no longer needing more data – this allows it to more quickly fill the FIFO at startup when the FIFO is empty.

The BUTTON_OUT_xxxx sets a GPIO pin so we can see on the oscilloscope what’s happening.

When writing a single block / sector to the SD card, which is 512 bytes, we would expect to see 512 / 32 = 16 writes (of 32 bytes) to the FIFO. Let’s look at the scope:

STM32F407 SD Card Interface Write

There are a few things of great interest to be seen here.

At the start of the scope plot, on the left, we can see 4 writes in very quick succession. This is thanks to the while() loop in the code. The SDIO Tx FIFO is 32 words deep, so the TXFIFOHE remains set until the FIFO is full, which requires 4 sets of 8 words to be written. This is good – we’re getting the Tx FIFO filled very quickly.

If we count the total number of writes on the scope plot, we see 19. Huh? We expected to see 16; what gives? 19 means we’ve read 608 bytes from our data buffer (actually: right past the end of our data buffer) and given it to the SDIO; that’s too much for a 512 byte write. The reason is the title of this section: the SDIO requests more data than it needs. It appears the designers of the SDIO block did not give it the intelligence to compare its FIFO level with its DCOUNT register. If the FIFO contains sufficient empty space to accept another 8 words, it will set its TXFIFOHE flag to request more data, EVEN THOUGH IT DOES NOT NEED IT TO COMPLETE THE CURRENT TRANSFER. Be aware of this.

Changing our SDIO IRQ handler slightly to consider the DCOUNT register, for example like this:

if ((SDIO->STA & SDIO_FLAG_TXFIFOHE) && (SDIO->DCOUNT >= 32)) {

does not help, because we cannot know the amount of data currently held in the FIFO.

To deal with this, you need to keep your own “data remaining count” variable, which you can count down as you give data to the SDIO FIFO. Then when your count variable reaches zero, you should turn off the TXFIFOHE interrupt (by clearing its bit in the SDIO->MASK register).

Something else to note from this scope capture is the interrupt rate and CPU utilisation. In this example the SDIO clock is 20 MHz, meaning we can write data to the card at 10 MB/s. Given that we’re writing 32 bytes at a time (except at the very beginning where we write 4 times that), we calculate we’re writing data every 3.2 microseconds. The scope shot bears this out. This corresponds to an interrupt rate of 312.5 kHz! This is a very high rate for a small processor, and the CPU utilisation should be expected to be high. From the scope shot we can estimate we’re spending about 12% – 15% of our 120 MHz processor doing nothing except servicing these SDIO interrupts. It’s a steep price to pay for making so much RAM inaccessible to the DMA.

Tx FIFO Underrun

Getting data transmit (send data to the card) to startup properly on the SMT32F4xx / 2xx can be very tricky. Here’s my understanding.

When you enable the SDIO (via the DTEN bit in the SDIO_DCTRL register) the FIFO is empty. So the TXFIFOHE interrupt will trigger immediately, and at the same time the SDIO peripheral will start attempting to write data to the SD card. Hence data must appear in the Tx FIFO extremely quickly, otherwise a Tx FIFO underrun will occur and the SDIO peripheral will shut down.

It is not possible to pre-load the FIFO before enabling the SDIO. I’ve tried and it doesn’t work. I believe the FIFO is hardware-cleared until the SDIO is enabled, or something similar to that.

What this means is that at the moment of SDIO turn-on (when the DTEN bit is set), that TXFIFOHE interrupt must trigger. At that point in time it must be the highest priority interrupt in the system, or be the only interrupt. If it’s delayed for any reason, for example because another interrupt occurs at that time, then a Tx FIFO underrun will very quickly follow. Think very carefully about your enabled interrupts at that critical SDIO transmit start-up point. You may want to consider using the NVIC to make the SDIO be the highest priority interrupt, permitted to preempt all other interrupts. Or, come up with some other scheme to ensure that first TXFIFOHE interrupt can execute immediately.

SDIOIT Status Bit

The SDIO_STA status register contains the SDIOIT bit with a very vague description. I’ve seen this bit being set from time to time but I’ve never worked out what it means. If you understand what it actually represents, please let me know.