13 Jan
Posted by: frank in: Electronics, Programming, STM32
It’s funny how a topic as apparently mundane as the DMA controllers on the STM32F2xx and STM32F4xx processors can be such a can of worms. I’ve already provided 2 postings on the subject, here and here, and now we have a third.
This one is a biggie. From what we’re seeing, it appears you can only have 2 DMA transactions taking place at any one time. It appears that having 3 simultaneous DMA transfers causes intermittent failures.
To explain….
We’ve been doing some work with the STM32F207 and STM32F407 processors. We had been performing 3 DMAs simultaneously. Specifically: receiving data from the DCMI port, receiving data from the ADC, and either receiving or sending data to the SDIO port (SD card). What we saw happening was:
We initially put these behaviours down to buggy peripherals, but one day we noticed the problems stopped when we turned off the DCMI port. Hmm, how puzzling. Further investigation yielded that we had no problems if only any 2 of these 3 peripherals were running, which led to the question: “what’s common between these 3 peripherals?”. The obvious answer: the DMA. In this case, specifically DMA2.
It fits the symptoms very nicely. What we theorise is that when a peripheral needs a DMA service, it raises a flag which results in the DMA servicing that peripheral, either by writing it data or reading data from it. If 3 peripherals raise their flags simultaneously, it can happen that one of them is “permanently forgotten”. It doesn’t get serviced, then or later. In the case of the ADC this results in an overrun. In the case of the SDIO this results in it eventually running its FIFO empty (when writing to a SD card). In the case of the DCMI we saw some apparent failures, but they proved hard to nail down so I’ll reserve judgement there as to exactly what happens.
Based on this theoretical understanding, we put together a test case demonstrating the problem (on both the F2x and F4x) and sent it to ST at the beginning of Nov 2011. We heard back at the beginning of Dec that they’d replicated the problem, but we haven’t heard anything since.
Then towards the end of December I was speaking with another company using the SMT32F2xx and they just happened to mention a similar problem. They also had 3 simultaneous DMAs active, however theirs were: SPI, USART and SDIO. A different set of 3 to us, and theirs were spread across both DMA1 and DMA2. Yet their basic symptom of unexplained data stoppages was the same.
Based on all this, I can only assume that any 3 simultaneous DMAs, from any one DMA controller or a mix of both DMA controllers, can produce this problem.
In our case, we took the lowest bandwidth peripheral, namely the ADC, and moved it to operating under interrupt. Because the STM32F407 ADC has no internal FIFO it can overrun very easily, hence its interrupt must be at the highest priority and be permitted to preempt other interrupts. Since we’ve done this all 3 peripherals have behaved normally.
I need to re-emphasise that what we believe to be happening is no more than a theory based on the symptoms we’ve been seeing. Although ST has confirmed seeing these symptoms, they haven’t provided any information about what’s actually going on, so we’re flying in the dark to a certain degree. What I can say with some confidence is that you’d be very wise to limit the number of simultaneous DMAs to 2. At least until ST provides more information about what’s actually causing this.
Update June 2012:
In January 2012 ST provided a response in their user forums here:
It does not answer all the questions, but it seems to indicate the problem lies not with the number of DMAs, but instead might lie specifically with DMA2. They stated:
We confirm your findings and it is a limitation that concerns only our DMA2, and here is the detailed description :
DMA2 controller could corrupt data when managing AHB and APB2 peripherals in a concurrent way.
Description :
This case is somehow critical for peripherals embedding FIFO and generates data corruption. For memories, the impact is a multiple access but the data is not corrupted. AHB Peripherals embedding FIFO are DCMI, CRYPT, HASH. on STM32F2/40xx without CRYPTO only the DCMI is impacted.Implications:
The data transferred by the DMA to the AHB peripherals could be corrupted in case
of a FIFO target or multiply accesses in case of memories access.Workarounds :
Avoid concurrent AHB and APB2 transfer using DMA2. One of the following approach could be used to solve the issue:
* If DMA2 is used to manage AHB peripheral (DCMI, CRYPT, HASH), we can use the Cortex-M CPU to manage APB2 peripherals.
* If DMA2 is used to manage APB2 peripheral, we can use the CPU to manage AHB peripheral (DCMI, CRYPT, HASH).Obviously, we will update our errata on web soon.
I don’t know how ST measures time – they say their errata will be updated “soon”, but as I write this it’s 5 months since they stated that, and their STM32 errata, at revision 2.0, does not contain this information, let alone any more detail about it. If anyone knows anything more, please post a comment.
30 Dec
Posted by: frank in: Electronics, Programming, STM32
A very significant limitation with the STM32F4xx family (STM32F405 / 407 / 415 / 417) is that fully a third of its internal RAM is inaccessible to the DMA controller. Of the 192 kB of available RAM, only 128 kB can be accessed by the DMA. The other 64 kB, known as the CCM, cannot be read or written by DMA.
For a Cortex-M4 processor that is promoted using DSP type benchmarks (filters and FFTs etc), this is a glaring oversight. DSP type operations are all about reading data in, processing the data, and writing the resultant data out. Two of those three tasks require the DMA if they’re to be performed efficiently, and on the STM32F4xx family the DMA is unusable for a third of its RAM. For me personally, coming from a long DSP background, this stilted memory architecture is crazy beyond words.
Still, it’s not the first time the hardware designers have made life tough for the software folks, and it won’t be the last. We just have to deal with it as best we can. I’ve been attempting to get the SDIO SD Card interface working under interrupt, so that the additional 64 kB of RAM we’re paying for can be accessed by the SDIO. This post will share a few things I’ve learned.
ST SD Card Interrupt Examples
As far as I can find, there aren’t any. I’ve looked through both the STM32F2xx and STM32F4xx software examples, and it all uses DMA exclusively for the data handling. If you come across any ST example code doing SD card data handling via interrupt, please let me know.
Double-Handling the Data
This is an option, and I have considered it. The idea would be (as an example):
Obviously what I’ve listed is worst-case and ugly as sin. You really wouldn’t want to do it. Still, if you did, an efficient software-copy routine would be essential. This Stellaris forum posting contains details for a fast assembler Cortex-M3/M4 memory copy routine. I’ve played with it and it works well.
SDIO Requests More Data Then It Needs
If you’re using the STM32F2xx / STM32F4xx SDIO to transmit data to an SD Card, under interrupt you’ll probably be using the “transmit FIFO half empty” TXFIFOHE interrupt flag. When this triggers, you know your interrupt handler software needs to write 8 words (32 bytes) to the SDIO FIFO.
The problem is that the SDIO will request more data than what it actually requires, which could, if you’re not careful, result in you reading past the end of your data buffer, possibly generating some kind of a bus fault or hard fault. To explain, take a look at this example code snippet from within an SDIO interrupt handler:
if (SDIO->STA & SDIO_FLAG_TXFIFOHE) { ptr = source_addr; // address of source data to Tx to card while (SDIO->STA & SDIO_FLAG_TXFIFOHE) { BUTTON_OUT_HIGH SDIO->FIFO = *ptr++; // write first word (32 bits = 4 bytes) to the FIFO SDIO->FIFO = *ptr++; SDIO->FIFO = *ptr++; SDIO->FIFO = *ptr++; SDIO->FIFO = *ptr++; SDIO->FIFO = *ptr++; SDIO->FIFO = *ptr++; SDIO->FIFO = *ptr++; // 8th word of data written to the SDIO FIFO BUTTON_OUT_LOW } source_addr = ptr; // remember data position for next time }
You can see it’s checking to see if the Tx FIFO Half-Empty flag is set, and if so, it writes 8 words (32 bytes) of data to the FIFO, updates its data pointer, and that’s it. We’ve made it slightly more efficient by wrapping it in the while() loop, so it does it repeatedly until the Tx FIFO is no longer needing more data – this allows it to more quickly fill the FIFO at startup when the FIFO is empty.
The BUTTON_OUT_xxxx sets a GPIO pin so we can see on the oscilloscope what’s happening.
When writing a single block / sector to the SD card, which is 512 bytes, we would expect to see 512 / 32 = 16 writes (of 32 bytes) to the FIFO. Let’s look at the scope:
There are a few things of great interest to be seen here.
At the start of the scope plot, on the left, we can see 4 writes in very quick succession. This is thanks to the while() loop in the code. The SDIO Tx FIFO is 32 words deep, so the TXFIFOHE remains set until the FIFO is full, which requires 4 sets of 8 words to be written. This is good – we’re getting the Tx FIFO filled very quickly.
If we count the total number of writes on the scope plot, we see 19. Huh? We expected to see 16; what gives? 19 means we’ve read 608 bytes from our data buffer (actually: right past the end of our data buffer) and given it to the SDIO; that’s too much for a 512 byte write. The reason is the title of this section: the SDIO requests more data than it needs. It appears the designers of the SDIO block did not give it the intelligence to compare its FIFO level with its DCOUNT register. If the FIFO contains sufficient empty space to accept another 8 words, it will set its TXFIFOHE flag to request more data, EVEN THOUGH IT DOES NOT NEED IT TO COMPLETE THE CURRENT TRANSFER. Be aware of this.
Changing our SDIO IRQ handler slightly to consider the DCOUNT register, for example like this:
if ((SDIO->STA & SDIO_FLAG_TXFIFOHE) && (SDIO->DCOUNT >= 32)) {
does not help, because we cannot know the amount of data currently held in the FIFO.
To deal with this, you need to keep your own “data remaining count” variable, which you can count down as you give data to the SDIO FIFO. Then when your count variable reaches zero, you should turn off the TXFIFOHE interrupt (by clearing its bit in the SDIO->MASK register).
Something else to note from this scope capture is the interrupt rate and CPU utilisation. In this example the SDIO clock is 20 MHz, meaning we can write data to the card at 10 MB/s. Given that we’re writing 32 bytes at a time (except at the very beginning where we write 4 times that), we calculate we’re writing data every 3.2 microseconds. The scope shot bears this out. This corresponds to an interrupt rate of 312.5 kHz! This is a very high rate for a small processor, and the CPU utilisation should be expected to be high. From the scope shot we can estimate we’re spending about 12% – 15% of our 120 MHz processor doing nothing except servicing these SDIO interrupts. It’s a steep price to pay for making so much RAM inaccessible to the DMA.
Tx FIFO Underrun
Getting data transmit (send data to the card) to startup properly on the SMT32F4xx / 2xx can be very tricky. Here’s my understanding.
When you enable the SDIO (via the DTEN bit in the SDIO_DCTRL register) the FIFO is empty. So the TXFIFOHE interrupt will trigger immediately, and at the same time the SDIO peripheral will start attempting to write data to the SD card. Hence data must appear in the Tx FIFO extremely quickly, otherwise a Tx FIFO underrun will occur and the SDIO peripheral will shut down.
It is not possible to pre-load the FIFO before enabling the SDIO. I’ve tried and it doesn’t work. I believe the FIFO is hardware-cleared until the SDIO is enabled, or something similar to that.
What this means is that at the moment of SDIO turn-on (when the DTEN bit is set), that TXFIFOHE interrupt must trigger. At that point in time it must be the highest priority interrupt in the system, or be the only interrupt. If it’s delayed for any reason, for example because another interrupt occurs at that time, then a Tx FIFO underrun will very quickly follow. Think very carefully about your enabled interrupts at that critical SDIO transmit start-up point. You may want to consider using the NVIC to make the SDIO be the highest priority interrupt, permitted to preempt all other interrupts. Or, come up with some other scheme to ensure that first TXFIFOHE interrupt can execute immediately.
SDIOIT Status Bit
The SDIO_STA status register contains the SDIOIT bit with a very vague description. I’ve seen this bit being set from time to time but I’ve never worked out what it means. If you understand what it actually represents, please let me know.
I’m not generally one to repost articles and links to articles – you won’t find much of that on my blog. Nor am I particularly pro-this or anti-that when it comes to technologies, brands, companies, etc. I have Apple products and non-Apple products. I’ve used various flavours of Mac OS, of DOS, of Windows, of Unix, of Linux, and much more. It’s all good to me.
However, I’m not agnostic when it comes to the on-going development of technology. I’ve been in this business long enough to realise that every new development builds upon the back of some new development before it. Like kicking steps up a snowbank, making each new step upwards requires you to stand in the step you’ve got. That’s the way innovation and technology develops, and it’s critical for everyone that this is able to continue.
Gizmodo recently posted this article:
If Apple Wins We All Lose
Yesterday’s news that courts had ruled against HTC in favor of Apple was a tidy little victory for Apple. But HTC is just an initial skirmish in a much larger fight. The real war is against Android, and if Apple wins that, we’ll all lose…
It’s well worth the read. Many patent systems and legal systems around the world are increasingly allowing existing “technology holders” to supress and quash the development of new evolutionary technologies. The USA is the most obvious example of this, but it’s certainly not the only one. And the trends indicate this problem is getting worse – not better. Truely, when it comes to anti-competitive behaviour and legal policies, this is an area where indeed, “we all lose”.
If you’re seeing a Hard Fault exception on your Cortex M3 or Cortex M4 processor, this handler and information may help. I can’t take credit for it – this code was provided by Joseph Yiu on a few different forums, as well as in his book (Definitive Guide to the ARM Cortex M3). I’m simply providing some assistance on how to install and use it.
Hard Fault Handler Installation
These instructions work for an STM32F2xx or STM32F4xx processor using a GNU-based toolchain (eg Yagarto or Sourcery G++). They should work with other processors and toolchains but may require a small tweak – no doubt your compiler will be pleased to tell you if it’s not happy! As always with programming, the following is not the only way to do it – it’s simply the way I did it. If you want to rearrange things or do things a bit differently then feel free.
Joseph’s hard fault handler is in two pieces – a small piece of assembly, and a small piece of C. You need the processor’s hardfault exception vector to jump to the assembly, and then the assembly code will itself call the C code.
Here’s the assembly code. It extracts the location of the stack frame, then passes it as a pointer to the C code, which is named hard_fault_handler_c.
.syntax unified .cpu cortex-m3 .thumb .global HardFault_Handler .extern hard_fault_handler_c HardFault_Handler: TST LR, #4 ITE EQ MRSEQ R0, MSP MRSNE R0, PSP B hard_fault_handler_c
This assembly needs to be immediately called when the hard fault exception occurs. For the STM32F processors, their vector table is found in the ST-supplied file startup_stm32f2xx.s (or similar). If you look at the vectors list, you’ll see something like this:
g_pfnVectors: .word _estack .word Reset_Handler .word NMI_Handler .word HardFault_Handler .word MemManage_Handler .word BusFault_Handler .word UsageFault_Handler etc
Given that the name in the vector table is HardFault_Handler, we give the assembler code that name (and declare the name as a global so the linker can find it). If your vector table contains a different name, then change the name of the assembler code to suit.
You need to include this bit of assembler in your build. Just save the assembler code in its own .s file (eg: hardfault.s) and then include it in your build, the same way as your other .s files (like that startup file) are.
Now we need to add the C code. Here it is:
// From Joseph Yiu, minor edits by FVH // hard fault handler in C, // with stack frame location as input parameter // called from HardFault_Handler in file xxx.s void hard_fault_handler_c (unsigned int * hardfault_args) { unsigned int stacked_r0; unsigned int stacked_r1; unsigned int stacked_r2; unsigned int stacked_r3; unsigned int stacked_r12; unsigned int stacked_lr; unsigned int stacked_pc; unsigned int stacked_psr; stacked_r0 = ((unsigned long) hardfault_args[0]); stacked_r1 = ((unsigned long) hardfault_args[1]); stacked_r2 = ((unsigned long) hardfault_args[2]); stacked_r3 = ((unsigned long) hardfault_args[3]); stacked_r12 = ((unsigned long) hardfault_args[4]); stacked_lr = ((unsigned long) hardfault_args[5]); stacked_pc = ((unsigned long) hardfault_args[6]); stacked_psr = ((unsigned long) hardfault_args[7]); printf ("\n\n[Hard fault handler - all numbers in hex]\n"); printf ("R0 = %x\n", stacked_r0); printf ("R1 = %x\n", stacked_r1); printf ("R2 = %x\n", stacked_r2); printf ("R3 = %x\n", stacked_r3); printf ("R12 = %x\n", stacked_r12); printf ("LR [R14] = %x subroutine call return address\n", stacked_lr); printf ("PC [R15] = %x program counter\n", stacked_pc); printf ("PSR = %x\n", stacked_psr); printf ("BFAR = %x\n", (*((volatile unsigned long *)(0xE000ED38)))); printf ("CFSR = %x\n", (*((volatile unsigned long *)(0xE000ED28)))); printf ("HFSR = %x\n", (*((volatile unsigned long *)(0xE000ED2C)))); printf ("DFSR = %x\n", (*((volatile unsigned long *)(0xE000ED30)))); printf ("AFSR = %x\n", (*((volatile unsigned long *)(0xE000ED3C)))); printf ("SCB_SHCSR = %x\n", SCB->SHCSR); while (1); }
This code goes wherever the existing (previous) Hard Fault Handler went. In our example, the vector table pointed to a function called HardFault_Handler. We are replacing that function with the assembler code, so the original HardFault_Handler function needs to be commented out (otherwise we’ll have two functions with the same name). For the STM32F2xx all exception handlers are found in the file: stm32f2xx_it.c So comment out the function HardFault_Handler() from that C file, and paste the C code for Joseph’s hard_fault_handler_c() into the same file.
That’s it. In summary, you commented out the old hard fault handler, and you added in some assembly code and some C code instead. Try building your project and see what happens.
Note that this code will only work if the main stack pointer hasn’t been badly corrupted prior to the hard fault occurring – if the stack pointer is off in never-never land then the C handler may not work. In my experience this has never been a problem.
Hard Fault Handler Usage
The big thing the above handler gives you is the program counter, which is the address where the processor was executing when the hard fault occurred. You can then look at your listing file (or map file) to see what function and instruction that was. Also useful is LR the Link Register, which contains the return address of the last function call – it can show you where you came from to get to this point.
A few tips.
The typical reason for a hardfault is actually a bus error (which was promoted to a hard fault), because software tried to access an invalid region of memory. There are a couple of common ways of doing this.
One is a bad pointer. It might be uninitialised, or not properly bounded (you ran off the end of an array for example).
Another, more subtle, way of getting a bad pointer is by being sloppy with the scope of variables. This is quite common with main() but can happen anywhere. For example, you define a buffer or a structure as a local variable within main(), but then you access that buffer or structure from some other function (by passing a pointer to it). This is dangerous – you need to declare the buffer or structure as “static” if you want to do that. Otherwise the buffer or structure might be optimised away and not actually exist when that other function tries to access it. If that structure contained a pointer for example, that would become a bad pointer.
Imprecise Bus Fault
For the Cortex M3, an imprecise bus fault (as indicated by bit 10 in the CFSR register) means that a write to an invalid address was attempted. If you look at the program counter, the naughty naughty write is usually present in the 3 or so instructions leading up to the program counter address. Because of the Cortex M3 write buffer system, the program counter might have advanced slightly before the actual bus write took place, hence you need to look back slightly to find the erroneous write.
The discussion around the STM32F2xx and STM32F4xx DMA controllers is becoming a bit too large for a single posting, so here we are with Part 2!
Clearing a DMA Interrupt
In theory this is very simple – in practice not so much. The documentation for the “DMA low interrupt flag clear register” (DMA_LIFCR) and the “DMA high interrupt flag clear register” (DMA_HIFCR) shows bits, such as CTCIFx: Stream x clear transfer complete interrupt flag (x = 3..0). The documentation says to write a “1″ to the bit to clear the interrupt flag. That is correct.
However, I was finding that in certain circumstances the interrupt kept on reocurring, immediately, even though I was clearing out the interrupt.
DMA Interrupt Re-entering Reason #1
There is a well-known reason for this. The Cortex-M3 / M4 is a pipelined processor. If you clear the interrupt at the very bottom of your interrupt service routine code, immediately before you exit your interrupt routine, the clearing of the interrupt might not have propagated through the processor before your interrupt routine exited, meaning that the interrupt will be immediately triggered again.
It’s important to ensure the interrupt clear has actually happened before you exit your interrupt code. The easy way to do this is to clear the interrupt at the very top of your interrupt handler, before you do anything else. Then the time taken by executing your interrupt handler code allows lots of time for the interrupt clear to propagate. If this isn’t possible, or if your interrupt handler is extremely short, write to the interrupt clear register, then do a while() loop on the interrupt flag register to wait until the clear happens. Note that if you do this, make sure you put a timeout of some description in your while() loop to prevent a hangup. The interrupt clear should normally take effect very quickly.
DMA Interrupt Re-entering Reason #2
My interrupt re-entering was not caused by #1 above, and I spent a frustrating day hunting down the reason, finally discovering reason #2.
My interrupt handler code looked generally like this:
void DMA2_Stream1_IRQHandler (void) { DMA2->LIFCR = (uint32_t)0x00000F40; // clear DMA IRQ flags ... do a bunch of stuff .... // setup the DMA for a new transfer - DMA currently on so turn it off first DMA2->S1CR = (uint32_t)0x02025410; // need to turn off DMA first to change settings while (DMA2->S1CR & DMA_CR_EN); // wait until DMA is actually off DMA2->S1M0AR = ... DMA2->S1NDTR = ... ... DMA2->S1CR = (uint32_t)0x02065510; // everything setup but not enabled DMA2->LIFCR = (uint32_t)0x00000F40; // clear any pending (old) DMA2 Stream 1 interrupts DMA2->S1CR = (uint32_t)0x02065511; // everything setup and enabled }
You can see two interrupt clears taking place. One at the top of the interrupt handler (as it should be), and a second clear shortly before I re-enable the DMA controller, to ensure nothing is pending when the DMA is started up again.
Remember too, that the STM32F2xx / STM32F4xx DMA won’t even start if it has a pending interrupt (see STM32F2xx DMA Controllers Part 1) so clearing the interrupt flags is not optional – it must be done (even if you’re not using interrupts).
This second clear is not necessarily essential – it’s just for my own peace of mind, to be sure the DMA will start up correctly.
The reason the interrupt immediately re-enters is here:
// setup the DMA for a new transfer – DMA currently on so turn it off first
DMA2->S1CR = (uint32_t)0×02025410; // need to turn off DMA first to change settings
In this example the DMA is currently enabled, due to it being in a double-buffer / circular mode. I turn off the DMA. This results in the DMA TCIF flag being set. Again, so to speak – the TCIF was set earlier which triggered this interrupt, I cleared the flag at the top of the interrupt handler, and now that I’m turning off the DMA the TCIF flag has become set a second time. In the DMA configuration register bit 4, “TCIE: Transfer complete interrupt enable” is set, meaning that when the TCIF flag becomes set, it’s passed on to the NVIC (the Cortex-M3 / M4 Nested Vectored Interrupt Controller).
Hence at this point, even though the STM32F2xx / STM32F4xx DMA has been turned off, the simple act of turning it off has caused a new DMA interrupt to be pending in the NVIC. Because this particular code is in an interrupt service routine, this pending interrupt hasn’t executed yet, but it will the moment we exit our interrupt handler. This is the source of the endlessly reoccuring DMA interrupt.
There are a few possible solutions to this problem. Here’s one:
void DMA2_Stream1_IRQHandler (void) { DMA2->LIFCR = (uint32_t)0x00000F40; // clear DMA IRQ flags ... do a bunch of stuff .... // setup the DMA for a new transfer - DMA currently on so turn it off first DMA2->S1CR = (uint32_t)0x02025410; // need to turn off DMA first to change settings while (DMA2->S1CR & DMA_CR_EN); // wait until DMA is actually off NVIC_ClearPendingIRQ (DMA2_Stream1_IRQn); // clear pending DMA IRQ from the NVIC DMA2->S1M0AR = ... DMA2->S1NDTR = ... ... DMA2->S1CR = (uint32_t)0x02065510; // everything setup but not enabled DMA2->LIFCR = (uint32_t)0x00000F40; // clear any pending (old) DMA2 Stream 1 interrupts DMA2->S1CR = (uint32_t)0x02065511; // everything setup and enabled }
Using the NVIC_ClearPendingIRQ() function works in this example because we’re in an interrupt service routine, so the new interrupt is pending but hasn’t had the opportunity to execute yet. Using NVIC_ClearPendingIRQ() in non-interrupt code would not work, because the interrupt will trigger the moment the DMA is turned off. In non-interrupt code (ie your regular main code) you need to prevent the interrupt from reaching the NVIC to begin with. Which would mean making sure bit 4 (TCIE: Transfer complete interrupt enable) in the DMA configuration register is clear (zero) before turning off the DMA. Or disabling the DMA interrupt in the NVIC before turning off the DMA. For example:
NVIC_DisableIRQ (DMA2_Stream1_IRQn);
(I personally haven’t tried this one though, because NVIC_ClearPendingIRQ() worked in my case.)
STM32F DMA interrupts can be a source of great confusion – hopefully this post has clarified things a little.
Some time ago I built a signal generator, which is a reasonably handy bit of equipment. But I was periodically frustrated with not having a suitable probe to easily inject a signal into a circuit. The standard BNC – BNC cable has its uses, but isn’t too great for that kind of thing. And a dangling bit of wire causes nothing but troubles.
What I really wanted was an oscilloscope probe. Scope probes have a nice sharp tip on them, they have a ground lead with a clip, they even have a hooky tip for grabbing onto things. Scope probes are great. But some experiments proved that a scope probe makes a terrible signal generator cable, because scope probes contain resistor-divider circuits, trimmer capacitors, etc, which get in the way of the signal generator output.
So I decided to hack a scope probe, to build a signal generator cable / probe. From DealExtreme I ordered a cheap oscilloscope probe – their SKU 32422 for a big $6. About three weeks later it landed in my mailbox. I quickly found it can easily be taken apart. Like so:

The strain relief (on the left of the photo) can simply be pulled off. That holds the little square box covering the PCB in place – with the strain relief gone the cover now slides off as well. A minute of quality time with the soldering iron removes the resistors and trimmer capacitor on the PCB (done before I took these photos), then I soldered a wire across the middle of the PCB to connect the BNC connector back to the centre of the cable. Here’s a close-up:
Then the PCB cover and strain relief were slipped back into place, and the X1/X10 switch on the probe handle was placed in the X1 position, then wrapped in tape to make sure it couldn’t be accidently switched to X10. When the switch is in the X10 position a big resistor is switched in series with the tip – don’t want that! Finally I stuck a label on the probe so I won’t confuse it with the regular scope probes. Done!
This is awesome. For a few dollars and maybe 10 minutes work we get a nice signal generator probe complete with sharp tip, hooky tip, ground lead, etc. Can’t beat that.
Recently I purchased a Class 10 SD Card from a major retailer. I was very excited – now I have a fast card for testing. Here’s my new baby:
Note the 10 in the part-circle indicating the speed. For those who don’t know, the “class” speed system for SD cards is very simple. It’s the write speed, in MB/s (megabytes per second). So a class 4 card can be written to at 4 MB/s, a class 6 at 6 MB/s etc. So yes, a class 10 could be written to at 10 MB/s.
And it worked really well for small files, but when writing large files it would sometimes slow down. Some investigation produced the following oscilloscope screen capture:
What you’re seeing is the card crying out, “I’m Busy!” for extended periods of time. The top (purple) trace is a trigger pulse. The middle (yellow) trace is the CMD line, and the bottom (green) trace is the DATA0 line. SD cards hold their DATA0 line low when they’re busy; this is how they tell you to wait. I was seeing this Duracell SD card pull busy for up to 250 ms at a time. It seems to me it’d be real tough to sustain 10 MB/s writes to the card, if the card is busy a quarter of a second at a time, every few seconds.
So I wrote to Duracell tech support (yes, they really do have one), and they replied, ‘Thanks very much – we’ll look into it.”
Most interestingly, I’m pretty sure I read somewhere that the SD Card spec actually allows for these long “busy” periods to happen. So a card can actually do this, yet remain compliant with the spec. That can make life hard for a user of the card however.
I have a pretty good collection of SD cards, so I started looking at a bunch of them, comparing their labels to what the internal card register says. Almost all cards had the label agree with the register. But I did find an interesting “happy” exception. A Lexar “class 4″ card actually read as a class 6. And for large file sizes, this Lexar “class 4″ card actually outperforms the “class 10″ Duracell. Like the Duracell. the Lexar also asserts busy from time to time. But the Lexar busy durations are far shorter than the Duracell – a maximum of around 80 ms, compared to a quarter of a second for the Duracell. Making the Lexar much easier to deal with from a file-writing perspective.
How about that hey? A Lexar “class 4″ card outperforming a Duracell “class 10″ SD card. When it comes to SD cards, things aren’t always as they seem. And sometimes it really is worth paying the extra money for the top-brand cards.
Update 27 Sept 2011: I’ve done a bunch more testing on my various cards and I’ve managed to write to this particular Duracell card at 9 MB/s. Pretty darn close to a “class 10″ card if you ask me (and I might yet be able to write faster still). How? By writing a 32 MB block of data consecutively to the card without any filesystem – just a raw data write as fast as possible. This highlights an important consideration with SD cards – the write speed really is affected by the amount of data you write. With a filesystem on the card, writing one 32 kB cluster at a time, the “write plus programming time” gets pretty slow. To hit the big write-speed numbers you really need to find a way to write much larger blocks of data at a time.
The STM32F2xx has a great SD Card interface. It’s a true 4-bit parallel interface, and in general it works pretty well. I have come across a few, fairly minor but still significant, considerations when using the interface that I thought I’d pass on.
Initialisation Sequence
Proper initialization of the SD card is important, because SD cards have no reset line and it’s not going to behave if its internal state machine wanders off to where you don’t expect it. It’s a good idea to have some way of removing power from the card (a p-channel MOSFET for example) so you can reset it if it goes crazy on you.
At power-up it seems to be helpful to have all signals in the idle state, which is high. This can be done by first having the SDIO pins configured as GPIOs, and set the GPIOs to be outputs, set high. Then you can switch those pins to their alternate function for the SDIO port.
Don’t Stop The Clock
The SD Card specification allows for the stopping of the clock. This can be helpful at times. The STM32F2xx allows for this, but when I first tried it, it didn’t work – I wasn’t able to reestablish communications with the card afterwards. Soon afterwards ST published an errata that said this feature doesn’t work. I don’t know if they plan to fix this, and I personally don’t care too much – I don’t plan on using it anyway. But if your application needs this, you should probably check with ST what their intentions are.
Voltage Level Translators & SD Card Timing
The STM32F2xx can be run at 1.8V, however the SD Card is a 3.3V device (most of them anyway). Ideally the processor would have a Vcc pin specifically for the SDIO pins (some processors do have this) but the STM32F2xx does not, so in this example its SDIO pins will be at 1.8V. This is fine for reading from the SD card, because the STM32F2xx is 3.3V tolerant on its inputs. But it’s no good for writing to the card, because the SD card won’t recognize 1.8V as being a logic “high”. A level translator is required.
Some processors provide a “direction” pin as part of their SDIO interface; this can be used to drive an external lowcost bidirectional buffer. The STM32F2xx does not provide this pin so an automatic switching bidirectional buffer is required. ST has one, however the most commonly used buffer appears to be the Texas Instruments TXS0108. There are several others.
Using an external buffer substantially affects the timing for SD card reads. Some processors provide a “clock input” pin as part of their SDIO interface, which is the clock used for read cycles. The STM32F2xx does not implement this either – the SDIO clock output by the STM32F2xx is the clock used for both writes and reads.
What this means is that for a SD read cycle, the clock “arrives” at the processor much earlier than the data does. Consider the read cycle for a moment. The processor outputs a rising edge of the clock, and the processor then expects to clock in data on the next falling edge of that clock. Once the SD card sees the rising edge there will be a delay within the card before it outputs the data. So the sequence looks like this:
Rising edge output from processor -> clock delayed through level translator en route to card -> delay due to SD card response time and then data is placed on bus -> data delayed through level translator en route to processor -> processor reads data on falling edge of clock
With a 25 MHz SD clock, assuming a perfect 50% duty cycle, there’s only 20 ns between the rising and the falling edge of the clock. In those 20 ns we need 2 trips through the level translator (clock going out and data coming in) plus the delay due to the SD card, plus any setup time for the processor read (which is zero thankfully). 20 ns is not sufficient – the level translator simply isn’t fast enough, nor necessarily is the card.
You need to do the timing analysis yourself for your exact components and system, but I think you’ll find that running at 25 MHz with a level translator on the STM32F2xx simply isn’t possible. Somewhere between 15 – 20 MHz for the SDIO is probably where you’ll end up.
Busy Signalling and Data Transfer
The STM32F2xx SDIO port contains hardware support for the card to signal busy. If the card cannot accept data it indicates this by pulling its Data0 line low. Once it’s able to accept data again, it sets Data0 high and data transfer can continue. For the most part the STM32F2xx SD interface handles this pretty seamlessly, pausing when the card is busy and continuing when its able. It’s quite transparent to the programmer. One intermittent exception I’ve found is when the processor is about to start sending data to the card. If the card is signalling ‘busy’ at the time the processor wants to commence sending data, sometimes (not always) the processor attempts the data transmit, stops (when it realises the card is busy), and then generates a CRC error (SDIO_STA register bit 1). This error halts the entire SD transfer. This behaviour is intermittent – usually the processor handles a ‘busy’ at the beginning of a transfer normally, however sometimes it results in an SDIO port CRC error.
The solution to this problem, quite obviously, is to wait with initiating a data transfer until the card is not busy. There are a couple of ways of accomplishing this. One is to tie the SDIO Data0 line to a free GPIO pin, which you can then read to ensure the pin is high before kicking off a data transfer. Another is to poll the card (this is what I do). Probably the easiest thing to do is to send the card a CMD13 “SEND_STATUS” command. The response to this command is the 32-bit “R1″ response, which is the card’s “card status”. Bit 8 will be high if the card is ready for data, low if the card is busy. Just sit in a loop, sending the card CMD13 commands and checking bit 8 of the response until it’s high.
SDIO_STA register TXACT bit
Be careful interpreting the TXACT bit (bit 12) in the STM32F2xx SDIO_STA register. The documentation says:
Bit 12 TXACT: Data transmit in progress
This would imply the bit is set while a data transfer is in progress, and clear when it’s not. You might think you can look at this bit to determine if the SDIO port has finished transmitting data to the card, so you know when you can start transmitting your next chunk of data to the card.
I’ve found this bit only behaves that way for single-block writes (CMD24 commands). For multi-block writes (CMD25 commands) I’ve seen this bit remain set even after the SDIO has sent all the data to the card and the SDIO_DCOUNT register is zero. If the card is still in its receive-data state (state 6) at the completion of the data transfer, the TXACT bit may still be set.
There are other ways to know if the SDIO port has finished transmitting its data. If you’re using DMA (and you should be) then you can check your DMA NDTR register to confirm it’s zero. The SDIO_STA register DATAEND bit (bit 8 ) will have been set (and generated an interrupt if you have it enabled) at the completion of the data transfer. And of course the SDIO_DCOUNT register will be zero. You don’t need to rely on the TXACT bit, and I suggest you don’t because it can be a bit misleading, at least the way it’s currently documented.
CRC Error with CMD5
The SDIO peripheral calculates a CRC regardless of whether a CRC is actually present or not. This results in the SDIO hardware generating CRC errors in the case of commands which don’t contain a CRC. Be aware that in the case of sending CMD5 to the card, the return data does not contain a CRC. The SDIO hardware will generate a CRC error in this case: CCRCFAIL bit in the SDIO_STA register will be set and may generate an interrupt if you have the interrupt enabled in the SDIO_MASK register. This reported CRC error is wrong – make sure your software is prepared to accomodate this “special case” in the case of a CMD5.
Update Dec 2011: This is now mentioned in the STM32F4xx errata, however the STM32F2xx errata still does not mention this.
Standard Peripherals Library SD Card Software
The Standard Peripherals Library for the STM32F2xx is a set of example software routines that can be downloaded from the ST website. If you’re using this processor the library is very valuable. Aside from providing a bunch of examples for using different peripherals and features of the chip, it also provides a standard set of definitions, some example start-up code, and more.
This doesn’t mean you should blindly use this code for your production product however. ST tries to make it clear this code is “example” code, and in many cases that’s all it is. Certainly this statement is true for the SD Card examples. You need to go through the code carefully and make sure it meets your requirements, or modify it to suit your needs if it doesn’t. It’s a great starting point, but don’t assume it’s anything more than just a starting point.
With regards to the SD Card example code in there, I’ve come across a few things worth noting.
Timeouts
The SD Card specification suggests timeout values for various operations, and the STM32F2xx SDIO peripheral contains a hardware timer you can use to implement this. It’s a simple clock counter. Alternatively you can use one of the many general-purpose timer/counters the processor provides.
The example SDIO code sometimes uses the SDIO “clock counter” timeout; when it does it sets it to its maximum value. That’s not very useful – yes it will eventually timeout, but not for a really long time. Usually when the example code implements a timer, it uses a simple loop counter, for example:
static SD_Error CmdError(void) { SD_Error errorstatus = SD_OK; uint32_t timeout; timeout = SDIO_CMD0TIMEOUT; /*!< 10000 */ while ((timeout > 0) && (SDIO_GetFlagStatus(SDIO_FLAG_CMDSENT) == RESET)) { timeout--; }
The problem with this is you’ve no idea what the value of the timeout is. A compiler can potentially optimize it away to nothing, or it could take a long time. In practice I’ve found these timeouts expiring very prematurely, resulting in the functions returning errors before the SDIO transaction has had a chance to complete. There are also many places in the example code where there’s no timeout implemented at all, meaning the code can potentially hang-up in those locations.
4 GB Maximum Card Size
The SDIO example code uses an unsigned 32-bit variable (a uint32_t) for the card address. For example:
SD_ReadBlock (uint8_t *readbuff, uint32_t ReadAddr, uint16_t BlockSize)
A little math: 2^32 = 4 GB. Beyond that this address variable overflows. SD Cards can be up to 2 TB in size (2^32 x 512 bytes). Whether this limitation is a problem for you depends upon what kind of cards you intend to support. You may want to consider changing the following 5 functions to use a “sector” parameter instead of an “address” parameter. Given that modern large cards all use 512-byte sectors (or blocks) this allows the code to match up with how the card behaves.
SD_ReadBlock (uint8_t *readbuff, uint32_t ReadAddr, uint16_t BlockSize)
SD_ReadMultiBlocks (uint8_t *readbuff, uint32_t ReadAddr, uint16_t BlockSize, uint32_t NumberOfBlocks)
SD_WriteBlock(uint8_t *writebuff, uint32_t WriteAddr, uint16_t BlockSize)
SD_WriteMultiBlocks (uint8_t *writebuff, uint32_t WriteAddr, uint16_t BlockSize, uint32_t NumberOfBlocks)
SD_Erase(uint32_t startaddr, uint32_t endaddr)
SD Card Initialisation
This was mentioned earlier, but just to reiterate. Card initialization seems to be more reliable if the SDIO pins are placed in a high “idle” state before the pins are switched to the SDIO peripheral “alternate function”. The example code does not do this.
Blocksize
I haven’t personally experienced this, but it’s been reported on the forums that some smaller cards (eg 2 GB) have problems because the example read and write functions do not issue a CMD16 blocksize command to the card before performing the transaction. SDHC cards have a fixed blocksize of 512 bytes and do not require the CDM16 command, however non-SDHC cards do need that command.
SDIO_SetPowerState() function
Many thanks to Brad & Andrew over at the STM32 forum for finding this one. The issue is that during SDIO port power-up, which is part of the SDIO initialisation routines, the power-up may not always succeed. If you find the function SDIO_SetPowerState() contains this:
SDIO->POWER &= PWR_PWRCTRL_MASK; SDIO->POWER |= SDIO_PowerState;
then try changing those two lines to this:
if (SDIO_PowerState == SDIO_PowerState_ON) SDIO->POWER |= SDIO_PowerState; else SDIO->POWER &= PWR_PWRCTRL_MASK;
I believe the problem with the original code is described in the documentation for the SDIO_POWER register:
Note: At least seven HCLK clock periods are needed between two write accesses to this register.
Note: After a data write, data cannot be written to this register for three SDIOCLK (48 MHz) clock
periods plus two PCLK2 clock periods.
You can see the original code does two writes to the register in quick succession. That would be bad. This code change helped things for me.
SD_SendSDStatus() function
Updated 7 Dec 2011. This was a tough one to reliably reproduce and hence to find. In the STM32F2xx SD code, the function SD_Init() calls the function SD_GetCardStatus() which in turn calls the function SendSDStatus() passing it a pointer to a buffer, like so:
errorstatus = SD_SendSDStatus((uint32_t *)SDSTATUS_Tab);
The purpose of SD_SendSDStatus() is not immediately obvious from the code (read the comment block for SD_GetCardStatus() if you need a good laugh), but what it does is send an ACMD13 command to the card. It then retrieves the 512-bit status that the card sends in reply and writes it into the buffer. The problem? This:
static uint8_t SDSTATUS_Tab[16];
By my math, 16 bytes = 128 bits. So what happens is that SD_SendSDStatus() writes 64 bytes of data into a 16 byte buffer, resulting in a big buffer overrun and a bunch of innocent SRAM locations being stomped on. Which creates all manner of flakely problems. The simple and obvious fix is to increase the size of SDSTATUS_Tab, although a more robust solution would include a rewrite of SD_SendSDStatus().
Summary
It should be clear by now that the standard peripheral library SDIO code cannot reliably be used as-is. It contains far too many limitations, ranging from a serious lack of error-handling (and is sometimes error-generating) to outright functional restrictions. It’s a good starting point to show how things can work, but it’s far from being production-ready. For any real product you have no choice except to grab a copy of the SD Card specification and get busy. With that said, I’ve found the SD Card interface on the STM32F2xx to perform pretty well and the library code to be a big time-saver. Just don’t expect it to be production-ready code.
More information can be found in the posting: SDIO Interface Part 2.
19 Aug
Posted by: frank in: Computers, Electronics, Programming, STM32
I’m using the STM32F2xx DCMI port to receive data from an CMOS image sensor. There aren’t too many Cortex-M3 or M4 parts which have this ability natively; the STM32F2xx is a rare breed. It’s a very nice feature. Along the way I’ve learnt a few things about the DCMI port, some of which are documented, some of which are not.
DCMI Interrupt Only At Frame End
There’s no way to get a DCMI interrupt at the start of a frame. If you need that, route the image sensor’s frame sync pin to a processor GPIO pin, and generate a GPIO interrupt on the rising or falling edge as appropriate.
Data Count Register
One of the things you notice once you start trying to use the DCMI port is that it doesn’t have a simple “data count” register. The way the DCMI port works, is it receives data from an external source and puts it into a small FIFO, which then gets fed to the STM32F2xx DMA controller. For some data types you know exactly how much data you’re going to receive; for other types (particularly jpeg data) you have no idea. But you always need to know how much you have received, and with the DCMI there’s no easy way to know. It’s a surprisingly difficult problem to solve, particularly when you start inserting other things into your image data buffers (like image file headers and the like). When I queried ST tech support about this, they suggested setting things up so an interrupt was generated every time a byte was received. Can you imagine 4 million interrupts a second?
I ended up writing some code to count with the help of the DMA controller. The DMA doesn’t have a “data count up” register either, however it does have a “data count down” register, the NDTR. It gets tricky when you’re using the double-buffer or circular-buffer modes of the DMA controller, because the NDTR register is automatically reloaded by hardware. But with enough puzzling, thinking, head-scratching and experimenting it can be done. Although it’s only accurate to multiples of 4 bytes – see the “zero padding” discussion below.
Ideally I’d like to see ST add a simple “data count” register to the DCMI port, but I’m guessing it probably won’t happen for these parts.
Zero Padding
Talking about JPEG mode, there’s a nasty little “gotcha” in the DCMI port that the documentation does mention in the “JPEG Format” section, but I’ll re-iterate it here. It very much ties in to the lack of a data count register mentioned earlier.
If the amount of data received in a frame is not a multiple of 4 bytes, the DCMI will pad out the end of the frame to a multiple of 4 bytes by inserting zeros. For example, if your frame is 1001 bytes long, you’ll receive from the DCMI port 1004 bytes, consisting of 1001 bytes of “real” data, followed by 3 bytes of zeros.
And therein lies the rub. If you (a) don’t know how many bytes you’re expecting to receive, and (b) don’t know how many bytes were actually received by the DCMI port because it doesn’t have a data count register, then how do you know whether those 3 bytes at the end are padding zeros, or “real data” zeros? In short, you don’t know, there’s no way to tell. You need to consider this, and decide whether or not it might be a problem in your application.
Strangely enough, the DCMI documentation only mentions this zero padding for the JPEG mode, however I would assume (and it’s only an assumption – not confirmed with ST) that this zero padding would occur in the other capture modes as well.
Control Signals Polarity
The documentation is remarkably confusing on this point. If a control signal (eg vsync) is active-high, what DCMI setting is required for that? It’s a bit too difficult to explain, so here’s a repost of a code snippet I posted on the STM32F forum for exactly that question.
// DCMI configuration // For the image sensor, vsync is "frame sync" and is high when the sensor is transmitting an image. // hsync is "line sync" and is high when the sensor is transmitting data, low when it's not. It // will go high and low many times during a frame. The sensor outputs data on the falling edge // of the clock, hence this processor must clock data in on the rising edge of the clock. // Note: the F2xx datasheet is incredibly confusing regard V & Hsync polarity. "Polarity_Low" here // means the DCMI_CR register bit VSPOL/HSPOL is clear, which means the CPU interprets a logic-1 (high) // sync as meaning "valid data available". Makes no sense I know, but the processor logic is inverted // from what you'd expect. Tell it zero, it grabs on a 1. DCMI_InitStructure.DCMI_CaptureMode = DCMI_CaptureMode_Continuous; // repeatedly process frames DCMI_InitStructure.DCMI_SynchroMode = DCMI_SynchroMode_Hardware; DCMI_InitStructure.DCMI_PCKPolarity = DCMI_PCKPolarity_Rising; // read data on clk rising edge DCMI_InitStructure.DCMI_VSPolarity = DCMI_VSPolarity_Low; // data valid on vsync high DCMI_InitStructure.DCMI_HSPolarity = DCMI_HSPolarity_Low; // data valid on hsync high DCMI_InitStructure.DCMI_CaptureRate = DCMI_CaptureRate_All_Frame; DCMI_InitStructure.DCMI_ExtendedDataMode = DCMI_ExtendedDataMode_8b; DCMI_Init(&DCMI_InitStructure); DCMI_JPEGCmd (ENABLE); // enable DCMI JPEG mode
Hopefully this will help. As a bonus this code snippet shows how to enable the DCMI JPEG mode.
Unexpected “Frame Skipping” in Continuous Mode
The STM32F2xx DCMI control register DCMI_CR contains a couple of bits called FCRC to select which frames are grabbed. Clearing the bits means “grab all frames”, which is how I use the port.
However, I’ve found that despite this, the DCMI can appear to “skip frames”. Take a look at the following two oscilloscope captures. The top trace is a GPIO pin toggled at the start of the DCMI interrupt service routine. The middle trace is the image sensor frame sync pin, and the bottom trace is the image sensor clock pin. As you can plainly see, the first screenshot shows the DCMI triggering on every frame (as expected), where the second screenshot shows the DCMI triggering on every second frame (as you’d certainly not expect).
The difference between these two is a DCMI FIFO overflow. In the case of the first screenshot, the DMA controller is always taking the received data from the DCMI, so the DCMI internal FIFO never overflows. In the second case, the DMA is being turned off halfway through the frame, every second frame. This causes the STM32F2xx DCMI port FIFO to overflow and generate a DCMI error, badly enough to prevent it from generating an “end of frame” interrupt at the end of that frame. However it resets itself at the start of the next frame, which allows it to capture and complete the next frame normally. Then for the subsequent frame the DMA is turned off partway through and the cycle repeats itself.
This was rather unexpected. It seemed like a good idea, turning off the DMA controller once the necessary data was received to save on internal bus bandwidth. However it appears the DCMI peripheral doesn’t like this idea so much, because it doesn’t generate IT_FRAME interrupts properly when this happens. I have not yet had this DCMI behavior confirmed by ST, however it is clear and repeatable on my hardware. I will update this post if I receive any information from ST.
Depending upon the application, losing the IT_FRAME interrupt at the end of a corrupted or overflow frame may, or may not, be a problem. One way to work around this (ahem) “undocumented feature” is to also route the frame sync signal to a GPIO pin, and have the GPIO pin generate an interrupt at the end of each frame. That way you’re guaranteed a GPIO interrupt at the completion of every frame, even if you don’t receive the DCMI IT_FRAME interrupt.
STM32F2xx DCMI Overflow
In my working with the DCMI port I’ve found some other DCMI-overflow related behavior that’s worth mentioning, even though I can’t explain it entirely clearly.
In the second screenshot above, we see IT_FRAME interrupts being missed due to a periodic DCMI overflow, which in turn is caused by software turning off the DCMI’s DMA controller.
The DCMI port can generate an overflow interrupt, called IT_OVR. I tried enabling this interrupt, to see if I would get this interrupt for those frames which are missing the IT_FRAME interrupt. Turning on the IT_OVR interrupt (by setting the OVR_IE bit in the DCMI_IER register) was actually a bit of a disaster – I no longer received any IT_FRAME interrupts at all, and the DCMI port stopped receiving data. It effectively killed the system.
The behavior I saw was this. With both the FRAME and OVR interrupts enabled, initially (after powerup) the DCMI port worked fine, receiving data and passing it to the DMA. After 20 or 30 frames of data I triggered an overflow by momentarily turning off the DMA controller. This resulted in the IT_ OVR interrupt occuring and no IT_FRAME interrupt taking place. I expected this state of affairs to be brief. In fact what happened was the IT_OVR interrupt triggered continuously after that point, and the IT_FRAME never triggered again. Nothing I could dream up would clear the OVR overflow error – it always error’ed again on the next frame, and the next, etc. Even turning off the DCMI port and re-enabling it didn’t cure the problem. It took a reset of the STM32F2xx to have the DCMI port working normally again. This behavior happened even if the OVR interrupt service routine did nothing at all – not even clear the interrupt. Somehow, enabling that OVR interrupt causes the DCMI to behave differently than leaving that interrupt disabled.
I can’t rule out that there was something I was doing wrong, or strange. But for identical incoming DCMI data, if I didn’t enable the OVR interrupt, the DCMI port would recover from an overflow. But if I did enable the OVR interrupt, once an overflow occurred, the DCMI port appeared broken forever. Very strange.
For this reason, obviously, I don’t enable the OVR overflow interrupt – I simply pretend it doesn’t exist.
Other testing I performed, reading the status of the overflow bit in the DCMI port, indicated that the port seems to be tolerant of a certain amount of overflow. Either that or it’s a bit quick to set the flag. I found that even during normal DCMI operation, with the port receiving data and everything apparently going perfectly, the OVR bit could become set. But the DCMI port seemed to ignore it and just continue. I never saw any data corruption or missed data despite the bit being set. Of course forcing a “real” overflow, by momentarily disabling the DMA controller, would also set the OVR flag and in addition would usually cause the IT_FRAME for that particular frame to be skipped. But the subsequent frame would be OK again.
Long story short: don’t enable the overflow interrupt. And if you do, test it (by generating some overflows) very carefully to ensure you don’t end up in some apparently unrecoverable state.
That’s it for the DCMI – the next post will talk about the STM32F2xx SDIO SD Card interface.
If you’re moving a bunch of data around, the DMA controllers are going to be your best friends. However they have their quirks and undocumented features. This post will describe what I’ve learned about them, and hopefully others can contribute if they can shed light as well.
Note there is also a Part 2 to this post.
FIFOs
The DMA controller contains a bunch of small FIFOs, one for each stream. You don’t need to use them (well, sometimes you do, but usually it’s optional) however you’d be a bit crazy not to. The FIFOs allow the DMA to function without losing data if something else with a higher priority is using the bus. But the FIFOs do introduce complexities, as we shall soon see.
DMA Configuration Register
A number of bits in the configuration register (eg channel selection) are marked: “These bits are protected and can be written only if EN is ‘0’ “. What does that mean? The register defaults to zero. If I want to set these bits and enable the DMA controller, do I need to perform two writes: one to set these bits while keeping EN = 0. Then a second write which only sets EN = 1?
The short answer is “yes”, you should do two writes to the configuration register to enable the DMA. My experiments seem to indicate that performing a single write the configuration register, setting those bits AND setting EN = 1 in one hit, appears to work. But that’s not guaranteed, and ST tech support recommends performing two writes.
Start Me Up
Configuring the various STM32F2xx DMA registers and setting EN = 1 should cause the DMA to start. Imagine my surprise then, when the DMA worked the first time, but not the second.
From performing some experiments I discovered, and ST subsequently confirmed, that the DMA stream will NOT start up if its interrupt status bits are set. This is not described in the current documentation. Even if you’re not using interrupts, you MUST clear the appropriate bits in the DMA Interrupt Flag Clear register. I don’t know which bit in particular the hardware cares about, so I just clear them all for that stream.
Reading DMA Registers
This wasn’t obvious to me from the documentation: the address registers (M0AR and M1AR) do not change during the course of a DMA transfer. For example, in the documentation, for the configuration register MINC bit, it states:
Memory address pointer is incremented after each data transfer (increment is done according to MSIZE)
Certainly sounds like the address register would increment, doesn’t it? In fact they do not. What actually happens is when the DMA starts up (or restarts in the case of circular or double-buffer mode) the address register(s) are copied to internal shadow registers that we lowly programmers cannot see. It’s the shadow address registers which increment during a DMA; not the visible M0AR / M1AR registers. M0AR and M1AR remain unaltered during the DMA.
This is an important point. If you think you can simply read the value of M0AR at the completion of a DMA to determine where the DMA stopped writing, for example, that won’t work. M0AR will still be pointing to the beginning of your data buffer.
NTDR does decrement though – you can read this register (at any time) to see where the DMA controller is.
Circular Mode
The documentation for circular mode is pretty skimpy. It says:
When the circular mode is activated, the number of data items to be transferred is automatically reloaded with the initial value programmed during the stream configuration phase, and the DMA requests continue to be served.
This tells us the NDTR register is reloaded once it falls to zero, in the same manner as the double-buffer mode. What about everything else? I asked ST tech support a bunch of questions and here’s what I learned.
M0AR is the address register used for circular mode (M1AR is only used for double-buffer mode and nothing else). When circular mode is enabled, once NDTR counts down to zero, the following takes place:
In short, the DMA will keep writing to the same buffer, over and over again, triggering an interrupt each time it fills the buffer, and it’ll do this forever until its disabled (usually by software writing to the DMA configuration register).
As noted before, you’ll never see the value of the M0AR register change so don’t bother trying, however you will see the NDTR register decrement as the DMA proceeds to write the buffer.
Double-Buffer Mode
This is simply a variant of Circular Mode, where two address pointers M0AR and M1AR are used instead of one, and the SMT32F2xx DMA reloads them alternately, each time NDTR counts down to zero. So the DMA fills buffer 0, then buffer 1, then buffer 0, then buffer 1, etc, for example. The really nice thing about this mode is you’re permitted to update the not-currently-in-use address register, on the fly. Which means you can extend this mode to as many buffers as you wish, by changing the inactive register. The CT bit in the DMA configuration register tells you which address register is currently in use; you’re free to update the other register to point to a new buffer.
Burst Mode
It states this in the documentation, but just to re-iterate: If you’re using the DMA in burst mode (for example allow the FIFO to fill with 16 bytes / 4 words, then write the whole FIFO to memory in one burst) which is the most efficient method, be aware that your buffer needs to be aligned on a 1kB address boundary. If its not, if the burst straddles a 1kB boundary, it’ll generate an AHB bus error.
Flow Controller
The documentation is very clear about this, still I’ve seen it pop up on the forums. In almost all cases the DMA will itself be the DMA flow controller. The only possible exception is the STM32F2xx SDIO port (for which you have the option).
STM32F2xx FIFO Flush
I found this topic to be very confusing, primarily because the documentation mentions it very briefly, and that very briefness (is “briefness” even a word?) led to a number of misconceptions which took some time to iron out with ST tech support. Hopefully the following section will help clarify things.
In an “ideal” DMA operation, you would specify the amount of data to be transferred in the NDTR register, specify things like burst mode (or not), FIFO fullness, etc, and start the DMA running. Ideally everything would be a neat multiple of everything else; for example you set a FIFO threshold of “FIFO full” (16 bytes) and your data quantity is a multiple of 16 bytes. The DMA would run to completion, in this case for an integer number of FIFO fills and empties, and life’s good.
As we all know, real life isn’t always this neat and tidy. What happens if you (via software) turn off the DMA controller partway through? In this kind of example the DMA can “terminate” with data remaining in its FIFO. This is the purpose of the FIFO flush.
For starters, it’s worth noting that the DMA will not accept any more data than what the NDTR register specifies for it. For example, if your data source is the DCMI port, which can happily throw data at the DMA all day long, and you specify “18 bytes” for the NDTR register along with a “FIFO full” threshold and a burst mode, the DMA will not take 32 bytes from the DCMI. The DMA will only accept 18 bytes from the DCMI, resulting in one and a quarter FIFO loads (so to speak). When reading out the FIFO, the DMA will do the first 16 bytes as a burst as you specified, and the remaining two bytes as single accesses. Hence the DMA will complete normally, even though the amount of DMA data wasn’t a nice multiple of the FIFO or burst size.
This is an important point because it means the DMA can never overrun the end of your destination data buffer. Even in the event of a FIFO flush, provided your NDTR register is never programmed with a value larger than your buffer size, this specifies the maximum amount of data the DMA will accept from the data source, and hence the maximum amount of data that can be written into the destination buffer.
If software turns off the DMA transfer partway through, there may be data sitting in the DMA FIFO at the moment the DMA is disabled. This is the scenario for a FIFO flush. In this case the DMA will continue emptying the FIFO, writing that data to memory, until the FIFO is empty.
I do not know the value of the NDTR register in this scenario. Is NDTR the value at the moment the DMA stream was turned off, or is NDTR the value after the flush is completed, IE including the amount of data that was written out as a result of the flush? I suspect the latter, but I don’t know for certain.
In any case, it’s important to note that the TCIF flag will be set. If you turn off the DMA and want to know if the flush is complete, all you need do is wait on the TCIF flag becoming set, and at that point you know the DMA stream is finished and idle. Even though you might think the DMA transfer wasn’t complete because you halted it partway through, the DMA controller thinks it was completed and will set its TCIF flag (potentially generating an interrupt if you have that enabled). If you plan on restarting the DMA, to have it continue on from where you stopped it, remember you need to update the M0AR (and/or possibly the M1AR) register, because the address register will still be pointing to the start of your buffer, and not to where the DMA just finished writing to.
That’s it for this post. Those are the things I’ve learnt so far about the SMT32F2xx DMA controllers, as well as any still-open questions. Next post will talk about the STM32F2xx DCMI (Digital Camera Interface) peripheral.
| M | T | W | T | F | S | S |
|---|---|---|---|---|---|---|
| « Jan | ||||||
| 1 | 2 | 3 | 4 | 5 | ||
| 6 | 7 | 8 | 9 | 10 | 11 | 12 |
| 13 | 14 | 15 | 16 | 17 | 18 | 19 |
| 20 | 21 | 22 | 23 | 24 | 25 | 26 |
| 27 | 28 | 29 | 30 | 31 | ||