Frank's Random Wanderings

A bunch of random musings, with a leaning towards electronics & computers.
Options:

There’s a new project posted on the main website – a simple RF signal generator that (mostly) covers the range 640 MHz to 5.8 GHz.

RF Signal Generator

The project can be considered as being in two halves. First is the RF half. This is provided by a Linear Technology LTC6946 demo board. The 6946 is a PLL synthesizer. Put simply, you program its internal control registers with suitable PLL multiplication / divide factors, and it outputs the resulting frequency. The synthesizer is programmed via an SPI port.

The second half is a Texas Instruments Stellaris Cortex-M3 development board with a 3.5″ touch-screen color display. It’s programmed to be the user interface – you can enter your desired frequency on the touchscreen, it calculates the values for the various LTC6946 internal registers, then programs them via SPI. It provides a really nice user interface to what is otherwise a very simple project. Then everything is wrapped up in a simple metal box.

Full information on the build, including the source code for the software, is provided on the main site. Enjoy!

I’ve been doing some work with a Kinetis MKL25Z128 Cortex-M0+ processor. I love what you get for the very small amount of money this thing costs. It even contains a 16 bit A/D converter, which is very rare at this pricepoint. Along with everything else, it contains a USB port.

There are a number of Freescale application notes about how to use the USB port, implement bootloaders, etc. To be honest a lot of them are a bit of a mess, and if you frequent the Freescale forums you’ll see plenty of discussions with people asking pretty basic questions (like, why doesn’t the sample code even compile). So to find something actually working is a bit of a rarity.

Freescale has an appnote called AN4764 which describes a very small USB bootloader. The whole thing compiles down to only around 4 kB of flash which is quite impressive. They also provide a simple PC application to work with it. The source code, appnote, etc, can be found here:

AN4764 Simple HID Bootloader

The source code zip file in the link above contains prebuilt images for most processors, but for some reason the MKL25Z128 processor (also known as the “Freedom board” processor) is missing. IAR is required to build the projects. (Why Freescale used the IAR compiler instead of their own CodeWarrior compiler is another unsolved mystery.) I installed the small (32 kB code size) free version of IAR and built the MKL25Z128 project. To save others from needing to do the same, I’ve posted the entire project, including the executable and the appnote PDF, here:

Click here to download the files.

The function Check_Bootloader_Mode() in file kl25z_bootloader.c has been modified to remove most of the checks it was doing. It used to calculate & compare a checksum on the application, plus it looked to see if a couple of words of flash were erased. Now all it does is looks at GPIO bit PE31 – if that pin is low it runs the USB bootloader, and if that pin is high it runs the application. That’s it.

For a quick summary of how to use it, you program it into your processor. To cause the USB bootloader to run GPIO PE31 needs to be low after reset (ie, put a switch on PE31 and hold it low during reset or poweron). When plugged into a Windows PC it’ll be recognised as a HID device. Then run the PC application to download application code. Your application code needs to be built for a start address of 0×1000, seeing as the bootloader lives at the bottom 3.5 kB of flash. Do that by editing your linker file. Then somewhere at the beginning of your application code you need to point the processor to your new interrupt vector table (if you’re using interrupts), like so:

SCB_VTOR = (uint32)0x1000;

Once your application starts, interrupts will be disabled. In CodeWarrior you can use the following macros to enable and disable interrupts:

#define EnableInterrupts asm(" CPSIE i");
#define DisableInterrupts asm(" CPSID i");

That’s it. This is a super-easy way to get a USB bootloader running on a Freescale Kinetis processor.

Older versions of FreeNAS would email periodic status reports, however the newer 9.x versions no longer do this. FreeNAS now only sends emails when something strange or bad happens. It makes sense – lots of “everything’s fine” emails tends to numb you and you end up deleting every email from the FreeNAS system, including potentially the interesting ones.

Nonetheless, a weekly email can be nice. Specifically I’d like a weekly email containing the SMART information for the harddisks, so I can see their temperatures, whether any are starting to reallocate sectors (which is an early failure warning), etc. A weekly email with that kind of information would be pretty useful.

This thread on the FreeNAS forums provides a lot of information how to do it, and it’s pretty simple to setup. But some of the information in there is a bit outdated, plus they suggest placing your script in a location that gets overwritten when you update FreeNAS. So here we’ll do things a little more simply. Here’s my script:

#!/bin/sh                                                                                                                 
 
# emails a status report                                                                                                            
 
(                                                                                                                                   
echo "To: YOURNAME@YOUREMAIL.COM"                                                                                                        
echo "Subject: Freenas status report"                                                                                               
echo " "                                                                                                                            
) > /var/cover                                                                                                                      
 
echo "Drive Reports for 2 disks:" >> /var/cover                                                                                     
echo " " >> /var/cover                                                                                                              
smartctl -a /dev/ada0 >> /var/cover                                                                                                 
echo " " >> /var/cover                                                                                                              
smartctl -a /dev/ada1 >> /var/cover                                                                                                 
echo " " >> /var/cover                                                                                                              
 
echo "ZPool Status:" >> /var/cover                                                                                                  
zpool status >> /var/cover  
zpool get all YOURPOOLNAME >> /var/cover                                                                                                        
 
sendmail -t < /var/cover                                                                                                            
 
exit 0

You can see the script is calling smartctl twice, once for each of the two disks in the system. The script also calls “zpool status” and “zpool get all” to obtain the health and status of the specified data pool. “/var/cover” is simply a temporary file used to store all the data – at the end of the script it’s passed to sendmail to email everything out.

To use the script you only need change two things: your email address, and the name of your storage pool. Beyond that you can of course edit it to your heart’s delight.

Save your script in your storage pool, ie in your shared folder. The full directory path will probably start with /mnt. For example:
/mnt/share/something/somethingelse/emailstatusreport.sh
There are a couple of advantages to this. One is your zpool doesn’t get overwritten when you update FreeNAS so it’s as easy as always to do updates and you won’t lose your script. The other is by putting it in your zpool your script is naturally backed-up as part of your backups (you do do backups right??).

Then setup a cron job (easy to do through the FreeNAS user interface) and specify something like this to call your script (remember to select the cron user as ‘root’):

/bin/sh /mnt/share/something/somethingelse/emailstatusreport.sh

That’s it! Enjoy your status emails.

Introduction

NAS (Network Attached Storage) devices, also known as a home server, are pretty common these days. They’re great for having a central point to store files; as a media server for holding photos and home movies; as a place to store backups; etc. Since 2008 I’ve had a 2-bay Synology device, the DS207+. It’s been both reliable and simple to use, but a source of frustation has been my reliance upon Synology to keep its software updated, fix bugs, etc. Technology marches on and Synology no longer supports these units. Synology’s latest software stream is at 4.x and my DS207+ is stuck at 3.x; there are no more feature improvements or bug fixes for me.

The DS207+ is also starting to look rather slow, so it’s high time for a new NAS. This time around I want to avoid the proprietary solutions from companies like Synology, QNAP, etc, and go with something completely open so I can modify it, update it, etc to my heart’s content for as long as I choose. At the same time I’m looking for higher performance, both in network throughput as well as error resiliency. It’s time to roll my own, and this post describes what I ended up with.

UER – The Dreaded “Uncorrectable Error Rate”

We’re all aware that harddisks are not perfect devices – they can fail completely, so backups are super important. What’s less commonly known, but growing in importance as harddisks get larger, is the UER – the uncorrectable error rate.

The simple fact is that harddisks store data on magnetic media, and sometimes that data gets corrupted. Actually, it gets corrupted quite often, so harddisks have sophisticated error detection & correction algorithms (Reed-Solomon encoding etc) to deal with those errors so you never see them. Well, mostly. Sometimes an error happens that the disk cannot correct – this is an uncorrectable error.

The uncorrectable error rate is given in the datasheet for the harddisk, but it’s typically in the order of 1 bit in 10^14 for consumer drives. Some are less, some are more. Consider that 10^14 bits is around 12 TB. Back when you had a 80 GB disk, you’d have to read the entire contents of your disk quite a few times to have a chance of suffering an uncorrectable error. Today 4 TB disks are quite common, and reading that entire disk only 3 times would result in a sizeable chance of suffering an uncorrectable error.

Depending upon the error, you might never notice. If it happened in the middle of a video you would probably never see it. However if it happened in the first few bytes of that video it might corrupt the header and make the video unplayable. If it happened in a program file the program might crash. If it happened in part of the disk filesystem then the impact could be much worse.

Most NAS filesystems blithely assume the disk is perfect. My Synology is like that, as are the vast majority of consumer NAS’s. However some filesystems have the ability to checksum all data coming off the disk, and thereby can detect & correct a harddisk unrecoverable error, even if it happens within the filesystem data itself. The most well-known of these filesystems is called ZFS. I want a system that uses the ZFS filesystem.

DRAM Errors

Harddisks are not the only things that can corrupt your data. Imagine copying your latest home movie from your camera to your NAS. The video data is stored in the DRAM of the NAS during the process of being written to disk. If a DRAM error occurs, then an error will be written to disk in the middle of your movie, and you probably won’t even know.

The error rate of DRAM is higher than the error rate of a harddisk. A 2009 Google study found an average of more than 3000 errors per DRAM stick (DIMM) per year. That’s in a busy server which is always switched on; a home computer is not typically always on and hence the error rate will be lower. Nonetheless, DRAM errors are very real and surprisingly common.

The main solution for DRAM errors is to use ECC DRAM. It’s slightly more expensive than non ECC DRAM (approximately 20% more expensive in my experience), but it has the ability to automatically correct single-bit errors, which is what most DRAM errors are.

Being able to use ECC DRAM requires both a processor and a motherboard chipset that support ECC DRAM. Hence this is a driving force when choosing which hardware to buy.

The Build – Software

There are quite a few pre-packaged NAS software builds available. Some are free, some are not (but the paid ones are still inexpensive), some are open-source and some are closed. Some have good support and some have minimal. I spent quite a bit of time reviewing all I could find, which was a lot. In the end I selected what is likely the most popular build out there: FreeNAS. It’s free, very popular, well supported with lots of help available, a strong feature set, and meets all my requirements (and then some).

One constraint derived from the FreeNAS decision was their recommendation to choose a motherboard with an Intel network chip. Some other network chips do work, but for a trouble-free install an Intel network chip comes highly recommended.

The Build – Hardware

I already had a PC case and decent power supply, so I needed a processor, motherboard and RAM. My choice:

  • Processor: Intel G3220 3 GHz LGA1150 Haswell $65
  • Motherboard: Supermicro LGA1150 X10SLL-F $158
  • RAM: Kingston ValueRAM KVR1333D3E9S/4G DDR3-1333 ECC $57 x 2 = $114 (8 GB total)

As you can see the most expensive component was the motherboard. Although I could find much cheaper LGA1150 motherboards, I couldn’t find any that explicitly said they supported ECC RAM, nor did they have Intel network interface chips. Although I paid full price, this motherboard does go on sale. I saw it on amazon.com for less than $100 at one point.

To offer up a slightly lower-cost alternative, I’ve seen the following system used by some folks on the FreeNAS forums. Costs listed are what I see on newegg. This is AMD based (not Intel) but it obviously does work.

  • Processor: AMD FX-4300 (3.8GHz) $120
  • Motherboard: Asus M5A78L-M/USB3 $70
  • RAM: as above

So in the end I spent $337 (plus tax) to put together my new NAS. For that money I was quite happy – it’s a similar cost to what I’d spend on another Synology or QNAP, yet fills the criterias I listed at the beginning of this post. In my view, much better bang for the buck.

The only thing remaining is to stick in some drives. I chose Western Digital Green drives, and put them in a simple mirror arrangement. Yes there are “better” drives, but these are cheap, relatively reliable (from the newegg customer reviews), relatively low-power, and due to the simple mirror arrangement are easy to replace if one fails.

How Fast Is It?

My home network uses gigabit ethernet. The only speed number I care about is reading from the system. So I read about 12 GB of data from the freenas server to my PC and timed it. Then repeated the exercise to see if I got the same result. Which was: a read speed of 410 Mbits/sec. (This required the data to pass through two network switches, and my PC is not exactly cutting-edge any longer. Still, it’s a decent number as far as I’m concerned.)

And The Synology?

It’s still in use. These days it contains a couple of old but still good drives in a JBOD arrangement. About once a week I power it on to perform a backup (automated, using a shell script) from the freenas to the synology. Although it’s much slower than the freenas and I wouldn’t want to use it for daily tasks, it makes a good backup machine. If lightning ever strikes the freenas system, taking out everything including its mirrored drives, there will still be a copy of the data on the synology. As far as I’m concerned, you can never have too many backups!

It’s funny how a topic as apparently mundane as the DMA controllers on the STM32F2xx and STM32F4xx processors can be such a can of worms. I’ve already provided 2 postings on the subject, here and here, and now we have a third.

This one is a biggie. From what we’re seeing, it appears you can only have 2 DMA transactions taking place at any one time. It appears that having 3 simultaneous DMA transfers causes intermittent failures.

To explain….

We’ve been doing some work with the STM32F207 and STM32F407 processors. We had been performing 3 DMAs simultaneously. Specifically: receiving data from the DCMI port, receiving data from the ADC, and either receiving or sending data to the SDIO port (SD card). What we saw happening was:

  • Sometimes the SDIO transfer would simply stop partway through. There was no error, and the card was not busy. There was no apparent reason for the data transfer to stop, but it would. Querying the SDIO status register SDIO_STA would indicate its transmit FIFO was empty (in the case of writing to the SD card) and that it was in the middle of a transfer.
  • Sometimes the ADC would report an ADC overrun error. This should be impossible. With the ADC data being read out via DMA, completely outside of any software control, there’s no way it should ever overflow. But sometimes it would.

We initially put these behaviours down to buggy peripherals, but one day we noticed the problems stopped when we turned off the DCMI port. Hmm, how puzzling. Further investigation yielded that we had no problems if only any 2 of these 3 peripherals were running, which led to the question: “what’s common between these 3 peripherals?”. The obvious answer: the DMA. In this case, specifically DMA2.

It fits the symptoms very nicely. What we theorise is that when a peripheral needs a DMA service, it raises a flag which results in the DMA servicing that peripheral, either by writing it data or reading data from it. If 3 peripherals raise their flags simultaneously, it can happen that one of them is “permanently forgotten”. It doesn’t get serviced, then or later. In the case of the ADC this results in an overrun. In the case of the SDIO this results in it eventually running its FIFO empty (when writing to a SD card). In the case of the DCMI we saw some apparent failures, but they proved hard to nail down so I’ll reserve judgement there as to exactly what happens.

Based on this theoretical understanding, we put together a test case demonstrating the problem (on both the F2x and F4x) and sent it to ST at the beginning of Nov 2011. We heard back at the beginning of Dec that they’d replicated the problem, but we haven’t heard anything since.

Then towards the end of December I was speaking with another company using the SMT32F2xx and they just happened to mention a similar problem. They also had 3 simultaneous DMAs active, however theirs were: SPI, USART and SDIO. A different set of 3 to us, and theirs were spread across both DMA1 and DMA2. Yet their basic symptom of unexplained data stoppages was the same.

Based on all this, I can only assume that any 3 simultaneous DMAs, from any one DMA controller or a mix of both DMA controllers, can produce this problem.

In our case, we took the lowest bandwidth peripheral, namely the ADC, and moved it to operating under interrupt. Because the STM32F407 ADC has no internal FIFO it can overrun very easily, hence its interrupt must be at the highest priority and be permitted to preempt other interrupts. Since we’ve done this all 3 peripherals have behaved normally.

I need to re-emphasise that what we believe to be happening is no more than a theory based on the symptoms we’ve been seeing. Although ST has confirmed seeing these symptoms, they haven’t provided any information about what’s actually going on, so we’re flying in the dark to a certain degree. What I can say with some confidence is that you’d be very wise to limit the number of simultaneous DMAs to 2. At least until ST provides more information about what’s actually causing this.

Update June 2012:

In January 2012 ST provided a response in their user forums here:

It does not answer all the questions, but it seems to indicate the problem lies not with the number of DMAs, but instead might lie specifically with DMA2. They stated:

We confirm your findings and it is a limitation that concerns only our DMA2, and here is the detailed description :

DMA2 controller could corrupt data when managing AHB and APB2 peripherals in a concurrent way.

Description :
This case is somehow critical for peripherals embedding FIFO and generates data corruption. For memories, the impact is a multiple access but the data is not corrupted. AHB Peripherals embedding FIFO are DCMI, CRYPT, HASH. on STM32F2/40xx without CRYPTO only the DCMI is impacted.

Implications:
The data transferred by the DMA to the AHB peripherals could be corrupted in case
of a FIFO target or multiply accesses in case of memories access.

Workarounds :
Avoid concurrent AHB and APB2 transfer using DMA2. One of the following approach could be used to solve the issue:
* If DMA2 is used to manage AHB peripheral (DCMI, CRYPT, HASH), we can use the Cortex-M CPU to manage APB2 peripherals.
* If DMA2 is used to manage APB2 peripheral, we can use the CPU to manage AHB peripheral (DCMI, CRYPT, HASH).

Obviously, we will update our errata on web soon.

I don’t know how ST measures time – they say their errata will be updated “soon”, but as I write this it’s 5 months since they stated that, and their STM32 errata, at revision 2.0, does not contain this information, let alone any more detail about it. If anyone knows anything more, please post a comment.

Here’s the link to Part 1.

A very significant limitation with the STM32F4xx family (STM32F405 / 407 / 415 / 417) is that fully a third of its internal RAM is inaccessible to the DMA controller. Of the 192 kB of available RAM, only 128 kB can be accessed by the DMA. The other 64 kB, known as the CCM, cannot be read or written by DMA.

For a Cortex-M4 processor that is promoted using DSP type benchmarks (filters and FFTs etc), this is a glaring oversight. DSP type operations are all about reading data in, processing the data, and writing the resultant data out. Two of those three tasks require the DMA if they’re to be performed efficiently, and on the STM32F4xx family the DMA is unusable for a third of its RAM. For me personally, coming from a long DSP background, this stilted memory architecture is crazy beyond words.

Still, it’s not the first time the hardware designers have made life tough for the software folks, and it won’t be the last. We just have to deal with it as best we can. I’ve been attempting to get the SDIO SD Card interface working under interrupt, so that the additional 64 kB of RAM we’re paying for can be accessed by the SDIO. This post will share a few things I’ve learned.

ST SD Card Interrupt Examples

As far as I can find, there aren’t any. I’ve looked through both the STM32F2xx and STM32F4xx software examples, and it all uses DMA exclusively for the data handling. If you come across any ST example code doing SD card data handling via interrupt, please let me know.

Double-Handling the Data

This is an option, and I have considered it. The idea would be (as an example):

  • DMA data from SD card into the 128 kB of RAM
  • Software copy the data into the 64 kB of RAM
  • process the data
  • Software copy the results from the 64 kB of RAM into the 128 kB of RAM
  • DMA the results from the 128 kB of RAM to the SD card

Obviously what I’ve listed is worst-case and ugly as sin. You really wouldn’t want to do it. Still, if you did, an efficient software-copy routine would be essential. This Stellaris forum posting contains details for a fast assembler Cortex-M3/M4 memory copy routine. I’ve played with it and it works well.

SDIO Requests More Data Then It Needs

If you’re using the STM32F2xx / STM32F4xx SDIO to transmit data to an SD Card, under interrupt you’ll probably be using the “transmit FIFO half empty” TXFIFOHE interrupt flag. When this triggers, you know your interrupt handler software needs to write 8 words (32 bytes) to the SDIO FIFO.

The problem is that the SDIO will request more data than what it actually requires, which could, if you’re not careful, result in you reading past the end of your data buffer, possibly generating some kind of a bus fault or hard fault. To explain, take a look at this example code snippet from within an SDIO interrupt handler:

if (SDIO->STA & SDIO_FLAG_TXFIFOHE) {
  ptr = source_addr;    // address of source data to Tx to card
  while (SDIO->STA & SDIO_FLAG_TXFIFOHE) {
    BUTTON_OUT_HIGH
    SDIO->FIFO = *ptr++;     // write first word (32 bits = 4 bytes) to the FIFO 
    SDIO->FIFO = *ptr++;
    SDIO->FIFO = *ptr++;
    SDIO->FIFO = *ptr++;
    SDIO->FIFO = *ptr++;
    SDIO->FIFO = *ptr++;
    SDIO->FIFO = *ptr++;
    SDIO->FIFO = *ptr++;      // 8th word of data written to the SDIO FIFO
    BUTTON_OUT_LOW
  }
  source_addr = ptr;   // remember data position for next time
}

You can see it’s checking to see if the Tx FIFO Half-Empty flag is set, and if so, it writes 8 words (32 bytes) of data to the FIFO, updates its data pointer, and that’s it. We’ve made it slightly more efficient by wrapping it in the while() loop, so it does it repeatedly until the Tx FIFO is no longer needing more data – this allows it to more quickly fill the FIFO at startup when the FIFO is empty.

The BUTTON_OUT_xxxx sets a GPIO pin so we can see on the oscilloscope what’s happening.

When writing a single block / sector to the SD card, which is 512 bytes, we would expect to see 512 / 32 = 16 writes (of 32 bytes) to the FIFO. Let’s look at the scope:

STM32F407 SD Card Interface Write

There are a few things of great interest to be seen here.

At the start of the scope plot, on the left, we can see 4 writes in very quick succession. This is thanks to the while() loop in the code. The SDIO Tx FIFO is 32 words deep, so the TXFIFOHE remains set until the FIFO is full, which requires 4 sets of 8 words to be written. This is good – we’re getting the Tx FIFO filled very quickly.

If we count the total number of writes on the scope plot, we see 19. Huh? We expected to see 16; what gives? 19 means we’ve read 608 bytes from our data buffer (actually: right past the end of our data buffer) and given it to the SDIO; that’s too much for a 512 byte write. The reason is the title of this section: the SDIO requests more data than it needs. It appears the designers of the SDIO block did not give it the intelligence to compare its FIFO level with its DCOUNT register. If the FIFO contains sufficient empty space to accept another 8 words, it will set its TXFIFOHE flag to request more data, EVEN THOUGH IT DOES NOT NEED IT TO COMPLETE THE CURRENT TRANSFER. Be aware of this.

Changing our SDIO IRQ handler slightly to consider the DCOUNT register, for example like this:

if ((SDIO->STA & SDIO_FLAG_TXFIFOHE) && (SDIO->DCOUNT >= 32)) {

does not help, because we cannot know the amount of data currently held in the FIFO.

To deal with this, you need to keep your own “data remaining count” variable, which you can count down as you give data to the SDIO FIFO. Then when your count variable reaches zero, you should turn off the TXFIFOHE interrupt (by clearing its bit in the SDIO->MASK register).

Something else to note from this scope capture is the interrupt rate and CPU utilisation. In this example the SDIO clock is 20 MHz, meaning we can write data to the card at 10 MB/s. Given that we’re writing 32 bytes at a time (except at the very beginning where we write 4 times that), we calculate we’re writing data every 3.2 microseconds. The scope shot bears this out. This corresponds to an interrupt rate of 312.5 kHz! This is a very high rate for a small processor, and the CPU utilisation should be expected to be high. From the scope shot we can estimate we’re spending about 12% – 15% of our 120 MHz processor doing nothing except servicing these SDIO interrupts. It’s a steep price to pay for making so much RAM inaccessible to the DMA.

Tx FIFO Underrun

Getting data transmit (send data to the card) to startup properly on the SMT32F4xx / 2xx can be very tricky. Here’s my understanding.

When you enable the SDIO (via the DTEN bit in the SDIO_DCTRL register) the FIFO is empty. So the TXFIFOHE interrupt will trigger immediately, and at the same time the SDIO peripheral will start attempting to write data to the SD card. Hence data must appear in the Tx FIFO extremely quickly, otherwise a Tx FIFO underrun will occur and the SDIO peripheral will shut down.

It is not possible to pre-load the FIFO before enabling the SDIO. I’ve tried and it doesn’t work. I believe the FIFO is hardware-cleared until the SDIO is enabled, or something similar to that.

What this means is that at the moment of SDIO turn-on (when the DTEN bit is set), that TXFIFOHE interrupt must trigger. At that point in time it must be the highest priority interrupt in the system, or be the only interrupt. If it’s delayed for any reason, for example because another interrupt occurs at that time, then a Tx FIFO underrun will very quickly follow. Think very carefully about your enabled interrupts at that critical SDIO transmit start-up point. You may want to consider using the NVIC to make the SDIO be the highest priority interrupt, permitted to preempt all other interrupts. Or, come up with some other scheme to ensure that first TXFIFOHE interrupt can execute immediately.

SDIOIT Status Bit

The SDIO_STA status register contains the SDIOIT bit with a very vague description. I’ve seen this bit being set from time to time but I’ve never worked out what it means. If you understand what it actually represents, please let me know.

If Apple Wins We All Lose

I’m not generally one to repost articles and links to articles – you won’t find much of that on my blog. Nor am I particularly pro-this or anti-that when it comes to technologies, brands, companies, etc. I have Apple products and non-Apple products. I’ve used various flavours of Mac OS, of DOS, of Windows, of Unix, of Linux, and much more. It’s all good to me.

However, I’m not agnostic when it comes to the on-going development of technology. I’ve been in this business long enough to realise that every new development builds upon the back of some new development before it. Like kicking steps up a snowbank, making each new step upwards requires you to stand in the step you’ve got. That’s the way innovation and technology develops, and it’s critical for everyone that this is able to continue.

Gizmodo recently posted this article:

If Apple Wins We All Lose

Yesterday’s news that courts had ruled against HTC in favor of Apple was a tidy little victory for Apple. But HTC is just an initial skirmish in a much larger fight. The real war is against Android, and if Apple wins that, we’ll all lose…

It’s well worth the read. Many patent systems and legal systems around the world are increasingly allowing existing “technology holders” to supress and quash the development of new evolutionary technologies. The USA is the most obvious example of this, but it’s certainly not the only one. And the trends indicate this problem is getting worse – not better. Truely, when it comes to anti-competitive behaviour and legal policies, this is an area where indeed, “we all lose”.

Cortex-M3 / M4 Hard Fault Handler

If you’re seeing a Hard Fault exception on your Cortex M3 or Cortex M4 processor, this handler and information may help. I can’t take credit for it – this code was provided by Joseph Yiu on a few different forums, as well as in his book (Definitive Guide to the ARM Cortex M3). I’m simply providing some assistance on how to install and use it.

Hard Fault Handler Installation

These instructions work for an STM32F2xx or STM32F4xx processor using a GNU-based toolchain (eg Yagarto or Sourcery G++). They should work with other processors and toolchains but may require a small tweak – no doubt your compiler will be pleased to tell you if it’s not happy! As always with programming, the following is not the only way to do it – it’s simply the way I did it. If you want to rearrange things or do things a bit differently then feel free.

Joseph’s hard fault handler is in two pieces – a small piece of assembly, and a small piece of C. You need the processor’s hardfault exception vector to jump to the assembly, and then the assembly code will itself call the C code.

Here’s the assembly code. It extracts the location of the stack frame, then passes it as a pointer to the C code, which is named hard_fault_handler_c.

.syntax unified
.cpu cortex-m3
.thumb
 
.global HardFault_Handler
.extern hard_fault_handler_c
 
HardFault_Handler:
  TST LR, #4
  ITE EQ
  MRSEQ R0, MSP
  MRSNE R0, PSP
  B hard_fault_handler_c

This assembly needs to be immediately called when the hard fault exception occurs. For the STM32F processors, their vector table is found in the ST-supplied file startup_stm32f2xx.s (or similar). If you look at the vectors list, you’ll see something like this:

g_pfnVectors:
  .word  _estack
  .word  Reset_Handler
  .word  NMI_Handler
  .word  HardFault_Handler
  .word  MemManage_Handler
  .word  BusFault_Handler
  .word  UsageFault_Handler
  etc

Given that the name in the vector table is HardFault_Handler, we give the assembler code that name (and declare the name as a global so the linker can find it). If your vector table contains a different name, then change the name of the assembler code to suit.

You need to include this bit of assembler in your build. Just save the assembler code in its own .s file (eg: hardfault.s) and then include it in your build, the same way as your other .s files (like that startup file) are.

Now we need to add the C code. Here it is:

// From Joseph Yiu, minor edits by FVH
// hard fault handler in C,
// with stack frame location as input parameter
// called from HardFault_Handler in file xxx.s
void hard_fault_handler_c (unsigned int * hardfault_args)
{
  unsigned int stacked_r0;
  unsigned int stacked_r1;
  unsigned int stacked_r2;
  unsigned int stacked_r3;
  unsigned int stacked_r12;
  unsigned int stacked_lr;
  unsigned int stacked_pc;
  unsigned int stacked_psr;
 
  stacked_r0 = ((unsigned long) hardfault_args[0]);
  stacked_r1 = ((unsigned long) hardfault_args[1]);
  stacked_r2 = ((unsigned long) hardfault_args[2]);
  stacked_r3 = ((unsigned long) hardfault_args[3]);
 
  stacked_r12 = ((unsigned long) hardfault_args[4]);
  stacked_lr = ((unsigned long) hardfault_args[5]);
  stacked_pc = ((unsigned long) hardfault_args[6]);
  stacked_psr = ((unsigned long) hardfault_args[7]);
 
  printf ("\n\n[Hard fault handler - all numbers in hex]\n");
  printf ("R0 = %x\n", stacked_r0);
  printf ("R1 = %x\n", stacked_r1);
  printf ("R2 = %x\n", stacked_r2);
  printf ("R3 = %x\n", stacked_r3);
  printf ("R12 = %x\n", stacked_r12);
  printf ("LR [R14] = %x  subroutine call return address\n", stacked_lr);
  printf ("PC [R15] = %x  program counter\n", stacked_pc);
  printf ("PSR = %x\n", stacked_psr);
  printf ("BFAR = %x\n", (*((volatile unsigned long *)(0xE000ED38))));
  printf ("CFSR = %x\n", (*((volatile unsigned long *)(0xE000ED28))));
  printf ("HFSR = %x\n", (*((volatile unsigned long *)(0xE000ED2C))));
  printf ("DFSR = %x\n", (*((volatile unsigned long *)(0xE000ED30))));
  printf ("AFSR = %x\n", (*((volatile unsigned long *)(0xE000ED3C))));
  printf ("SCB_SHCSR = %x\n", SCB->SHCSR);
 
  while (1);
}

This code goes wherever the existing (previous) Hard Fault Handler went. In our example, the vector table pointed to a function called HardFault_Handler. We are replacing that function with the assembler code, so the original HardFault_Handler function needs to be commented out (otherwise we’ll have two functions with the same name). For the STM32F2xx all exception handlers are found in the file: stm32f2xx_it.c So comment out the function HardFault_Handler() from that C file, and paste the C code for Joseph’s hard_fault_handler_c() into the same file.

That’s it. In summary, you commented out the old hard fault handler, and you added in some assembly code and some C code instead. Try building your project and see what happens.

Note that this code will only work if the main stack pointer hasn’t been badly corrupted prior to the hard fault occurring – if the stack pointer is off in never-never land then the C handler may not work. In my experience this has never been a problem.

Hard Fault Handler Usage

The big thing the above handler gives you is the program counter, which is the address where the processor was executing when the hard fault occurred. You can then look at your listing file (or map file) to see what function and instruction that was. Also useful is LR the Link Register, which contains the return address of the last function call – it can show you where you came from to get to this point.

A few tips.

The typical reason for a hardfault is actually a bus error (which was promoted to a hard fault), because software tried to access an invalid region of memory. There are a couple of common ways of doing this.

One is a bad pointer. It might be uninitialised, or not properly bounded (you ran off the end of an array for example).

Another, more subtle, way of getting a bad pointer is by being sloppy with the scope of variables. This is quite common with main() but can happen anywhere. For example, you define a buffer or a structure as a local variable within main(), but then you access that buffer or structure from some other function (by passing a pointer to it). This is dangerous – you need to declare the buffer or structure as “static” if you want to do that. Otherwise the buffer or structure might be optimised away and not actually exist when that other function tries to access it. If that structure contained a pointer for example, that would become a bad pointer.

Imprecise Bus Fault

For the Cortex M3, an imprecise bus fault (as indicated by bit 10 in the CFSR register) means that a write to an invalid address was attempted. If you look at the program counter, the naughty naughty write is usually present in the 3 or so instructions leading up to the program counter address. Because of the Cortex M3 write buffer system, the program counter might have advanced slightly before the actual bus write took place, hence you need to look back slightly to find the erroneous write.

The discussion around the STM32F2xx and STM32F4xx DMA controllers is becoming a bit too large for a single posting, so here we are with Part 2!

Clearing a DMA Interrupt

In theory this is very simple – in practice not so much. The documentation for the “DMA low interrupt flag clear register” (DMA_LIFCR) and the “DMA high interrupt flag clear register” (DMA_HIFCR) shows bits, such as CTCIFx: Stream x clear transfer complete interrupt flag (x = 3..0). The documentation says to write a “1″ to the bit to clear the interrupt flag. That is correct.

However, I was finding that in certain circumstances the interrupt kept on reocurring, immediately, even though I was clearing out the interrupt.

DMA Interrupt Re-entering Reason #1

There is a well-known reason for this. The Cortex-M3 / M4 is a pipelined processor. If you clear the interrupt at the very bottom of your interrupt service routine code, immediately before you exit your interrupt routine, the clearing of the interrupt might not have propagated through the processor before your interrupt routine exited, meaning that the interrupt will be immediately triggered again.

It’s important to ensure the interrupt clear has actually happened before you exit your interrupt code. The easy way to do this is to clear the interrupt at the very top of your interrupt handler, before you do anything else. Then the time taken by executing your interrupt handler code allows lots of time for the interrupt clear to propagate. If this isn’t possible, or if your interrupt handler is extremely short, write to the interrupt clear register, then do a while() loop on the interrupt flag register to wait until the clear happens. Note that if you do this, make sure you put a timeout of some description in your while() loop to prevent a hangup. The interrupt clear should normally take effect very quickly.

DMA Interrupt Re-entering Reason #2

My interrupt re-entering was not caused by #1 above, and I spent a frustrating day hunting down the reason, finally discovering reason #2.

My interrupt handler code looked generally like this:

void DMA2_Stream1_IRQHandler (void)
{
  DMA2->LIFCR = (uint32_t)0x00000F40;      // clear DMA IRQ flags
 
  ... do a bunch of stuff ....
 
  // setup the DMA for a new transfer - DMA currently on so turn it off first
  DMA2->S1CR = (uint32_t)0x02025410;      // need to turn off DMA first to change settings
  while (DMA2->S1CR & DMA_CR_EN);         // wait until DMA is actually off
 
  DMA2->S1M0AR = ...     
  DMA2->S1NDTR = ...     
  ...
  DMA2->S1CR = (uint32_t)0x02065510;      // everything setup but not enabled     
  DMA2->LIFCR = (uint32_t)0x00000F40;     // clear any pending (old) DMA2 Stream 1 interrupts
  DMA2->S1CR = (uint32_t)0x02065511;      // everything setup and enabled   
}

You can see two interrupt clears taking place. One at the top of the interrupt handler (as it should be), and a second clear shortly before I re-enable the DMA controller, to ensure nothing is pending when the DMA is started up again.

Remember too, that the STM32F2xx / STM32F4xx DMA won’t even start if it has a pending interrupt (see STM32F2xx DMA Controllers Part 1) so clearing the interrupt flags is not optional – it must be done (even if you’re not using interrupts).

This second clear is not necessarily essential – it’s just for my own peace of mind, to be sure the DMA will start up correctly.

The reason the interrupt immediately re-enters is here:

// setup the DMA for a new transfer – DMA currently on so turn it off first
DMA2->S1CR = (uint32_t)0×02025410; // need to turn off DMA first to change settings

In this example the DMA is currently enabled, due to it being in a double-buffer / circular mode. I turn off the DMA. This results in the DMA TCIF flag being set. Again, so to speak – the TCIF was set earlier which triggered this interrupt, I cleared the flag at the top of the interrupt handler, and now that I’m turning off the DMA the TCIF flag has become set a second time. In the DMA configuration register bit 4, “TCIE: Transfer complete interrupt enable” is set, meaning that when the TCIF flag becomes set, it’s passed on to the NVIC (the Cortex-M3 / M4 Nested Vectored Interrupt Controller).

Hence at this point, even though the STM32F2xx / STM32F4xx DMA has been turned off, the simple act of turning it off has caused a new DMA interrupt to be pending in the NVIC. Because this particular code is in an interrupt service routine, this pending interrupt hasn’t executed yet, but it will the moment we exit our interrupt handler. This is the source of the endlessly reoccuring DMA interrupt.

There are a few possible solutions to this problem. Here’s one:

void DMA2_Stream1_IRQHandler (void)
{
  DMA2->LIFCR = (uint32_t)0x00000F40;      // clear DMA IRQ flags
 
  ... do a bunch of stuff ....
 
  // setup the DMA for a new transfer - DMA currently on so turn it off first
  DMA2->S1CR = (uint32_t)0x02025410;      // need to turn off DMA first to change settings
  while (DMA2->S1CR & DMA_CR_EN);         // wait until DMA is actually off
 
  NVIC_ClearPendingIRQ (DMA2_Stream1_IRQn);    // clear pending DMA IRQ from the NVIC
 
  DMA2->S1M0AR = ...     
  DMA2->S1NDTR = ...     
  ...
  DMA2->S1CR = (uint32_t)0x02065510;      // everything setup but not enabled     
  DMA2->LIFCR = (uint32_t)0x00000F40;     // clear any pending (old) DMA2 Stream 1 interrupts
  DMA2->S1CR = (uint32_t)0x02065511;      // everything setup and enabled   
}

Using the NVIC_ClearPendingIRQ() function works in this example because we’re in an interrupt service routine, so the new interrupt is pending but hasn’t had the opportunity to execute yet. Using NVIC_ClearPendingIRQ() in non-interrupt code would not work, because the interrupt will trigger the moment the DMA is turned off. In non-interrupt code (ie your regular main code) you need to prevent the interrupt from reaching the NVIC to begin with. Which would mean making sure bit 4 (TCIE: Transfer complete interrupt enable) in the DMA configuration register is clear (zero) before turning off the DMA. Or disabling the DMA interrupt in the NVIC before turning off the DMA. For example:

NVIC_DisableIRQ (DMA2_Stream1_IRQn);

(I personally haven’t tried this one though, because NVIC_ClearPendingIRQ() worked in my case.)

STM32F DMA interrupts can be a source of great confusion – hopefully this post has clarified things a little.

Some time ago I built a signal generator, which is a reasonably handy bit of equipment. But I was periodically frustrated with not having a suitable probe to easily inject a signal into a circuit. The standard BNC – BNC cable has its uses, but isn’t too great for that kind of thing. And a dangling bit of wire causes nothing but troubles.

What I really wanted was an oscilloscope probe. Scope probes have a nice sharp tip on them, they have a ground lead with a clip, they even have a hooky tip for grabbing onto things. Scope probes are great. But some experiments proved that a scope probe makes a terrible signal generator cable, because scope probes contain resistor-divider circuits, trimmer capacitors, etc, which get in the way of the signal generator output.

So I decided to hack a scope probe, to build a signal generator cable / probe. From DealExtreme I ordered a cheap oscilloscope probe – their SKU 32422 for a big $6. About three weeks later it landed in my mailbox. I quickly found it can easily be taken apart. Like so:

Oscilloscope probe disassembled
The strain relief (on the left of the photo) can simply be pulled off. That holds the little square box covering the PCB in place – with the strain relief gone the cover now slides off as well. A minute of quality time with the soldering iron removes the resistors and trimmer capacitor on the PCB (done before I took these photos), then I soldered a wire across the middle of the PCB to connect the BNC connector back to the centre of the cable. Here’s a close-up:

hacked oscilloscope probe

Then the PCB cover and strain relief were slipped back into place, and the X1/X10 switch on the probe handle was placed in the X1 position, then wrapped in tape to make sure it couldn’t be accidently switched to X10. When the switch is in the X10 position a big resistor is switched in series with the tip – don’t want that! Finally I stuck a label on the probe so I won’t confuse it with the regular scope probes. Done!

This is awesome. For a few dollars and maybe 10 minutes work we get a nice signal generator probe complete with sharp tip, hooky tip, ground lead, etc. Can’t beat that.