Frank's Random Wanderings

A bunch of random musings, with a leaning towards electronics & computers.

I’ve recently been working on a project with a small 320×240 TFT display connected to a microcontroller. In this case an ARM Cortex-M0+. Being able to display images is a really nice feature, however images can consume a lot of space in the limited flash memory of the microcontroller.

The controller chip built into the display accepts RGB565 data, which is to say, 16 bits per pixel. One idea to reduce the size of the images is to use RGB332, which is 8 bits per pixel. For displaying photographs this drop in color depth will be visually noticeable, however for icons and similar things RGB332 would be just fine.

Images can be converted to RGB332 using the free download program “Image2LCD” found here:

Now the microcontroller needs a fast way of converting the 8-bit RGB332 “pixel data” of the saved image file back into the 16-bit RGB565 “pixel data” that the display needs to receive. The obvious answer is a lookup table. I couldn’t quickly find such a table on the web so I spent a little time on the weekend writing a program to generate a table. Here’s my resulting RGB332 to RGB565 lookup table:

const Uint16 RGB332to565lookupTable[256] = {
    0x0000, 0x000a, 0x0015, 0x001f, 0x0120, 0x012a, 0x0135, 0x013f, 
    0x0240, 0x024a, 0x0255, 0x025f, 0x0360, 0x036a, 0x0375, 0x037f, 
    0x0480, 0x048a, 0x0495, 0x049f, 0x05a0, 0x05aa, 0x05b5, 0x05bf, 
    0x06c0, 0x06ca, 0x06d5, 0x06df, 0x07e0, 0x07ea, 0x07f5, 0x07ff, 
    0x2000, 0x200a, 0x2015, 0x201f, 0x2120, 0x212a, 0x2135, 0x213f, 
    0x2240, 0x224a, 0x2255, 0x225f, 0x2360, 0x236a, 0x2375, 0x237f, 
    0x2480, 0x248a, 0x2495, 0x249f, 0x25a0, 0x25aa, 0x25b5, 0x25bf, 
    0x26c0, 0x26ca, 0x26d5, 0x26df, 0x27e0, 0x27ea, 0x27f5, 0x27ff, 
    0x4800, 0x480a, 0x4815, 0x481f, 0x4920, 0x492a, 0x4935, 0x493f, 
    0x4a40, 0x4a4a, 0x4a55, 0x4a5f, 0x4b60, 0x4b6a, 0x4b75, 0x4b7f, 
    0x4c80, 0x4c8a, 0x4c95, 0x4c9f, 0x4da0, 0x4daa, 0x4db5, 0x4dbf, 
    0x4ec0, 0x4eca, 0x4ed5, 0x4edf, 0x4fe0, 0x4fea, 0x4ff5, 0x4fff, 
    0x6800, 0x680a, 0x6815, 0x681f, 0x6920, 0x692a, 0x6935, 0x693f, 
    0x6a40, 0x6a4a, 0x6a55, 0x6a5f, 0x6b60, 0x6b6a, 0x6b75, 0x6b7f, 
    0x6c80, 0x6c8a, 0x6c95, 0x6c9f, 0x6da0, 0x6daa, 0x6db5, 0x6dbf, 
    0x6ec0, 0x6eca, 0x6ed5, 0x6edf, 0x6fe0, 0x6fea, 0x6ff5, 0x6fff, 
    0x9000, 0x900a, 0x9015, 0x901f, 0x9120, 0x912a, 0x9135, 0x913f, 
    0x9240, 0x924a, 0x9255, 0x925f, 0x9360, 0x936a, 0x9375, 0x937f, 
    0x9480, 0x948a, 0x9495, 0x949f, 0x95a0, 0x95aa, 0x95b5, 0x95bf, 
    0x96c0, 0x96ca, 0x96d5, 0x96df, 0x97e0, 0x97ea, 0x97f5, 0x97ff, 
    0xb000, 0xb00a, 0xb015, 0xb01f, 0xb120, 0xb12a, 0xb135, 0xb13f, 
    0xb240, 0xb24a, 0xb255, 0xb25f, 0xb360, 0xb36a, 0xb375, 0xb37f, 
    0xb480, 0xb48a, 0xb495, 0xb49f, 0xb5a0, 0xb5aa, 0xb5b5, 0xb5bf, 
    0xb6c0, 0xb6ca, 0xb6d5, 0xb6df, 0xb7e0, 0xb7ea, 0xb7f5, 0xb7ff, 
    0xd800, 0xd80a, 0xd815, 0xd81f, 0xd920, 0xd92a, 0xd935, 0xd93f, 
    0xda40, 0xda4a, 0xda55, 0xda5f, 0xdb60, 0xdb6a, 0xdb75, 0xdb7f, 
    0xdc80, 0xdc8a, 0xdc95, 0xdc9f, 0xdda0, 0xddaa, 0xddb5, 0xddbf, 
    0xdec0, 0xdeca, 0xded5, 0xdedf, 0xdfe0, 0xdfea, 0xdff5, 0xdfff, 
    0xf800, 0xf80a, 0xf815, 0xf81f, 0xf920, 0xf92a, 0xf935, 0xf93f, 
    0xfa40, 0xfa4a, 0xfa55, 0xfa5f, 0xfb60, 0xfb6a, 0xfb75, 0xfb7f, 
    0xfc80, 0xfc8a, 0xfc95, 0xfc9f, 0xfda0, 0xfdaa, 0xfdb5, 0xfdbf, 
    0xfec0, 0xfeca, 0xfed5, 0xfedf, 0xffe0, 0xffea, 0xfff5, 0xffff 

The microcontroller code then simply has to:

  • Read a byte from the image
  • Use the byte as an index into the table to read a word from the table
  • Write the word to the display

And repeat for every pixel data byte in the image.

For those who want to generate the table themselves, here’s my little C program:

// Creates a file of 256 entries being a lookup table for RGB332 to RGB565 conversion.
// (c) Frank Van Hooft 2015
// Free software - do as you like with it.
#include <stdio.h>
typedef unsigned int Uint16;
Uint16 ConvertRGB332toRGB565(unsigned char rgb332);
const unsigned char b3to6lookup[8] = { 0, 9, 18, 27, 36, 45, 54, 63 };
const unsigned char b3to5lookup[8] = { 0, 4, 9, 13, 18, 22, 27, 31 };
const unsigned char b2to5lookup[4] = { 0, 10, 21, 31 };
int main()
	FILE *fp;
	int i;
	printf("\nGenerating file RGB332to565lookuptable.c... ");
	fp = fopen("RGB332to565lookuptable.c", "w"); 	
	fprintf(fp, "const Uint16 RGB332to565lookupTable[256] = {");
	for (i = 0; i<256; i++) {
		if (!(i % 8)) fprintf(fp, "\n    ");
		fprintf(fp, "0x%.4x, ", ConvertRGB332toRGB565(i));
	fprintf(fp, "\n};\n");
Uint16 ConvertRGB332toRGB565(unsigned char rgb332)
	Uint16 red, green, blue;
	red = (rgb332 & 0xe0) >> 5;		// rgb332 3 red bits now right justified
	red = (Uint16)b3to5lookup[red];		// 3 bits converted to 5 bits
	red = red << 11;			// red bits now 5 MSB bits
	green = (rgb332 & 0x1c) >> 2;		// rgb332 3 green bits now right justified
	green = (Uint16)b3to6lookup[green];	// 3 bits converted to 6 bits
	green = green << 5;			// green bits now 6 "middle" bits
	blue = rgb332 & 0x03;			// rgb332 2 blue bits are right justified
	blue = (Uint16)b2to5lookup[blue];	// 2 bits converted to 5 bits, right justified
	return (Uint16)(red | green | blue);

Notice this conversion table is not optimised for any particular purpose. The resulting lookup table is a “palette conversion” – it’s mapping an 8-bit color palette to a 16-bit color palette. Obviously with an 8-bit starting point, only 256 colors from the 65536 possible 16-bit colors can be selected. For a general purpose conversion there is no perfect answer – something like the above is about as good as you can do. The above lookup table attempts to cover the 16-bit RGB565 color space as evenly as possible.

However, if you’re doing this conversion for a specific image it can be possible to optimise the palette for it. For example, if your image was mostly shades of red, you could place a lot more “shades of red” 16-bit RGB565 entries into your lookup table (at the expense of green and blue entries). Some early computer games, like Doom, used 8-bit color, and optimised their palettes for the types of images they were displaying. Those who have played Doom know it’s lots of shades of brown. There’s no end to the tweaking that can be done for specific applications.

Recently I’ve trying to use Freescale’s C90TFS Flash Driver for erasing and programming on-chip flash on their Kinetis MKL25Z128 Cortex M0+ processor. This is the same processor as used on the Freescale Freedom board. At the time of writing, the latest version of the TFS flash driver is 1.03.

The documentation states it’s been tested and verified on a pretty long list of processors and demo boards, including the Kinetis freedom board, Coldfire processors, etc. To be honest, I kinda doubt that, for a reason which will become apparent in a moment.

I added the appropriate driver files to my existing Kinetis CodeWarrior project, then wrote some simple code, based on the included NormalDemo_Flash.c code, to test it. It didn’t work.

Bug in file SSD_FTFx_Internal.h

In file user_cfg.h you must specify a #define FLASH_DERIVATIVE. This basically specifies your processor. Based on this single entry, the file SSD_FTFx_Internal.h contains a long set of if – else if to pull in an appropriate .h configuration file which itself contains a long list of definitions specific to your exact processor. The C90TFS flash library supports several different CPU cores, a bunch of different processors, etc, so this is a key part of making the library work correctly.

Unfortunately the file SSD_FTFx_Internal.h contains a bad bug.

/* Select file .h for each derivatives */
#if (FTFx_KX_256K_256K_4K_2K_2K == FLASH_DERIVATIVE)
    #include "FTFx_KX_256K_256K_4K_2K_2K.h"
#elif (FTFx_KX_128K_128K_4K_2K_2K == FLASH_DERIVATIVE)
    #include "FTFx_KX_128K_128K_4K_2K_2K.h"
#elif ((FTFx_KX_512K_0K_4K_2K_0K == FLASH_DERIVATIVE) || (FTFx_KX_256K_0K_4K_2K_0K == FLASH_DERIVATIVE))
    #include "FTFx_KX_(512_256)K_0K_4K_2K_0K.h"
#elif (FTFx_CX_256K_32K_2K_1K_1K == FLASH_DERIVATIVE)
    #include "FTFx_CX_256K_32K_2K_1K_1K.h"
#elif ((FTFx_CX_128K_32K_2K_1K_1K == FLASH_DERIVATIVE) || (FTFx_CX_64K_32K_2K_1K_1K == FLASH_DERIVATIVE)\
    || (FTFx_CX_32K_32K_2K_1K_1K == FLASH_DERIVATIVE))
    #include "FTFx_CX_(128_64_32)K_32K_2K_1K_1K.h"
#elif (FTFx_KX_512K_512K_16K_4K_4K == FLASH_DERIVATIVE)
    #include "FTFx_KX_512K_512K_16K_4K_4K.h"
#elif ((FTFx_KX_1024K_0K_16K_4K_0K == FLASH_DERIVATIVE) || (FTFx_KX_1024K_0K_4K_4K_0K == FLASH_DERIVATIVE)\
    || (FTFx_KX_2048K_0K_4K_4K_0K))
    #include "FTFx_KX_(2048_1024)K_0K_(16_4)K_4K_0K.h"
#elif ((FTFx_KX_128K_0K_2K_1K_0K == FLASH_DERIVATIVE)||(FTFx_KX_64K_0K_2K_1K_0K == FLASH_DERIVATIVE)\
    ||(FTFx_KX_32K_0K_2K_1K_0K == FLASH_DERIVATIVE))
    #include "FTFx_KX_(128_64_32)K_0K_2K_1K_0K.h"
#elif ((FT......

In the little snippit above, the 6th line from the bottom:

    || (FTFx_KX_2048K_0K_4K_4K_0K))

should be:

    || (FTFx_KX_2048K_0K_4K_4K_0K == FLASH_DERIVATIVE))

Because of this bug, if your processor selection isn’t one of the first few in the list, you’ll end up pulling in the include file FTFx_KX_(2048_1024)K_0K_(16_4)K_4K_0K.h, which is for a Cortex-M processor with the larger 4k flash block sizes, etc. Not suitable for a MKL25Z128 with 1k flash sectors, certainly not suitable for many other processors. How this could work at all for other processors like the Coldfires, I’ve currently got no idea.

Like most bugs, once found it’s an easy fix to edit that line.

Executing Code from RAM

For processors like the MKL25Z128, you cannot execute code from flash at the same time as a flash sector is being erased or programmed. This has some ramifications.

First, there’s a driver function FlashCommandSequence() which must live in RAM. It’s called by the flash driver – you don’t directly call it. But you must copy it into RAM. There’s an example of how to do that in the NormalDemo_Flash.c file, but it involves creating a separate section in RAM in the linker file. If you’d prefer not to fiddle with the linker file, here’s another way.

#define SIZE_OF_FLASH_CMD_SEQ 0x50		// Assumed size (plus padding) of function FlashCommandSequence() - check map file
// Global SRAM memory where we put code to be executed from RAM. Code is copied into
// here when using the flash_write() function below.
uint8_t ram_for_command_sequence_func[SIZE_OF_FLASH_CMD_SEQ]  __attribute__ ((aligned (4)));
uint8_t *source;
/* copy FlashCommandSequence routine code to RAM, so later on it can execute from RAM */
source = (uint8_t*)((uint32_t)FlashCommandSequence & 0xfffffffe);
for(i=0; i < SIZE_OF_FLASH_CMD_SEQ; i++, source++)
    ram_for_command_sequence_func[i] = *source;
pFlashCommandSequence = (pFLASHCOMMANDSEQUENCE)((uint8_t *)ram_for_command_sequence_func + 1);

Hopefully this is sufficient to give you the idea of how to do it. Looking at my .map file, the size of FlashCommandSequence() was 0x30, so allowing 0x50 above gave lots of slack in case things might change a little. Then you pass the address of the function in RAM to the flash library through the sector erase and program functions, for example like so:

returnCode = FlashEraseSector(&flashSSDConfig, ADDR_DATA_FLASH_SECTOR, size, pFlashCommandSequence);

Note that in the Freescale flash library, you’ll find a couple of source files for all the functions. You’ll find one is pre-compiled; each function is just an array of bytes. The other is normal C source code. I initially tried using the precompiled version of FlashCommandSequence() because I thought it would be convenient to know the exact size. However that function did not work on the MKL25Z128. Perhaps it was compiled for a different CPU; I don’t know. When I switched to using the C version of FlashCommandSequence(), compiling it normally as part of my project and copying it into RAM as shown above, it worked as expected.

Callbacks & Interrupts

The library functions can execute your code while they’re waiting for flash erases and programs to take place. This is done by giving them a pointer to a callback function. On many processors, like the Freedom board processor, this would also need to be in RAM. Because remember on these processors you cannot execute code from flash while erasing or programming a flash sector.

For this reason, it’s probably more trouble than it’s worth. Set the callback function to NULL_CALLBACK so the callback function isn’t used. Like so:

    FLASH_SSD_CONFIG flashSSDConfig =
        FTFx_REG_BASE,          /* FTFx control register base */
        PFLASH_BLOCK_BASE,      /* base address of PFlash block */
        PBLOCK_SIZE,            /* size of PFlash block */
        DEFLASH_BLOCK_BASE,     /* base address of DFlash block */
        0x0U,                   /* size of DFlash block */
        EERAM_BLOCK_BASE,       /* base address of EERAM block */
        0x0U,                   /* size of EEE block */
        DEBUGENABLE,            /* background debug mode enable bit */
        NULL_CALLBACK           /* pointer to callback function - none for us */

Similarly, you should make sure no interrupts might trip off which might execute code from flash during that time. The simplest thing to do is just to globally disable interrupts when you’re erasing or programming flash, then reenable interrupts afterwards. Here are a couple of macros which work in CodeWarrior for the Kinetis to do that.

#define EnableInterrupts asm(" CPSIE i");
#define DisableInterrupts asm(" CPSID i");

A usage example:

returnCode = FlashEraseSector(&flashSSDConfig, ADDR_DATA_FLASH_SECTOR, size, pFlashCommandSequence);

Yes, globally turning interrupts off and on is a bit crude, but in a lot of cases it’s quite good enough. Of course you can get much more fancy if you want.

It’s taken me a while to get this flash library working reliably. Between the bug and the sometimes ambiguous documentation, it’s not quite trivial. Hopefully this will help others get flash erasing and writing working. Once you know the tricks, it’s pretty straightforward.

There’s a new project posted on the main website – a simple RF signal generator that (mostly) covers the range 640 MHz to 5.8 GHz.

RF Signal Generator

The project can be considered as being in two halves. First is the RF half. This is provided by a Linear Technology LTC6946 demo board. The 6946 is a PLL synthesizer. Put simply, you program its internal control registers with suitable PLL multiplication / divide factors, and it outputs the resulting frequency. The synthesizer is programmed via an SPI port.

The second half is a Texas Instruments Stellaris Cortex-M3 development board with a 3.5″ touch-screen color display. It’s programmed to be the user interface – you can enter your desired frequency on the touchscreen, it calculates the values for the various LTC6946 internal registers, then programs them via SPI. It provides a really nice user interface to what is otherwise a very simple project. Then everything is wrapped up in a simple metal box.

Full information on the build, including the source code for the software, is provided on the main site. Enjoy!

I’ve been doing some work with a Kinetis MKL25Z128 Cortex-M0+ processor. I love what you get for the very small amount of money this thing costs. It even contains a 16 bit A/D converter, which is very rare at this pricepoint. Along with everything else, it contains a USB port.

There are a number of Freescale application notes about how to use the USB port, implement bootloaders, etc. To be honest a lot of them are a bit of a mess, and if you frequent the Freescale forums you’ll see plenty of discussions with people asking pretty basic questions (like, why doesn’t the sample code even compile). So to find something actually working is a bit of a rarity.

Freescale has an appnote called AN4764 which describes a very small USB bootloader. The whole thing compiles down to only around 4 kB of flash which is quite impressive. They also provide a simple PC application to work with it. The source code, appnote, etc, can be found here:

AN4764 Simple HID Bootloader

The source code zip file in the link above contains prebuilt images for most processors, but for some reason the MKL25Z128 processor (also known as the “Freedom board” processor) is missing. IAR is required to build the projects. (Why Freescale used the IAR compiler instead of their own CodeWarrior compiler is another unsolved mystery.) I installed the small (32 kB code size) free version of IAR and built the MKL25Z128 project. To save others from needing to do the same, I’ve posted the entire project, including the executable and the appnote PDF, here:

Click here to download the files.

The function Check_Bootloader_Mode() in file kl25z_bootloader.c has been modified to remove most of the checks it was doing. It used to calculate & compare a checksum on the application, plus it looked to see if a couple of words of flash were erased. Now all it does is looks at GPIO bit PE31 – if that pin is low it runs the USB bootloader, and if that pin is high it runs the application. That’s it.

For a quick summary of how to use it, you program it into your processor. To cause the USB bootloader to run GPIO PE31 needs to be low after reset (ie, put a switch on PE31 and hold it low during reset or poweron). When plugged into a Windows PC it’ll be recognised as a HID device. Then run the PC application to download application code. Your application code needs to be built for a start address of 0x1000, seeing as the bootloader lives at the bottom 3.5 kB of flash. Do that by editing your linker file. Then somewhere at the beginning of your application code you need to point the processor to your new interrupt vector table (if you’re using interrupts), like so:

SCB_VTOR = (uint32)0x1000;

Once your application starts, interrupts will be disabled. In CodeWarrior you can use the following macros to enable and disable interrupts:

#define EnableInterrupts asm(" CPSIE i");
#define DisableInterrupts asm(" CPSID i");

That’s it. This is a super-easy way to get a USB bootloader running on a Freescale Kinetis processor.

Update Oct 2014
I’ve made a minor update to the bootloader (link to the file is above) to correct a problem where sometimes, if you don’t have an external pullup resistor on Port E bit 31, the bootloader might not always jump to your code. The reason was the code didn’t always wait long enough after a power-on, when using the internal weak pull-up. I’ve increased the delay a little (just a millisecond or two) to give it a little extra time to detect PE31 as being high. Ideally you should have an external pull-up resistor on PE31, but if you want to save a penny and don’t have one, this minor change helps.

Older versions of FreeNAS would email periodic status reports, however the newer 9.x versions no longer do this. FreeNAS now only sends emails when something strange or bad happens. It makes sense – lots of “everything’s fine” emails tends to numb you and you end up deleting every email from the FreeNAS system, including potentially the interesting ones.

Nonetheless, a weekly email can be nice. Specifically I’d like a weekly email containing the SMART information for the harddisks, so I can see their temperatures, whether any are starting to reallocate sectors (which is an early failure warning), etc. A weekly email with that kind of information would be pretty useful.

This thread on the FreeNAS forums provides a lot of information how to do it, and it’s pretty simple to setup. But some of the information in there is a bit outdated, plus they suggest placing your script in a location that gets overwritten when you update FreeNAS. So here we’ll do things a little more simply. Here’s my script:

# emails a status report                                                                                                            
echo "To: YOURNAME@YOUREMAIL.COM"                                                                                                        
echo "Subject: Freenas status report"                                                                                               
echo " "                                                                                                                            
) > /var/cover                                                                                                                      
echo "Drive Reports for 2 disks:" >> /var/cover                                                                                     
echo " " >> /var/cover                                                                                                              
smartctl -a /dev/ada0 >> /var/cover                                                                                                 
echo " " >> /var/cover                                                                                                              
smartctl -a /dev/ada1 >> /var/cover                                                                                                 
echo " " >> /var/cover                                                                                                              
echo "ZPool Status:" >> /var/cover                                                                                                  
zpool status >> /var/cover  
zpool get all YOURPOOLNAME >> /var/cover                                                                                                        
sendmail -t < /var/cover                                                                                                            
exit 0

You can see the script is calling smartctl twice, once for each of the two disks in the system. The script also calls “zpool status” and “zpool get all” to obtain the health and status of the specified data pool. “/var/cover” is simply a temporary file used to store all the data – at the end of the script it’s passed to sendmail to email everything out.

To use the script you only need change two things: your email address, and the name of your storage pool. Beyond that you can of course edit it to your heart’s delight.

Save your script in your storage pool, ie in your shared folder. The full directory path will probably start with /mnt. For example:
There are a couple of advantages to this. One is your zpool doesn’t get overwritten when you update FreeNAS so it’s as easy as always to do updates and you won’t lose your script. The other is by putting it in your zpool your script is naturally backed-up as part of your backups (you do do backups right??).

Then setup a cron job (easy to do through the FreeNAS user interface) and specify something like this to call your script (remember to select the cron user as ‘root’):

/bin/sh /mnt/share/something/somethingelse/

That’s it! Enjoy your status emails.


NAS (Network Attached Storage) devices, also known as a home server, are pretty common these days. They’re great for having a central point to store files; as a media server for holding photos and home movies; as a place to store backups; etc. Since 2008 I’ve had a 2-bay Synology device, the DS207+. It’s been both reliable and simple to use, but a source of frustation has been my reliance upon Synology to keep its software updated, fix bugs, etc. Technology marches on and Synology no longer supports these units. Synology’s latest software stream is at 4.x and my DS207+ is stuck at 3.x; there are no more feature improvements or bug fixes for me.

The DS207+ is also starting to look rather slow, so it’s high time for a new NAS. This time around I want to avoid the proprietary solutions from companies like Synology, QNAP, etc, and go with something completely open so I can modify it, update it, etc to my heart’s content for as long as I choose. At the same time I’m looking for higher performance, both in network throughput as well as error resiliency. It’s time to roll my own, and this post describes what I ended up with.

UER – The Dreaded “Uncorrectable Error Rate”

We’re all aware that harddisks are not perfect devices – they can fail completely, so backups are super important. What’s less commonly known, but growing in importance as harddisks get larger, is the UER – the uncorrectable error rate.

The simple fact is that harddisks store data on magnetic media, and sometimes that data gets corrupted. Actually, it gets corrupted quite often, so harddisks have sophisticated error detection & correction algorithms (Reed-Solomon encoding etc) to deal with those errors so you never see them. Well, mostly. Sometimes an error happens that the disk cannot correct – this is an uncorrectable error.

The uncorrectable error rate is given in the datasheet for the harddisk, but it’s typically in the order of 1 bit in 10^14 for consumer drives. Some are less, some are more. Consider that 10^14 bits is around 12 TB. Back when you had a 80 GB disk, you’d have to read the entire contents of your disk quite a few times to have a chance of suffering an uncorrectable error. Today 4 TB disks are quite common, and reading that entire disk only 3 times would result in a sizeable chance of suffering an uncorrectable error.

Depending upon the error, you might never notice. If it happened in the middle of a video you would probably never see it. However if it happened in the first few bytes of that video it might corrupt the header and make the video unplayable. If it happened in a program file the program might crash. If it happened in part of the disk filesystem then the impact could be much worse.

Most NAS filesystems blithely assume the disk is perfect. My Synology is like that, as are the vast majority of consumer NAS’s. However some filesystems have the ability to checksum all data coming off the disk, and thereby can detect & correct a harddisk unrecoverable error, even if it happens within the filesystem data itself. The most well-known of these filesystems is called ZFS. I want a system that uses the ZFS filesystem.

DRAM Errors

Harddisks are not the only things that can corrupt your data. Imagine copying your latest home movie from your camera to your NAS. The video data is stored in the DRAM of the NAS during the process of being written to disk. If a DRAM error occurs, then an error will be written to disk in the middle of your movie, and you probably won’t even know.

The error rate of DRAM is higher than the error rate of a harddisk. A 2009 Google study found an average of more than 3000 errors per DRAM stick (DIMM) per year. That’s in a busy server which is always switched on; a home computer is not typically always on and hence the error rate will be lower. Nonetheless, DRAM errors are very real and surprisingly common.

The main solution for DRAM errors is to use ECC DRAM. It’s slightly more expensive than non ECC DRAM (approximately 20% more expensive in my experience), but it has the ability to automatically correct single-bit errors, which is what most DRAM errors are.

Being able to use ECC DRAM requires both a processor and a motherboard chipset that support ECC DRAM. Hence this is a driving force when choosing which hardware to buy.

The Build – Software

There are quite a few pre-packaged NAS software builds available. Some are free, some are not (but the paid ones are still inexpensive), some are open-source and some are closed. Some have good support and some have minimal. I spent quite a bit of time reviewing all I could find, which was a lot. In the end I selected what is likely the most popular build out there: FreeNAS. It’s free, very popular, well supported with lots of help available, a strong feature set, and meets all my requirements (and then some).

One constraint derived from the FreeNAS decision was their recommendation to choose a motherboard with an Intel network chip. Some other network chips do work, but for a trouble-free install an Intel network chip comes highly recommended.

The Build – Hardware

I already had a PC case and decent power supply, so I needed a processor, motherboard and RAM. My choice:

  • Processor: Intel G3220 3 GHz LGA1150 Haswell $65
  • Motherboard: Supermicro LGA1150 X10SLL-F $158
  • RAM: Kingston ValueRAM KVR1333D3E9S/4G DDR3-1333 ECC $57 x 2 = $114 (8 GB total)

As you can see the most expensive component was the motherboard. Although I could find much cheaper LGA1150 motherboards, I couldn’t find any that explicitly said they supported ECC RAM, nor did they have Intel network interface chips. Although I paid full price, this motherboard does go on sale. I saw it on for less than $100 at one point.

To offer up a slightly lower-cost alternative, I’ve seen the following system used by some folks on the FreeNAS forums. Costs listed are what I see on newegg. This is AMD based (not Intel) but it obviously does work.

  • Processor: AMD FX-4300 (3.8GHz) $120
  • Motherboard: Asus M5A78L-M/USB3 $70
  • RAM: as above

So in the end I spent $337 (plus tax) to put together my new NAS. For that money I was quite happy – it’s a similar cost to what I’d spend on another Synology or QNAP, yet fills the criterias I listed at the beginning of this post. In my view, much better bang for the buck.

The only thing remaining is to stick in some drives. I chose Western Digital Green drives, and put them in a simple mirror arrangement. Yes there are “better” drives, but these are cheap, relatively reliable (from the newegg customer reviews), relatively low-power, and due to the simple mirror arrangement are easy to replace if one fails.

How Fast Is It?

My home network uses gigabit ethernet. The only speed number I care about is reading from the system. So I read about 12 GB of data from the freenas server to my PC and timed it. Then repeated the exercise to see if I got the same result. Which was: a read speed of 410 Mbits/sec. (This required the data to pass through two network switches, and my PC is not exactly cutting-edge any longer. Still, it’s a decent number as far as I’m concerned.)

And The Synology?

It’s still in use. These days it contains a couple of old but still good drives in a JBOD arrangement. About once a week I power it on to perform a backup (automated, using a shell script) from the freenas to the synology. Although it’s much slower than the freenas and I wouldn’t want to use it for daily tasks, it makes a good backup machine. If lightning ever strikes the freenas system, taking out everything including its mirrored drives, there will still be a copy of the data on the synology. As far as I’m concerned, you can never have too many backups!

It’s funny how a topic as apparently mundane as the DMA controllers on the STM32F2xx and STM32F4xx processors can be such a can of worms. I’ve already provided 2 postings on the subject, here and here, and now we have a third.

This one is a biggie. From what we’re seeing, it appears you can only have 2 DMA transactions taking place at any one time. It appears that having 3 simultaneous DMA transfers causes intermittent failures.

To explain….

We’ve been doing some work with the STM32F207 and STM32F407 processors. We had been performing 3 DMAs simultaneously. Specifically: receiving data from the DCMI port, receiving data from the ADC, and either receiving or sending data to the SDIO port (SD card). What we saw happening was:

  • Sometimes the SDIO transfer would simply stop partway through. There was no error, and the card was not busy. There was no apparent reason for the data transfer to stop, but it would. Querying the SDIO status register SDIO_STA would indicate its transmit FIFO was empty (in the case of writing to the SD card) and that it was in the middle of a transfer.
  • Sometimes the ADC would report an ADC overrun error. This should be impossible. With the ADC data being read out via DMA, completely outside of any software control, there’s no way it should ever overflow. But sometimes it would.

We initially put these behaviours down to buggy peripherals, but one day we noticed the problems stopped when we turned off the DCMI port. Hmm, how puzzling. Further investigation yielded that we had no problems if only any 2 of these 3 peripherals were running, which led to the question: “what’s common between these 3 peripherals?”. The obvious answer: the DMA. In this case, specifically DMA2.

It fits the symptoms very nicely. What we theorise is that when a peripheral needs a DMA service, it raises a flag which results in the DMA servicing that peripheral, either by writing it data or reading data from it. If 3 peripherals raise their flags simultaneously, it can happen that one of them is “permanently forgotten”. It doesn’t get serviced, then or later. In the case of the ADC this results in an overrun. In the case of the SDIO this results in it eventually running its FIFO empty (when writing to a SD card). In the case of the DCMI we saw some apparent failures, but they proved hard to nail down so I’ll reserve judgement there as to exactly what happens.

Based on this theoretical understanding, we put together a test case demonstrating the problem (on both the F2x and F4x) and sent it to ST at the beginning of Nov 2011. We heard back at the beginning of Dec that they’d replicated the problem, but we haven’t heard anything since.

Then towards the end of December I was speaking with another company using the SMT32F2xx and they just happened to mention a similar problem. They also had 3 simultaneous DMAs active, however theirs were: SPI, USART and SDIO. A different set of 3 to us, and theirs were spread across both DMA1 and DMA2. Yet their basic symptom of unexplained data stoppages was the same.

Based on all this, I can only assume that any 3 simultaneous DMAs, from any one DMA controller or a mix of both DMA controllers, can produce this problem.

In our case, we took the lowest bandwidth peripheral, namely the ADC, and moved it to operating under interrupt. Because the STM32F407 ADC has no internal FIFO it can overrun very easily, hence its interrupt must be at the highest priority and be permitted to preempt other interrupts. Since we’ve done this all 3 peripherals have behaved normally.

I need to re-emphasise that what we believe to be happening is no more than a theory based on the symptoms we’ve been seeing. Although ST has confirmed seeing these symptoms, they haven’t provided any information about what’s actually going on, so we’re flying in the dark to a certain degree. What I can say with some confidence is that you’d be very wise to limit the number of simultaneous DMAs to 2. At least until ST provides more information about what’s actually causing this.

Update June 2012:

In January 2012 ST provided a response in their user forums here:

It does not answer all the questions, but it seems to indicate the problem lies not with the number of DMAs, but instead might lie specifically with DMA2. They stated:

We confirm your findings and it is a limitation that concerns only our DMA2, and here is the detailed description :

DMA2 controller could corrupt data when managing AHB and APB2 peripherals in a concurrent way.

Description :
This case is somehow critical for peripherals embedding FIFO and generates data corruption. For memories, the impact is a multiple access but the data is not corrupted. AHB Peripherals embedding FIFO are DCMI, CRYPT, HASH. on STM32F2/40xx without CRYPTO only the DCMI is impacted.

The data transferred by the DMA to the AHB peripherals could be corrupted in case
of a FIFO target or multiply accesses in case of memories access.

Workarounds :
Avoid concurrent AHB and APB2 transfer using DMA2. One of the following approach could be used to solve the issue:
* If DMA2 is used to manage AHB peripheral (DCMI, CRYPT, HASH), we can use the Cortex-M CPU to manage APB2 peripherals.
* If DMA2 is used to manage APB2 peripheral, we can use the CPU to manage AHB peripheral (DCMI, CRYPT, HASH).

Obviously, we will update our errata on web soon.

I don’t know how ST measures time – they say their errata will be updated “soon”, but as I write this it’s 5 months since they stated that, and their STM32 errata, at revision 2.0, does not contain this information, let alone any more detail about it. If anyone knows anything more, please post a comment.

Here’s the link to Part 1.

A very significant limitation with the STM32F4xx family (STM32F405 / 407 / 415 / 417) is that fully a third of its internal RAM is inaccessible to the DMA controller. Of the 192 kB of available RAM, only 128 kB can be accessed by the DMA. The other 64 kB, known as the CCM, cannot be read or written by DMA.

For a Cortex-M4 processor that is promoted using DSP type benchmarks (filters and FFTs etc), this is a glaring oversight. DSP type operations are all about reading data in, processing the data, and writing the resultant data out. Two of those three tasks require the DMA if they’re to be performed efficiently, and on the STM32F4xx family the DMA is unusable for a third of its RAM. For me personally, coming from a long DSP background, this stilted memory architecture is crazy beyond words.

Still, it’s not the first time the hardware designers have made life tough for the software folks, and it won’t be the last. We just have to deal with it as best we can. I’ve been attempting to get the SDIO SD Card interface working under interrupt, so that the additional 64 kB of RAM we’re paying for can be accessed by the SDIO. This post will share a few things I’ve learned.

ST SD Card Interrupt Examples

As far as I can find, there aren’t any. I’ve looked through both the STM32F2xx and STM32F4xx software examples, and it all uses DMA exclusively for the data handling. If you come across any ST example code doing SD card data handling via interrupt, please let me know.

Double-Handling the Data

This is an option, and I have considered it. The idea would be (as an example):

  • DMA data from SD card into the 128 kB of RAM
  • Software copy the data into the 64 kB of RAM
  • process the data
  • Software copy the results from the 64 kB of RAM into the 128 kB of RAM
  • DMA the results from the 128 kB of RAM to the SD card

Obviously what I’ve listed is worst-case and ugly as sin. You really wouldn’t want to do it. Still, if you did, an efficient software-copy routine would be essential. This Stellaris forum posting contains details for a fast assembler Cortex-M3/M4 memory copy routine. I’ve played with it and it works well.

SDIO Requests More Data Then It Needs

If you’re using the STM32F2xx / STM32F4xx SDIO to transmit data to an SD Card, under interrupt you’ll probably be using the “transmit FIFO half empty” TXFIFOHE interrupt flag. When this triggers, you know your interrupt handler software needs to write 8 words (32 bytes) to the SDIO FIFO.

The problem is that the SDIO will request more data than what it actually requires, which could, if you’re not careful, result in you reading past the end of your data buffer, possibly generating some kind of a bus fault or hard fault. To explain, take a look at this example code snippet from within an SDIO interrupt handler:

  ptr = source_addr;    // address of source data to Tx to card
    SDIO->FIFO = *ptr++;     // write first word (32 bits = 4 bytes) to the FIFO 
    SDIO->FIFO = *ptr++;
    SDIO->FIFO = *ptr++;
    SDIO->FIFO = *ptr++;
    SDIO->FIFO = *ptr++;
    SDIO->FIFO = *ptr++;
    SDIO->FIFO = *ptr++;
    SDIO->FIFO = *ptr++;      // 8th word of data written to the SDIO FIFO
  source_addr = ptr;   // remember data position for next time

You can see it’s checking to see if the Tx FIFO Half-Empty flag is set, and if so, it writes 8 words (32 bytes) of data to the FIFO, updates its data pointer, and that’s it. We’ve made it slightly more efficient by wrapping it in the while() loop, so it does it repeatedly until the Tx FIFO is no longer needing more data – this allows it to more quickly fill the FIFO at startup when the FIFO is empty.

The BUTTON_OUT_xxxx sets a GPIO pin so we can see on the oscilloscope what’s happening.

When writing a single block / sector to the SD card, which is 512 bytes, we would expect to see 512 / 32 = 16 writes (of 32 bytes) to the FIFO. Let’s look at the scope:

STM32F407 SD Card Interface Write

There are a few things of great interest to be seen here.

At the start of the scope plot, on the left, we can see 4 writes in very quick succession. This is thanks to the while() loop in the code. The SDIO Tx FIFO is 32 words deep, so the TXFIFOHE remains set until the FIFO is full, which requires 4 sets of 8 words to be written. This is good – we’re getting the Tx FIFO filled very quickly.

If we count the total number of writes on the scope plot, we see 19. Huh? We expected to see 16; what gives? 19 means we’ve read 608 bytes from our data buffer (actually: right past the end of our data buffer) and given it to the SDIO; that’s too much for a 512 byte write. The reason is the title of this section: the SDIO requests more data than it needs. It appears the designers of the SDIO block did not give it the intelligence to compare its FIFO level with its DCOUNT register. If the FIFO contains sufficient empty space to accept another 8 words, it will set its TXFIFOHE flag to request more data, EVEN THOUGH IT DOES NOT NEED IT TO COMPLETE THE CURRENT TRANSFER. Be aware of this.

Changing our SDIO IRQ handler slightly to consider the DCOUNT register, for example like this:


does not help, because we cannot know the amount of data currently held in the FIFO.

To deal with this, you need to keep your own “data remaining count” variable, which you can count down as you give data to the SDIO FIFO. Then when your count variable reaches zero, you should turn off the TXFIFOHE interrupt (by clearing its bit in the SDIO->MASK register).

Something else to note from this scope capture is the interrupt rate and CPU utilisation. In this example the SDIO clock is 20 MHz, meaning we can write data to the card at 10 MB/s. Given that we’re writing 32 bytes at a time (except at the very beginning where we write 4 times that), we calculate we’re writing data every 3.2 microseconds. The scope shot bears this out. This corresponds to an interrupt rate of 312.5 kHz! This is a very high rate for a small processor, and the CPU utilisation should be expected to be high. From the scope shot we can estimate we’re spending about 12% – 15% of our 120 MHz processor doing nothing except servicing these SDIO interrupts. It’s a steep price to pay for making so much RAM inaccessible to the DMA.

Tx FIFO Underrun

Getting data transmit (send data to the card) to startup properly on the SMT32F4xx / 2xx can be very tricky. Here’s my understanding.

When you enable the SDIO (via the DTEN bit in the SDIO_DCTRL register) the FIFO is empty. So the TXFIFOHE interrupt will trigger immediately, and at the same time the SDIO peripheral will start attempting to write data to the SD card. Hence data must appear in the Tx FIFO extremely quickly, otherwise a Tx FIFO underrun will occur and the SDIO peripheral will shut down.

It is not possible to pre-load the FIFO before enabling the SDIO. I’ve tried and it doesn’t work. I believe the FIFO is hardware-cleared until the SDIO is enabled, or something similar to that.

What this means is that at the moment of SDIO turn-on (when the DTEN bit is set), that TXFIFOHE interrupt must trigger. At that point in time it must be the highest priority interrupt in the system, or be the only interrupt. If it’s delayed for any reason, for example because another interrupt occurs at that time, then a Tx FIFO underrun will very quickly follow. Think very carefully about your enabled interrupts at that critical SDIO transmit start-up point. You may want to consider using the NVIC to make the SDIO be the highest priority interrupt, permitted to preempt all other interrupts. Or, come up with some other scheme to ensure that first TXFIFOHE interrupt can execute immediately.

SDIOIT Status Bit

The SDIO_STA status register contains the SDIOIT bit with a very vague description. I’ve seen this bit being set from time to time but I’ve never worked out what it means. If you understand what it actually represents, please let me know.

If Apple Wins We All Lose

I’m not generally one to repost articles and links to articles – you won’t find much of that on my blog. Nor am I particularly pro-this or anti-that when it comes to technologies, brands, companies, etc. I have Apple products and non-Apple products. I’ve used various flavours of Mac OS, of DOS, of Windows, of Unix, of Linux, and much more. It’s all good to me.

However, I’m not agnostic when it comes to the on-going development of technology. I’ve been in this business long enough to realise that every new development builds upon the back of some new development before it. Like kicking steps up a snowbank, making each new step upwards requires you to stand in the step you’ve got. That’s the way innovation and technology develops, and it’s critical for everyone that this is able to continue.

Gizmodo recently posted this article:

If Apple Wins We All Lose

Yesterday’s news that courts had ruled against HTC in favor of Apple was a tidy little victory for Apple. But HTC is just an initial skirmish in a much larger fight. The real war is against Android, and if Apple wins that, we’ll all lose…

It’s well worth the read. Many patent systems and legal systems around the world are increasingly allowing existing “technology holders” to supress and quash the development of new evolutionary technologies. The USA is the most obvious example of this, but it’s certainly not the only one. And the trends indicate this problem is getting worse – not better. Truely, when it comes to anti-competitive behaviour and legal policies, this is an area where indeed, “we all lose”.

Cortex-M3 / M4 Hard Fault Handler

If you’re seeing a Hard Fault exception on your Cortex M3 or Cortex M4 processor, this handler and information may help. I can’t take credit for it – this code was provided by Joseph Yiu on a few different forums, as well as in his book (Definitive Guide to the ARM Cortex M3). I’m simply providing some assistance on how to install and use it.

Hard Fault Handler Installation

These instructions work for an STM32F2xx or STM32F4xx processor using a GNU-based toolchain (eg Yagarto or Sourcery G++). They should work with other processors and toolchains but may require a small tweak – no doubt your compiler will be pleased to tell you if it’s not happy! As always with programming, the following is not the only way to do it – it’s simply the way I did it. If you want to rearrange things or do things a bit differently then feel free.

Joseph’s hard fault handler is in two pieces – a small piece of assembly, and a small piece of C. You need the processor’s hardfault exception vector to jump to the assembly, and then the assembly code will itself call the C code.

Here’s the assembly code. It extracts the location of the stack frame, then passes it as a pointer to the C code, which is named hard_fault_handler_c.

.syntax unified
.cpu cortex-m3
.global HardFault_Handler
.extern hard_fault_handler_c
  TST LR, #4
  B hard_fault_handler_c

This assembly needs to be immediately called when the hard fault exception occurs. For the STM32F processors, their vector table is found in the ST-supplied file startup_stm32f2xx.s (or similar). If you look at the vectors list, you’ll see something like this:

  .word  _estack
  .word  Reset_Handler
  .word  NMI_Handler
  .word  HardFault_Handler
  .word  MemManage_Handler
  .word  BusFault_Handler
  .word  UsageFault_Handler

Given that the name in the vector table is HardFault_Handler, we give the assembler code that name (and declare the name as a global so the linker can find it). If your vector table contains a different name, then change the name of the assembler code to suit.

You need to include this bit of assembler in your build. Just save the assembler code in its own .s file (eg: hardfault.s) and then include it in your build, the same way as your other .s files (like that startup file) are.

Now we need to add the C code. Here it is:

// From Joseph Yiu, minor edits by FVH
// hard fault handler in C,
// with stack frame location as input parameter
// called from HardFault_Handler in file xxx.s
void hard_fault_handler_c (unsigned int * hardfault_args)
  unsigned int stacked_r0;
  unsigned int stacked_r1;
  unsigned int stacked_r2;
  unsigned int stacked_r3;
  unsigned int stacked_r12;
  unsigned int stacked_lr;
  unsigned int stacked_pc;
  unsigned int stacked_psr;
  stacked_r0 = ((unsigned long) hardfault_args[0]);
  stacked_r1 = ((unsigned long) hardfault_args[1]);
  stacked_r2 = ((unsigned long) hardfault_args[2]);
  stacked_r3 = ((unsigned long) hardfault_args[3]);
  stacked_r12 = ((unsigned long) hardfault_args[4]);
  stacked_lr = ((unsigned long) hardfault_args[5]);
  stacked_pc = ((unsigned long) hardfault_args[6]);
  stacked_psr = ((unsigned long) hardfault_args[7]);
  printf ("\n\n[Hard fault handler - all numbers in hex]\n");
  printf ("R0 = %x\n", stacked_r0);
  printf ("R1 = %x\n", stacked_r1);
  printf ("R2 = %x\n", stacked_r2);
  printf ("R3 = %x\n", stacked_r3);
  printf ("R12 = %x\n", stacked_r12);
  printf ("LR [R14] = %x  subroutine call return address\n", stacked_lr);
  printf ("PC [R15] = %x  program counter\n", stacked_pc);
  printf ("PSR = %x\n", stacked_psr);
  printf ("BFAR = %x\n", (*((volatile unsigned long *)(0xE000ED38))));
  printf ("CFSR = %x\n", (*((volatile unsigned long *)(0xE000ED28))));
  printf ("HFSR = %x\n", (*((volatile unsigned long *)(0xE000ED2C))));
  printf ("DFSR = %x\n", (*((volatile unsigned long *)(0xE000ED30))));
  printf ("AFSR = %x\n", (*((volatile unsigned long *)(0xE000ED3C))));
  printf ("SCB_SHCSR = %x\n", SCB->SHCSR);
  while (1);

This code goes wherever the existing (previous) Hard Fault Handler went. In our example, the vector table pointed to a function called HardFault_Handler. We are replacing that function with the assembler code, so the original HardFault_Handler function needs to be commented out (otherwise we’ll have two functions with the same name). For the STM32F2xx all exception handlers are found in the file: stm32f2xx_it.c So comment out the function HardFault_Handler() from that C file, and paste the C code for Joseph’s hard_fault_handler_c() into the same file.

That’s it. In summary, you commented out the old hard fault handler, and you added in some assembly code and some C code instead. Try building your project and see what happens.

Note that this code will only work if the main stack pointer hasn’t been badly corrupted prior to the hard fault occurring – if the stack pointer is off in never-never land then the C handler may not work. In my experience this has never been a problem.

Hard Fault Handler Usage

The big thing the above handler gives you is the program counter, which is the address where the processor was executing when the hard fault occurred. You can then look at your listing file (or map file) to see what function and instruction that was. Also useful is LR the Link Register, which contains the return address of the last function call – it can show you where you came from to get to this point.

A few tips.

The typical reason for a hardfault is actually a bus error (which was promoted to a hard fault), because software tried to access an invalid region of memory. There are a couple of common ways of doing this.

One is a bad pointer. It might be uninitialised, or not properly bounded (you ran off the end of an array for example).

Another, more subtle, way of getting a bad pointer is by being sloppy with the scope of variables. This is quite common with main() but can happen anywhere. For example, you define a buffer or a structure as a local variable within main(), but then you access that buffer or structure from some other function (by passing a pointer to it). This is dangerous – you need to declare the buffer or structure as “static” if you want to do that. Otherwise the buffer or structure might be optimised away and not actually exist when that other function tries to access it. If that structure contained a pointer for example, that would become a bad pointer.

Imprecise Bus Fault

For the Cortex M3, an imprecise bus fault (as indicated by bit 10 in the CFSR register) means that a write to an invalid address was attempted. If you look at the program counter, the naughty naughty write is usually present in the 3 or so instructions leading up to the program counter address. Because of the Cortex M3 write buffer system, the program counter might have advanced slightly before the actual bus write took place, hence you need to look back slightly to find the erroneous write.