Frank's Random Wanderings

Blackfin Fast JPEG Encoding

Recently I’ve been doing a lot of embedded uClinux programming for the Analog Devices Blackfin BF537 processor. Writing drivers, application code, things like that. It’s a lot of fun.


One of the things I’d been battling with was finding a reasonably fast JPEG encoder. Now you need to understand that the BF537 doesn’t have a JPG compression engine or anything like that, so it’ll never be screaming fast. The default uClinux JPEG library is libjpeg. Libjpeg is fairly slow. I wrote a reasonably-optimised DCT function in blackfin assembler for libjpeg & posted it over at You can find the files, and a description, here at my original posting:

Faster JPEG encoding – files attached

What’s cool is it appears to have been accepted into the mainstream blackfin uClinux distribution (SVN). So pretty soon everyone will be getting this faster libjpeg code by default.

Which is nice, and it helps, but it’s still too slow. My example 752 x 512 RGB image was taking around 260 ms to JPG encode using the “improved” libjpeg function. Too long. So I looked for alternatives.

I had tried FFMPEG. It’s faster, because the Analog Devices folks wrote some optimisations for it. It’s tricky to tell exactly how fast, because FFMPEG will duplicate frames to give you the framerate you request. Just because FFMPEG gives you 25 frames/sec doesn’t mean you’re getting 25 different frames per second. But that aside, the big problem I had with FFMPEG was its tendancy to malloc() and free() chunks of memory constantly. This would eventually result in memory fragmentation & malloc() failing, sometimes in minutes, sometimes in hours. Problems like this I can do without, and it’d be a hard one to fix.

SCS Vision wrote a JPEG encoder for their leanXcam product. Their encoder appears to be based on one written by Nitin Gupta. Based on comments they wrote in their comment block (which I assumed to be correct & did not personally verify), I calculated it would encode my image in around 290 ms. Obviously that’s not the big improvement I’m looking for.

Analog Devices has a very fast optimised JPEG encoder available, right off their Blackfin website, for free. But it only works with their proprietary VDSP+ tools. It’s not compatible with any of the open-source tools (gcc, uClinux, etc) and it’s only distributed in object code form, which makes it impossible to port. Tantalising, but useless.

Then recently I had a break. I stumbled across the Surveyor robot folks. A very bright fellow over there wrote a pretty decent JPG encoder, which took advantage of an Analog Devices optimised DCT function, as well as doing some other smart things in the code. It only handled YUV images, whereas my image is RGB. So before I could try it, I had to hack up a quick RGB24 to YUV422 conversion routine. But once I did, I was encoding my image in 165 ms. That’s clearly a substantial improvement over the libjpeg 260 ms.

With that success, I set to further optimising the code, both theirs and mine. By the time I was finished, my image was JPEG encoding in 96 ms. Go baby go!

Fast JPG encoding seems to be a common question for Blackfin, so I’m posting the final code here. Click on the link below to get to the download page:

Blackfin Fast JPEG Encoder

Now here’s how to call the function.

If you’re familiar with libjpeg, you’ll find this easier, because there’s nothing to set up or tear down.  encode_image() needs a pointer to the raw image in memory, a pointer to a chunk of memory where it can write the resulting JPG image, and a few other parameters. It returns another pointer, which is the next free location in memory where it had just been writing. So that you can work out how large the just-created JPG image is. It’s up to you to write the JPG image to a file (if indeed you even want to do that).

Here’s the function prototype:

It’s pretty simple. Pointers to the input raw image buffer (YUV or RGB) and an output image (JPG) buffer. A quality factor which ranges from 1 (highest) to 8 (lowest). I use 2. Setting this to 1 results in much longer execution time & much larger filesizes. Image format can be:

  • RGB  – this is RGB24
  • FOUR_TWO_TWO   – this is YUV422

FOUR_ZERO_ZERO (monochrome YUV) is reported in the comments below as also working.

Image width & height should be self-explanatory.

To call it, set up your image buffers as you see fit, then do something like this:

Your JPG image is of filesize:  (jpeg_buff_end – jpeg_buff)

If you want to save your JPG to a file, you can do something like:

That’s it.  Easy to use, and it’s the fastest thing, by far, I’m aware of for JPEG encoding on Blackfin uClinux.

18 thoughts on “Blackfin Fast JPEG Encoding

  1. Mark

    Thanks for posting this. I have adapted it to run on a STM32F429. 3 slight problems using a 24 bit BMP image, red and blue are swapped, there seems to be a bit of wrap around on the RHS and the image if flipped vertically but not horizontally but I suspect these should be easy to fix. Next stage, taking the code and porting it into a FPGA.

  2. frank Post author

    The blackfin IS a 32bit micro. Running it on a STM32 is certainly possible; probably the biggest challenge will be memory because the STM32 only has a small amount of internal RAM.

  3. Brad

    Hi, I am not sure if this blog is checked or updated anymore, but I am going to go ahead and put this out there.

    Has anyone got this going on an 32 bit micro? I have an STM32F207 and would like to give this encoder a try but I see a lot of the operations are 8 bit.

  4. Jim Dallas

    Hi Frank – really interesting.
    We currently use the Analog JPEG libraries on a Blackfin based camera – and are looking at power efficient ways of cuttingboth a jpeg thumbnail – say 256×192 and a ‘web’ image 1024×768 at the same time we cut the 3MP file. We want to do this on the device so we can browse the device images by Bluetooth. e.g. one file would be 1/8th by 1/8th, the other would be 1/2 x 1/2.

    Presumably one way we could do this would be to take your code, then add a downsample from the YUV in memory(two steps) and cut two new jpegs ? Another way could be to do something with the JPEG at ain interim stage – but my guess is that gets tricky (e.g. is there an easyish way to transform a 2×2 set of DCTs to a single DCT at some point in the encode – then send that to be packed up into a new 1/4 size JPeg – is there a way at the end to take a Jpeg and reduce by 1/2 x 1/2 without decoding) . Just musing – will post back if we find anything


  5. André

    thanks for putting this together. While adapting it to my needs I came across the fact that the huffman coding is taking a lot of time. The following code on a i7 using gcc 4.4.1 brought a 25% increase for the huffman coding. This is a code fragment. huf_ctx just has the context, huf_type can be either UINT32 or UINT64. The real increase is due to putbyte dealing with the probable case first.



    static __inline__ void putbyte(huf_ctx *hc, UINT8 data)
    *hc->out = data;
    if (data != 0xff)
    *hc->out = 0;

    static __inline__ void putbits(huf_ctx *hc, int numbits, UINT32 data)
    int bits_in_next_word;

    bits_in_next_word = (hc->bitindex + numbits – sizeof(huf_type)*8);
    if (bits_in_next_word lcode = (hc->lcode <bitindex += numbits;
    hc->lcode = (hc->lcode <bitindex)) | (data >> bits_in_next_word);
    switch (sizeof(huf_type))
    case 8:
    putbyte(hc, hc->lcode >> 56);
    putbyte(hc, hc->lcode >> 48);
    putbyte(hc, hc->lcode >> 40);
    putbyte(hc, hc->lcode >> 32);
    case 4:
    putbyte(hc, hc->lcode >> 24);
    putbyte(hc, hc->lcode >> 16);
    case 2:
    putbyte(hc, hc->lcode >> 8);
    putbyte(hc, hc->lcode);
    hc->lcode = data;
    hc->bitindex = bits_in_next_word;

  6. frank Post author

    Glad to hear it’s working for you Martin. That’s good news. The real thanks of course must go to the folks at who created the initial code. They did a great job.

  7. Martin Banky


    I just wanted to thank you for this implementation. I’m using it on an i486 800MHz SBC, with uClibc and a highly modified version of Palantir. With your code (modified for my needs and ported to C++), Palantir went from 13.25fps to 25.25fps! An absolutely incredible increase in speed!


  8. frank Post author

    First off, run the code through a profiler to see which functions are taking the most time. It might not even be your code – you might be calling some really slow libraries. Then try to optimise the code algorithmically in C as much as you can. Only when you’ve done all you can that way, do you take the painful step of writing assembler.

  9. Bill Strahan

    What about a REALLY fast jpeg decoder? I’ve got a CE app that we distribute on an XScale based device, written in .net. I’ve put together a large C++ library to handle some image stuff (rotation, zooming, etc.) but the bottleneck is how quicky he images are being decoded by .net.

    They’re just 256X256 tiles but they take more than 100ms to decode on a 624 mhz processor. I’m sure it could be MUCH faster, but dont’ know where to begin.


  10. Steve Howell

    Hi Frank,

    Thanks for the very quick reply! I don’t think it’s a 64/32 bit problem. I have 32 bit int’s, 16 bit short’s and 8 bit char’s, as expected by the code. Thanks anyway. I’ll keep investigating.


  11. frank Post author

    Hi Steve,

    Hmm, I don’t know. At one point I had a similar problem to what you’re describing, where the JPG image started out correct but then got messed up. It was due to the assembly language DCT algorithm being used at the time. Switching to the C code DCT function fixed the problem, and I then wrote an assembly version of that C code to make it run faster. Is there any chance this might be a data type problem (eg you’re running on a 64 bit machine when this code perhaps assumes 32 bits) or something like that?

    Good luck!


  12. Steve Howell


    I’ve been trying to build and use this jpeg code in a Microsoft Visual C++ project to convert 24 bit RGB images to JPG. I removed the __attribute__ stuff and switched it back to using all the C routines instead of the assembler ones, but I can’t get it to work properly. It seems to produce a JPEG file in which the first few 8×8 blocks of pixels look correct but the rest of the image is corrupted. On quality level 8 the length of the strip of blocks which look right is longer than it is on quality level 1. I don’t suppose you’d have any idea what I might have got wrong?

    The only changes I made to the code were:
    Remove the __attribute__s.
    In “encodeMCU”, called “quantization” instead of “quantization_asm”.
    In “DCT”, commented out the part were it just calls “jpegdct” and returns.
    In “huffman”, made it call the PUTBITS macro instead of “putbits_asm”.
    In “read_rgb24_format”, I changed the “#if 0″s to “#if 1″s and vice versa, this stopping it using “read_rgb_yuv”.

    Any help gratefully accepted!


    Steve Howell

  13. Franklin

    You are a genius! The following change produces the expected picture:

    INT16 *CB_Ptr = CR;//CB;
    INT16 *CR_Ptr = CB;//CR;

    Next step, the inverted image.

    Thanks a lot

  14. frank Post author

    Kudos to you for getting it largely running on wince. Obviously you commented out the assembler routines to use the C functions (DCT, huffman, etc).

    Chances are the image is inverted because that’s how it comes from the camera. A great many cameras return an inverted image. Not all, but lots.

    A BMP image is actually upside down. So if you’re going from a BMP to a JPG, that’s likely what you’re seeing. A BMP stores the image inverted. So if the image comes from the camera inverted & is stored in a BMP “as-is”, the BMP viewer will invert it, resulting in a right-way-up image when viewing the BMP. But a JPG file stores the image as-is, so an inverted source image will result in an inverted JPG image.

    RGB24 is a somewhat ambiguous thing. Some drivers think the byte order is red – green – blue. Other drivers think it’s blue- green – red. I’ve seen both. In either case green will be correct, but sometimes red & blue will be swapped.

    This can easily be seen by pointing the camera at a strongly-coloured red & blue object. If the colours display backwards, you know where the problem is.

    A blue cast (or any other colour cast) over the image may also be caused by a wrong (or completely missing) white balance algorithm. Artificial lighting can also cause a colour cast (our eyes/brains are remarkably good at ignoring such things) – try changing the lighting a bit.

    Have fun!

  15. Frank

    Hi Frank,
    I was looking for a fast Jpeg encoder for windows ce and found yours. I had it working after making a few minimal changes of the C code version. The output is very fast. The only problem is that the image comes inverted and the resulting picture is blue-ish. I am pretty sure my camera outputs RGB24.
    Thanks a lot

  16. Strubi

    Hi Frank 🙂

    Just some quick feedback: FOUR_ZERO_ZERO works, too. Tested it with a monochrome cam.


Leave a Reply

Your email address will not be published. Required fields are marked *