Friday, May 31, 2013

Optimizing JPEG decoder on Raspberry Pi

Months ago, I spent a lot of time getting the Raspberry Pi to perform fast JPEG decoding.

While it is fast, it is just barely not fast enough for what I want to do with Dexter so I am now going to spend another chunk of time optimizing it further.

Here are some current decoding speed statistics.  I used a high bitrate JPEG to stress it to its limit.

--------------------------------

Decoding from a memory buffer and rendering on the screen repeatedly:
62 fps (_fields_ per second)

When disabling the fragment shader that handles interlacing the video:
63 fps

Without transferring the decoded image to GLES2 but leaving everything else the same:
80 fps

Transferring the decoded image but not rendering it:
77 fps

Only decoding the JPEG, not transferring it or rendering it:
103 fps

-------------------------

Bottlenecks:
- Transferring the decoded image costs about 20 fps
- Rendering the image costs about 20 fps (but exercising the fragment shader only costs 1 fps so this is a good sign since I rely on it).  It's possible I could optimize the vertex shader to reduce rendering speed.

Conclusions:
- If I can decode JPEG directly to a texture then I don't have to transfer it in memory (this may boost up to 20 fps)
- Rendering performance may be running sub optimally
- I may be able to increase JPEG decoding speed by reducing some internal memcpy's I am doing right now and increasing OpenMAX input buffer size

Goal:
- Increase overall decoding/rendering speed from 62 fps to over 100 fps if I can manage it.  I think bare minimum I will need to get it over 80 fps to give me some breathing room.

1 comment: