What was the Problem?

The open-source OPUS Decoder library, sitting on the top of a small embedded system consumes too much CPU. This becomes a bottle-neck when there are other programs running on the same system.

What was the customer
looking for?

To decode opus streams on the embedded system such that it keeps as much CPU free as possible.

What is the Solution?

The solution proposed by Media Magic was to optimize the OPUS decoder for the underlying CPU architecture, using the features provided by the architecture as well as by the means of other functional optimizations. Following are salient tasks we worked on:

  • Removed functionally unused parts of the codec to avoid some necessary decision making.

  • Profiled the code to find major CPU consuming functions. 
  • First level of optimization was done in C Language itself, where functional blocks of code were tuned to avoid buffer and data copy and reusing the buffers wherever possible.  
  • Second level of optimization was to find and get ready made optimized functionality from some third-party optimization libraries and to use such codes for our benefit. Here were always been careful about the licensing issues. 
  • Next level of optimization was to write hand-coded assembly for all critical functions. This requires thorough understanding of the underlying architecture, instruction set and compiler behaviour.

  • After first round of optimization, we repeated profiling of code until the CPU performance of decoder came to satisfactory levels. During each cycle of profiling-optimization, we needed to decide if the performance was satisfactory for our requirement. If not, then we needed to identify functions for assembly level optimizations and optimize them.

  • Due to the fact that the encoded bitstream was also getting created using our own OPUS encoder, we were able to tune the bitstream to strip some redundant headers to avoid encoding, transmission, processing and decoding overheads for those headers.

Customer Benefits

  • The optimized OPUS decoder saves a lot of CPU cycles on the embedded platform, hence giving the application to run some other critical programs on the same CPU. This saved a lot of hardware cost for our Customer.
  • Customer hugely benefited in terms of time-to-market from our expertise on Codecs, optimization and architecture understanding.
  • We have enabled our customer to reuse the same optimized source code on other similar platforms with some minor porting tweaks in the code.


LibOPUS, C language, ARMv5TE, NEON, WMMX, gprof, arm-gnu compiler toolchains, GDB, Valgrind.

Privacy Preference Center