Demystifying Voice Quality When Using Graphical Processing Units for Transcoding


I believe Graphical Processing Units (GPUs) are the right solution for voice transcoding as VoIP services continue their migration to virtual, cloud-based solutions. However, an argument I have recently heard is that codec transcoding is more efficient and produces better voice quality with fixed point processing, whereas GPUs are designed for floating point calculations.

If this argument was true, it directly implies that using GPUs will result in inferior voice quality. Fortunately, this argument can be shown to be false by looking at actual test results.

We analyzed the voice quality results using CPUs (fixed point) versus GPUs (floating point) for transcoding using three codec types: G729AB; AMR-WB; and EVRC-B, using the speech test vector in the G.729 standard specification. Voice quality measurements were done using the PESQ standard. The highlights are:

  • G729AB
    • Testing run without Discontinuous Transmission (DTX), a.k.a. “silence suppression”, so packets were sent during periods of silence
    • GPU measurements were within .4% of CPU measurements
  • EVRC-B
    • Testing was done on two bitrates: 9.3 and 8.5 kbps
    • GPU measurements were less than .9% difference than CPU measurements
  • AMR-WB
    • Testing was done across the full spectrum of bitrates from 6.6 to 23.85 kbps
    • GPU measurements ranged from .7% better to .55% worse than CPU measurements

In summary, our testing showed GPUs using floating point processing was within 1% of CPU fixed point processing or better. In our experience, <1% difference results in no perceived degradation in voice quality.

But if you want another source that shows similar results, check out 3GPP TR 26.976 version 10.0.0 Release 10, Performance characterization of the Adaptive Multi-Rate Wideband (AMR-WB) speech codec.

Specifically, look at Annex B which contains the verification results for AMR-WB floating point codec and section B.7 which shows a comparison AMR-WB PESQ scores using a floating point vs a fixed point encoder. Section B.7 concludes with the following statement:

It is most likely, from the data, that there is no significant subjective difference between V5.3.0 of the fixed-point AMR-WB encoder with CR011 implemented and V0.2.2 of the floating-point AMR-WB encoder.

Beyond the voice quality equivalence of using GPUs for voice transcoding, there are other reasons that GPUs are the right solution for VoIP services in the virtual, cloud deployment models that service providers and enterprise customers are increasingly adopting.

As we have shown in prior blogs, when compared head-to-head, GPUs clearly exceed CPUs in terms of performance and scale because they are designed for high volume, compute-intensive processing, which is exactly what audio transcoding requires. Not only do they exceed in performance and scale, but do so at less cost for power consumption and far less rack space.

In a virtual, cloud deployment model, especially in public clouds, GPUs are readily available. It is now easier than ever to get access to GPUs. GPUs are already being made available for high volume, compute-intensive applications like machine learning and analytics, so expanding the use case to include real-time voice codec transcoding becomes simple and very attractive.

In conclusion, using GPUs for voice transcoding provides voice quality that is as good as any other option and when combined with the added benefits highlighted above is clearly the right solution for voice transcoding in the Cloud.


  • MathWorks estimates that through the automated provisioning and call routing features of the Sonus solution, the company has freed up more than 250 IT staff hours per week for more important projects.

    MathWorks is the leading developer of mathematical computing software for engineers and scientists. Founded in 1984, MathWorks employs 2800 people in 15 countries, with headquarters in Natick, Massachusetts, U.S.A.
  • The industry-leading performance and scale of Sonus' SBC 5100 allows us to maintain a competitive edge in the market while delivering exceptional customer service. 

    Smart Tel is a major player in the Singapore telecommunications industry and aims to develop its global presence with new offices in Australia, Thailand, Indonesia, Philippines, India, South Africa, the US and the UK, with cost effective, easy-to-use and scalable telephony solutions.
  • We wanted to work with an industry-leading SBC vendor and our market analysis indicated that Sonus was the clear choice for this partnership.

    (GCS) is a software company founded in 2006 by Neal Axelrad and Jay Meranchik. GCS' goal is to be the best company in the marketplace. We are privately held and have offices in New York & New Jersey USA.
  • Sonus made the deployment, integration and migration to Microsoft Lync easy. 

    We are experts in identifying and delivering flexible communication solutions that scale and adapt to your business demands, empowering your business to do more, faster and with less effort and cost.