Single-chip DSP implementation of MPEG voice coding

This article refers to the address: http://

Abstract In order to develop inexpensive monolithic fixed-point DSP sound encoder for MPEG audio coding standard reference algorithm computation and storage requirements of a more in-depth analysis, considering the required encoding quality and speed of the processor, combined with computer simulation As a result, the key point of using single-chip fixed-point DSP is pointed out. Based on AD company's ADSP-2181, the hardware structure optimized for sound processing is fully utilized, and the hardware and software scheme of real-time MPEG sound layer 2 coding is designed and implemented. It is shown that the encoding quality and real-time performance can be guaranteed under the condition of MAC-based filter precision extension and improved psychoacoustic model algorithm.
Key words compression coding; sound processing; digital signal processing

The MPEG [1] sound compression algorithm is the first international standard for high-fidelity digital sound compression. Since the adoption of this standard by the International Organization for Standardization and the International Electrotechnical Commission in late 1992, it has been used for digital sound storage, multimedia transmission over the Internet, The field of sound digital broadcasting (DAB) [2] has been widely used. However, the MPEG sound encoding algorithm is quite complicated, and it has high requirements on the amount of calculation and storage, and the market demand for encoders is not high. Large, so there is no special ASIC chip so far. The common method is to use MPEG to complete MPEG sound compression coding, but there are only a few companies in DEC, Philips, Xingit, etc. in single-chip DSP. This algorithm is completed, and it is expensive to purchase, no source code; there are two TMS320C30 from TI to implement MPEG voice coding layer 2 [3] , but the use of two DSPs not only complies the control circuit, but also adds off-chip Memory prices are still very expensive. Therefore, the study of software and hardware with proprietary rights and cheap prices has become an inevitable choice.

1 MPEG voice coding principle

MPEG sound coding is a sub-band sound coding algorithm based on human auditory characteristics. It belongs to a sensory sound coding method. The basic structure of the sensory sound coding algorithm is shown in Figure 1. According to the encoder, the frequency resolution or time is emphasized. Resolution can be divided into sub-band encoder and transform encoder. MPEG sound layer 2 encoding algorithm divides the sound signal into 32 sub-bands in the frequency domain, belonging to a sub-band encoder. In Figure 1, time-frequency mapping Also known as a filter bank for mapping an input sound signal into sub-sampled frequency components. Depending on the nature of the filter bank used, ie the resolution of the filter bank in the frequency domain, these frequency components can also be called sub- With sample or frequency lines.

 

(a)

68-2.gif (3077 bytes)
(b)
Figure 1 Block diagram of the sensory sound decoder

The output of the filter or the output of the time-frequency transform in parallel with the filter bank is provided to the psychoacoustic model to estimate the temporally correlated sound masking threshold. The psychoacoustic model uses known simultaneous masking effects, including masking with tuning. Characteristics and untuned masking characteristics. If the front and back masking effects of the sound are used, the accuracy of the masking threshold estimation can be further improved. The subband samples or frequency lines are quantized according to the criteria that ensure that the spectrum of the quantization noise is below the masking threshold. And coding, which can ensure that the noise introduced by the human ear is minimized. According to the complexity requirements, the block synthesis or entropy coding analysis synthesis method can be used.
Frame packing combines the output of the quantized code and the associated side information in a prescribed format for use by the decoder.

2 coding quality and DSP speed

The single-chip ADSP-2181 implements MPEG voice coding, which needs to solve two problems: first, how to ensure the quality of sound coding; secondly, how to make full use of the computing speed of DSP. These two problems are often a contradiction, and need to find the most Good combination point.
In general, the pros and cons of determining an MPEG sound coder are mainly the quality of the acoustic model. However, for the use of a single chip
For the application of 16bit fixed-point DSP, this conclusion is no longer applicable. The analysis shows that the influence of the finite word length effect on the coding quality has become the main contradiction. Especially the analysis filter bank, the truncation effect actually brings 33 The noise of the quantization error is multiplied by 16bitAD, and the finite length representation of the window coefficient reduces the filter response of the original 96dB sidelobe attenuation to less than 70dB. Therefore, to ensure the quality of the sound encoding, the analysis of the filter bank algorithm must be accurate. Expansion.
Regarding the speed problem, the first thing that comes to mind is the use of fast algorithms. We also tried to use fast algorithms in sub-band filtering [4] . However, it has been proved that these fast algorithms are not ideal for use on DSP. The reasons are as follows: Article: (1) only considers the number of addition and multiplication, but does not care about the operation of value, addressing, etc., but for all DSPs whose instructions are single-cycle, the number of multiplications and additions is not relative to other operations. It is particularly important; (2) does not consider the hardware characteristics of DSP, its algorithm can not give full play to the ability of DSP multiply accumulator (MAC) parallel processing; (3) ADSP-2181 is optimized for 16-bit algorithm operation, in need of precision In the case of expansion, the amount of computation will increase dramatically by orders of magnitude.
Based on the analysis of the above quality and speed requirements, we chose the multi-phase structure filter bank implementation suitable for DSP multiply-accumulate instruction, and adopted the MAC structure-based precision expansion method to better solve the problem between coding quality and DSP speed. Contradiction. In addition, the input method, psychoacoustic model and scale factor coding of sampling data are improved for ADSP-2181, which reduces the amount of calculation and ensures real-time performance.

3 algorithm software design

Software design is the core of single-chip DSP implementation of MPEG voice coding. The requirements of coding quality and speed need to be realized by carefully designing DSP software.
(1) The analysis of the MPEG voice coding based on the accuracy of the MAC structure can be implemented in many ways. The multiphase structure is one of the MPEG standards recommended, and its mathematical expression is

G69-1.gif (823 bytes) (1)
G69-2.gif (912 bytes) (2)

The analysis shows that the double word expansion of Y k can reduce the noise caused by the truncation effect by 33 times. However, considering that ADSP-2181 only supports 16-bit multiply-accumulate operation, it is necessary to convert equation (1), ie Y k = HY k +2 -16 LY k (3)

G69-3.gif (2177 bytes) (4)

In this way, the multiply accumulator structure of the DSP can be utilized, the operation amount is only increased by about 1 time, and the storage amount is only increased by 64 words.
(2) Organization of input data The organization of input data should not only consider the easy acquisition of sound raw data from the digital-to-analog converter, but also consider the input data in the on-chip data RAM storage suitable as an FFT for polyphase filter banks and acoustic models. Input of the operation. The polyphase filter bank shifts in 32 new sound data each time, and removes 32 old samples, as follows:
X i = X i -32 , i=511,510,...,32
X i =next - input - audio - sample, i =31,30,...,0
However, ADSP-2181 is not suitable for data movement. Each assignment operation requires two instructions to complete. Each analysis and filtering operation requires 1024 instruction cycles. If you use ADSP-2181 multi-channel automatic buffer serial port and indirect addressing capability. By properly organizing the input sound data, the sliding window method can be used to realize the data moving in and out, as shown in Fig. 2.

69.gif (2475 bytes)
Figure 2 Multi-phase filtered sliding window technology

In order to ensure the continuity of the frame boundary processing, the input data buffer should be designed in the form of a circular buffer, and its length should be able to store two frames of sound input data. When the DSP is processing one frame of data, the input data can be buffered to another frame. The overhead of data movement is saved. At the same time, the organization of the input data is also conducive to the FFT operation of the acoustic model. The FFT needs to utilize the address reversal addressing mode of the ADSP-2181. Since the FFT calculation and the buffering of the input data are simultaneously performed. Therefore, the pointer calculated by the FFT needs address inversion, but the pointer of the input buffer cannot be inverted. Otherwise, the input sound data will be arranged in disorder. The ADSP-2181 provides this capability, its first address pointer group I0, I1, I2, I3 has address inversion capability, and the second address pointer group I4, I5, I6, I7 is not affected by the address inversion mode. Therefore, the pointer is selected from the second address pointer group for input buffering, from the first address. Select the pointer in the pointer group to perform FFT calculation.
(3) Improvement of acoustic model One of the problems in implementing psychoacoustic model with DSP is that there are a large number of logarithmic operations. Although the approximation can be obtained by polynomial approximation, its huge computational volume indicates that this is not a wise choice. In the improved psychoacoustic model, the FFT operation is not immediately converted to the logarithmic domain, but the segmentation line is used to approximate the masking effect curve of the linear domain. For the sake of simplicity, the segmentation method consistent with the standard is used. The polynomial expansion of the one-item method, although this method is relatively rough, but as previously analyzed, the acoustic model is not a major contradiction in the 16-bit fixed-point implementation, and is therefore acceptable.
After the masking threshold is obtained, in order to calculate the mask ratio for bit allocation, it is still necessary to convert from the linear domain to the logarithmic domain. At this time, we adopt an approximate calculation method using the ADSP-2181 shifter. Extract the exponent of the two-complement fractional number, and the energy has 1 bit about 3dB. Therefore, multiplying the exponent value by 3 approximates the dB value of the complement fraction, and the influence of the mantissa part is ignored.
(4) Coding of the scale factor A total of 63 scale factors are given in the MPEG sound coding standard, but not all of these scale factors can be represented by a 16-bit binary number. If the double word is used for precision expansion, when quantizing It will also face the huge overhead of double word division. Therefore, only the subset that can be accurately represented by the 16-bit two-complement fraction is used, that is, the scale factor with a multiple of 3 and less than or equal to 45.
After using the proportional factor subset, the scale factor coding can be obtained by the comparison method, but can be obtained directly by calculating the maximum amplitude of the subband, which simplifies the coding of the scale factor.
(5) The software simulation results are combined with the above various algorithm improvements. According to the characteristics of ADSP-2181 and MPEG standard, the software simulation is carried out with the development software of AD company. Table 1 lists the calculation amount and storage amount of each module obtained by simulation. The estimated result is required. The simulation is performed at a sampling rate of 48 kHz, the encoding mode is stereo, and the input signal is a sine wave with a frequency of 1 kHz, and the output code rate is 192 kbit/s.
It can be seen from Table 1 that the performance of ADSP-2181 is fully utilized. The simulation results show that under the above conditions, the signal-to-noise ratio of the decoded output can reach about 80dB. It can be seen that the algorithm improvement is more effective.

Table 1 Calculations and storage requirements for each module

Operation amount / (10 6 instructions / s) Program storage/10 3 words Data storage/10 3 words
Subband filtering 18 3.0 6.5
Acoustic model 10 3.5 1.5
Bit allocation and quantization 2 2.0 -
Formatted bitstream 1 0.5 1.0


4 hardware design

The hardware structure block diagram is shown in Figure 3. The basic functions of each module are as follows:

70.gif (7221 bytes)

DSP core: In addition to completing all the encoding algorithms, the initialization configuration of the analog-to-digital conversion circuit is also completed; the sampling clock is selected by the auxiliary control circuit, and the encoding parameters of the host are accepted through the interface circuit.
Auxiliary control circuit: realized by FPGA and its auxiliary circuits, complete clock generation, FIFO status monitoring, address decoding and other functions.
Output buffer: The temporary storage area of ​​the encoded code stream, while providing a completely asynchronous output interface mode. It is especially useful in applications where image and sound lip synchronization is required.
External memory: including BDMA space, I / O space.
Analog-to-digital conversion circuit: completes the digitization of the sound and directly connects with the serial port 0 of the DSP. The sampling frequency is determined by the frequency of the externally supplied 256 times sampling clock, which needs to be initialized before normal operation.
Interface circuit: The interface circuit is divided into two parts, one part is the code output interface, and the other part is the interface connected with the host. The host interface uses the RS232 interface chip to complete the connection between the DSP serial port 1 and the host serial port. The DSP uses the interrupt and the internal timer to realize the asynchronous. String communication.
The above scheme has been implemented in the "Ninth Five-Year" scientific and technological research project, and the sound of real-time codec has passed subjective tests.

*National "Ninth Five-Year Plan" Key Science and Technology Funding Project Author: Lin Shengmen Aidong School of Telecommunications Engineering, Beijing University of Posts and Telecommunications, Beijing 100876; First author 25 years old, male, doctoral student

references

[1] ISO/IEC 11172-3-1993 Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s —— part 3: audio
[2] Brandenburg K, Dehéry YF, Johnston JD, et al. ISO-MPEG-1 audio: a generic standard for coding of high-quality digital audio. J Audio Eng Soc, 1994, 42 (10): 780 ~ 791
[3] Wang Jianwei, Dong Zaiwang, Research and Real-time Implementation of MPEG Audio Coding Algorithm by Yin Rifang, Journal of Tsinghua University, 1997 , 37(10) : 45 ~ 48
[4] Konstantinides K, Fast subband filtering in MPEG audio coding. IEEE Signal Processing Letters, 1994,

Network Accessories

Network Accessories,Wifi Adapter,Fiber Optic Network Components,Splitter Fiber Optic

Cixi Dani Plastic Products Co.,Ltd , https://www.cxdnplastic.com

This entry was posted in on