Electrical & Computer Engineering 520.445/645
Audio Signal Processing
Fall, 2024
Project 3
Linear Predictive Coding
Deadline: 10/17/2024 at 11:59pm
This project implements a Linear Predictive Coding compression scheme. You are given 10 audio samples to evaluate your system, and your goal is:
(a) Compress the signals into as few coefficients as you can so that it can be transmitted over a communication channel. Your transmitter will estimate the prediction parameters a! and gain G for each signal on a frame-by-frame. basis.
(b) Using the parameters {a!, G}, re-synthesize the original sentence with the highest possible quality.
The ultimate goal is a reconstructed waveform. with the best possible quality, using the least number of coefficients. Your report MUST include an estimate of the average number of parameters in bits/second or samples/second.
Your evaluation should be done in two parts:
- Part 1: evaluate your system on 10 audio files provided
- Part 2: Record yourself saying the sentence “The synthesized signal is supposed to be of high standard". For this second part, can you optimize the choice of parameters specifically for this sentence to improve compression and quality. Note that this sentence has a large number of fricatives, specifically the phoneme /s/. What do you need to change in your parameters to improve compression/quality? Discuss your choices in your report and save the improved synthesized sentence.
Voicing (10% of grade):
LPC requires knowledge of voicing information. In order not to hamper your progress in the project, you are provided voicing information that you can use in case your own voicing code does not work. For each of the 10 audio files, you have a file X.txt that contains estimates of voiced/unvoiced decision and pitch frequencies at a frame. rate of 100 frames/sec (higher than the typical framerate for LPC). A zero value indicates unvoiced or silence, and a non-zero value is the voiced pitch frequency (in Hz). Note that these estimates are obtained from a correlation analysis of the signal. They are only approximations, may not be accurate for every frame. and should not be considered as absolute truths. As reminder, these values are only provided as backup in case your own voicing analysis is not working well.
A component of your project is to implement your own voicing estimator, using either the signal's autocorrelation, the LPC residual signal, or any other method. You are expected to implement your voicing analysis from scratch instead of using existing packages in Matlab or Python.
For part 2, you will be recording your own voice. If your own voicing code is not working properly, you can use existing packages or visual inspection of spectral slices to generate your voicing information for part 2 and proceed with the rest of the project.
Notes:
- A number of factors may affect the quality of your synthesized speech. For instance, what goes on in a given frame. is not independent of what happened in previous frames. As the pitch period changes, you will need to know where the last pitch impulse occurred in the previous frame. so as to determine the location of the next impulse in the current frame. You should also examine what is the benefit of using different glottal shapes for your voiced segments.
- You can change the vocal tract filter once per frame, or you can interpolate between frames. The voicing information is provided to you at a rate of 10msec but you should explore the frame. rate that works best.
- Listen to your synthesized speech and see if you can isolate what are the main sources of distortion.
- If you wish, you could use an automated quality metric to evaluate the quality of your synthesized signal. Documentation and Matlab code for this metric (PESQ: Perceptual Evaluation of Speech Quality) are included with the project documents. A python implementation can be found here: https://pypi.org/project/pesq/. You are not required to used this metric. It is up to you to decide how you evaluate improvements of signal quality as you resynthesize your audio.
- If you use any builtin functions (in MATLAB or Python), please indicate that in your report and comment it in your code.
Deliverables (on canvas):
1. A report (5 pages) that describes how you programmed the LPC Vocoder (e.g., decisions made on frame. length, excitation generation, frame. overlaps, etc.), along with any graphics/plots. Do not include code in your report.
2. Your recorded synthesis for all 11 audio files.
3. A link to your code (e.g. google collab) or submit your code directly. You are free to use MATLAB or python. Make sure your code is working properly and includes all functions necessary to properly test your system. If your code crashes for any reason, points will be deducted.
Grading Rubrics
Components
|
Points
|
Code
|
Code works for estimating LPC coefficients
|
15
|
|
Code works for reconstructing audio from LPC coefficients
|
15
|
Performance
|
Reasonable Signal Quality
|
10
|
|
Reasonable Compression Rate
|
10
|
|
Compare different compression rates vs quality based on parameters
|
10
|
Report
|
Discussion of different glottal shapes
|
7.5
|
|
Discussion of different frame. rates and what works best
|
7.5
|
|
Discussion on using the remaining error and increase in bit rate
|
7.5
|
|
Discussion of parameter optimization for own recording
|
7.5
|
Voicing
|
Code for voicing performs well
|
5
|
|
Report: Discussion of voicing analysis approach
|
3
|
|
Report: Evaluation approach for voicing derivation
|
2
|
Total
|
100
|