ECE5550: Applied Kalman Filtering
DYNAMIC SYSTEMS WITH NOISY INPUTS
3.1: Scalar random variables
■ The purpose of Kalman filters is to estimate the hidden internal state
of some system where that state is affected by noise and where our
measurements of system output are also corrupted by noise.
■ By definition, noise is not deterministic—it is random in some sense.
■ So, to discuss the impact of noise on the system dynamics, we must review the concept of a “random variable,” (RV) X .
• Cannot predict exactly what we will get each time we measure or sample the random variable, but
• We can characterize the probability of each sample value by the “probability density function” (pdf).
Probability density functions (pdf)
■ We denote probability density function (pdf) of RV X as fX (x).
■ fX (x0) dx is the probability that random variable X is between [x0, x0 + dx].
■ Properties that are true of all pdfs:
1. fX(x) ≥ 0 ∀ x.
2. fX(x) dx = 1.
3. Pr , which is the RV’s cumulative distribution function (cdf).
■ Problem: Apart from simple examples it is often difficult to determine fX (x) accurately. ➠ Use approximations to capture the key behavior.
■ Need to define key characteristics of fX (x).
EXPECTATION : Describes the expected outcome of a random trial.
■ Expectation is a linear operator (very important for working with it).
■ So, for example, the first moment about the mean: E[X − ¯(x)] = 0.
STATISTICAL AVERAGE: Different from expectation.
■ Consider a (discrete) RV X that can assume n values x 1 , x2 , . . .xn.
■ Define the average by making many measurements N → ∞ . Then,
mi is the number of times the value of the measurement is i.
■ In the limit, m 1 /N → Pr(X = x 1) and so forth (assuming ergodicity),
■ So, statistical means can converge to expectation in the limit. (Can show similar property for continuous RVs. . . but harder to do.)
VARIANCE: Second moment about the mean.
or is equal to the mean-square minus the square-mean.
STANDARD DEVIATION : Measure of dispersion about the mean of the samples of X: σX = √
■ The expectation and variance capture key features of the actual pdf. Higher-order moments are available, but we won’t need them!
KEY POINT FOR UNDERSTANDING VARIANCE: Chebychev’s inequality
■ Chebychev’s inequality states (for positive ε)
which implies that probability is concentrated around the mean.
■ It may be proven as follows:
■ For the two regions of integration |x − ¯(x) |/ε ≥ 1 or (x − ¯(x))2/ε2 ≥ 1. So,
■ Since fX (x) is positive, then we also have
■ This inequality shows that probability is clustered around the mean, and that the variance is an indication of the dispersion of the pdf.
■ That is, variance (later on, covariance too) informs us of how uncertain we are about the value of a random variable.
• Low variance means that we are very certain of its value;
• High variance means that we are very uncertain of its value.
■ The mean and variance give us an estimate of the value of a random variable, and how certain we are of that estimate.
The most important distribution for this course
■ The Gaussian (normal) distribution is of key importance to Kalman filters. (We will explain why this is true later—see “main point #7” on pg. 3–14.)
■ Its pdf is defined as:
■ Symmetric about¯(x) .
■ Peak proportional to at¯(x) .
■ Notation: X ∼ N(¯(x),σX(2)).
is 68% ; probability that X within
±2σX of ¯(x) is 96% ; probability that X within ±3σX of ¯(x) is 99.7%.
• A ±3σX range almost certainly covers observed samples.
■ “Narrow” distribution ➠ Sharp peak. High confidence in predicting X .
■ “Wide” distribution ➠ Poor knowledge in what to expect for X .
3.2: Vector random variables
■ With very little change in the preceding, we can also handle vectors of random variables.
■ X described by (scalar function) joint pdf fX (x) of vector X .
■ fX (x0) means fX (X1 = x 1 , X2 = x2 ··· Xn = xn ).
■ That is, fX (x0) dx1 dx2 ··· dxn is the probability that X is between x0 and x0 + dx.
■ Properties of joint pdf fX (x):
1. fX (x) ≥ 0 ∀ x. Same as before.
2. ∫ ∞ ∫ ∞ ··· ∫ ∞ fX (x) dx1 dx2 ··· dxn = 1. Basically the same.
dx1 dx2 · · · dxn. Basically same.
4. Correlation matrix: Different.
5. Covariance matrix: Different. Define = X − ¯(x) . Then,
ΣX- is symmetric and positive-semi-definite (psd). This means
PROOF: For all y 0,
■ Notice that correlation and covariance are the same for zero-mean random vectors.
■ The covariance entries have specific meaning:
• The diagonal entries are the variances of each vector component;
• The correlation coefficient ρij is a measure of linear dependence between Xi and Xj . |ρij | ≤ 1.
The most important multivariable distribution for this course
■ The multivariable Gaussian is of key importance for Kalman filtering.
■ Notation: X ~ N(x-,Σ).
■ Contours of constant fX (x) are hyper-ellipsoids, centered at , directions governed by Σ . Principle axes decouple Σ
(eigenvectors).
■ Two-dimensional zero-mean case: (Let σ 1 = σX 1 and σ2 = σX2)