
5 Data Analysis
Missing Data
In MATLAB, NaN (Not a Number) values represent missing data. NaN values
allow variables with missing data to maintain their structure—in this case,
24-by-1 vectors with consistent indexing across all three intersections.
Check the data at the third intersection for
NaN values using the MATLAB
isnan function:
c3 = count(:,3); % Data at intersection 3
c3NaNCount = sum(isnan(c3))
c3NaNCount =
0
isnan
returns a logical vector the same size as c3, w ith entries indicating the
presence (
1)orabsence(0)ofNaN v a lues for each of the 24 elements in the
data. In this case, the logical values sum to
0,sotherearenoNaN values
in the data.
NaN values are i ntroduced into the data in the section on “Outliers” on page
5-4.
See “Removing and Interpolating Missing Values” in the M ATLAB Data
Analysis d o cu mentatio n for more inf ormation on handling missing da ta in
MATLAB.
Outliers
Outliers are data values that are dramatically different from patterns in
the rest of the data. They may be due to measurement error, or they may
represent sig ni ficant features in the data . Identify ing outliers, and de c iding
what to do with the m , depends on an under standing of the data and its source.
One common method for identifying outliers is to look for v alues m ore than
a certain number of standard deviations
σ
from the mean μ. The following
code plots a histog ram of the data at the third intersection together with
lines at μ and μ + n
σ
,forn =1,2:
bin_counts = hist(c3); % Histogram bin counts
N = max(bin_counts); % Maximum bin count
mu3 = mean(c3); % Data mean
5-4
Commenti su questo manuale