Functional singular value decomposition and multi-resolution anomaly detection Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 21, 2019
Creator
  • Zhang, Lingsong
    • Affiliation: College of Arts and Sciences, Department of Statistics and Operations Research
Abstract
  • This dissertation has two major parts. The first part discusses the connections and differences between the statistical tool of Principal Component Analysis (PCA) and the related numerical method of Singular Value Decomposition (SVD), and related visualization methods. The second part proposes a Multi-Resolution Anomaly Detection (MRAD) method for time series with long range dependence (LRD). PCA is a popular method in multivariate analysis and in Functional Data Analysis (FDA). Compared to PCA, SVD is more general, because it not only provides a direct approach to calculate the principal components (PCs), but also simultaneously yields the PCAs for both the row and the column spaces. SVD has been used directly to explore and analyze data sets, and has been shown to be an insightful analysis tool in many fields. However, the connection and differences between PCA and SVD have seldom been explored from a statistical view point. Here we explore the connections and differences between PCA and SVD, and extend the usual SVD method to variations including different centerings based on various types of means. A generalized scree plot is developed to provide a visual aid for selection of different centerings. Several matrix views of the SVD components are introduced to explore different features in data, including SVD surface plots, image plots, rotation movies, and curve movies. These methods visualize both column and row information of a two-way matrix simultaneously, relate the matrix to relevant curves, and show local variations and interactions between columns and rows. Several toy examples are designed iii to compare the different types of centerings, and three real applications are used to illustrate the matrix views. In the field of Internet traffic anomaly detection, different types of network anomalies exist at different time scales. This motivates anomaly detection methods that effectively exploit multiscale properties. Because time series of Internet measurements exhibit long range dependence (LRD) and self-similarity (SS), the classical outlier detection methods base on short-range dependent time series may not be suitable for identifying network anomalies. Based on a time series collected at a single scale (the finest scale), we aggregate to form time series of various scales, and propose a MRAD procedure to find anomalies which appear at different time scales. We show that this MRAD method is more conservative than a typical outlier detection method based on a given scale, and has larger power on average than any single scale outlier detection method based on some reasonable assumptions. Asymptotic distribution of the test statistic is developed as well. An MRAD map is developed to show candidate anomalies and the corresponding significance probabilities (p values). This method can be easily extended to be implemented in real time. Simulations and real examples are reported as well, to illustrate the usefulness of the MRAD method. Keywords: Principal Component Analysis, Functional Data Analysis, Exploratory Data Analysis, Network Intrusion Detection, Outlier detection, Level Shift, Multiscale analysis, Long Range Dependence, Multiple Comparison, p values, Time Series, false discovery rate.
Date of publication
DOI
Resource type
Rights statement
  • In Copyright
Advisor
  • Marron, James Stephen
Language
Access
  • Open access
Parents:

This work has no parents.

Items