Learning a symbolic representation for multivariate time series classification
Başlık | Learning a symbolic representation for multivariate time series classification |
Publication Type | Journal Article |
Year of Publication | 2015 |
Authors | Baydogan, M. Gokce, and G. Runger |
Journal | Data Mining and Knowledge Discovery |
Volume | 29 |
Issue | 2 |
Pagination | 400-422 |
Date Published | 03/2015 |
ISSN | 1384-5810 |
Anahtar kelimeler | codebook, Decision trees, supervised learning |
Abstract | Multivariate time series (MTS) classification has gained importance with the increase in the number of temporal datasets in different domains (such as medicine, finance, multimedia, etc.). Similarity-based approaches, such as nearest-neighbor classifiers, are often used for univariate time series, but MTS are characterized not only by individual attributes, but also by their relationships. Here we provide a classifier based on a new symbolic representation for MTS (denoted as SMTS) with several important elements. SMTS considers all attributes of MTS simultaneously, rather than separately, to extract information contained in the relationships. Symbols are learned from a supervised algorithm that does not require pre-defined intervals, nor features. An elementary representation is used that consists of the time index, and the values (and first differences for numerical attributes) of the individual time series as columns. That is, there is essentially no feature extraction (aside from first differences) and the local series values are fused to time position through the time index. The initial representation of raw data is quite simple conceptually and operationally. Still, a tree-based ensemble can detect interactions in the space of the time index and time values and this is exploited to generate a high-dimensional codebook from the terminal nodes of the trees. Because the time index is included as an attribute, each MTS is learned to be segmented by time, or by the value of one of its attributes. The codebook is processed with a second ensemble where now implicit feature selection is exploited to handle the high-dimensional input. The constituent properties produce a distinctly different algorithm. Moreover, MTS with nominal and missing values are handled efficiently with tree learners. Experiments demonstrate the effectiveness of the proposed approach in terms of accuracy and computation times in a large collection multivariate (and univariate) datasets. |
URL | http://dx.doi.org/10.1007/s10618-014-0349-y |
DOI | 10.1007/s10618-014-0349-y |