INVESTIGATION OF ORTHOGONAL POLYNOMIAL KERNELS AS SIMILARITY FUNCTIONS FOR PATTERN CLASSIFICATION BY SUPPORT VECTOR MACHINES

Abstract or description

A kernel function is an important component in the support vector machine (SVM) kernel-based classifier. This is due to the elegant mathematical characteristics of a kernel, which amount to the mapping of non-linearly separable classes to an implicit higher-dimensional feature space where they can become linearly separable, and hence easier to classify. Such characteristics are those prescribed by the underpinning positive semi-definite (PSD) property. The properties of this feature space can, however, be difficult to interpret, to customize or select an appropriate kernel for the classification task at hand. Moreover, the high-dimensionality of the feature space does not usually provide apparent and intuitive information about the natural representations of the data in the input space, as the construction of this feature space is only implicit. On the other hand, SVM kernels have also been regarded as similarity functions in many contexts to measure the resemblance between two patterns, which can be from the same or different classes. However, despite the elegant theory of PSD kernels, and its remarkable implications on the performance of many learning algorithms, limited research efforts seem to have studied kernels from this similarity perspective. Given that patterns from the same class share more similar characteristics than those belonging to different classes, this similarity perspective can therefore provide more tangible means to craft or select appropriate kernels than the properties of the implicit high-dimensional feature spaces that one might not even be able to calculate.
This thesis therefore aims to: (i) investigate the similarity-based properties, which can be exploited to characterise kernels (with focus on the so-called “orthogonal polynomial kernels”) when used as similarity functions, and (ii) assess the influence of these properties on the performance of the SVM classifier. An appropriate similarity-based model is therefore defined in the thesis based on how the shape of an SVM kernel should ideally look like when used to measure the similarity between its two inputs. The model proposes that the similarity curve should be maximized when the two kernel inputs are identical, and it should decay monotonically as they differ more and more from each other. Motivated by the pictorial characteristics of the Chebyshev kernels reported in the literature, the thesis adopts this kernel-shape perspective to also study some other orthogonal polynomial kernels (such as the Legendre kernels and Hermite kernels), to underpin the assessment of the proposed ideal shape of the similarity curve for kernel-based pattern classification by SVMs.
The analysis of these polynomial kernels revealed that they are naturally constructed from smaller kernel building blocks, which are combined by summation and multiplication operations. A novel similarity fusion framework is therefore developed in this thesis to investigate the effect of these fusion operations on the shape characteristics of the kernels and on their classification performance. This framework is developed in three stages, where Stage 1 kernels are those building blocks constructed from only the polynomial order n (the highest order under consideration), whereas Stage 2 kernels combine all the Stage 1 kernel blocks (from order 0 to n) using a summation fusion operation. The Stage 3 kernels finally combine Stage 2 kernels with another kernel via a multiplication fusion operation. The analysis of the shape characteristics of these three-stage polynomial kernels revealed that their inherent fusion operations are synergistic in nature, as they bring their shapes closer to the ideal similarity function model, and hence enable the calculation of more accurate similarity measures, and accordingly score better classification performance. Experimental results showed that these summative and multiplicative fusion operations improved the classification accuracy by average factors of 17.35% and 19.16%, respectively, depending on the dataset and the polynomial function employed.
On the other hand, the shapes of the Stage 2 polynomial kernels have also been shown to oscillate after a certain threshold within the standard normalized input space of [-1,1]. A simple adaptive data normalization approach is therefore proposed to confine the data to the threshold window where these kernels exhibit the sought after ideal shape characteristics, hence eliminate the possibility of any data point to be located outside the range where these oscillations are observed. The implementation of the adaptive data normalization approach accordingly leads to a more accurate calculation of similarity measures and improves the classification performance. When compared to the standard normalized input space, experimental results (performed on the Stage 2 kernels) demonstrate the effectiveness of the proposed adaptive data normalization approach, with an average accuracy improvement factor of 11.772%, depending on the dataset and the polynomial function utilized.
Finally, a new perspective is also introduced whereby the utilization of orthogonal polynomials is perceived as a way of transforming the input space to another vector space, of the same dimensionality as the input space, prior to the kernel calculation step. Based on this perspective, a novel processing approach, based on vector concatenation, is proposed which, unlike the previous approaches, ensures that the quantities processed by each polynomial order are always formulated in vector form. This way, the attributes embedded in the structure of the original vectors are maintained intact. The proposed concatenated processing approach can also be used with any polynomial function, regardless of the parity combination of its monomials, whether they are only odd, only even, or a combination of both. Moreover, the Gaussian kernel is also proposed to be evaluated on vectors processed by the polynomial kernels (instead of the linear kernel used in the previous approaches), due to the more accurate similarity shape characteristics of the Gaussian kernel, as well as its renowned ability to implicitly map the input space to a feature space of higher dimensionality. Experimental results demonstrate the superiority of the concatenated approach for all the three polynomial-kernel stages of the developed similarity fusion framework and for all the polynomial functions under investigation. When the Gaussian kernel is evaluated on the vectors processed using the concatenated approach, the observed results show a statistically significant improvement in the average classification accuracy of 22.269%, compared to when the linear kernel is evaluated on the vectors processed using the previously proposed approaches.

Item Type:	Thesis (Doctoral)
Faculty:	School of Computing and Digital Technologies > Computing
Depositing User:	Jeffrey HENSON
Date Deposited:	27 Jun 2018 13:22
Last Modified:	24 Feb 2023 13:51
URI:	https://eprints.staffs.ac.uk/id/eprint/4572

Item Type:

Thesis (Doctoral)

Faculty:

School of Computing and Digital Technologies > Computing

Depositing User:

Jeffrey HENSON

Date Deposited:

27 Jun 2018 13:22

Last Modified:

24 Feb 2023 13:51

URI:

https://eprints.staffs.ac.uk/id/eprint/4572

INVESTIGATION OF ORTHOGONAL POLYNOMIAL KERNELS AS SIMILARITY FUNCTIONS FOR PATTERN CLASSIFICATION BY SUPPORT VECTOR MACHINES

Abstract or description

Actions (login required)