Support Vector Machines (SVM) are one of the most popular supervised learning methods in Machine Learning(ML). Many researchers have reported superior results compared with older ML techniques.
SVM can be applied on regression problems as well as classification problems, however, here I describe a classification application on a cancer dataset.
SVM has been widely used throughout ML, including medical research, face recognition, spam email, document classification, handwriting recognition. In the medical field, SVM has been applied by practitioners in:
- White blood cells classification
- Cancer prediction
- Identifying gene classification
Researchers have claimed better results than logistic regression and decision trees and also Neural Networks.
SVM Linear Applications
Overview of method
A popular classifier for linear applications because SVM’s have yielded excellent generalization performance on many statistical problems with minimal prior knowledge and also when the dimension of the input space(features) is very high.
SVM – Nonlinear applications.
SVM uses a Kernel trick to transform to a higher nonlinear dimension where an optimal hyperplane can more easily be defined.
SVM works by separating the classes using the best fit hyperplane to separate the classes. A kernel trick is used to improve the ability to separate classes using an optimal hyperplane. There may be more than one optimal hyperplane that can fit the data.
A line is considered bad if it passes too close to the points because it will be noise sensitive. The objective is to find the line passing as far as possible from all points – the maximum margin hyperplane
SVM seeks to find those points that lie closest to both the classes. These points are known as support vectors. In the next step, the SVM algorithm seeks to identify the optimal margin between the support vectors and the dividing hyperplane, called the margin. The SVM algorithm seeks to maximize the margin. The optimal hyperplane is the one with the maximum margin.
Types of SVM Kernels
The main idea behind a kernel function is a transform done to the training data to improve its resemblance to a linearly separable set of data. This transform involves increasing the dimensionality of the data to achieve a separable dataset. There are several kernel functions available, each with its own advantages.
- Linear Kernel
- Polynomial Kernel
- RBF – Radial Basis Function Kernel
- Gaussian kernel
- Hyperbolic tangent kernel
I will describe these kernels and typical applications in a future article.
I usually apply the linear kernel first. It is fast and often yields good results. Often I will then run the RBF kernel to compare the results. In the example below the linear kernel provides somewhat better results.
Example Application – Cancer Dataset
The Breast Cancer Wisconsin ) dataset included with Python sklearn is a classification dataset, that details measurements for breast cancer recorded by the University of Wisconsin Hospitals. The dataset comprises 569 rows and 31 features. The features are listed below:
cancer = datasets.load_breast_cancer() returns a Bunch object which I convert into a dataframe. You can inspect the data with
print(df.shape). In the output you will see (569, 31) which means there are 569 rows and 31 columns. Using
print(df.head()) lists the first five rows of the dataset.
The cancer dataset is derived from images of tumors recorded by medical staff and labeled as malignant or benign. The features (columns) of the dataset are listed below:
[‘mean radius’ ‘mean texture’ ‘mean perimeter’ ‘mean area’
‘mean smoothness’ ‘mean compactness’ ‘mean concavity’
‘mean concave points’ ‘mean symmetry’ ‘mean fractal dimension’
‘radius error’ ‘texture error’ ‘perimeter error’ ‘area error’
‘smoothness error’ ‘compactness error’ ‘concavity error’
‘concave points error’ ‘symmetry error’ ‘fractal dimension error’
‘worst radius’ ‘worst texture’ ‘worst perimeter’ ‘worst area’
‘worst smoothness’ ‘worst compactness’ ‘worst concavity’
‘worst concave points’ ‘worst symmetry’ ‘worst fractal dimension’]
The model selection section of the scikit-learn library provides the train_test_split() method that enables a seamless division of data into the training data and test data.
Training the Algorithm
Now we have the data divided into the training and test sets we are ready to train the algorithm. scikit-learn contains an SVM library which contains built-in methods for different SVM applications. The first parameter is the kernel type, and I have chosen the linear kernel for this application.
The fit() method of the SVM class is invoked to train the algorithm on the training data output from the train_test_split() method.
Assessing the quality of the Algorithm
The accuracy of the prediction is here assessed using the Confusion Matrix which shows the misclassifications as well as correct classifications achieved by the algorithm.
Here we see that the accuracy achieved using the linear kernel was 94.7%, which is a good accuracy.
Advantages and Disadvantages of Support Vector Machines:
Advantages of SVM
As a classification technique, the SVM has a number of advantages:
Practitioners have reported SVM outperforming many older established machine learning algorithms such as Neural Networks, and Decision Trees.
Accuracy is often dependent on the kernel method selected for the application. However, many practitioners find the Radial Basis Function (RBF) Kernel provides a robust kernel suitable for many problems.
Disadvantages of SVM
In applications where the number of features for each class is greater than the number of training data samples, SVM can perform poorly.
O. L. Mangasarian and W. H. Wolberg: “Cancer diagnosis via linear programming”, SIAM News, Volume 23, Number 5, September 1990, pp 1 & 18.
William H. Wolberg and O.L. Mangasarian: “Multisurface method of pattern separation for medical diagnosis applied to breast cytology”, Proceedings of the National Academy of Sciences, U.S.A., Volume 87, December 1990, pp 9193-9196.