SVM Application List

This list of Support Vector Machine applications grows thanks to visitors like you who ADD new entries. Thank you in advance for your contribution. 

Support vector machines-based generalized predictive control

This work presents an application of the previously proposed Support Vector Machines Based Generalized Predictive Control (SVM-Based GPC) method to the problem of controlling chaotic dynamics with small parameter perturbations. The Generalized Predictive Control (GPC) method, which is included in the class of Model Predictive Control, necessitates an accurate model of the plant that plays very crucial role in the control loop. On the other hand, chaotic systems exhibit very complex behavior peculiar to them and thus it is considerably difficult task to get their accurate model in the whole phase space. In this work, the Support Vector Machines (SVMs) regression algorithm is used to obtain an acceptable model of a chaotic system to be controlled. SVM-Based GPC exploits some advantages of the SVM approach and utilizes the obtained model in the GPC structure. Simulation results on several chaotic systems indicate that the SVM-Based GPC scheme provides an excellent performance with respect to local stabilization of the target (an originally unstable equilibrium point). Furthermore, it somewhat performs targeting, the task of steering the chaotic system towards the target by applying relatively small parameter perturbations. It considerably reduces the waiting time until the system, starting from random initial conditions, enters the local control region, a small neighborhood of the chosen target. Moreover, SVM-Based GPC maintains its performance in the case that the measured output is corrupted by an additive Gaussian noise.

Entered by: Serdar Iplikci <iplikci@pau.edu.tr> - Monday, October 23, 2006 at 18:05:17 (GMT)
Comments:


Dynamic Reconstruction of Chaotic Systems from Inter-spike Intervals Using Least Squares Support Vector Machines

This work presents a methodology for dynamic reconstruction of chaotic systems from inter-spike interval (ISI) time series obtained via integrate-and-fire (IF) models. In this methodology, least squares support vector machines (LSSVMs) have been employed for approximating the dynamic behaviors of the systems under investigation.

Entered by: Serdar Iplikci <iplikci@pau.edu.tr> - Monday, May 29, 2006 at 12:53:56 (GMT)
Comments:


Application of The Kernel Method to the Inverse Geosounding Problem

Determining the layered structure of the earth demands the solution of a variety of inverse problems; in the case of electromagnetic soundings at low induction numbers, the problem is linear, for the measurements may be represented as a linear functional of the electrical conductivity distribution. In this work, an application of the Support Vector (SV) Regression technique to the inversion of electromagnetic data is presented. We take advantage of the regularizing properties of the SV learning algorithm and use it as a modeling technique with synthetic and field data. The SV method presents better recovery of synthetic models than Tikhonov's regularization. As the SV formulation is solved in the space of the data, which has a small dimension in this application, a smaller problem than that considered with Tikhonov's regularization is produced. For field data, the SV formulation develops models similar to those obtained via linear programming techniques, but with the added characteristic of robustness.

Entered by: Hugo Hidalgo <hugo@cicese.mx> - Wednesday, March 22, 2006 at 14:04:25 (MST)
Comments:


Support Vector Machines Based Modeling of Seismic Liquefaction Potential

This paper investigate the potential of support vector machines based classification approach to assess the liquefaction potential from actual standard penetration test (SPT) and cone penetration test (CPT) field data. Support vector machines are based on statistical learning theory and found to work well in comparison to neural networks in several other applications. Both CPT and SPT field data sets is used with support vector machines for predicting the occurrence and nonoccurrence of liquefaction based on different input parameter combination. With SPT and CPT test data sets, highest accuracy of 96% and 97% respectively was achieved with support vector machines. This suggests that support vector machines can effectively be used to model the complex relationship between different soil parameter and the liquefaction potential. Several other combinations of input variable were used to assess the influence of different input parameters on liquefaction potential. Proposed approach suggest that neither normalized cone resistance value with CPT data nor the calculation of standardized SPT value is required with SPT data. Further, support vector machines required few user-defined parameters and provide better performance in comparison to neural network approach.

Entered by: Mahesh Pal <mpce_pal@yahoo.co.uk> - Wednesday, February 22, 2006 at 06:50:07 (GMT)
Comments:


SVM for Geo- and Environmental Sciences

Statistical learning theory for geo(spatial) and spatio-temporal environmental data analysis and modelling. Comparisons with geostatistical predictions and simulations

Entered by: Mikhail Kanevski <Mikhail.Kanevski@unil.ch> - Sunday, February 12, 2006 at 16:30:07 (GMT)
Comments:


SVM for Protein Fold and Remote Homology Detection

Motivation: Protein remote homology detection is a central problem in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for remote homology detection. The performance of these methods depends on how the protein sequences are modeled and on the method used to compute the kernel function between them. Results: We introduce two classes of kernel functions that are constructed by combining sequence profiles with new and existing approaches for determining the similarity between pairs of protein sequences. These kernels are constructed directly from these explicit protein similarity measures and employ effective profile-to-profile scoring schemes for measuring the similarity between pairs of proteins. Experiments with remote homology detection and fold recognition problems show that these kernels are capable of producing results that are substantially better than those produced by all of the existing state-of-the-art SVM-based methods. In addition, the experiments show that these kernels, even when used in the absence of profiles, produce results that are better than those produced by existing non-profile-based schemes.

Entered by: Huzefa Rangwala <rangwala@cs.umn.edu> - Sunday, November 06, 2005 at 06:02:08 (GMT)
Comments:


content based image retrieval

Relevance feedback schemes based on support vector machines (SVM) have been widely used in content-based image retrieval (CBIR). However, the performance of SVM based relevance feedback is often poor when the number of labeled positive feedback samples is small. This is mainly due to three reasons: (1) an SVM classifier is unstable on a small-sized training set; (2) SVM’s optimal hyper-plane may be biased when the positive feedback samples are much less than the negative feedback samples; and (3) overfitting happens because the number of feature dimensions is much higher than the size of the training set. In this paper, we develop a mechanism to overcome these problems. To address the first two problems, we propose an asymmetric bagging based SVM (AB-SVM). For the third problem, we combine the random subspace method and SVM for relevance feedback, which is named random subspacing SVM (RS-SVM). Finally, by integrating AB-SVM and RS-SVM, an asymmetric bagging and random subspacing SVM (ABRS-SVM) is built to solve these three problems and further improve the relevance feedback performance.

Entered by: Dacheng Tao <Dacheng Tao> - Tuesday, October 11, 2005 at 19:03:18 (GMT)
Comments:


DATA Classification ursing SSVM

Smoothing methods, extensively used for solving important mathematical programming problems and applications, are applied here to generate and solve an unconstrained smooth reformulation of the support vector machine for data or pattern classification using a completely arbitrary kernel. The basic SVM is reformulated in to smooth support vector machine (SSVM) which possesses the mathematical property of strong convexity, which makes the basic SVM classification , a minimization problem. In this work Newton-Armijo algorithm is used for solving the SSVM Unconstraint optimization problem. On larger datasets SSVM is faster than SVM light . SSVM can also generate a highly nonlinear separating surface such as a checkerboard.

Entered by: Aduru . Venkateswarlu <venkatsherma@yahoo.com> - Monday, September 19, 2005 at 04:35:39 (GMT)
Comments:


DTREG SVM and decision tree modeling

DTREG builds SVM and decision tree based predictive models.

Entered by: Phil Sherrod <phil.sherrod@sandh.com> - Saturday, September 10, 2005 at 20:32:24 (GMT)
Comments:


DTREG - SVM and Decision Tree Predictive Modeling

DTREG builds Support Vector Machine (SVM) and Decision Tree predictive models. SVM features automatic grid search for optimal parameter selection and V-fold cross-validation for measuring model generalization. Decision tree models provided by DTREG include classical single trees, TreeBoost series of boosted trees and Decision Tree Forest of parallel trees that "vote" on the outcome.

Entered by: Phil Sherrod <phil.sherrod@sandh.com> - Friday, August 26, 2005 at 20:09:46 (GMT)
Comments: DTREG supports Linear, Polynomial, Sigmoid and Radial Basis kernel functions. It can handle problems with millions of data rows and hundreds of variables.


Facial expression classification

Facial expression classification using statistical models of shape and SVM's

Entered by: John Ghent <jghent@cs.may.ie> - Tuesday, August 09, 2005 at 10:14:08 (GMT)
Comments:


End-depth and discharge prediction in semi-circular and circular shaped channels

The results of an application of a support vector machine based modelling technique to determine the end-depth ratio and discharge of a free overfall occurring over an inverted smooth semi-circular channel and a circular channel with flat bases are presented in this paper. The results of the study indicate that the support vector machine technique can be used effectively for predicting the end-depth ratio and the discharge for such channels. For subcritical flow, the predicted value of the end-depth ratio compares favorably to the values obtained by using empirical relations derived in previous studies, while for supercritical flow, the support vector machines perform equally well and are found to work better than the empirical relationship proposed in earlier studies.The results also suggest the usefulness of support vector machine based modelling techniques in predicting the end-depth ratio and discharge for a semi-circular channel using the model created for the circular channel data, and vice versa, for supercritical flow conditions.

Entered by: mahesh pal <mpce_pal@yahoo.co.uk> - Monday, August 01, 2005 at 10:20:34 (GMT)
Comments:


Identification of alternative exons using SVM

Alternative splicing is a major component of the regulatory action on mammalian transcriptomes. It is estimated that over half of all human genes have more than one splice variant. Previous studies have shown that alternatively spliced exons possess several features that distinguish them from constitutively spliced ones. Recently, we have demonstrated that such features can be used to distinguish alternative from constitutive exons. In the current study, we used advanced machine learning methods to generate robust classifier of alternative exons. RESULTS: We extracted several hundred local sequence features of constitutive as well as alternative exons. Using feature selection methods we find seven attributes that are dominant for the task of classification. Several less informative features help to slightly increase the performance of the classifier. The classifier achieves a true positive rate of 50% for a false positive rate of 0.5%. This result enables one to reliably identify alternatively spliced exons in exon databases that are believed to be dominated by constitutive exons.

Entered by: Gideon Dror <gideon@mta.ac.il> - Monday, June 20, 2005 at 11:55:09 (GMT)
Comments: 2 class, 243 positive , 1753 negative instances. total 228 features gaussian kernel. Baseline systems: neural networks and Naive Bayes. SVM outperformed them in terms of area under ROC curve, but most inportantly, in its ability to get very high true positives rate (50%) for very low false positives rate (0.5%). This performance would enable effective scan of exon databases in search for novel alternatively spliced exons, in the human or other genomes.


Support Vector Machines For Texture Classification

Sathishkumar is currently doing M.Tech in computer vision and image processing in Amrita institute of technology (amrita vishwa vidya peetham)coimbatore, Tamilnadu, India. He received his B.E from bharathiar university in the year 2003. His intrests are Soft computing & Data mining techniques for image processing

Entered by: sathishkumar <sathishkumar.maddy@gmail.com> - Thursday, June 02, 2005 at 05:02:34 (GMT)
Comments:


SVM application in E-learning

Personalied and learner centered learning is receiving increasing importance due to increased learning rate. AI techniques to tailor the content to the learner depending on his context and need are being deployed for such tasks.SVMs stand out due to their better perfromance specially in handling large dimensions which text content do possess. Lecture material could be reprocessed to create a suitable feature space and then present the contents to the learner as per his need. This will save time and also avoid information overload.

Entered by: sandeep dixit <sandeepdixit2004@yahoo.com> - Thursday, March 31, 2005 at 15:14:17 (GMT)
Comments:


text classification with SVMs

Classification large volumn of document into some class

Entered by: Duong DInh DUng <dungngtq8@yahoo.com> - Thursday, March 24, 2005 at 06:03:04 (GMT)
Comments:


Isolated Handwritten Jawi Characters Categorization Using Support Vector Machines (SVM).

Isolated Handwritten Jawi Characters Categorization Using Support Vector Machines (SVM). Application is targeted at routing in a mobile computing environment.

Entered by: Suhaimi Abd Latif <suhaimie@iiu.edu.my> - Wednesday, January 19, 2005 at 06:02:27 (GMT)
Comments:


Image Clustering

Clustering is an important task for image compression. This clustering can be done by SVM efficiently.

Entered by: Ahmed Yousuf Saber <saber_uap@yahoo.com> - Wednesday, January 19, 2005 at 02:16:09 (GMT)
Comments:


ewsRec, a SVM-driven Personal Recommendation System for News Websites

NewsRec is a SVM-driven personal recommender system designed for news websites and uses SVMs for prediction wether articles are interesting or not.

Entered by: Christian Bomhardt <christian.bomhardt@etu.uni-karlsruhe.de> - Monday, October 11, 2004 at 15:26:58 (GMT)
Comments: about 1200 datasets, about 30000 features, linear kernel, SVMs are very fast compared to other methods and can handle the large number of features.


Equbits Foresight

Equbits Foresight is a SVM based predictive modeling application designed for HTS and ADME-Tox Chemists.

Entered by: Ravi Mallela <ravi@equbits.com> - Saturday, October 09, 2004 at 15:37:20 (GMT)
Comments:


SPEAKER /SPEECH RECOGNITION

SPEAKER /SPEECH RECOGNITION UTTERANCE VERIFICATION FOR SPEECH RECOGNITION SVM ARE USED TO ACCEPT KEYWORD OR REJECT NON-KEYWORD FOR SPEECH RECOGNITION SPEAKER VERIFICATION /RCOGNITION POLYVAR TELEPHONE DATABASE IS USED NEW METHOD FOR NORMALIZING POLYNORMIAL KERNEL TO USE WITH SVM YOHO DATABASE ,TEXT INDEPENDENT , BEST EER=0.34% COMBINED GAUSSIAN MIXTURE MODEL IN SVM OUTPUTS TEXT INDEPENDENT SPEAKER VERIFICATION BEST EER=1.56%

Entered by: MEHDI GHAYOUMI <M_GHAYOUMI@YAHOO.COM> - Tuesday, March 09, 2004 at 06:25:10 (GMT)
Comments:


STUDENT IN AI

SPEAKER /SPEECH RECOGNITION UTTERANCE VERIFICATION FOR SPEECH RECOGNITION SVM ARE USED TO ACCEPT KEYWORD OR REJECT NON-KEYWORD FOR SPEECH RECOGNITION SPEAKER VERIFICATION /RCOGNITION POLYVAR TELEPHONE DATABASE IS USED NEW METHOD FOR NORMALIZING POLYNORMIAL KERNEL TO USE WITH SVM YOHO DATABASE ,TEXT INDEPENDENT , BEST EER=0.34% COMBINED GAUSSIAN MIXTURE MODEL IN SVM OUTPUTS TEXT INDEPENDENT SPEAKER VERIFICATION BEST EER=1.56%

Entered by: MEHDI GHAYOUMI <M_GHAYOUMI@YAHOO.COM> - Tuesday, March 09, 2004 at 06:23:03 (GMT)
Comments:


Analysis and Applications of Support Vector Forecasting Model Based on Chaos Theory

A novel support vector forecasting model based on chaos theory was presented. It adopted support vector machines as nonlinear forecaster and network¡¯s input variable number was determined through computing reconstruct phase space¡¯s saturated embedding dimension; The maximum effective forecasting steps was determined by computing chaos time series¡¯ largest lyapunov exponent; It made use of support vector machines to carry out nonlinear forecasting. Application results in aeroengine compressor¡¯s modeling show that this presented method possesses much better precision, which proves that the method is feasible and effective. This method is contributive and instructional for nonlinear time series forecasting via support vector machines for chaos time series.

Entered by: xunkai <skyhawkf119@163.com> - Monday, February 23, 2004 at 04:51:26 (GMT)
Comments: It seems impossible but SVM do perfect well!


A Comparison Of The Performance Of Artificial Neural Networks And Support Vector Machines For The Prediction Of Traffic Speed and Travel Time

The ability to predict traffic variables such as speed, travel time or flow based on real time data and historic data collected by various systems in transportation networks is vital to the intelligent transportation systems (ITS) components such as in-vehicle route guidance systems (RGS), advanced traveler information systems (ATIS), and advanced traffic management systems (ATMS). This predicted information enables the drivers to select the shortest path for the intended journey. Accurate prediction of traffic speed and travel time is also useful for evaluating the planning, design, operations and safety of roadways. In the context of prediction methodologies, different time series, and artificial neural networks (ANN) models have been developed in addition to the historic and real time approach. The present paper proposes the application of a recently developed pattern classification and regression technique called support vector machines (SVM) for the short-term prediction of traffic speed and travel time. An ANN model is also developed and a comparison of the performance of both these approaches is carried out, along with real time and historic approach results. Data from the freeways of San Antonio, Texas is used for the analysis.

Entered by: Lelitha Vanajakshi <lelitha@yahoo.com> - Friday, January 30, 2004 at 17:39:08 (GMT)
Comments: When the training data was less SVM outperformed ANN, when enough data was available both performed more or less same.


none

none

Entered by: leechs <leechs@sohu.com> - Sunday, January 25, 2004 at 13:44:16 (GMT)
Comments:


svm learning

svm face learning

Entered by: burak <burakkaragoz2002@yahoo.com> - Monday, December 08, 2003 at 16:06:03 (GMT)
Comments:


Protein Structure Prediction

The task of predicting protein structure from protein sequence is an important application of support vector machines. A protein's function is closely related to its structure, which is difficult to determine experimentally. There are mainly two types of methods for predicting protein structure. The first type includes threading and comparitve modeling, which relies on a priori knowledge on similarity among sequence and known structures. The second type, called de novo or ab-initio methods, predicts the protein structure from the sequence alone without relying on the similarity to known structures. Currently, it is difficult to predict high resolution 3D structure from ab-initio methods in order to study the docking of macro-molecules, predicting protein-partner interactions, designing and improving ligands, and protein-protein interactions. The prediction of protein relative solvent accessibility gives us useful information for predicting tertiary protein structure. The SVMpsi method, which uses support vector machines (SVMs) and the position specific scoring matrix (PSSM) generated from PSI-BLAST, has been applied to achieve better prediction accuracy of the relative solvent accessibility. We have introduced a three demensional local descriptor that contains information about the expected remote contacts via the long-range interaction matrix as well as neighbor sequences. The support vector machine approach has successfully been applied to solvent accessibility prediction by considering long-range interaction and handling unbalanced data.

Entered by: Dr. Haesun Park <hpark@cs.umn.edu> - Friday, July 11, 2003 at 18:59:18 (GMT)
Comments:


Support vector classifiers for land cover classification

SVM wasused to classifiy different land covers using remote sensing data. Results from this study suggests that Multi-class support vector machine perform well in comparison with neural network and decision tree classifiers.

Entered by: Mahesh Pal <mpce_pal@yahoo.co.uk> - Wednesday, May 21, 2003 at 07:17:46 (GMT)
Comments:


Intrusion Detection

Intrusion Detection Systems (IDSs) have become more widely used to protect computer networks. However, it is difficult to build highly effective IDS since some of the pattern recognition problems involved are intractable. In this paper, we propose the use of Support Vector Machines (SVMs) for intrusion detection and analyze its performance. We conduct experiments using a large set of DARPA-provided intrusion data. Two groups of SVMs are built to perform, respectively, binary classifications (normal pattern vs. attack pattern) and five-class classifications (normal pattern, and four classes of attack patterns). Detailed analysis is provided on the classification accuracy and time performance (regarding both learning time and running time). Performance comparison between SVMs and neural networks (that are built using the same training and testing data) is also given. Based on the simulation results, we argue that SVMs are superior to neural networks in several important aspects of IDS. Our results therefore indicate that SVMs are superior candidates to be used as the learning machines for IDSs.

Entered by: Srinivas Mukkamala <srinivas@cs.nmt.edu> - Thursday, January 09, 2003 at 05:02:19 (GMT)
Comments: SVMs are superior to ANNs for intrusion detection in three critical respects: SVMs train, and run, an order of magnitude faster; SVMs scale much better; and SVMs give higher classification accuracy. For details on number of classes, kernels used, input features, number of support vectors, input feature selection and ranking methods. Please take a read of our latest versions. If you need our latest versions or need any assistance, please send the author an email: srinivas@cs.nmt.edu Sincerely Srinivas Mukkamala


The Gaussian Dynamic Time Warping (GDTW) kernel for On-line Handwriting Recognition

During the last years the task of on-line handwriting recognition has gained an immense importance in all-day applications, mainly due to the increasing popularity of the personal digital assistant (pda). Currently a next generation of ``smart phones'' and tablet-style PCs, which also rely on handwriting input, is further targeting the consumer market. However, in the majority of these devices the handwriting input method is still not satisfying. In current pdas people still use input methods, which abstract from the natural writing style, e.g. in the widespread Graffiti.

Thus there is demand for a handwriting recognition system which is accurate, efficient and which can deal with the natural handwriting of a wide range of different writers.


Entered by: Claus Bahlmann <bahlmann@informatik.uni-freiburg.de> - Monday, September 09, 2002 at 11:52:27 (GMT)
Comments:

Usual SVM kernels are designed to deal with data of fixed dimension. However, on-line handwriting data is not of a fixed dimension, but of a variable-length sequential form. In this respect SVMs cannot be applied to HWR straightforwardly.

We have addressed this issue by developing an appropriate SVM kernel for sequential data, the Gaussian dynamic time warping (GDTW) kernel. The basic idea of the GDTW kernel is, that instead of the squared Euclidean distance in the usual Gaussian kernel it uses the dynamic time warping distance. In addition to on-line handwriting recognition the GDTW kernel can be straightforwardly applied to all classification problems, where DTW gives a reasonable distance measure, e.g. speech recognition or genome processing.

Experiments have shown superior recognition rate in comparison to an HMM-based classifier for relative small training sets (~ 6000) and comparable rates for larger training sets.


The Gaussian Dynamic Time Warping (GDTW) kernel for On-line Handwriting Recognition

During the last years the task of on-line handwriting recognition has gained an immense importance in all-day applications, mainly due to the increasing popularity of the personal digital assistant (pda). Currently a next generation of ``smart phones'' and tablet-style PCs, which also rely on handwriting input, is further targeting the consumer market. However, in the majority of these devices the handwriting input method is still not satisfying. In current pdas people still use input methods, which abstract from the natural writing style, e.g. in the widespread Graffiti.

Thus there is demand for a handwriting recognition system which is accurate, efficient and which can deal with the natural handwriting of a wide range of different writers.


Entered by: Claus Bahlmann <bahlmann@informatik.uni-freiburg.de> - Friday, September 06, 2002 at 11:39:08 (GMT)
Comments:

Usual SVM kernels are designed to deal with data of fixed dimension. However, on-line handwriting data is not of a fixed dimension, but of a variable-length sequential form. In this respect SVMs cannot be applied to HWR straightforwardly.

We have addressed this issue by developing an appropriate SVM kernel for sequential data, the Gaussian dynamic time warping (GDTW) kernel. The basic idea of the GDTW kernel is, that instead of the squared Euclidean distance in the usual Gaussian kernel it uses the dynamic time warping distance. In addition to on-line handwriting recognition the GDTW kernel can be straightforwardly applied to all classification problems, where DTW gives a reasonable distance measure, e.g. speech recognition or genome processing.

Experiments have shown superior recognition rate in comparison to an HMM-based classifier for relative small training sets (~ 6000) and comparable rates for larger training sets.


forecast

forecast stock

Entered by: shen <shen0204@yahoo.com.tw> - Thursday, September 05, 2002 at 07:24:00 (GMT)
Comments:


Detecting Steganography in digital images

Techniques for information hiding have become increasingly more sophisticated and widespread. With high-resolution digital images as carriers, detecting hidden messages has become considerably more difficult. This paper describes an approach to detecting hidden messages in images that uses a wavelet-like decomposition to build higher-order statistical models of natural images. Support vector machines are then used to discriminate between untouched and adulterated images.

Entered by: Siwei Lyu <lsw@cs.dartmouth.edu> - Thursday, August 22, 2002 at 15:58:54 (GMT)
Comments: 2 classes 3600 training examples, over 18,000 testing samples 1100 SVs RBF kernel LibSVM


Detecting Steganography in digital images

Techniques for information hiding have become increasingly more sophisticated and widespread. With high-resolution digital images as carriers, detecting hidden messages has become considerably more difficult. This paper describes an approach to detecting hidden messages in images that uses a wavelet-like decomposition to build higher-order statistical models of natural images. Support vector machines are then used to discriminate between untouched and adulterated images.

Entered by: Siwei Lyu <lsw@cs.dartmouth.edu> - Thursday, August 22, 2002 at 15:57:24 (GMT)
Comments: 2 classes 3600 training examples, over 18,000 testing samples 1100 SVs RBF kernel LibSVM


Fast Fuzzy Cluster

real time adaptive pattern recognition

Entered by: Michael Bickel <awareai@aol.com> - Tuesday, July 23, 2002 at 01:11:11 (GMT)
Comments:


Breast Cancer Prognosis: Chemotherapy Effect on Survival Rate

A linear support vector machine (SVM) is used to extract 6 features from a total of 31 features from the Wisconsin breast cancer dataset of 253 patients. We cluster the 253 breast cancer patients into three prognostic groups: Good, Intermediate and Poor. Each of the three groups has a significantly distinct Kaplan-Meier survival curve. Of particular significance is the Intermediate group, because this group comprises of patients for whom chemotherapy gives distinctly better survival times than those in the same group that did not undergo chemotherapy. This is the reverse case to that of the overall population studied, for which patients without chemotherapy have better longevity. We prescribe a procedure that utilizes three nonlinear smooth support vector machines (SSVMs) for classifying breast cancer patients into three above prognostic groups with 82.7% test set correctness. These results suggest that patients in the Good group should not receive chemotherapy while Intermediate group patients should receive chemotherapy based on our survival curve analysis. To our knowledge this is the first instance of classifiable group of breast cancer patients for which chemotherapy enhances survival.

Entered by: Yuh-Jye Lee <yjlee@cs.ccu.edu.tw> - Wednesday, October 24, 2001 at 19:38:50 (MDT)
Comments:


Underground Cable Temperature Prediction

Support Vector Regression used to predict the temperature of a cable buried underground, based on weather data from the previous 24 hours. Training examples typically 100-200, testing examples 400. Outperformed previous hybrid forecasting system of Neural Net with fuzzy logic. The software used was not my own but is a Matlab toolbox implementation written by Steve Gunn at Southampton University (http://www.isis.ecs.soton.ac.uk/resources/svminfo/). Training takes about a minute - I'm now using about 400 training examples and the entire testing dataset is about 26000 examples. Dimensions of the feature space are about 50: 5 weather elements repeated ten times over the past 24 hours. So far the predictions on the entire dataset are within about a dgree; I could probably optimise it further but I'm running out of time!

Entered by: Robin Willis <rew198@soton.ac.uk> - Friday, May 04, 2001 at 08:31:41 (PDT)
Comments:


Image classification

Classification of natural images from the Corel database using a SVM and a color histogram as input feature

Entered by: Olivier Chapelle <chapelle@research.att.com> - Tuesday, April 04, 2000 at 13:50:39 (PDT)
Comments: Number of classes = 6 or 14 Dimension of the input features = 4096 Kernel = RBF with various distances SVM outperforms KNN. The choice of the distance in the RBF kernel is critical.



Particle and Quark-Flavour Identification in High Energy Physics

The aim is to classify events of high-energy electron-positron collisions according to the quark-flavour they originate from. A second application is to identify particles in an event (e.g. muons). The event data comes from a big detector, the OPAL experiment, which measures physical properties of the particles coming out of the collision. Monte Carlo methods are used to simulate this collisions in the detector. This allows the use of supervised machine-learning methods to classify the data. The input-variable distributions of the different quark-classes have partially a very high overlap. No complete separation is possible. Identifying muons is an easier problem, where neural methods proof to be successful.
We compared the performance of SVMs (RBF kernel) with NNs (trained with backprop). The amounts of available data are very large, we tested on 3x100k patterns for the quark-flavour problem.

Entered by: Philippe Vannerem <philippe.vannerem@cern.ch> - Tuesday, October 19, 1999 at 16:17:56 (PDT)
Comments: We saw only small differences in performance between NNs and SVMs.


Object Detection

The problem of object detection is to differentiate a certain class of objects (the A class) from all other patterns and objects (the not-A class). This is contrasted with object recognition where the problem is to be able to differentiate between elements of the same class. Entered by: Constantine Papageorgiou <cpapa@ai.mit.edu> - Wednesday, October 06, 1999 at 14:33:37 (PDT)
Comments (entered by Isabelle Guyon): In Papageorgiou-Oren-Poggio-98, the authors investigate face detection and pedestrian detection. From the point of view of static images, they obtain 75% correct face detection for a rate of 1 false dectection in 7500 windows and 70% correct pedestrian detection for a false detection rate of 1 false detection in 15000 windows. Of particular interest is their method to increase the number of negative examples with a "bootstrap method": they start with a training set consisting of positive examples (faces or pedestrians) and a small number of negative examples that are "meaningless" images. After a first round of training and testing on fresh examples, the negative examples corresponding to false detections are added to the training set. Training/test set enlargement is iterated. The dynamic system that uses motion to refine performance is roughly 20% better. In this first paper, they authors reduce the dimensionality of input space before training with SVM down to 20-30 input features. Thousands of examples are used for training. In contrast, in Papageorgiou-Poggio-99, using again the problem of pedestrian detection in motion picture, the authors train an SVM with 5201 examples directly in a 6630 dimensional input space consisting of wavelet features at successive time steps. They find that their system is simpler and faster than traditional HMM or Kalman filter systems and has lower false positive rates than static systems.

Combustion Engine Knock Detection

In our research project we developed an engine knock detection system for combustion engine control using advanced neural detection algorithms. Knocking is an undesired fast combustion which can destroy the engine. To approach we collected a large database with different engine states (2000 and 4000 rounds per minute; non-knocking, borderline- and hard-knocking). Because of the high non-linearity of the problem neural approaches are very promising and we showed already their high performance in this application. Entered by: Matthias Rychetsky <rychetsky@mes.tu-darmstadt.de> - Tuesday, October 05, 1999 at 01:03:12 (PDT)
Comments: We compared for our database (unfortunately not public domain) SVM approaches, MLP nets and Adaboost. The SV Machines outperformed all other approaches significantly. For this application real time calculation is an issue, therefore we currently examine methods to reduce computational burden at recall phase (e.g. reduced set algorithms or integer based approaches).

Engineering Support Vector Machine Kernels That Recognize Translation Initiation Sites

In order to extract protein sequences from nucleotide sequences, it is an important step to recognize points from which regions encoding proteins start, the so-called translation initiation sites. This can be modeled as a classification problem. We demonstrate the power of support vector machines for this task, and show how to successfully incorporate biological prior knowledge by engineering an appropriate kernel function. Entered by: Alexander Zien <Alexander.Zien@gmd.de> - Friday, September 24, 1999 at 02:27:26 (PDT)
Comments: SVMs beat a neural network.

Detection of Remote Protein Homologies

A core problem in statistical biosequence analysis is the annotation of new protein sequences with structural and functional features. To a degree, this can be achieved by relating the new sequences to proteins for which such structural properties are already known. Although numerous techniques have been applied to this problem with success, the detection of remote protein homologies has remained a challenge. Entered by: Isabelle Guyon <isabelle@clopinet.com> - Monday, September 20, 1999 at 15:04:05 (PDT)
Comments: Jaakkola et al combine SVMs with HMMs and show the superiority of this approach compared to several baseline systems.

Function Approximation and Regression

Function approximation and regression problems seek to determine from pairs of examples (x,y) an approximation to an unknown function y=f(x). The application of SVMs to such problems has been intensively benchmarked with "synthetic data" coming from known functions. Although this demonstrated that SVMs are a very promising technique, this hardly qualifies as an application. There are only a few applications to real world problems. For example, in the Boston housing problem, house prices must be predicted from socio-economic and environmental factors, such as crime rate, nitric oxide concentation, distance to employment centers, and age of a property. Entered by: Isabelle Guyon <isabelle@clopinet.com> - Monday, September 20, 1999 at 14:23:51 (PDT)
Comments: Drucker-97 finds that SVMs outperform the baseline system (bagging) on the Boston housing problem. It is noted that SVMs can make a real difference when the dimensionality of input space and the order of the approximation create a dimensionality of feature space which is untractable with other methods. The results of Drucker et al are further improved in Stitson-99 (overall 35% better than the baseline method).

3-D Object Recognition Problems

3-D Object Recognition encompasses a wide variety of problems in Pattern Recognition that have to do with classifying representations of 3-dimensional objects. This ranges from face recognition to Automatic Target Recognition (ATR) from radar images. Some of the challenges include that objects are usually seen from only one angle at a time, and may be partially occulted. After training the system must be able both to classify correctly the objects of interest and reject other "confusers" or "distractors". Entered by: Isabelle Guyon <isabelle@clopinet.com> - Saturday, September 18, 1999 at 17:54:51 (PDT)
                     Pontil Massimiliano <pontil@ai.mit.edu> - Monday, October 04, 1999 at 18:48:38 (PDT)
                    Danny Roobaert <roobaert@nada.kth.se> - Friday, October 08, 1999 at 12:01:21 (PDT)
                    Edited by Isabelle Guyon - Thursday, October 14, 1999.
Comments: SVM's have been used either in the classification stage or in the pre-processing (Kernel Principal Component Analysis).
In Blanz-96. Support Vector Classifiers show excellent performance, leaving behind other methods. Osuna-97 demonstrates that SVCs can be trained on very large data sets (50,000 examples). The classification performance reaches that of one of the best known system while being 30 times faster at run time.
In Schölkopf-97, the advantage of KPCA is more measured in terms of simplicity, ensured convergence, and ease of understanding of the non-linearities.
Zhao-98 notes that SVCs with Gaussian kernels handle the rejection of unknown "confusers" particularly well. Friess-98 reports performance on the sonar data of Gorman and Sejnowski (1988). Their kernel adatron SVMs has a 95.2% success, compared to 90.2% for the best Backpropagation Neural Networks.
Papageorgiou-98 applies SVM with a wavelet preprocessing to face and people detection, showing improvements with respect to their base system.
Roobaert-99 shows that an SVM system working on raw data, not incorporating any domain knowledge about the task, matches the performance of their baseline system that does incorporate such knowledge.
Massimiliano Pontil points out that, as shown by the comparison with other techniques, it appears that SVMs can be effectively trained even if the number of examples is much lower than the dimensionality of the object space. In the paper Pontil-Verri-97, linear SVMs are used for 3-D object recognition. The potential of SVMs is illustrated on a database of 7200 images of 100 different objects. The proposed system does not require feature extraction and performs recognition on images regarded as points of a space of high dimension without estimating pose. The excellent recognition rates achieved in all the performed experiments indicate that SVMs are well-suited for aspect-based recognition.
In Roobaert-99, 3 methods for the improvement of Linear Support Vector Machines are presented in the case of Pattern Recognition with a number of irrelevent dimensions. A method for 3D object recognition without segmentation is proposed.

Text Categorization

Text categorization is the assignment of natural language texts to one or more predefined categories based on their content. Applications include: assigning subject categories to documents to support text retrieval, routing, and filtering; email or files sorting into folder hierarchies; web page sorting into search engine categories. Entered by: Isabelle Guyon <isabelle@clopinet.com> - Friday, September 17, 1999 at 15:19:48 (PDT). Last modified, October 13, 1999.
Comments: Joachims-98 reports that SVMs are well suited to learn in very high dimensional spaces (> 10000 inputs). They achieve substantial improvements over the currently best performing methods, eliminating the need for feature selection. The tests were run on the Oshumed corpus of William Hersh and Reuter-21578. Dumais et al report that they use linear SVMs because they are both accurate and fast (to train and to use). They are 35 times faster to train that the next most accurate classifier that they tested (Decision Trees). They have applied SVMs to the Reuter-21578 collection, emails and web pages. Drucker at al classify emails as spam and non spam. They find that boosting trees and SVMs have similar performance in terms of accuracy and speed. SVMs train significatly faster. Joachims-99 report that transduction is a very natural setting for many text classification and information retrieval tasks. Transductive SVMs improve performance especially in cases with very small amounts of labelled training data.

Time Series Prediction and Dynamic Resconstruction of Chaotic Systems

Dynamic reconstruction is an inverse problem that deals with reconstructing the dynamics of an unknown system, given a noisy time-series representing the evolution of one variable of the system with time. The reconstruction proceeds by utilizing the time-series to build a predictive model of the system and, then, using iterated prediction to test what the model has learned from the training data on the dynamics of the system. Entered by: Isabelle Guyon <isabelle@clopinet.com> - Thursday, September 16, 1999 at 14:54:32 (PDT)
Comments: Müller et al report excellent performance of SVM. They set a new record on the Santa Fe competition data set D, 37% better than the winning approach during the competition. Mattera et al report that SVM are effective for such tasks and that their main advantage is the possibility of trading off the required accuracy with the number of Support Vectors.

Support Vector Machine Classification of Microarray Gene Expression Data

We introduce a new method of functionally classifying genes using gene expression data from DNA microarray hybridization experiments. The method is based on the theory of support vector machines (SVMs). We describe SVMs that use different similarity metrics, including a simple dot product of gene expression vectors, polynomial versions of the dot product, and a radial basis function. The radial basis function SVM appears to provide superior performance in classifying functional classes of genes when compared to the other SVM similarity metrics. In addition, SVM performance is compared to four standard machine learning algorithms. SVMs have many features that make them attractive for gene expression analysis, including their flexibility in choosing a similarity function, sparseness of solution when dealing with large data sets, the ability to handle large feature spaces, and the ability to identify outliers. Entered by: Nello Cristianini <nello.cristianini@bristol.ac.uk> - Friday, September 10, 1999 at 03:11:46 (PDT)
Comments: SVMs outperformed all other classifers, when provided with a specifically designed kernel to deal with very imbalanced data.

Handwritten digit recognition problems

Support vector classifiers were applied to the recognition of isolated handwritten digits optically scanned. This is a subtask of of zipcode automatic reading and courtesy amount recognition on checks. Entered by: Isabelle Guyon <isabelle@clopinet.com> - Thursday, September 09, 1999 at 16:27:38 (PDT)
Comments: This is one of the first applications of SVCs. It was demonstrated that SCVs could be applied directly to pixel maps and nearly match or outperform other techniques requiring elaborate pre-processing, architecture design (structured neural networks), and/or a metric incorporating prior knowledge of the task (tangent distance) -- see e.g. Lecun-95. Elaborate metrics such as tangent distance can be used in combination with SVCs (Schölkopf-96-97) and yield improved performance. SVCs are also attractive for handwriting recognition tasks because they lend themself to easy writer adaptation and data cleaning, by making use of the support vectors (Matic-93 and Guyon-96). In Friess-98, the kernel Adatron SVM slightly outperforms the original SVM on the USPS character recognition benchmark.

Breast cancer diagnosis and prognosis

Support vector machines have been applied to breast cancer diagnosis and prognosis. The Wisconsin breast cancer dataset contains 699 patterns with 10 attributes for a binary classification task (the tumor is malignant or benign). Entered by: Prof. Olvi L. Mangasarian <olvi@cs.wisc.ed> - Thursday, September 09, 1999 at 15:25:50 (PDT)
Modified by: Isabelle Guyon <isabelle@clopinet.com> - Monday, September 20, 1999 at 9:30 (PDT)
Comments: Mangasarian et al use a linear programming formulation underlying that can be interpreted as an SVM. Their system (XCYT) is a highly accurate non-invasive breast cancer diagnostic program currently in use at University of Wisconsin Hospital. Friess et al report that the Wisconsin breat cancer dataset has been extensively studied. Their system, which uses Adatron SVMs, has 99.48% success rate, compared to 94.2% (CART), 95.9% (RBF), 96% (linear discriminant), 96.6% (Backpropagation network), all results reported elsewhere in the literature.

Support Vector Decision Tree Methods for Database Marketing

We introduce a support vector decision tree method for customer targeting in the framework of large databases (database marketing). The goal is to provide a tool to identify the best customers based on historical data (model development). Then this tool is used to forecast the best potential customers among a pool of prospects through a process of scoring. We begin by recursively constructing a decision tree. Each decision consists of a linear combination of the independent attributes. A linear program motivated by the support vector machine method from Vapnik's Statistical Learning Theory is used to construct each decision. A gainschart table is used to verify the goodness of fit of the targeting, the likely prospects, and the expected utility of profit. Successful results are given for three industrial problems. The linear program automatically performs dimensionality reduction. The method consistently produced trees with a very small number of decision nodes. Each decision consisted of a relatively small number of attributes. The trees produced a clear division of the population into likely prospects, unlikely prospects, and ambiguous prospects. The largest training dataset tested contained 15,700 points with 866 attributes. Commercial optimization package used, CPLEX, is capable of solving even larger problems. Entered by: Kristin P Bennett <bennek@rpi.edu> - Thursday, September 09, 1999 at 15:16:07 (PDT)
Comments: The support vector decision tree performed better than C4.5. SVDT produced very simple trees using few attributes.