Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien (eds)
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- book
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.001.0001
- Subject:
- Computer Science, Machine Learning
In the field of machine learning, semi-supervised learning (SSL) occupies the middle ground, between supervised learning (in which all training examples are labeled) and unsupervised learning (in ...
More
In the field of machine learning, semi-supervised learning (SSL) occupies the middle ground, between supervised learning (in which all training examples are labeled) and unsupervised learning (in which no label data are given). Interest in SSL has increased in recent years, particularly because of application domains in which unlabeled data are plentiful, such as images, text, and bioinformatics. This overview of SSL presents state-of-the-art algorithms, a taxonomy of the field, selected applications, benchmark experiments, and perspectives on ongoing and future research. It first presents the key assumptions and ideas underlying the field: smoothness, cluster or low-density separation, manifold structure, and transduction. The core of the book is the presentation of SSL methods, organized according to algorithmic strategies. After an examination of generative models, the book describes algorithms that implement the low-density separation assumption, graph-based methods, and algorithms which perform two-step learning. It then discusses SSL applications and offers guidelines for SSL practitioners by analyzing the results of benchmark experiments. Finally, the book looks at interesting directions for SSL research. It closes with a discussion of the relationship between semi-supervised learning and transduction.Less
In the field of machine learning, semi-supervised learning (SSL) occupies the middle ground, between supervised learning (in which all training examples are labeled) and unsupervised learning (in which no label data are given). Interest in SSL has increased in recent years, particularly because of application domains in which unlabeled data are plentiful, such as images, text, and bioinformatics. This overview of SSL presents state-of-the-art algorithms, a taxonomy of the field, selected applications, benchmark experiments, and perspectives on ongoing and future research. It first presents the key assumptions and ideas underlying the field: smoothness, cluster or low-density separation, manifold structure, and transduction. The core of the book is the presentation of SSL methods, organized according to algorithmic strategies. After an examination of generative models, the book describes algorithms that implement the low-density separation assumption, graph-based methods, and algorithms which perform two-step learning. It then discusses SSL applications and offers guidelines for SSL practitioners by analyzing the results of benchmark experiments. Finally, the book looks at interesting directions for SSL research. It closes with a discussion of the relationship between semi-supervised learning and transduction.
Chapelle Olivier, Schölkopf Bernhard, and Zien Alexander
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0001
- Subject:
- Computer Science, Machine Learning
This chapter first presents definitions of supervised and unsupervised learning in order to understand the nature of semi-supervised learning (SSL). SSL is halfway between supervised and unsupervised ...
More
This chapter first presents definitions of supervised and unsupervised learning in order to understand the nature of semi-supervised learning (SSL). SSL is halfway between supervised and unsupervised learning. In addition to unlabeled data, the algorithm is provided with some supervision information—but not necessarily for all examples. Often, this information will be the targets associated with some of the examples. Other forms of partial supervision are possible. For example, there may be constraints such as “these points have (or do not have) the same target.” The different setting corresponds to a different view of semi-supervised learning: In succeeding chapters, SSL is seen as unsupervised learning guided by constraints. A problem related to SSL was introduced by Vapnik several decades ago—transductive learning. In this setting, a labeled training set and an unlabeled test set are provided. The idea of transduction is to perform predictions only for the test points.Less
This chapter first presents definitions of supervised and unsupervised learning in order to understand the nature of semi-supervised learning (SSL). SSL is halfway between supervised and unsupervised learning. In addition to unlabeled data, the algorithm is provided with some supervision information—but not necessarily for all examples. Often, this information will be the targets associated with some of the examples. Other forms of partial supervision are possible. For example, there may be constraints such as “these points have (or do not have) the same target.” The different setting corresponds to a different view of semi-supervised learning: In succeeding chapters, SSL is seen as unsupervised learning guided by constraints. A problem related to SSL was introduced by Vapnik several decades ago—transductive learning. In this setting, a labeled training set and an unlabeled test set are provided. The idea of transduction is to perform predictions only for the test points.
Chapelle Olivier, Schölkopf Bernhard, and Zien Alexander
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0025
- Subject:
- Computer Science, Machine Learning
This chapter presents a fictitious discussion inspired by real discussions between the editors of this book and a number of people, including Vladimir Vapnik. It involves three researchers; for ...
More
This chapter presents a fictitious discussion inspired by real discussions between the editors of this book and a number of people, including Vladimir Vapnik. It involves three researchers; for simplicity, called here A, B, and C, without implying any one-to-one mapping to real persons. The topic of the discussion is: What is the Difference between Semi-Supervised Learning and Transductive Learning?Less
This chapter presents a fictitious discussion inspired by real discussions between the editors of this book and a number of people, including Vladimir Vapnik. It involves three researchers; for simplicity, called here A, B, and C, without implying any one-to-one mapping to real persons. The topic of the discussion is: What is the Difference between Semi-Supervised Learning and Transductive Learning?
Bengio Yoshua, Delalleau Olivier, and Roux Nicolas Le
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0011
- Subject:
- Computer Science, Machine Learning
This chapter shows how the different graph-based algorithms for semi-supervised learning can be cast into a common framework where one minimizes a quadratic cost criterion whose closed-form solution ...
More
This chapter shows how the different graph-based algorithms for semi-supervised learning can be cast into a common framework where one minimizes a quadratic cost criterion whose closed-form solution is found by solving a linear system of size n (total number of data points). The cost criterion naturally leads to an extension of such algorithms to the inductive setting, where one obtains test samples one at a time: the derived induction formula can be evaluated in O(n) time, which is much more efficient than solving again exactly the linear system (which in general costs O(kn2) time for a sparse graph where each data point has k neighbors). This inductive formula is also used to show that when the similarity between points satisfies a locality property, then the algorithms are plagued by the curse of dimensionality, with respect to the dimensionality of an underlying manifold.Less
This chapter shows how the different graph-based algorithms for semi-supervised learning can be cast into a common framework where one minimizes a quadratic cost criterion whose closed-form solution is found by solving a linear system of size n (total number of data points). The cost criterion naturally leads to an extension of such algorithms to the inductive setting, where one obtains test samples one at a time: the derived induction formula can be evaluated in O(n) time, which is much more efficient than solving again exactly the linear system (which in general costs O(kn2) time for a sparse graph where each data point has k neighbors). This inductive formula is also used to show that when the similarity between points satisfies a locality property, then the algorithms are plagued by the curse of dimensionality, with respect to the dimensionality of an underlying manifold.
Cozman Fabio and Cohen Ira
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0004
- Subject:
- Computer Science, Machine Learning
This chapter presents a number of conclusions. Firstly, labeled and unlabeled data contribute to a reduction in variance in semi-supervised learning under maximum-likelihood estimation. Secondly, ...
More
This chapter presents a number of conclusions. Firstly, labeled and unlabeled data contribute to a reduction in variance in semi-supervised learning under maximum-likelihood estimation. Secondly, when the model is “correct,” maximum-likelihood methods are asymptotically unbiased both with labeled and unlabeled data. Thirdly, when the model is “incorrect,” there may be different asymptotic biases for different values of λ. Asymptotic classification error may also vary with λ—an increase in the number of unlabeled samples may lead to a larger estimation asymptotic bias and to a larger classification error. If the performance obtained from a given set of labeled data is better than the performance with infinitely many unlabeled samples, then at some point the addition of unlabeled data must decrease performance.Less
This chapter presents a number of conclusions. Firstly, labeled and unlabeled data contribute to a reduction in variance in semi-supervised learning under maximum-likelihood estimation. Secondly, when the model is “correct,” maximum-likelihood methods are asymptotically unbiased both with labeled and unlabeled data. Thirdly, when the model is “incorrect,” there may be different asymptotic biases for different values of λ. Asymptotic classification error may also vary with λ—an increase in the number of unlabeled samples may lead to a larger estimation asymptotic bias and to a larger classification error. If the performance obtained from a given set of labeled data is better than the performance with infinitely many unlabeled samples, then at some point the addition of unlabeled data must decrease performance.
Weston Jason, Leslie Christina, Ie Eugene, and Noble William Stafford
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0019
- Subject:
- Computer Science, Machine Learning
This chapter describes an experimental study of large-scale semi-supervised learning for the problem of protein classification. The protein classification problem, a central problem in computational ...
More
This chapter describes an experimental study of large-scale semi-supervised learning for the problem of protein classification. The protein classification problem, a central problem in computational biology, is to predict the structural class of a protein given its amino acid sequence. Such a classification helps biologists to understand the function of a protein. Building an accurate protein classification system, as with many tasks, depends critically upon choosing a good representation of the input sequences of amino acids. Early work using string kernels with support vector machines (SVMs) for protein classification achieved state-of-the-art classification performance. However, such representations are based only on labeled data—examples with known three-dimensional (3D) structures, organized into structural classes-while in practice, unlabeled data are far more plentiful.Less
This chapter describes an experimental study of large-scale semi-supervised learning for the problem of protein classification. The protein classification problem, a central problem in computational biology, is to predict the structural class of a protein given its amino acid sequence. Such a classification helps biologists to understand the function of a protein. Building an accurate protein classification system, as with many tasks, depends critically upon choosing a good representation of the input sequences of amino acids. Early work using string kernels with support vector machines (SVMs) for protein classification achieved state-of-the-art classification performance. However, such representations are based only on labeled data—examples with known three-dimensional (3D) structures, organized into structural classes-while in practice, unlabeled data are far more plentiful.
Vapnik Vladimir
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0024
- Subject:
- Computer Science, Machine Learning
This chapter discusses the difference between transductive inference and semi-supervised learning. It argues that transductive inference captures the intrinsic properties of the mechanism for ...
More
This chapter discusses the difference between transductive inference and semi-supervised learning. It argues that transductive inference captures the intrinsic properties of the mechanism for extracting additional information from the unlabeled data. It also shows an important role of transduction for creating noninductive models of inference. In transductive inference the goal is to classify the given u test vectors of interest while in semi-supervised learning the goal is to find the function that minimizes the functional. Semi-supervised learning can be seen as being related to a particular setting of transductive learning. From a conceptual point of view, transductive inference contains important elements of a new philosophy of inference and this is the subject of this discussion.Less
This chapter discusses the difference between transductive inference and semi-supervised learning. It argues that transductive inference captures the intrinsic properties of the mechanism for extracting additional information from the unlabeled data. It also shows an important role of transduction for creating noninductive models of inference. In transductive inference the goal is to classify the given u test vectors of interest while in semi-supervised learning the goal is to find the function that minimizes the functional. Semi-supervised learning can be seen as being related to a particular setting of transductive learning. From a conceptual point of view, transductive inference contains important elements of a new philosophy of inference and this is the subject of this discussion.
Schuurmans Dale, Southey Finnegan, Wilkinson Dana, and Guo Yuhong
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0023
- Subject:
- Computer Science, Machine Learning
This chapter discusses the explicit relationship that must be asserted between labeled and unlabeled data, which is a requirement of semi-supervised learning methods. Semi-supervised model selection ...
More
This chapter discusses the explicit relationship that must be asserted between labeled and unlabeled data, which is a requirement of semi-supervised learning methods. Semi-supervised model selection and regularization methods are presented here that instead require only that the labeled and unlabeled data are drawn from the same distribution. From this assumption, a metric can be constructed over hypotheses based on their predictions for unlabeled data. This metric can then be used to detect untrustworthy training error estimates, leading to model selection strategies that select the richest hypothesis class while providing theoretical guarantees against overfitting. This general approach is then adapted to regularization for supervised regression and supervised classification with probabilistic classifiers. The regularization adapts not only to the hypothesis class but also to the specific data sample provided, allowing for better performance than regularizers that account only for class complexity.Less
This chapter discusses the explicit relationship that must be asserted between labeled and unlabeled data, which is a requirement of semi-supervised learning methods. Semi-supervised model selection and regularization methods are presented here that instead require only that the labeled and unlabeled data are drawn from the same distribution. From this assumption, a metric can be constructed over hypotheses based on their predictions for unlabeled data. This metric can then be used to detect untrustworthy training error estimates, leading to model selection strategies that select the richest hypothesis class while providing theoretical guarantees against overfitting. This general approach is then adapted to regularization for supervised regression and supervised classification with probabilistic classifiers. The regularization adapts not only to the hypothesis class but also to the specific data sample provided, allowing for better performance than regularizers that account only for class complexity.
Orlitsky Alon
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0017
- Subject:
- Computer Science, Machine Learning
This chapter discusses density-based metrics induced by Riemannian manifold structures. It presents asymptotically consistent methods to estimate and compute these metrics and present upper and lower ...
More
This chapter discusses density-based metrics induced by Riemannian manifold structures. It presents asymptotically consistent methods to estimate and compute these metrics and present upper and lower bounds on their estimation and computation errors. Finally, it is discussed how these metrics can be used for semi-supervised learning and present experimental results. Learning algorithms use a notion of similarity between data points to make inferences. Semi-supervised algorithms assume that two points are similar to each other if they are connected by a high-density region of the unlabeled data. Apart from semi-supervised learning, such density-based distance metrics also have applications in clustering and nonlinear interpolation.Less
This chapter discusses density-based metrics induced by Riemannian manifold structures. It presents asymptotically consistent methods to estimate and compute these metrics and present upper and lower bounds on their estimation and computation errors. Finally, it is discussed how these metrics can be used for semi-supervised learning and present experimental results. Learning algorithms use a notion of similarity between data points to make inferences. Semi-supervised algorithms assume that two points are similar to each other if they are connected by a high-density region of the unlabeled data. Apart from semi-supervised learning, such density-based distance metrics also have applications in clustering and nonlinear interpolation.
Nigam Kamal, McCallum Andrew, and Mitchell Tom
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0003
- Subject:
- Computer Science, Machine Learning
This chapter explores the use of generative models for semi-supervised learning with labeled and unlabeled data in domains of text classification. The widely used naive Bayes classifier for ...
More
This chapter explores the use of generative models for semi-supervised learning with labeled and unlabeled data in domains of text classification. The widely used naive Bayes classifier for supervised learning defines a mixture of multinomials mixture models. In some domains, model likelihood and classification accuracy are strongly correlated, despite the overly simplified generative model. Here, expectation-maximization finds more likely models and improved classification accuracy. In other domains, likelihood and accuracy are not well correlated with the naive Bayes model. Here, we can use a more expressive generative model that allows for multiple mixture components per class. This helps restore a moderate correlation between model likelihood and classification accuracy, and again, EM finds more accurate models. Finally, even with a well-correlated generative model, local maxima are a significant hindrance with EM. Here, the approach of deterministic annealing does provide much higher likelihood models, but often loses the correspondence with the class labels. When class label correspondence is easily corrected, high accuracy models result.Less
This chapter explores the use of generative models for semi-supervised learning with labeled and unlabeled data in domains of text classification. The widely used naive Bayes classifier for supervised learning defines a mixture of multinomials mixture models. In some domains, model likelihood and classification accuracy are strongly correlated, despite the overly simplified generative model. Here, expectation-maximization finds more likely models and improved classification accuracy. In other domains, likelihood and accuracy are not well correlated with the naive Bayes model. Here, we can use a more expressive generative model that allows for multiple mixture components per class. This helps restore a moderate correlation between model likelihood and classification accuracy, and again, EM finds more accurate models. Finally, even with a well-correlated generative model, local maxima are a significant hindrance with EM. Here, the approach of deterministic annealing does provide much higher likelihood models, but often loses the correspondence with the class labels. When class label correspondence is easily corrected, high accuracy models result.
Chapelle Olivier, Schölkopf Bernhard, and Zien Alexander
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0021
- Subject:
- Computer Science, Machine Learning
This chapter assesses the strengths and weaknesses of different semi-supervised learning (SSL) algorithms through inviting the authors of each chapter in this book to apply their algorithms to eight ...
More
This chapter assesses the strengths and weaknesses of different semi-supervised learning (SSL) algorithms through inviting the authors of each chapter in this book to apply their algorithms to eight benchmark data sets. These data sets encompass both artificial and real-world problems. Details are provided on how the algorithms were applied, especially how hyperparameters were chosen given the few labeled points. Finally, the chapter concludes by presenting and discussing the empirical performance.Less
This chapter assesses the strengths and weaknesses of different semi-supervised learning (SSL) algorithms through inviting the authors of each chapter in this book to apply their algorithms to eight benchmark data sets. These data sets encompass both artificial and real-world problems. Details are provided on how the algorithms were applied, especially how hyperparameters were chosen given the few labeled points. Finally, the chapter concludes by presenting and discussing the empirical performance.
Balcan Maria-Florina and Blum Avrim
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0022
- Subject:
- Computer Science, Machine Learning
This chapter describes an augmented version of the PAC model, designed with semi-supervised learning in mind, that can be used to help think about the problem of learning from labeled and unlabeled ...
More
This chapter describes an augmented version of the PAC model, designed with semi-supervised learning in mind, that can be used to help think about the problem of learning from labeled and unlabeled data and many of the different approaches taken. The model provides a unified framework for analyzing when and why unlabeled data can help, in which one can discuss both sample-complexity and algorithmic issues. The model described here can be viewed as an extension of the standard PAC model, where a compatibility function is also proposed—a type of compatibility that one believes the target concept should have with the underlying distribution of data. Unlabeled data are potentially helpful in this setting because they allow one to estimate compatibility over the space of hypotheses.Less
This chapter describes an augmented version of the PAC model, designed with semi-supervised learning in mind, that can be used to help think about the problem of learning from labeled and unlabeled data and many of the different approaches taken. The model provides a unified framework for analyzing when and why unlabeled data can help, in which one can discuss both sample-complexity and algorithmic issues. The model described here can be viewed as an extension of the standard PAC model, where a compatibility function is also proposed—a type of compatibility that one believes the target concept should have with the underlying distribution of data. Unlabeled data are potentially helpful in this setting because they allow one to estimate compatibility over the space of hypotheses.
Shin Hyunjung and Tsuda Koji
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0020
- Subject:
- Computer Science, Machine Learning
This chapter describes an algorithm to assign weights to multiple graphs within graph-based semi-supervised learning. Both predicting class labels and searching for weights for combining multiple ...
More
This chapter describes an algorithm to assign weights to multiple graphs within graph-based semi-supervised learning. Both predicting class labels and searching for weights for combining multiple graphs are formulated into one convex optimization problem. The graph-combining method is applied to functional class prediction of yeast proteins. When compared with individual graphs, the combined graph with optimized weights performs significantly better than any single graph. When compared with the semi-definite programming-based support vector machine (SDP/SVM), it shows comparable accuracy in a remarkably short time. Compared with a combined graph with equal-valued weights, this method could select important graphs without loss of accuracy, which implies the desirable property of integration with selectivity.Less
This chapter describes an algorithm to assign weights to multiple graphs within graph-based semi-supervised learning. Both predicting class labels and searching for weights for combining multiple graphs are formulated into one convex optimization problem. The graph-combining method is applied to functional class prediction of yeast proteins. When compared with individual graphs, the combined graph with optimized weights performs significantly better than any single graph. When compared with the semi-definite programming-based support vector machine (SDP/SVM), it shows comparable accuracy in a remarkably short time. Compared with a combined graph with equal-valued weights, this method could select important graphs without loss of accuracy, which implies the desirable property of integration with selectivity.