Cozman Fabio and Cohen Ira
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0004
- Subject:
- Computer Science, Machine Learning
This chapter presents a number of conclusions. Firstly, labeled and unlabeled data contribute to a reduction in variance in semi-supervised learning under maximum-likelihood estimation. Secondly, ...
More
This chapter presents a number of conclusions. Firstly, labeled and unlabeled data contribute to a reduction in variance in semi-supervised learning under maximum-likelihood estimation. Secondly, when the model is “correct,” maximum-likelihood methods are asymptotically unbiased both with labeled and unlabeled data. Thirdly, when the model is “incorrect,” there may be different asymptotic biases for different values of λ. Asymptotic classification error may also vary with λ—an increase in the number of unlabeled samples may lead to a larger estimation asymptotic bias and to a larger classification error. If the performance obtained from a given set of labeled data is better than the performance with infinitely many unlabeled samples, then at some point the addition of unlabeled data must decrease performance.Less
This chapter presents a number of conclusions. Firstly, labeled and unlabeled data contribute to a reduction in variance in semi-supervised learning under maximum-likelihood estimation. Secondly, when the model is “correct,” maximum-likelihood methods are asymptotically unbiased both with labeled and unlabeled data. Thirdly, when the model is “incorrect,” there may be different asymptotic biases for different values of λ. Asymptotic classification error may also vary with λ—an increase in the number of unlabeled samples may lead to a larger estimation asymptotic bias and to a larger classification error. If the performance obtained from a given set of labeled data is better than the performance with infinitely many unlabeled samples, then at some point the addition of unlabeled data must decrease performance.
Balcan Maria-Florina and Blum Avrim
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0022
- Subject:
- Computer Science, Machine Learning
This chapter describes an augmented version of the PAC model, designed with semi-supervised learning in mind, that can be used to help think about the problem of learning from labeled and unlabeled ...
More
This chapter describes an augmented version of the PAC model, designed with semi-supervised learning in mind, that can be used to help think about the problem of learning from labeled and unlabeled data and many of the different approaches taken. The model provides a unified framework for analyzing when and why unlabeled data can help, in which one can discuss both sample-complexity and algorithmic issues. The model described here can be viewed as an extension of the standard PAC model, where a compatibility function is also proposed—a type of compatibility that one believes the target concept should have with the underlying distribution of data. Unlabeled data are potentially helpful in this setting because they allow one to estimate compatibility over the space of hypotheses.Less
This chapter describes an augmented version of the PAC model, designed with semi-supervised learning in mind, that can be used to help think about the problem of learning from labeled and unlabeled data and many of the different approaches taken. The model provides a unified framework for analyzing when and why unlabeled data can help, in which one can discuss both sample-complexity and algorithmic issues. The model described here can be viewed as an extension of the standard PAC model, where a compatibility function is also proposed—a type of compatibility that one believes the target concept should have with the underlying distribution of data. Unlabeled data are potentially helpful in this setting because they allow one to estimate compatibility over the space of hypotheses.
Grandvalet Yves and Bengio Yoshua
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0009
- Subject:
- Computer Science, Machine Learning
This chapter promotes the use of entropy regularization as a means to benefit from unlabeled data in the framework of maximum a posteriori estimation. The learning criterion is derived from clearly ...
More
This chapter promotes the use of entropy regularization as a means to benefit from unlabeled data in the framework of maximum a posteriori estimation. The learning criterion is derived from clearly stated assumptions and can be applied to any smoothly parameterized model of posterior probabilities. The regularization scheme favors low-density separation, without any modeling of the density of input features. The contribution of unlabeled data to the learning criterion induces local optima, but this problem can be alleviated by deterministic annealing. For well-behaved models of posterior probabilities, deterministic annealing expectation-maximization (EM) provides a decomposition of the learning problem in a series of concave subproblems. Other approaches to the semi-supervised problem are shown to be close relatives or limiting cases of entropy regularization. A series of experiments illustrates the good behavior of the algorithm in terms of performance and robustness with respect to the violation of the postulated low-density separation assumption.Less
This chapter promotes the use of entropy regularization as a means to benefit from unlabeled data in the framework of maximum a posteriori estimation. The learning criterion is derived from clearly stated assumptions and can be applied to any smoothly parameterized model of posterior probabilities. The regularization scheme favors low-density separation, without any modeling of the density of input features. The contribution of unlabeled data to the learning criterion induces local optima, but this problem can be alleviated by deterministic annealing. For well-behaved models of posterior probabilities, deterministic annealing expectation-maximization (EM) provides a decomposition of the learning problem in a series of concave subproblems. Other approaches to the semi-supervised problem are shown to be close relatives or limiting cases of entropy regularization. A series of experiments illustrates the good behavior of the algorithm in terms of performance and robustness with respect to the violation of the postulated low-density separation assumption.
Weston Jason, Leslie Christina, Ie Eugene, and Noble William Stafford
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0019
- Subject:
- Computer Science, Machine Learning
This chapter describes an experimental study of large-scale semi-supervised learning for the problem of protein classification. The protein classification problem, a central problem in computational ...
More
This chapter describes an experimental study of large-scale semi-supervised learning for the problem of protein classification. The protein classification problem, a central problem in computational biology, is to predict the structural class of a protein given its amino acid sequence. Such a classification helps biologists to understand the function of a protein. Building an accurate protein classification system, as with many tasks, depends critically upon choosing a good representation of the input sequences of amino acids. Early work using string kernels with support vector machines (SVMs) for protein classification achieved state-of-the-art classification performance. However, such representations are based only on labeled data—examples with known three-dimensional (3D) structures, organized into structural classes-while in practice, unlabeled data are far more plentiful.Less
This chapter describes an experimental study of large-scale semi-supervised learning for the problem of protein classification. The protein classification problem, a central problem in computational biology, is to predict the structural class of a protein given its amino acid sequence. Such a classification helps biologists to understand the function of a protein. Building an accurate protein classification system, as with many tasks, depends critically upon choosing a good representation of the input sequences of amino acids. Early work using string kernels with support vector machines (SVMs) for protein classification achieved state-of-the-art classification performance. However, such representations are based only on labeled data—examples with known three-dimensional (3D) structures, organized into structural classes-while in practice, unlabeled data are far more plentiful.
Chapelle Olivier, Schölkopf Bernhard, and Zien Alexander
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0001
- Subject:
- Computer Science, Machine Learning
This chapter first presents definitions of supervised and unsupervised learning in order to understand the nature of semi-supervised learning (SSL). SSL is halfway between supervised and unsupervised ...
More
This chapter first presents definitions of supervised and unsupervised learning in order to understand the nature of semi-supervised learning (SSL). SSL is halfway between supervised and unsupervised learning. In addition to unlabeled data, the algorithm is provided with some supervision information—but not necessarily for all examples. Often, this information will be the targets associated with some of the examples. Other forms of partial supervision are possible. For example, there may be constraints such as “these points have (or do not have) the same target.” The different setting corresponds to a different view of semi-supervised learning: In succeeding chapters, SSL is seen as unsupervised learning guided by constraints. A problem related to SSL was introduced by Vapnik several decades ago—transductive learning. In this setting, a labeled training set and an unlabeled test set are provided. The idea of transduction is to perform predictions only for the test points.Less
This chapter first presents definitions of supervised and unsupervised learning in order to understand the nature of semi-supervised learning (SSL). SSL is halfway between supervised and unsupervised learning. In addition to unlabeled data, the algorithm is provided with some supervision information—but not necessarily for all examples. Often, this information will be the targets associated with some of the examples. Other forms of partial supervision are possible. For example, there may be constraints such as “these points have (or do not have) the same target.” The different setting corresponds to a different view of semi-supervised learning: In succeeding chapters, SSL is seen as unsupervised learning guided by constraints. A problem related to SSL was introduced by Vapnik several decades ago—transductive learning. In this setting, a labeled training set and an unlabeled test set are provided. The idea of transduction is to perform predictions only for the test points.
Lawrence Neil D. and Jordan Michael I.
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0008
- Subject:
- Computer Science, Machine Learning
This chapter presents an augmentation of the standard probabilistic classification model which incorporates a null-category. Given a suitable probabilistic model for the model category, the chapter ...
More
This chapter presents an augmentation of the standard probabilistic classification model which incorporates a null-category. Given a suitable probabilistic model for the model category, the chapter obtains a probabilistic counterpart of the margin. By combining this noise model with a Gaussian process classifier (GPC), it obtains a classification methodology that is simultaneously discriminative, semi-supervised, and Bayesian. The approach incorporates the cluster assumption without explicitly modeling the data density and without requiring specialized kernels. GPCs aim to predict the posterior probability of the class label yi given a covariate vector xi. Under the standard assumptions generally invoked by GPC practitioners, this posterior probability is unaffected by unlabeled data points, providing no role for unlabeled data.Less
This chapter presents an augmentation of the standard probabilistic classification model which incorporates a null-category. Given a suitable probabilistic model for the model category, the chapter obtains a probabilistic counterpart of the margin. By combining this noise model with a Gaussian process classifier (GPC), it obtains a classification methodology that is simultaneously discriminative, semi-supervised, and Bayesian. The approach incorporates the cluster assumption without explicitly modeling the data density and without requiring specialized kernels. GPCs aim to predict the posterior probability of the class label yi given a covariate vector xi. Under the standard assumptions generally invoked by GPC practitioners, this posterior probability is unaffected by unlabeled data points, providing no role for unlabeled data.
Corduneanu Adrian and Jaakkola Tommi
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0010
- Subject:
- Computer Science, Machine Learning
This chapter considers two ways of representing the topology over examples, either based on complete knowledge of the marginal density or by grouping together examples whose labels should be related. ...
More
This chapter considers two ways of representing the topology over examples, either based on complete knowledge of the marginal density or by grouping together examples whose labels should be related. The learning algorithms and sample complexity issues that result from each representation is discussed here. Information regularization is a principle for assigning labels to unlabeled data points in a semi-supervised setting. The broader principle is based on finding labels that minimize the information induced between examples and labels relative to a topology over the examples; any label variation within a small local region of examples ties together the identities of examples and their labels. Such variation should be minimized unless supported directly or indirectly by the available labeled examples.Less
This chapter considers two ways of representing the topology over examples, either based on complete knowledge of the marginal density or by grouping together examples whose labels should be related. The learning algorithms and sample complexity issues that result from each representation is discussed here. Information regularization is a principle for assigning labels to unlabeled data points in a semi-supervised setting. The broader principle is based on finding labels that minimize the information induced between examples and labels relative to a topology over the examples; any label variation within a small local region of examples ties together the identities of examples and their labels. Such variation should be minimized unless supported directly or indirectly by the available labeled examples.
Delalleau Olivier, Bengio Yoshua, and Le Roux Nicolas
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0018
- Subject:
- Computer Science, Machine Learning
This chapter presents a subset selection method that can be used to reduce the original system to one of size m 〈〈 n. The idea is to solve for the labels of a subset S ⊂ X of only m points, while ...
More
This chapter presents a subset selection method that can be used to reduce the original system to one of size m 〈〈 n. The idea is to solve for the labels of a subset S ⊂ X of only m points, while still retaining information from the rest of the data by approximating their label with a linear combination of the labels in S—using the induction formula presented in Chapter 11. This leads to an algorithm whose computational requirements scale as O(m2n) and memory requirements as O(m2), thus allowing one to take advantage of significantly bigger unlabeled data sets than with the original algorithms.Less
This chapter presents a subset selection method that can be used to reduce the original system to one of size m 〈〈 n. The idea is to solve for the labels of a subset S ⊂ X of only m points, while still retaining information from the rest of the data by approximating their label with a linear combination of the labels in S—using the induction formula presented in Chapter 11. This leads to an algorithm whose computational requirements scale as O(m2n) and memory requirements as O(m2), thus allowing one to take advantage of significantly bigger unlabeled data sets than with the original algorithms.
Vapnik Vladimir
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0024
- Subject:
- Computer Science, Machine Learning
This chapter discusses the difference between transductive inference and semi-supervised learning. It argues that transductive inference captures the intrinsic properties of the mechanism for ...
More
This chapter discusses the difference between transductive inference and semi-supervised learning. It argues that transductive inference captures the intrinsic properties of the mechanism for extracting additional information from the unlabeled data. It also shows an important role of transduction for creating noninductive models of inference. In transductive inference the goal is to classify the given u test vectors of interest while in semi-supervised learning the goal is to find the function that minimizes the functional. Semi-supervised learning can be seen as being related to a particular setting of transductive learning. From a conceptual point of view, transductive inference contains important elements of a new philosophy of inference and this is the subject of this discussion.Less
This chapter discusses the difference between transductive inference and semi-supervised learning. It argues that transductive inference captures the intrinsic properties of the mechanism for extracting additional information from the unlabeled data. It also shows an important role of transduction for creating noninductive models of inference. In transductive inference the goal is to classify the given u test vectors of interest while in semi-supervised learning the goal is to find the function that minimizes the functional. Semi-supervised learning can be seen as being related to a particular setting of transductive learning. From a conceptual point of view, transductive inference contains important elements of a new philosophy of inference and this is the subject of this discussion.