Corduneanu Adrian and Jaakkola Tommi
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0010
- Subject:
- Computer Science, Machine Learning
This chapter considers two ways of representing the topology over examples, either based on complete knowledge of the marginal density or by grouping together examples whose labels should be related. ...
More
This chapter considers two ways of representing the topology over examples, either based on complete knowledge of the marginal density or by grouping together examples whose labels should be related. The learning algorithms and sample complexity issues that result from each representation is discussed here. Information regularization is a principle for assigning labels to unlabeled data points in a semi-supervised setting. The broader principle is based on finding labels that minimize the information induced between examples and labels relative to a topology over the examples; any label variation within a small local region of examples ties together the identities of examples and their labels. Such variation should be minimized unless supported directly or indirectly by the available labeled examples.Less
This chapter considers two ways of representing the topology over examples, either based on complete knowledge of the marginal density or by grouping together examples whose labels should be related. The learning algorithms and sample complexity issues that result from each representation is discussed here. Information regularization is a principle for assigning labels to unlabeled data points in a semi-supervised setting. The broader principle is based on finding labels that minimize the information induced between examples and labels relative to a topology over the examples; any label variation within a small local region of examples ties together the identities of examples and their labels. Such variation should be minimized unless supported directly or indirectly by the available labeled examples.
Balcan Maria-Florina and Blum Avrim
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0022
- Subject:
- Computer Science, Machine Learning
This chapter describes an augmented version of the PAC model, designed with semi-supervised learning in mind, that can be used to help think about the problem of learning from labeled and unlabeled ...
More
This chapter describes an augmented version of the PAC model, designed with semi-supervised learning in mind, that can be used to help think about the problem of learning from labeled and unlabeled data and many of the different approaches taken. The model provides a unified framework for analyzing when and why unlabeled data can help, in which one can discuss both sample-complexity and algorithmic issues. The model described here can be viewed as an extension of the standard PAC model, where a compatibility function is also proposed—a type of compatibility that one believes the target concept should have with the underlying distribution of data. Unlabeled data are potentially helpful in this setting because they allow one to estimate compatibility over the space of hypotheses.Less
This chapter describes an augmented version of the PAC model, designed with semi-supervised learning in mind, that can be used to help think about the problem of learning from labeled and unlabeled data and many of the different approaches taken. The model provides a unified framework for analyzing when and why unlabeled data can help, in which one can discuss both sample-complexity and algorithmic issues. The model described here can be viewed as an extension of the standard PAC model, where a compatibility function is also proposed—a type of compatibility that one believes the target concept should have with the underlying distribution of data. Unlabeled data are potentially helpful in this setting because they allow one to estimate compatibility over the space of hypotheses.