Cozman Fabio and Cohen Ira
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0004
- Subject:
- Computer Science, Machine Learning
This chapter presents a number of conclusions. Firstly, labeled and unlabeled data contribute to a reduction in variance in semi-supervised learning under maximum-likelihood estimation. Secondly, ...
More
This chapter presents a number of conclusions. Firstly, labeled and unlabeled data contribute to a reduction in variance in semi-supervised learning under maximum-likelihood estimation. Secondly, when the model is “correct,” maximum-likelihood methods are asymptotically unbiased both with labeled and unlabeled data. Thirdly, when the model is “incorrect,” there may be different asymptotic biases for different values of λ. Asymptotic classification error may also vary with λ—an increase in the number of unlabeled samples may lead to a larger estimation asymptotic bias and to a larger classification error. If the performance obtained from a given set of labeled data is better than the performance with infinitely many unlabeled samples, then at some point the addition of unlabeled data must decrease performance.Less
This chapter presents a number of conclusions. Firstly, labeled and unlabeled data contribute to a reduction in variance in semi-supervised learning under maximum-likelihood estimation. Secondly, when the model is “correct,” maximum-likelihood methods are asymptotically unbiased both with labeled and unlabeled data. Thirdly, when the model is “incorrect,” there may be different asymptotic biases for different values of λ. Asymptotic classification error may also vary with λ—an increase in the number of unlabeled samples may lead to a larger estimation asymptotic bias and to a larger classification error. If the performance obtained from a given set of labeled data is better than the performance with infinitely many unlabeled samples, then at some point the addition of unlabeled data must decrease performance.