*Bengio Yoshua, Delalleau Olivier, and Roux Nicolas Le*

- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter

- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0011
- Subject:
- Computer Science, Machine Learning

This chapter shows how the different graph-based algorithms for semi-supervised learning can be cast into a common framework where one minimizes a quadratic cost criterion whose closed-form solution ...
More

This chapter shows how the different graph-based algorithms for semi-supervised learning can be cast into a common framework where one minimizes a quadratic cost criterion whose closed-form solution is found by solving a linear system of size n (total number of data points). The cost criterion naturally leads to an extension of such algorithms to the inductive setting, where one obtains test samples one at a time: the derived induction formula can be evaluated in O(n) time, which is much more efficient than solving again exactly the linear system (which in general costs O(kn2) time for a sparse graph where each data point has k neighbors). This inductive formula is also used to show that when the similarity between points satisfies a locality property, then the algorithms are plagued by the curse of dimensionality, with respect to the dimensionality of an underlying manifold.Less

This chapter shows how the different graph-based algorithms for semi-supervised learning can be cast into a common framework where one minimizes a quadratic cost criterion whose closed-form solution is found by solving a linear system of size *n* (total number of data points). The cost criterion naturally leads to an extension of such algorithms to the inductive setting, where one obtains test samples one at a time: the derived induction formula can be evaluated in *O(n)* time, which is much more efficient than solving again exactly the linear system (which in general costs *O(kn*^{2}) time for a sparse graph where each data point has *k* neighbors). This inductive formula is also used to show that when the similarity between points satisfies a locality property, then the algorithms are plagued by the curse of dimensionality, with respect to the dimensionality of an underlying manifold.

*Delalleau Olivier, Bengio Yoshua, and Le Roux Nicolas*

- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter

- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0018
- Subject:
- Computer Science, Machine Learning

This chapter presents a subset selection method that can be used to reduce the original system to one of size m 〈〈 n. The idea is to solve for the labels of a subset S ⊂ X of only m points, while ...
More

This chapter presents a subset selection method that can be used to reduce the original system to one of size m 〈〈 n. The idea is to solve for the labels of a subset S ⊂ X of only m points, while still retaining information from the rest of the data by approximating their label with a linear combination of the labels in S—using the induction formula presented in Chapter 11. This leads to an algorithm whose computational requirements scale as O(m2n) and memory requirements as O(m2), thus allowing one to take advantage of significantly bigger unlabeled data sets than with the original algorithms.Less

This chapter presents a subset selection method that can be used to reduce the original system to one of size *m 〈〈 n*. The idea is to solve for the labels of a subset *S ⊂ X* of only *m* points, while still retaining information from the rest of the data by approximating their label with a linear combination of the labels in *S*—using the induction formula presented in Chapter 11. This leads to an algorithm whose computational requirements scale as *O(m*^{2}n) and memory requirements as *O(m*^{2}), thus allowing one to take advantage of significantly bigger unlabeled data sets than with the original algorithms.