Paul Kockelman
- Published in print:
- 2017
- Published Online:
- July 2017
- ISBN:
- 9780190636531
- eISBN:
- 9780190636562
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780190636531.003.0007
- Subject:
- Linguistics, Sociolinguistics / Anthropological Linguistics
This chapter details the inner workings of spam filters, algorithmic devices that separate desirable messages from undesirable messages. It argues that such filters are a particularly important kind ...
More
This chapter details the inner workings of spam filters, algorithmic devices that separate desirable messages from undesirable messages. It argues that such filters are a particularly important kind of sieve insofar as they readily exhibit key features of sieving devices in general, and algorithmic sieving in particular. More broadly, it describes the relation between ontology (assumptions that drive interpretations) and inference (interpretations that alter assumptions) as it plays out in the classification and transformation of identities, types, or kinds. Focusing on the unstable processes whereby identifying algorithms, identified types, and evasive transformations are dynamically coupled over time, it also theorizes various kinds of ontological inertia and highlights various kinds of algorithmic ineffability. Finally, it shows how similar issues underlie a much wider range of processes, such as the Turing Test, Bayesian reasoning, and machine learning more generally.Less
This chapter details the inner workings of spam filters, algorithmic devices that separate desirable messages from undesirable messages. It argues that such filters are a particularly important kind of sieve insofar as they readily exhibit key features of sieving devices in general, and algorithmic sieving in particular. More broadly, it describes the relation between ontology (assumptions that drive interpretations) and inference (interpretations that alter assumptions) as it plays out in the classification and transformation of identities, types, or kinds. Focusing on the unstable processes whereby identifying algorithms, identified types, and evasive transformations are dynamically coupled over time, it also theorizes various kinds of ontological inertia and highlights various kinds of algorithmic ineffability. Finally, it shows how similar issues underlie a much wider range of processes, such as the Turing Test, Bayesian reasoning, and machine learning more generally.
Joaquin Quiñonero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D. Lawrence (eds)
- Published in print:
- 2008
- Published Online:
- August 2013
- ISBN:
- 9780262170055
- eISBN:
- 9780262255103
- Item type:
- book
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262170055.001.0001
- Subject:
- Computer Science, Machine Learning
Dataset shift is a common problem in predictive modeling that occurs when the joint distribution of inputs and outputs differs between training and test stages. Covariate shift, a particular case of ...
More
Dataset shift is a common problem in predictive modeling that occurs when the joint distribution of inputs and outputs differs between training and test stages. Covariate shift, a particular case of dataset shift, occurs when only the input distribution changes. Dataset shift is present in most practical applications, for reasons ranging from the bias introduced by experimental design to the irreproducibility of the testing conditions at training time. (An example is email spam filtering, which may fail to recognize spam that differs in form from the spam the automatic filter has been built on.) Despite this, and despite the attention given to the apparently similar problems of semi-supervised learning and active learning, dataset shift has received relatively little attention in the machine learning community until recently. This book offers an overview of current efforts to deal with dataset and covariate shift. The chapters offer a mathematical and philosophical introduction to the problem; place dataset shift in relationship to transfer learning, transduction, local learning, active learning, and semi-supervised learning; provide theoretical views of dataset and covariate shift (including decision theoretic and Bayesian perspectives); and present algorithms for covariate shift.Less
Dataset shift is a common problem in predictive modeling that occurs when the joint distribution of inputs and outputs differs between training and test stages. Covariate shift, a particular case of dataset shift, occurs when only the input distribution changes. Dataset shift is present in most practical applications, for reasons ranging from the bias introduced by experimental design to the irreproducibility of the testing conditions at training time. (An example is email spam filtering, which may fail to recognize spam that differs in form from the spam the automatic filter has been built on.) Despite this, and despite the attention given to the apparently similar problems of semi-supervised learning and active learning, dataset shift has received relatively little attention in the machine learning community until recently. This book offers an overview of current efforts to deal with dataset and covariate shift. The chapters offer a mathematical and philosophical introduction to the problem; place dataset shift in relationship to transfer learning, transduction, local learning, active learning, and semi-supervised learning; provide theoretical views of dataset and covariate shift (including decision theoretic and Bayesian perspectives); and present algorithms for covariate shift.
Globerson Amir, Hui Teo Choon, Smola Alex, and Roweis Sam
- Published in print:
- 2008
- Published Online:
- August 2013
- ISBN:
- 9780262170055
- eISBN:
- 9780262255103
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262170055.003.0010
- Subject:
- Computer Science, Machine Learning
This chapter considers an adversarial model where the learning algorithm attempts to construct a predictor that is robust to deletion of features at test time. The problem is formulated as finding ...
More
This chapter considers an adversarial model where the learning algorithm attempts to construct a predictor that is robust to deletion of features at test time. The problem is formulated as finding the optimal minimax strategy with respect to an adversary which deletes features, and shows that the optimal strategy may be found by either solving a quadratic program or using efficient bundle methods for optimization. The resulting algorithm significantly improves prediction performance for several problems included in a spam-filtering challenge task.Less
This chapter considers an adversarial model where the learning algorithm attempts to construct a predictor that is robust to deletion of features at test time. The problem is formulated as finding the optimal minimax strategy with respect to an adversary which deletes features, and shows that the optimal strategy may be found by either solving a quadratic program or using efficient bundle methods for optimization. The resulting algorithm significantly improves prediction performance for several problems included in a spam-filtering challenge task.
Andrew Gelman and Deborah Nolan
- Published in print:
- 2017
- Published Online:
- September 2017
- ISBN:
- 9780198785699
- eISBN:
- 9780191827518
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/oso/9780198785699.003.0021
- Subject:
- Mathematics, Educational Mathematics
In this chapter, we describe the philosophy, goals, syllabus, and activities for a course that we have developed in data science course. In this course we integrate topics from computing, statistics, ...
More
In this chapter, we describe the philosophy, goals, syllabus, and activities for a course that we have developed in data science course. In this course we integrate topics from computing, statistics, and working with data. This integrated approach addresses many core aspects in statistics training, including statistical thinking, the role of context in addressing a statistical problem, statistical communication through code, and the balance between programming and mathematical approaches to problems. When designing this course, we asked ourselves what our students ought to be able to do computationally. While we do provide a list of technical material, we also considered the broader goals of the course. Examples include plotting on Google Earth and developing a spam filter for unwanted email.Less
In this chapter, we describe the philosophy, goals, syllabus, and activities for a course that we have developed in data science course. In this course we integrate topics from computing, statistics, and working with data. This integrated approach addresses many core aspects in statistics training, including statistical thinking, the role of context in addressing a statistical problem, statistical communication through code, and the balance between programming and mathematical approaches to problems. When designing this course, we asked ourselves what our students ought to be able to do computationally. While we do provide a list of technical material, we also considered the broader goals of the course. Examples include plotting on Google Earth and developing a spam filter for unwanted email.