Joaquin Quiñonero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D. Lawrence (eds)
- Published in print:
- 2008
- Published Online:
- August 2013
- ISBN:
- 9780262170055
- eISBN:
- 9780262255103
- Item type:
- book
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262170055.001.0001
- Subject:
- Computer Science, Machine Learning
Dataset shift is a common problem in predictive modeling that occurs when the joint distribution of inputs and outputs differs between training and test stages. Covariate shift, a particular case of ...
More
Dataset shift is a common problem in predictive modeling that occurs when the joint distribution of inputs and outputs differs between training and test stages. Covariate shift, a particular case of dataset shift, occurs when only the input distribution changes. Dataset shift is present in most practical applications, for reasons ranging from the bias introduced by experimental design to the irreproducibility of the testing conditions at training time. (An example is email spam filtering, which may fail to recognize spam that differs in form from the spam the automatic filter has been built on.) Despite this, and despite the attention given to the apparently similar problems of semi-supervised learning and active learning, dataset shift has received relatively little attention in the machine learning community until recently. This book offers an overview of current efforts to deal with dataset and covariate shift. The chapters offer a mathematical and philosophical introduction to the problem; place dataset shift in relationship to transfer learning, transduction, local learning, active learning, and semi-supervised learning; provide theoretical views of dataset and covariate shift (including decision theoretic and Bayesian perspectives); and present algorithms for covariate shift.Less
Dataset shift is a common problem in predictive modeling that occurs when the joint distribution of inputs and outputs differs between training and test stages. Covariate shift, a particular case of dataset shift, occurs when only the input distribution changes. Dataset shift is present in most practical applications, for reasons ranging from the bias introduced by experimental design to the irreproducibility of the testing conditions at training time. (An example is email spam filtering, which may fail to recognize spam that differs in form from the spam the automatic filter has been built on.) Despite this, and despite the attention given to the apparently similar problems of semi-supervised learning and active learning, dataset shift has received relatively little attention in the machine learning community until recently. This book offers an overview of current efforts to deal with dataset and covariate shift. The chapters offer a mathematical and philosophical introduction to the problem; place dataset shift in relationship to transfer learning, transduction, local learning, active learning, and semi-supervised learning; provide theoretical views of dataset and covariate shift (including decision theoretic and Bayesian perspectives); and present algorithms for covariate shift.
Bradley E. Alger
- Published in print:
- 2019
- Published Online:
- February 2021
- ISBN:
- 9780190881481
- eISBN:
- 9780190093761
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/oso/9780190881481.003.0007
- Subject:
- Neuroscience, Techniques
This chapter reviews and evaluates reports that scientists often cannot repeat, or “reproduce” published work. It begins by defining what “reproducibility” means and how reproducibility applies to ...
More
This chapter reviews and evaluates reports that scientists often cannot repeat, or “reproduce” published work. It begins by defining what “reproducibility” means and how reproducibility applies to various kinds of science. The focus then shifts to the Reproducibility Project: Psychology, which was a systematic effort to repeat published findings in psychology, and which gave rise to many of the present concerns about reproducibility. The chapter critically examines the Reproducibility Project and points out how the nature of science and the complexity of nature can stymie the best attempts at reproducibility. The chapter also reviews the statistical criticisms of science that John Ioannidis and Katherine Button and their colleagues have raised. The hypothesis is a central issue because it is inconsistently defined across various branches of science. The statisticians’ strongest attacks are directed against work that differs from most laboratory experimental science. A weak point in the reasoning behind the Reproducibility Project and the statistical arguments is the assumption that a multi-pronged scientific investigation can be legitimately criticized by close examination of one of its components. Experimental science relies on multiple tests and multiple hypotheses to arrive at its conclusions. Reproducibility is a valid concern for science; it is not a “crisis.”Less
This chapter reviews and evaluates reports that scientists often cannot repeat, or “reproduce” published work. It begins by defining what “reproducibility” means and how reproducibility applies to various kinds of science. The focus then shifts to the Reproducibility Project: Psychology, which was a systematic effort to repeat published findings in psychology, and which gave rise to many of the present concerns about reproducibility. The chapter critically examines the Reproducibility Project and points out how the nature of science and the complexity of nature can stymie the best attempts at reproducibility. The chapter also reviews the statistical criticisms of science that John Ioannidis and Katherine Button and their colleagues have raised. The hypothesis is a central issue because it is inconsistently defined across various branches of science. The statisticians’ strongest attacks are directed against work that differs from most laboratory experimental science. A weak point in the reasoning behind the Reproducibility Project and the statistical arguments is the assumption that a multi-pronged scientific investigation can be legitimately criticized by close examination of one of its components. Experimental science relies on multiple tests and multiple hypotheses to arrive at its conclusions. Reproducibility is a valid concern for science; it is not a “crisis.”
R. Barker Bausell
- Published in print:
- 2021
- Published Online:
- February 2021
- ISBN:
- 9780197536537
- eISBN:
- 9780197536568
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/oso/9780197536537.003.0002
- Subject:
- Psychology, Social Psychology
Publication bias, defined as a “tendency for positive results to be overrepresented in the published literature,” was recognized and bemoaned as early as the 17th century by the chemist Robert Boyle. ...
More
Publication bias, defined as a “tendency for positive results to be overrepresented in the published literature,” was recognized and bemoaned as early as the 17th century by the chemist Robert Boyle. In the latter half of the 20th century, it began to be recognized as an increasingly serious scientific problem characterized by a deluge of positive published results (actually exceeded 95% in some areas of psychology). And, by the second decade of the 21st century, data mining techniques indicated that the phenomenon had reached epic proportions, not only in psychology and the other social sciences, but in many of the life and physical sciences as well: a finding that might have been viewed as an amusing idiosyncratic scientific fact of life if not for a concomitant realization that most of these positive scientific findings were wrong. And that publication bias, if not a cause of this debacle, was at least a major facilitator. This chapter provides documentation for the high prevalence of this odd phenomenon in a wide swath of myriad empirical scientific literatures along with the accompanying compulsion it fosters for producing positive rather than reproducible results.Less
Publication bias, defined as a “tendency for positive results to be overrepresented in the published literature,” was recognized and bemoaned as early as the 17th century by the chemist Robert Boyle. In the latter half of the 20th century, it began to be recognized as an increasingly serious scientific problem characterized by a deluge of positive published results (actually exceeded 95% in some areas of psychology). And, by the second decade of the 21st century, data mining techniques indicated that the phenomenon had reached epic proportions, not only in psychology and the other social sciences, but in many of the life and physical sciences as well: a finding that might have been viewed as an amusing idiosyncratic scientific fact of life if not for a concomitant realization that most of these positive scientific findings were wrong. And that publication bias, if not a cause of this debacle, was at least a major facilitator. This chapter provides documentation for the high prevalence of this odd phenomenon in a wide swath of myriad empirical scientific literatures along with the accompanying compulsion it fosters for producing positive rather than reproducible results.
R. Barker Bausell
- Published in print:
- 2021
- Published Online:
- February 2021
- ISBN:
- 9780197536537
- eISBN:
- 9780197536568
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/oso/9780197536537.003.0009
- Subject:
- Psychology, Social Psychology
But what happens to investigators whose studies fails to replicate? The answer is complicated by the growing use of social media by scientists and the tenor of the original investigators’ responses ...
More
But what happens to investigators whose studies fails to replicate? The answer is complicated by the growing use of social media by scientists and the tenor of the original investigators’ responses to the replicators. Alternative case studies are presented including John Bargh’s vitriolic outburst following a failure of his classic word priming study to replicate, Amy Cuddy’s unfortunate experience with power posing, and Matthew Vees’s low-keyed response in which he declined to aggressively disparage his replicators, complemented the replicators’ interpretation of their replication, and neither defended his original study or even suggested that its findings might be wrong. In addition to such case studies, surveys on the subject suggest that there are normally no long-term deleterious career or reputational effects on investigators for a failure of a study to replicate and that a reasoned (or no) response to a failed replication is the superior professional and affective solution.Less
But what happens to investigators whose studies fails to replicate? The answer is complicated by the growing use of social media by scientists and the tenor of the original investigators’ responses to the replicators. Alternative case studies are presented including John Bargh’s vitriolic outburst following a failure of his classic word priming study to replicate, Amy Cuddy’s unfortunate experience with power posing, and Matthew Vees’s low-keyed response in which he declined to aggressively disparage his replicators, complemented the replicators’ interpretation of their replication, and neither defended his original study or even suggested that its findings might be wrong. In addition to such case studies, surveys on the subject suggest that there are normally no long-term deleterious career or reputational effects on investigators for a failure of a study to replicate and that a reasoned (or no) response to a failed replication is the superior professional and affective solution.