Željko Ivezi, Andrew J. Connolly, Jacob T. VanderPlas, Alexander Gray, Željko Ivezi, Andrew J. Connolly, Jacob T. VanderPlas, and Alexander Gray
- Published in print:
- 2014
- Published Online:
- October 2017
- ISBN:
- 9780691151687
- eISBN:
- 9781400848911
- Item type:
- chapter
- Publisher:
- Princeton University Press
- DOI:
- 10.23943/princeton/9780691151687.003.0001
- Subject:
- Physics, Particle Physics / Astrophysics / Cosmology
This chapter begins by discussing the meaning of data mining, machine learning, and knowledge discovery. Data mining, machine learning, and knowledge discovery refer to research areas which can all ...
More
This chapter begins by discussing the meaning of data mining, machine learning, and knowledge discovery. Data mining, machine learning, and knowledge discovery refer to research areas which can all be thought of as outgrowths of multivariate statistics. Their common themes are analysis and interpretation of data, often involving large quantities of data, and even more often resorting to numerical methods. The chapter then presents an incomplete survey of the relevant literature following by an introduction to the Python programming language and the Git code management tool. Next, it describes the surveys and data sets used in examples, plotting and visualizing the data in this book, and how to efficiently use this book.Less
This chapter begins by discussing the meaning of data mining, machine learning, and knowledge discovery. Data mining, machine learning, and knowledge discovery refer to research areas which can all be thought of as outgrowths of multivariate statistics. Their common themes are analysis and interpretation of data, often involving large quantities of data, and even more often resorting to numerical methods. The chapter then presents an incomplete survey of the relevant literature following by an introduction to the Python programming language and the Git code management tool. Next, it describes the surveys and data sets used in examples, plotting and visualizing the data in this book, and how to efficiently use this book.
Will Bridewell, Stuart R. Borrett, and Pat Langley
- Published in print:
- 2009
- Published Online:
- September 2009
- ISBN:
- 9780195381634
- eISBN:
- 9780199870264
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780195381634.003.0011
- Subject:
- Psychology, Cognitive Psychology
Scientific modeling is a creative activity that can benefit from computational support. This chapter reports five challenges that arise in developing such aids, as illustrated by PROMETHEUS, a ...
More
Scientific modeling is a creative activity that can benefit from computational support. This chapter reports five challenges that arise in developing such aids, as illustrated by PROMETHEUS, a software environment that supports the construction and revision of explanatory models. These challenges include the paucity of relevant data, the need to incorporate prior knowledge, the importance of comprehensibility, an emphasis on explanation, and the practicality of user interaction. The responses to these challenges include the use of quantitative processes to encode models and background knowledge, as well as the combination of AND/OR search through a space of model structures with gradient descent to estimate parameters. This chapter reports our experiences with PROMETHEUS on three scientific modeling tasks and some lessons we have learned from those efforts. This chapter concludes by noting additional challenges that were not apparent at the outset of our work.Less
Scientific modeling is a creative activity that can benefit from computational support. This chapter reports five challenges that arise in developing such aids, as illustrated by PROMETHEUS, a software environment that supports the construction and revision of explanatory models. These challenges include the paucity of relevant data, the need to incorporate prior knowledge, the importance of comprehensibility, an emphasis on explanation, and the practicality of user interaction. The responses to these challenges include the use of quantitative processes to encode models and background knowledge, as well as the combination of AND/OR search through a space of model structures with gradient descent to estimate parameters. This chapter reports our experiences with PROMETHEUS on three scientific modeling tasks and some lessons we have learned from those efforts. This chapter concludes by noting additional challenges that were not apparent at the outset of our work.
Janice Glasgow and Evan Steeg
- Published in print:
- 1999
- Published Online:
- November 2020
- ISBN:
- 9780195119404
- eISBN:
- 9780197561256
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/oso/9780195119404.003.0011
- Subject:
- Computer Science, Systems Analysis and Design
The field of knowledge discovery is concerned with the theory and processes involved in the representation and extraction of patterns or motifs from large ...
More
The field of knowledge discovery is concerned with the theory and processes involved in the representation and extraction of patterns or motifs from large databases. Discovered patterns can be used to group data into meaningful classes, to summarize data, or to reveal deviant entries. Motifs stored in a database can be brought to bear on difficult instances of structure prediction or determination from X-ray crystallography or nuclear magnetic resonance (NMR) experiments. Automated discovery techniques are central to understanding and analyzing the rapidly expanding repositories of protein sequence and structure data. This chapter deals with the discovery of protein structure motifs. A motif is an abstraction over a set of recurring patterns observed in a dataset; it captures the essential features shared by a set of similar or related objects. In many domains, such as computer vision and speech recognition, there exist special regularities that permit such motif abstraction. In the protein science domain, the regularities derive from evolutionary and biophysical constraints on amino acid sequences and structures. The identification of a known pattern in a new protein sequence or structure permits the immediate retrieval and application of knowledge obtained from the analysis of other proteins. The discovery and manipulation of motifs—in DNA, RNA, and protein sequences and structures—is thus an important component of computational molecular biology and genome informatics. In particular, identifying protein structure classifications at varying levels of abstraction allows us to organize and increase our understanding of the rapidly growing protein structure datasets. Discovered motifs are also useful for improving the efficiency and effectiveness of X-ray crystallographic studies of proteins, for drug design, for understanding protein evolution, and ultimately for predicting the structure of proteins from sequence data. Motifs may be designed by hand, based on expert knowledge. For example, the Chou-Fasman protein secondary structure prediction program (Chou and Fasman, 1978), which dominated the field for many years, depended on the recognition of predefined, user-encoded sequence motifs for α-helices and β-sheets. Several hundred sequence motifs have been cataloged in PROSITE (Bairoch, 1992); the identification of one of these motifs in a novel protein often allows for immediate function interpretation.
Less
The field of knowledge discovery is concerned with the theory and processes involved in the representation and extraction of patterns or motifs from large databases. Discovered patterns can be used to group data into meaningful classes, to summarize data, or to reveal deviant entries. Motifs stored in a database can be brought to bear on difficult instances of structure prediction or determination from X-ray crystallography or nuclear magnetic resonance (NMR) experiments. Automated discovery techniques are central to understanding and analyzing the rapidly expanding repositories of protein sequence and structure data. This chapter deals with the discovery of protein structure motifs. A motif is an abstraction over a set of recurring patterns observed in a dataset; it captures the essential features shared by a set of similar or related objects. In many domains, such as computer vision and speech recognition, there exist special regularities that permit such motif abstraction. In the protein science domain, the regularities derive from evolutionary and biophysical constraints on amino acid sequences and structures. The identification of a known pattern in a new protein sequence or structure permits the immediate retrieval and application of knowledge obtained from the analysis of other proteins. The discovery and manipulation of motifs—in DNA, RNA, and protein sequences and structures—is thus an important component of computational molecular biology and genome informatics. In particular, identifying protein structure classifications at varying levels of abstraction allows us to organize and increase our understanding of the rapidly growing protein structure datasets. Discovered motifs are also useful for improving the efficiency and effectiveness of X-ray crystallographic studies of proteins, for drug design, for understanding protein evolution, and ultimately for predicting the structure of proteins from sequence data. Motifs may be designed by hand, based on expert knowledge. For example, the Chou-Fasman protein secondary structure prediction program (Chou and Fasman, 1978), which dominated the field for many years, depended on the recognition of predefined, user-encoded sequence motifs for α-helices and β-sheets. Several hundred sequence motifs have been cataloged in PROSITE (Bairoch, 1992); the identification of one of these motifs in a novel protein often allows for immediate function interpretation.
Paul Dragos Aligica, Peter J. Boettke, and Vlad Tarko
- Published in print:
- 2019
- Published Online:
- June 2019
- ISBN:
- 9780190267032
- eISBN:
- 9780190267063
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/oso/9780190267032.003.0004
- Subject:
- Economics and Finance, Public and Welfare
Chapter 3 present a set of key notions to be employed in framing and approaching the dynamic governance process, and the phenomena associated with it, in ways that are particularly relevant for ...
More
Chapter 3 present a set of key notions to be employed in framing and approaching the dynamic governance process, and the phenomena associated with it, in ways that are particularly relevant for governance analysis and design: (a) the very idea of process-focused, dynamic governance itself, having at its core the voluntary action principle; (b) the notions of countervailing powers and voluntary sector, nonstate governance, leading to the overarching and encapsulating notion of polycentricity, the governance keystone of the normative individualist system of classical-liberal inspiration; and (c) the epistemic dimension, and the conceptualization of the role of knowledge discovery, production, aggregation and distribution in society, as well as the associated epistemic and institutional processes all seen as a natural complement of the notion of polycentricity.Less
Chapter 3 present a set of key notions to be employed in framing and approaching the dynamic governance process, and the phenomena associated with it, in ways that are particularly relevant for governance analysis and design: (a) the very idea of process-focused, dynamic governance itself, having at its core the voluntary action principle; (b) the notions of countervailing powers and voluntary sector, nonstate governance, leading to the overarching and encapsulating notion of polycentricity, the governance keystone of the normative individualist system of classical-liberal inspiration; and (c) the epistemic dimension, and the conceptualization of the role of knowledge discovery, production, aggregation and distribution in society, as well as the associated epistemic and institutional processes all seen as a natural complement of the notion of polycentricity.
Maurizio Borghi and Stavroula Karapapa
- Published in print:
- 2013
- Published Online:
- May 2013
- ISBN:
- 9780199664559
- eISBN:
- 9780191758409
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780199664559.003.0003
- Subject:
- Law, Intellectual Property, IT, and Media Law
One of the most prominent features of mass digitization is the automated processing of works for various research-related and commercial purposes. This includes text mining or linguistic analysis ...
More
One of the most prominent features of mass digitization is the automated processing of works for various research-related and commercial purposes. This includes text mining or linguistic analysis over masses of texts, image analysis, information extraction, automatic translation, data mining for behavioural profiling, and so on. With a few exceptions, copying for automated processing or computational analysis does not feature in statutory language and its status remains uncertain. While, historically, ‘machine-reading’ has challenged copyright norms at some instances, there is currently need for a careful consideration of the parameters of its permissibility.Less
One of the most prominent features of mass digitization is the automated processing of works for various research-related and commercial purposes. This includes text mining or linguistic analysis over masses of texts, image analysis, information extraction, automatic translation, data mining for behavioural profiling, and so on. With a few exceptions, copying for automated processing or computational analysis does not feature in statutory language and its status remains uncertain. While, historically, ‘machine-reading’ has challenged copyright norms at some instances, there is currently need for a careful consideration of the parameters of its permissibility.
Antony Bryant
- Published in print:
- 2017
- Published Online:
- January 2017
- ISBN:
- 9780199922604
- eISBN:
- 9780190652548
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780199922604.003.0016
- Subject:
- Psychology, Social Psychology
Clarification of ideas around Big Data and the ways in which a clear understanding of GTM offers a sound basis for insightful use of Big Data. Anderson’s challenge – the 21st century version of naïve ...
More
Clarification of ideas around Big Data and the ways in which a clear understanding of GTM offers a sound basis for insightful use of Big Data. Anderson’s challenge – the 21st century version of naïve Baonian induction. Digital data as a key source for researchers, but one to be used with care and the same criteria as other sources. Pathologies of Big Data analyses – apophenia; seeing patterns in everything. Mandelbrot’s abductive leap and the analysis of chaos. Metaphors in the understanding of data – mining, refining, construction, discovery. Leetaru’s Culturomics as an example of big data analytics; its strengths and its weaknesses. The continuing necessity for theoretical sensitivity. The paradox of big data – however much exists today, there will be far more by tomorrow.Less
Clarification of ideas around Big Data and the ways in which a clear understanding of GTM offers a sound basis for insightful use of Big Data. Anderson’s challenge – the 21st century version of naïve Baonian induction. Digital data as a key source for researchers, but one to be used with care and the same criteria as other sources. Pathologies of Big Data analyses – apophenia; seeing patterns in everything. Mandelbrot’s abductive leap and the analysis of chaos. Metaphors in the understanding of data – mining, refining, construction, discovery. Leetaru’s Culturomics as an example of big data analytics; its strengths and its weaknesses. The continuing necessity for theoretical sensitivity. The paradox of big data – however much exists today, there will be far more by tomorrow.
James E. Dobson
- Published in print:
- 2019
- Published Online:
- September 2019
- ISBN:
- 9780252042270
- eISBN:
- 9780252051111
- Item type:
- chapter
- Publisher:
- University of Illinois Press
- DOI:
- 10.5622/illinois/9780252042270.003.0004
- Subject:
- Literature, Criticism/Theory
This chapter turns to a lower level of computation to produce a cultural critique and historicization of one of the most important algorithms used in digital humanities and other big-data ...
More
This chapter turns to a lower level of computation to produce a cultural critique and historicization of one of the most important algorithms used in digital humanities and other big-data applications in the present moment, the k-nearest neighbor or k-NN algorithm. The chapter reconstructs the partial genealogy, the intellectual history, of this important algorithm that was key to sense making in the midtwentieth century and has found continued life in the twenty-first century. In both its formalized description, its exposition in the papers introducing and refining the rule and its implementation in algorithmic form, and its actual use, the k-nearest neighbor algorithm draws on dominant midtwentieth-century ideologies and tropes, including partitioning, segregation, suburbanization, and democratization. In the process of situating the k-NN algorithm within the larger field containing other residual and emergent statistical methods, the author seeks to produce an intervention within the developing critical theory of algorithmic governmentality.Less
This chapter turns to a lower level of computation to produce a cultural critique and historicization of one of the most important algorithms used in digital humanities and other big-data applications in the present moment, the k-nearest neighbor or k-NN algorithm. The chapter reconstructs the partial genealogy, the intellectual history, of this important algorithm that was key to sense making in the midtwentieth century and has found continued life in the twenty-first century. In both its formalized description, its exposition in the papers introducing and refining the rule and its implementation in algorithmic form, and its actual use, the k-nearest neighbor algorithm draws on dominant midtwentieth-century ideologies and tropes, including partitioning, segregation, suburbanization, and democratization. In the process of situating the k-NN algorithm within the larger field containing other residual and emergent statistical methods, the author seeks to produce an intervention within the developing critical theory of algorithmic governmentality.
Gary Smith
- Published in print:
- 2018
- Published Online:
- November 2020
- ISBN:
- 9780198824305
- eISBN:
- 9780191917295
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/oso/9780198824305.003.0007
- Subject:
- Computer Science, Artificial Intelligence, Machine Learning
I do an extra-sensory perception (ESP) experiment on the first day of my statistics classes. I show the students an ordinary coin— sometimes borrowed from a ...
More
I do an extra-sensory perception (ESP) experiment on the first day of my statistics classes. I show the students an ordinary coin— sometimes borrowed from a student—and flip the coin ten times. After each flip, I think about the outcome intently while the students try to read my mind. They write their guesses down, and I record the actual flips by circling H or T on a piece of paper that has been designed so that the students cannot tell from the location of my pencil which letter I am circling. Anyone who guesses all ten flips correctly wins a one-pound box of chocolates from a local gourmet chocolate store. If you want to try this at home, guess my ten coin flips in the stats class I taught in the spring of 2017. My brain waves may still be out there somewhere. Write your guesses down, and we’ll see how well you do. After ten flips, I ask the students to raise their hands and I begin revealing my flips. If a student misses, the hand goes down, Anyone with a hand up at the end wins the chocolates. I had a winner once, which is to be expected since more than a thousand students have played this game. I don’t believe in ESP, so the box of chocolates is not the point of this experiment. I offer the chocolates in order to persuade students to take the test seriously. My real intent is to demonstrate that most people, even bright college students, have a misperception about what coin flips and other random events look like. This misperception fuels our mistaken belief that data patterns uncovered by computers must be meaningful. Back in the 1930s, the Zenith Radio Corporation broadcast a series of weekly ESP experiments. A “sender” in the radio studio randomly chose a circle or square, analogous to flipping a fair coin, and visualized the shape, hoping that the image would reach listeners hundreds of miles away. After five random draws, listeners were encouraged to mail in their guesses. These experiments did not support the idea of ESP, but they did provide compelling evidence that people underestimate how frequently patterns appear in random data.
Less
I do an extra-sensory perception (ESP) experiment on the first day of my statistics classes. I show the students an ordinary coin— sometimes borrowed from a student—and flip the coin ten times. After each flip, I think about the outcome intently while the students try to read my mind. They write their guesses down, and I record the actual flips by circling H or T on a piece of paper that has been designed so that the students cannot tell from the location of my pencil which letter I am circling. Anyone who guesses all ten flips correctly wins a one-pound box of chocolates from a local gourmet chocolate store. If you want to try this at home, guess my ten coin flips in the stats class I taught in the spring of 2017. My brain waves may still be out there somewhere. Write your guesses down, and we’ll see how well you do. After ten flips, I ask the students to raise their hands and I begin revealing my flips. If a student misses, the hand goes down, Anyone with a hand up at the end wins the chocolates. I had a winner once, which is to be expected since more than a thousand students have played this game. I don’t believe in ESP, so the box of chocolates is not the point of this experiment. I offer the chocolates in order to persuade students to take the test seriously. My real intent is to demonstrate that most people, even bright college students, have a misperception about what coin flips and other random events look like. This misperception fuels our mistaken belief that data patterns uncovered by computers must be meaningful. Back in the 1930s, the Zenith Radio Corporation broadcast a series of weekly ESP experiments. A “sender” in the radio studio randomly chose a circle or square, analogous to flipping a fair coin, and visualized the shape, hoping that the image would reach listeners hundreds of miles away. After five random draws, listeners were encouraged to mail in their guesses. These experiments did not support the idea of ESP, but they did provide compelling evidence that people underestimate how frequently patterns appear in random data.
Roman M. Krzanowski and Jonathan Raper
- Published in print:
- 2001
- Published Online:
- November 2020
- ISBN:
- 9780195135688
- eISBN:
- 9780197561621
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/oso/9780195135688.003.0010
- Subject:
- Computer Science, Mathematical Theory of Computation
In part II we describe some possible methods of modeling spatial phenomena with spatial evolutionary algorithms. We will explain what spatial evolutionary ...
More
In part II we describe some possible methods of modeling spatial phenomena with spatial evolutionary algorithms. We will explain what spatial evolutionary models and spatial evolutionary algorithms are and how they can be designed. We will also provide a general framework for spatial evolutionary modeling. We believe that this framework can be used to create evolutionary models (and algorithms) of spatial phenomena that will reach well beyond the model discussed in the book. Wherever possible we will give examples to illustrate the concepts, terms, and procedures we discuss. In fact, by the end of part II we will have built, using presented principles, a complete spatial evolutionary model—a spatial evolutionary model of a wireless communication system. We shall begin our discussion with an explanation of the distinction between spatial evolutionary models and evolutionary models of spatial phenomena. As we shall see, the difference between these two terms, while subtle, is very important for the understanding of spatial modeling in general and evolutionary spatial modeling in particular. . . . "Spatial Evolutionary Models" Versus "Evolutionary Models of Spatial Phenomena" . . . The differences between the terms spatial evolutionary models and evolutionary models of spatial phenomena extend well beyond their lexical dissimilarities and touch upon very basic issues of evolutionary and spatial modeling. The term spatial evolutionary model, as used here, refers to an evolutionary model that constitutes a separate, distinct class of computer evolutionary models. In contrast, the term evolutionary models of spatial phenomena denotes applications of existing evolutionary methods (or mere extensions of established evolutionary methodologies) to problems defined in space. Our view of the science of spatial modelling is driven by the choice of which definition, along with its consequences, that we accept. If we accept that spatial evolutionary models constitute a separate and distinct class of evolutionary models, then we will also have to accept the proposition that they possess unique rules governing their behavior, a unique genome design to represent a model-specific data structure, and a set of unique operators that cannot be readily applied to nonspatial problems. Moreover, it will follow that these evolutionary models also possess problem-specific language, that is language specific to the domain of spatial evolutionary models.
Less
In part II we describe some possible methods of modeling spatial phenomena with spatial evolutionary algorithms. We will explain what spatial evolutionary models and spatial evolutionary algorithms are and how they can be designed. We will also provide a general framework for spatial evolutionary modeling. We believe that this framework can be used to create evolutionary models (and algorithms) of spatial phenomena that will reach well beyond the model discussed in the book. Wherever possible we will give examples to illustrate the concepts, terms, and procedures we discuss. In fact, by the end of part II we will have built, using presented principles, a complete spatial evolutionary model—a spatial evolutionary model of a wireless communication system. We shall begin our discussion with an explanation of the distinction between spatial evolutionary models and evolutionary models of spatial phenomena. As we shall see, the difference between these two terms, while subtle, is very important for the understanding of spatial modeling in general and evolutionary spatial modeling in particular. . . . "Spatial Evolutionary Models" Versus "Evolutionary Models of Spatial Phenomena" . . . The differences between the terms spatial evolutionary models and evolutionary models of spatial phenomena extend well beyond their lexical dissimilarities and touch upon very basic issues of evolutionary and spatial modeling. The term spatial evolutionary model, as used here, refers to an evolutionary model that constitutes a separate, distinct class of computer evolutionary models. In contrast, the term evolutionary models of spatial phenomena denotes applications of existing evolutionary methods (or mere extensions of established evolutionary methodologies) to problems defined in space. Our view of the science of spatial modelling is driven by the choice of which definition, along with its consequences, that we accept. If we accept that spatial evolutionary models constitute a separate and distinct class of evolutionary models, then we will also have to accept the proposition that they possess unique rules governing their behavior, a unique genome design to represent a model-specific data structure, and a set of unique operators that cannot be readily applied to nonspatial problems. Moreover, it will follow that these evolutionary models also possess problem-specific language, that is language specific to the domain of spatial evolutionary models.