Thomas P. Vartanian
- Published in print:
- 2010
- Published Online:
- January 2011
- ISBN:
- 9780195388817
- eISBN:
- 9780199863396
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780195388817.003.0002
- Subject:
- Social Work, Research and Evaluation
This chapter explains the differences between primary and secondary data sets. It explains how secondary data sets are typically collected, the ability of large institutions to collect both sizable ...
More
This chapter explains the differences between primary and secondary data sets. It explains how secondary data sets are typically collected, the ability of large institutions to collect both sizable and representative data sets and their ability to use sophisticated sampling designs in their data collection, how these large data sets can utilize statistical techniques that may not be available to smaller data sets, and differences between cross-sectional and longitudinal data sets.Less
This chapter explains the differences between primary and secondary data sets. It explains how secondary data sets are typically collected, the ability of large institutions to collect both sizable and representative data sets and their ability to use sophisticated sampling designs in their data collection, how these large data sets can utilize statistical techniques that may not be available to smaller data sets, and differences between cross-sectional and longitudinal data sets.
Jerrold I. Davis, Kevin C. Nixon, and Damon P. Little
- Published in print:
- 2006
- Published Online:
- September 2007
- ISBN:
- 9780199297306
- eISBN:
- 9780191713729
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780199297306.003.0007
- Subject:
- Biology, Evolutionary Biology / Genetics
Software for cladistic analysis has been widely available for more than twenty years, and a series of advances made during this time have facilitated the analysis of matrices of ever-increasing size. ...
More
Software for cladistic analysis has been widely available for more than twenty years, and a series of advances made during this time have facilitated the analysis of matrices of ever-increasing size. This chapter provides an overview of the development of parsimony methods for cladistic analysis, describes strategies that have allowed large data matrices to be analysed by conventional methods, and in doing so, demonstrates that data sets historically considered intracable could in fact have been readily approached using then-available hardware and software. Preliminary analyses, even when unsuccessful at discovering most-parsimonious trees, can be used to identify appropriate software settings for use during thorough analyses. A useful indicator of the settings that yield the most efficient searches is the excess branch swapping ratio, which is the ratio between the number of tree rearrangements conducted during a particular phase of branch swapping in which shorter trees are being discovered, and the minimum possible number of rearrangements during this phase. It is concluded that two-stage search strategies, with intensive branch swapping conducted on a small percentage of the most optimal sets of trees obtained by a large number of relatively short searches, are more efficient than one-stage searches.Less
Software for cladistic analysis has been widely available for more than twenty years, and a series of advances made during this time have facilitated the analysis of matrices of ever-increasing size. This chapter provides an overview of the development of parsimony methods for cladistic analysis, describes strategies that have allowed large data matrices to be analysed by conventional methods, and in doing so, demonstrates that data sets historically considered intracable could in fact have been readily approached using then-available hardware and software. Preliminary analyses, even when unsuccessful at discovering most-parsimonious trees, can be used to identify appropriate software settings for use during thorough analyses. A useful indicator of the settings that yield the most efficient searches is the excess branch swapping ratio, which is the ratio between the number of tree rearrangements conducted during a particular phase of branch swapping in which shorter trees are being discovered, and the minimum possible number of rearrangements during this phase. It is concluded that two-stage search strategies, with intensive branch swapping conducted on a small percentage of the most optimal sets of trees obtained by a large number of relatively short searches, are more efficient than one-stage searches.
Gidon Eshel
- Published in print:
- 2011
- Published Online:
- October 2017
- ISBN:
- 9780691128917
- eISBN:
- 9781400840632
- Item type:
- chapter
- Publisher:
- Princeton University Press
- DOI:
- 10.23943/princeton/9780691128917.003.0011
- Subject:
- Environmental Science, Environmental Studies
This chapter focuses on empirical orthogonal functions (EOFs). One of the most useful and common eigen-techniques in data analysis is the construction of EOFs. EOFs are a transform of the data; the ...
More
This chapter focuses on empirical orthogonal functions (EOFs). One of the most useful and common eigen-techniques in data analysis is the construction of EOFs. EOFs are a transform of the data; the original set of numbers is transformed into a different set with some desirable properties. In this sense the EOF transform is similar to other transforms, such as the Fourier or Laplace transforms. In all these cases, we project the original data onto a set of functions, thus replacing the original data with the set of projection coefficients on the chosen new set of basis vectors. However, the choice of the specific basis set varies from case to case. The discussions cover data matrix structure convention, reshaping multidimensional data sets for EOF analysis, forming anomalies and removing time mean, missing values, choosing and interpreting the covariability matrix, calculating the EOFs, projection time series, and extended EOF analysis.Less
This chapter focuses on empirical orthogonal functions (EOFs). One of the most useful and common eigen-techniques in data analysis is the construction of EOFs. EOFs are a transform of the data; the original set of numbers is transformed into a different set with some desirable properties. In this sense the EOF transform is similar to other transforms, such as the Fourier or Laplace transforms. In all these cases, we project the original data onto a set of functions, thus replacing the original data with the set of projection coefficients on the chosen new set of basis vectors. However, the choice of the specific basis set varies from case to case. The discussions cover data matrix structure convention, reshaping multidimensional data sets for EOF analysis, forming anomalies and removing time mean, missing values, choosing and interpreting the covariability matrix, calculating the EOFs, projection time series, and extended EOF analysis.
Lena H. Ting and Stacie A. Chvatal
- Published in print:
- 2010
- Published Online:
- January 2011
- ISBN:
- 9780195395273
- eISBN:
- 9780199863518
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780195395273.003.0005
- Subject:
- Neuroscience, Sensory and Motor Systems
This chapter examines methodologies for dimensional analysis and linear decomposition of multivariate data sets, and discusses their implicit hypotheses and interpretations for muscle coordination of ...
More
This chapter examines methodologies for dimensional analysis and linear decomposition of multivariate data sets, and discusses their implicit hypotheses and interpretations for muscle coordination of movement. It presents tutorials to compare how two common methods—principal components analysis (PCA) and non-negative matrix factorization (NMF)—decompose electromyographic signals into underlying components. To facilitate the integration of such mathematical techniques with physiological hypothesis testing, the chapter focuses on developing an intuitive understanding to the two techniques. It provides a simple two-dimensional tutorial, focusing on how orthogonality constraints in PCA and non-negativity constraints in NMF impact the resulting data decomposition and physiological relevance. Examples are presented using real data sets from human balance control and locomotion. The chapter examines the structure of the resulting components, their robustness across tasks, and their implications for various muscle synergy hypotheses. The chapter addresses practical issues and caveats in organizing datasets, the selection of the appropriate number of components, and considerations and pitfalls of experimental design and analysis, as well as offering suggestions and cautions for interpreting results. Based on these comparisons and on the work in the visual system over the last decade, evidence is presented for the increased neurophysiological relevance of the factors derived from NMF compared to PCA.Less
This chapter examines methodologies for dimensional analysis and linear decomposition of multivariate data sets, and discusses their implicit hypotheses and interpretations for muscle coordination of movement. It presents tutorials to compare how two common methods—principal components analysis (PCA) and non-negative matrix factorization (NMF)—decompose electromyographic signals into underlying components. To facilitate the integration of such mathematical techniques with physiological hypothesis testing, the chapter focuses on developing an intuitive understanding to the two techniques. It provides a simple two-dimensional tutorial, focusing on how orthogonality constraints in PCA and non-negativity constraints in NMF impact the resulting data decomposition and physiological relevance. Examples are presented using real data sets from human balance control and locomotion. The chapter examines the structure of the resulting components, their robustness across tasks, and their implications for various muscle synergy hypotheses. The chapter addresses practical issues and caveats in organizing datasets, the selection of the appropriate number of components, and considerations and pitfalls of experimental design and analysis, as well as offering suggestions and cautions for interpreting results. Based on these comparisons and on the work in the visual system over the last decade, evidence is presented for the increased neurophysiological relevance of the factors derived from NMF compared to PCA.
Thomas P. Vartanian
- Published in print:
- 2010
- Published Online:
- January 2011
- ISBN:
- 9780195388817
- eISBN:
- 9780199863396
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780195388817.003.0003
- Subject:
- Social Work, Research and Evaluation
This chapter examines factors to consider when thinking about using large secondary data sets. These include advantages of using such data, e.g., cost and time advantages; the ability to follow ...
More
This chapter examines factors to consider when thinking about using large secondary data sets. These include advantages of using such data, e.g., cost and time advantages; the ability to follow people over a long period of time as can be done with longitudinal data sets that already span a great number of decades; the packaging of the data into SAS, SPSS, STATA, and other formats, facilitating data programming and analysis; and the ability to do a number of research projects from a single data set. Some disadvantages include lack of control over the framing and wording of survey items, potentially small sample sizes for particular populations, and an inability to recontact the survey respondent. The chapter also presents a list of questions to determine whether using secondary data is both feasible and appropriate given the research study.Less
This chapter examines factors to consider when thinking about using large secondary data sets. These include advantages of using such data, e.g., cost and time advantages; the ability to follow people over a long period of time as can be done with longitudinal data sets that already span a great number of decades; the packaging of the data into SAS, SPSS, STATA, and other formats, facilitating data programming and analysis; and the ability to do a number of research projects from a single data set. Some disadvantages include lack of control over the framing and wording of survey items, potentially small sample sizes for particular populations, and an inability to recontact the survey respondent. The chapter also presents a list of questions to determine whether using secondary data is both feasible and appropriate given the research study.
J. C. Gower and G. B. Dijksterhuis
- Published in print:
- 2004
- Published Online:
- September 2007
- ISBN:
- 9780198510581
- eISBN:
- 9780191708961
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780198510581.003.0002
- Subject:
- Mathematics, Probability / Statistics
This chapter discusses initial transformations that may be useful before embarking on a Procrustes analysis proper. Data-scaling and configuration-scaling are the terms adopted for the many kinds of ...
More
This chapter discusses initial transformations that may be useful before embarking on a Procrustes analysis proper. Data-scaling and configuration-scaling are the terms adopted for the many kinds of transformations that may be deemed desirable before embarking on the actual Procrustes matching. The aim of these is to eliminate possible incommensurabilities of variables within the individual data sets (data-scaling) and size differences between data sets (configuration-scaling). Although some choices of data-scaling have clear justification, other choices are largely subjective. The form and source of the data should help decide what, if any, data-scaling is needed. Three main types of data are considered: (i) sets of coordinates derived from some form of multidimensional scaling (MDS) or, in the case of shape studies, directly measured landmark coordinates; (ii) data matrices whose columns refer to different variables; and (iii) sets of loadings derived from factor analysis.Less
This chapter discusses initial transformations that may be useful before embarking on a Procrustes analysis proper. Data-scaling and configuration-scaling are the terms adopted for the many kinds of transformations that may be deemed desirable before embarking on the actual Procrustes matching. The aim of these is to eliminate possible incommensurabilities of variables within the individual data sets (data-scaling) and size differences between data sets (configuration-scaling). Although some choices of data-scaling have clear justification, other choices are largely subjective. The form and source of the data should help decide what, if any, data-scaling is needed. Three main types of data are considered: (i) sets of coordinates derived from some form of multidimensional scaling (MDS) or, in the case of shape studies, directly measured landmark coordinates; (ii) data matrices whose columns refer to different variables; and (iii) sets of loadings derived from factor analysis.
Peter Temin
- Published in print:
- 2012
- Published Online:
- October 2017
- ISBN:
- 9780691147680
- eISBN:
- 9781400845422
- Item type:
- chapter
- Publisher:
- Princeton University Press
- DOI:
- 10.23943/princeton/9780691147680.003.0002
- Subject:
- History, Ancient History / Archaeology
This chapter discusses how there is little of what economists call data on markets in Roman times, despite lots of information about prices and transactions. Data, as economists consider it, consist ...
More
This chapter discusses how there is little of what economists call data on markets in Roman times, despite lots of information about prices and transactions. Data, as economists consider it, consist of a set of uniform prices that can be compared with each other. According to scholars, extensive markets existed in the late Roman Republic and early Roman Empire. Even though there is a lack of data, there are enough observations for the price of wheat, the most extensively traded commodity, to perform a test. The problem is that there is only a little bit of data by modern standards. Consequently, the chapter explains why statistics are useful in interpreting small data sets and how one deals with various problems that arise when there are only a few data points.Less
This chapter discusses how there is little of what economists call data on markets in Roman times, despite lots of information about prices and transactions. Data, as economists consider it, consist of a set of uniform prices that can be compared with each other. According to scholars, extensive markets existed in the late Roman Republic and early Roman Empire. Even though there is a lack of data, there are enough observations for the price of wheat, the most extensively traded commodity, to perform a test. The problem is that there is only a little bit of data by modern standards. Consequently, the chapter explains why statistics are useful in interpreting small data sets and how one deals with various problems that arise when there are only a few data points.
Ziheng Yang
- Published in print:
- 2006
- Published Online:
- April 2010
- ISBN:
- 9780198567028
- eISBN:
- 9780191728280
- Item type:
- book
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780198567028.001.0001
- Subject:
- Biology, Evolutionary Biology / Genetics
The field of molecular evolution has experienced explosive growth in recent years due to the rapid accumulation of genetic sequence data, continuous improvements to computer hardware and software, ...
More
The field of molecular evolution has experienced explosive growth in recent years due to the rapid accumulation of genetic sequence data, continuous improvements to computer hardware and software, and the development of sophisticated analytical methods. The increasing availability of large genomic data sets requires powerful statistical methods to analyse and interpret them, generating both computational and conceptual challenges for the field. This book provides a comprehensive coverage of modern statistical and computational methods used in molecular evolutionary analysis, such as maximum likelihood and Bayesian statistics. It describes the models, methods and algorithms that are most useful for analysing the ever-increasing supply of molecular sequence data, with a view to furthering our understanding of the evolution of genes and genomes. The book emphasizes essential concepts rather than mathematical proofs. It includes detailed derivations and implementation details, as well as numerous illustrations, worked examples, and exercises.Less
The field of molecular evolution has experienced explosive growth in recent years due to the rapid accumulation of genetic sequence data, continuous improvements to computer hardware and software, and the development of sophisticated analytical methods. The increasing availability of large genomic data sets requires powerful statistical methods to analyse and interpret them, generating both computational and conceptual challenges for the field. This book provides a comprehensive coverage of modern statistical and computational methods used in molecular evolutionary analysis, such as maximum likelihood and Bayesian statistics. It describes the models, methods and algorithms that are most useful for analysing the ever-increasing supply of molecular sequence data, with a view to furthering our understanding of the evolution of genes and genomes. The book emphasizes essential concepts rather than mathematical proofs. It includes detailed derivations and implementation details, as well as numerous illustrations, worked examples, and exercises.
Gary Goertz and James Mahoney
- Published in print:
- 2012
- Published Online:
- October 2017
- ISBN:
- 9780691149707
- eISBN:
- 9781400845446
- Item type:
- chapter
- Publisher:
- Princeton University Press
- DOI:
- 10.23943/princeton/9780691149707.003.0007
- Subject:
- Sociology, Social Research and Statistics
This chapter examines how the qualitative and quantitative research traditions treat within-case analysis versus cross-case analysis for causal inference. In qualitative research, the primary focus ...
More
This chapter examines how the qualitative and quantitative research traditions treat within-case analysis versus cross-case analysis for causal inference. In qualitative research, the primary focus is on specific events and processes taking place within each individual case. Leading qualitative methodologies of hypothesis testing, such as process tracing and counterfactual analysis, are fundamentally methods of within-case analysis. By contrast, quantitative research traditionally involves exclusively cross-case comparison. The chapter begins with a comparison of the typical roles (or nonroles) of within-case and cross-case analysis in case studies versus experiments. It then considers how causal inference in quantitative and qualitative research is linked to the use of “data-set observations” and “causal-process observations,” respectively. It also explains the differences between process-tracing tests and statistical tests and concludes by suggesting that cross-case analysis and within-case analysis can and often should be combined.Less
This chapter examines how the qualitative and quantitative research traditions treat within-case analysis versus cross-case analysis for causal inference. In qualitative research, the primary focus is on specific events and processes taking place within each individual case. Leading qualitative methodologies of hypothesis testing, such as process tracing and counterfactual analysis, are fundamentally methods of within-case analysis. By contrast, quantitative research traditionally involves exclusively cross-case comparison. The chapter begins with a comparison of the typical roles (or nonroles) of within-case and cross-case analysis in case studies versus experiments. It then considers how causal inference in quantitative and qualitative research is linked to the use of “data-set observations” and “causal-process observations,” respectively. It also explains the differences between process-tracing tests and statistical tests and concludes by suggesting that cross-case analysis and within-case analysis can and often should be combined.
Kristin Vanderbilt and David Blankman
- Published in print:
- 2017
- Published Online:
- September 2017
- ISBN:
- 9780300209549
- eISBN:
- 9780300228038
- Item type:
- chapter
- Publisher:
- Yale University Press
- DOI:
- 10.12987/yale/9780300209549.003.0013
- Subject:
- History, History of Science, Technology, and Medicine
Science has become a data-intensive enterprise. Data sets are commonly being stored in public data repositories and are thus available for others to use in new, often unexpected ways. Such re-use of ...
More
Science has become a data-intensive enterprise. Data sets are commonly being stored in public data repositories and are thus available for others to use in new, often unexpected ways. Such re-use of data sets can take the form of reproducing the original analysis, analyzing the data in new ways, or combining multiple data sets into new data sets that are analyzed still further. A scientist who re-uses a data set collected by another must be able to assess its trustworthiness. This chapter reviews the types of errors that are found in metadata referring to data collected manually, data collected by instruments (sensors), and data recovered from specimens in museum collections. It also summarizes methods used to screen these types of data for errors. It stresses the importance of ensuring that metadata associated with a data set thoroughly document the error prevention, detection, and correction methods applied to the data set prior to publication.Less
Science has become a data-intensive enterprise. Data sets are commonly being stored in public data repositories and are thus available for others to use in new, often unexpected ways. Such re-use of data sets can take the form of reproducing the original analysis, analyzing the data in new ways, or combining multiple data sets into new data sets that are analyzed still further. A scientist who re-uses a data set collected by another must be able to assess its trustworthiness. This chapter reviews the types of errors that are found in metadata referring to data collected manually, data collected by instruments (sensors), and data recovered from specimens in museum collections. It also summarizes methods used to screen these types of data for errors. It stresses the importance of ensuring that metadata associated with a data set thoroughly document the error prevention, detection, and correction methods applied to the data set prior to publication.
Manuel Arellano
- Published in print:
- 2003
- Published Online:
- July 2005
- ISBN:
- 9780199245284
- eISBN:
- 9780191602481
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/0199245282.003.0005
- Subject:
- Economics and Finance, Econometrics
This chapter analyses the time series properties of panel data sets, focusing on short panels. It discusses time effects and moving average covariances. It presents estimates of covariance structures ...
More
This chapter analyses the time series properties of panel data sets, focusing on short panels. It discusses time effects and moving average covariances. It presents estimates of covariance structures and tests the permanent income hypothesis.Less
This chapter analyses the time series properties of panel data sets, focusing on short panels. It discusses time effects and moving average covariances. It presents estimates of covariance structures and tests the permanent income hypothesis.
Andreas Savvides and Thanasis Stengos
- Published in print:
- 2008
- Published Online:
- June 2013
- ISBN:
- 9780804755405
- eISBN:
- 9780804769761
- Item type:
- book
- Publisher:
- Stanford University Press
- DOI:
- 10.11126/stanford/9780804755405.001.0001
- Subject:
- Economics and Finance, Econometrics
This book provides an in-depth investigation of the link between human capital and economic growth. The book examines the determinants of economic growth through a historical overview of the concept ...
More
This book provides an in-depth investigation of the link between human capital and economic growth. The book examines the determinants of economic growth through a historical overview of the concept of human capital. The text fosters an understanding of the connection between human capital and economic growth through the exploration of different theoretical approaches, a review of the literature, and the application of nonlinear estimation techniques to a comprehensive data set. The book discusses nonparametric econometric techniques and their application to estimating nonlinearities—which has emerged as one of the most salient features of empirical work in modeling the human capital–growth relationship, and the process of economic growth in general.Less
This book provides an in-depth investigation of the link between human capital and economic growth. The book examines the determinants of economic growth through a historical overview of the concept of human capital. The text fosters an understanding of the connection between human capital and economic growth through the exploration of different theoretical approaches, a review of the literature, and the application of nonlinear estimation techniques to a comprehensive data set. The book discusses nonparametric econometric techniques and their application to estimating nonlinearities—which has emerged as one of the most salient features of empirical work in modeling the human capital–growth relationship, and the process of economic growth in general.
Chapelle Olivier, Schölkopf Bernhard, and Zien Alexander
- Published in print:
- 2006
- Published Online:
- August 2013
- ISBN:
- 9780262033589
- eISBN:
- 9780262255899
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262033589.003.0021
- Subject:
- Computer Science, Machine Learning
This chapter assesses the strengths and weaknesses of different semi-supervised learning (SSL) algorithms through inviting the authors of each chapter in this book to apply their algorithms to eight ...
More
This chapter assesses the strengths and weaknesses of different semi-supervised learning (SSL) algorithms through inviting the authors of each chapter in this book to apply their algorithms to eight benchmark data sets. These data sets encompass both artificial and real-world problems. Details are provided on how the algorithms were applied, especially how hyperparameters were chosen given the few labeled points. Finally, the chapter concludes by presenting and discussing the empirical performance.Less
This chapter assesses the strengths and weaknesses of different semi-supervised learning (SSL) algorithms through inviting the authors of each chapter in this book to apply their algorithms to eight benchmark data sets. These data sets encompass both artificial and real-world problems. Details are provided on how the algorithms were applied, especially how hyperparameters were chosen given the few labeled points. Finally, the chapter concludes by presenting and discussing the empirical performance.
Katharina Pistor
- Published in print:
- 2012
- Published Online:
- September 2012
- ISBN:
- 9780199658244
- eISBN:
- 9780199949915
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780199658244.003.0007
- Subject:
- Law, Public International Law
This chapter studies the history of the first generation of indicators of governmental institutional quality. These (international) indicators include labels such as ‘bureaucratic efficiency’ and ...
More
This chapter studies the history of the first generation of indicators of governmental institutional quality. These (international) indicators include labels such as ‘bureaucratic efficiency’ and ‘rule of law.’ This discussion also addresses the argument that it is the reversal, and not the creation, of indicators designed to justify large-scale development policies by leading multilateral agencies that is problematic. This chapter emphasizes the importance of using alternative data sets and making raw data easily available, in order to challenge the present assumptions instead of merely aiming to validate them and the policy choices with which they are associated with.Less
This chapter studies the history of the first generation of indicators of governmental institutional quality. These (international) indicators include labels such as ‘bureaucratic efficiency’ and ‘rule of law.’ This discussion also addresses the argument that it is the reversal, and not the creation, of indicators designed to justify large-scale development policies by leading multilateral agencies that is problematic. This chapter emphasizes the importance of using alternative data sets and making raw data easily available, in order to challenge the present assumptions instead of merely aiming to validate them and the policy choices with which they are associated with.
Ian Budge, Hans Keman, Michael McDonald, and Paul Pennings
- Published in print:
- 2012
- Published Online:
- September 2012
- ISBN:
- 9780199654932
- eISBN:
- 9780191741685
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780199654932.003.0010
- Subject:
- Political Science, Comparative Politics
As citizen-electors and voters are different bodies of people, particularly in terms of the choice situation within which they express their preferences, we cannot simply assume that they are the ...
More
As citizen-electors and voters are different bodies of people, particularly in terms of the choice situation within which they express their preferences, we cannot simply assume that they are the same. They have to be compared explicitly. Democracy has to guarantee a necessary connection between public policy and citizen preferences, so we have in the first place to see how far voter and electoral preferences correspond. It cannot simply be assumed that they do. Comparing Eurobaromoter Left-Right self placements by electors with the positions stated out by voters in elections we find only limited correspondence in terms of election-by-election congruence and responsiveness but very substantial long term correspondence, especially between their mean positions. This gives an assurance that elections and voters do represent electors fairly over the long term.Less
As citizen-electors and voters are different bodies of people, particularly in terms of the choice situation within which they express their preferences, we cannot simply assume that they are the same. They have to be compared explicitly. Democracy has to guarantee a necessary connection between public policy and citizen preferences, so we have in the first place to see how far voter and electoral preferences correspond. It cannot simply be assumed that they do. Comparing Eurobaromoter Left-Right self placements by electors with the positions stated out by voters in elections we find only limited correspondence in terms of election-by-election congruence and responsiveness but very substantial long term correspondence, especially between their mean positions. This gives an assurance that elections and voters do represent electors fairly over the long term.
David Hoyle
- Published in print:
- 2015
- Published Online:
- October 2017
- ISBN:
- 9780691147611
- eISBN:
- 9781400866595
- Item type:
- chapter
- Publisher:
- Princeton University Press
- DOI:
- 10.23943/princeton/9780691147611.003.0016
- Subject:
- Mathematics, Probability / Statistics
This chapter focuses on the occurrence of Benford's law within the natural sciences, emphasizing that Benford's law is to be expected within many scientific data sets. This is a consequence of the ...
More
This chapter focuses on the occurrence of Benford's law within the natural sciences, emphasizing that Benford's law is to be expected within many scientific data sets. This is a consequence of the reasonable assumption that a particular scientific process is scale invariant, or nearly scale invariant. The chapter reviews previous work from many fields showing a number of data sets that conform to Benford's law. In each case the underlying scale invariance, or mechanism that leads to scale invariance, is identified. Having established that Benford's law is to be expected for many data sets in the natural sciences, the second half of the chapter highlights generic potential applications of Benford's law. Finally, direct applications of Benford's law are highlighted, whereby the Benford distribution is used in a constructive way rather than simply assessing an already existing data set.Less
This chapter focuses on the occurrence of Benford's law within the natural sciences, emphasizing that Benford's law is to be expected within many scientific data sets. This is a consequence of the reasonable assumption that a particular scientific process is scale invariant, or nearly scale invariant. The chapter reviews previous work from many fields showing a number of data sets that conform to Benford's law. In each case the underlying scale invariance, or mechanism that leads to scale invariance, is identified. Having established that Benford's law is to be expected for many data sets in the natural sciences, the second half of the chapter highlights generic potential applications of Benford's law. Finally, direct applications of Benford's law are highlighted, whereby the Benford distribution is used in a constructive way rather than simply assessing an already existing data set.
Steven J. Miller (ed.)
- Published in print:
- 2015
- Published Online:
- October 2017
- ISBN:
- 9780691147611
- eISBN:
- 9781400866595
- Item type:
- chapter
- Publisher:
- Princeton University Press
- DOI:
- 10.23943/princeton/9780691147611.003.0008
- Subject:
- Mathematics, Probability / Statistics
This chapter reviews Benford's law as it relates to detecting fraud and errors. It starts with an introduction and a review of selected parts of Benford's original 1938 paper, which shows the results ...
More
This chapter reviews Benford's law as it relates to detecting fraud and errors. It starts with an introduction and a review of selected parts of Benford's original 1938 paper, which shows the results of his analysis of 20,229 records from a total of 20 sets of data. Thereafter, four complaint data sets are reviewed, pertaining to interest received amounts from tax return data, census results from the 2000 United States census, results of an analysis of streamflow data, and results of analysis of accounts payable data. This is then followed by a discussion of several Benford analyses of fraudulent data related to accounts payable amounts, payroll data, and reported corporate numbers. The concluding discussion talks about these findings and sheds some light on when Benford's law might, or might not, detect fraud or errors.Less
This chapter reviews Benford's law as it relates to detecting fraud and errors. It starts with an introduction and a review of selected parts of Benford's original 1938 paper, which shows the results of his analysis of 20,229 records from a total of 20 sets of data. Thereafter, four complaint data sets are reviewed, pertaining to interest received amounts from tax return data, census results from the 2000 United States census, results of an analysis of streamflow data, and results of analysis of accounts payable data. This is then followed by a discussion of several Benford analyses of fraudulent data related to accounts payable amounts, payroll data, and reported corporate numbers. The concluding discussion talks about these findings and sheds some light on when Benford's law might, or might not, detect fraud or errors.
Steven J. Miller
- Published in print:
- 2015
- Published Online:
- October 2017
- ISBN:
- 9780691147611
- eISBN:
- 9781400866595
- Item type:
- chapter
- Publisher:
- Princeton University Press
- DOI:
- 10.23943/princeton/9780691147611.003.0001
- Subject:
- Mathematics, Probability / Statistics
This chapter provides a brief overview of Benford's law. It states Benford's law of digit bias and describes its history. The chapter then discusses the origins of Benford's law and gives numerous ...
More
This chapter provides a brief overview of Benford's law. It states Benford's law of digit bias and describes its history. The chapter then discusses the origins of Benford's law and gives numerous examples of data sets that follow this law, as well as some that do not. From these examples this chapter extracts several explanations as to the prevalence of Benford's law. Finally, the chapter closes with a quick summary of many of the diverse situations in which Benford's law holds, and why an observation that began in looking at the wear and tear in tables of logarithms has become a major tool in subjects as diverse as detecting tax fraud and building efficient computers.Less
This chapter provides a brief overview of Benford's law. It states Benford's law of digit bias and describes its history. The chapter then discusses the origins of Benford's law and gives numerous examples of data sets that follow this law, as well as some that do not. From these examples this chapter extracts several explanations as to the prevalence of Benford's law. Finally, the chapter closes with a quick summary of many of the diverse situations in which Benford's law holds, and why an observation that began in looking at the wear and tear in tables of logarithms has become a major tool in subjects as diverse as detecting tax fraud and building efficient computers.
Emery R. Boose and Barbara S. Lerner
- Published in print:
- 2017
- Published Online:
- September 2017
- ISBN:
- 9780300209549
- eISBN:
- 9780300228038
- Item type:
- chapter
- Publisher:
- Yale University Press
- DOI:
- 10.12987/yale/9780300209549.003.0014
- Subject:
- History, History of Science, Technology, and Medicine
The metadata that describe how scientific data are created and analyzed are typically limited to a general description of data sources, software used, and statistical tests applied and are presented ...
More
The metadata that describe how scientific data are created and analyzed are typically limited to a general description of data sources, software used, and statistical tests applied and are presented in narrative form in the methods section of a scientific paper or a data set description. Recognizing that such narratives are usually inadequate to support reproduction of the analysis of the original work, a growing number of journals now require that authors also publish their data. However, finer-scale metadata that describe exactly how individual items of data were created and transformed and the processes by which this was done are rarely provided, even though such metadata have great potential to improve data set reliability. This chapter focuses on the detailed process metadata, called “data provenance,” required to ensure reproducibility of analyses and reliable re-use of the data.Less
The metadata that describe how scientific data are created and analyzed are typically limited to a general description of data sources, software used, and statistical tests applied and are presented in narrative form in the methods section of a scientific paper or a data set description. Recognizing that such narratives are usually inadequate to support reproduction of the analysis of the original work, a growing number of journals now require that authors also publish their data. However, finer-scale metadata that describe exactly how individual items of data were created and transformed and the processes by which this was done are rarely provided, even though such metadata have great potential to improve data set reliability. This chapter focuses on the detailed process metadata, called “data provenance,” required to ensure reproducibility of analyses and reliable re-use of the data.
Hisam Kim
- Published in print:
- 2010
- Published Online:
- February 2013
- ISBN:
- 9780226386850
- eISBN:
- 9780226386881
- Item type:
- chapter
- Publisher:
- University of Chicago Press
- DOI:
- 10.7208/chicago/9780226386881.003.0008
- Subject:
- Economics and Finance, South and East Asia
This chapter explains variables regarding intergenerational transfers in three Korean data sets and compares them with those in a Health and Retirement Study (HRS), which is an elderly panel data ...
More
This chapter explains variables regarding intergenerational transfers in three Korean data sets and compares them with those in a Health and Retirement Study (HRS), which is an elderly panel data set. It is found that two or three out of five Korean households in the study provided some type of financial support for their aged parents. In the face of rapid population aging and prevailing individualism, the social norm for supporting the elderly is changing from transfers to self-responsibilities. As such, individuals might have to consider longevity risk as well as keeping a balance between savings for their old age and spending on their children, and investing in their own human capital. Before introducing new welfare programs, the existence and magnitude of latent demands for the service and potential crowding-out effect of the program on private sectors should be accounted for and measured in a reasonable way.Less
This chapter explains variables regarding intergenerational transfers in three Korean data sets and compares them with those in a Health and Retirement Study (HRS), which is an elderly panel data set. It is found that two or three out of five Korean households in the study provided some type of financial support for their aged parents. In the face of rapid population aging and prevailing individualism, the social norm for supporting the elderly is changing from transfers to self-responsibilities. As such, individuals might have to consider longevity risk as well as keeping a balance between savings for their old age and spending on their children, and investing in their own human capital. Before introducing new welfare programs, the existence and magnitude of latent demands for the service and potential crowding-out effect of the program on private sectors should be accounted for and measured in a reasonable way.