Lyn C. Thomas
- Published in print:
- 2009
- Published Online:
- May 2009
- ISBN:
- 9780199232130
- eISBN:
- 9780191715914
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780199232130.003.0002
- Subject:
- Mathematics, Applied Mathematics, Mathematical Finance
This chapter describes the different ways of measuring how good a scoring system is. It clarifies that there are three different ways of measuring the systems: their ability to discriminate Goods ...
More
This chapter describes the different ways of measuring how good a scoring system is. It clarifies that there are three different ways of measuring the systems: their ability to discriminate Goods from Bads, their prediction of the probability of a borrower defaulting, and the accuracy of their categorical calibration. Discrimination, which only requires knowledge of the scorecard itself, is measured using ROC curves, Cumulative Accuracy Profiles (CAP), Gini Coefficient, AUROC, Divergence, Mahalonobis distance, and Somers D-concordance statistic. Probability predictions, which need the population odds as well as the scorecard, are measured using the binomial and normal tests and the Hosmer-Lemeshow test. Categorical calibration, which needs the cut-off score as well as the scorecard, is measured using confusion matrix, swap sets, specificity and sensitivity, and Type I and Type II errors. The chapter also explains how, if one has built a suite of scorecards each on a different segment of the population, one can combine the measures of the different scorecards into an overall measure.Less
This chapter describes the different ways of measuring how good a scoring system is. It clarifies that there are three different ways of measuring the systems: their ability to discriminate Goods from Bads, their prediction of the probability of a borrower defaulting, and the accuracy of their categorical calibration. Discrimination, which only requires knowledge of the scorecard itself, is measured using ROC curves, Cumulative Accuracy Profiles (CAP), Gini Coefficient, AUROC, Divergence, Mahalonobis distance, and Somers D-concordance statistic. Probability predictions, which need the population odds as well as the scorecard, are measured using the binomial and normal tests and the Hosmer-Lemeshow test. Categorical calibration, which needs the cut-off score as well as the scorecard, is measured using confusion matrix, swap sets, specificity and sensitivity, and Type I and Type II errors. The chapter also explains how, if one has built a suite of scorecards each on a different segment of the population, one can combine the measures of the different scorecards into an overall measure.
Thomas D. Wickens
- Published in print:
- 2001
- Published Online:
- April 2010
- ISBN:
- 9780195092509
- eISBN:
- 9780199893812
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780195092509.003.0007
- Subject:
- Psychology, Cognitive Psychology
In discrimination, signals can be of two or more types, and the observer is presented with one of them and tries to say which it was. This chapter extends the signal-detection model to discrimination ...
More
In discrimination, signals can be of two or more types, and the observer is presented with one of them and tries to say which it was. This chapter extends the signal-detection model to discrimination and related to the detection.Less
In discrimination, signals can be of two or more types, and the observer is presented with one of them and tries to say which it was. This chapter extends the signal-detection model to discrimination and related to the detection.
Kai R. Larsen and Daniel S. Becker
- Published in print:
- 2021
- Published Online:
- July 2021
- ISBN:
- 9780190941659
- eISBN:
- 9780197601495
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/oso/9780190941659.003.0004
- Subject:
- Business and Management, Information Technology, Innovation
After preparing your dataset, the business problem should be quite familiar, along with the subject matter and the content of the dataset. This section is about modeling data, using data to train ...
More
After preparing your dataset, the business problem should be quite familiar, along with the subject matter and the content of the dataset. This section is about modeling data, using data to train algorithms to create models that can be used to predict future events or understand past events. The section shows where data modeling fits in the overall machine learning pipeline. Traditionally, we store real-world data in one or more databases or files. This data is extracted, and features and a target (T) are created and submitted to the “Model Data” stage (the topic of this section). Following the completion of this stage, the model produced is examined (Section V) and placed into production. With the model in the production system, present data generated from the real-world environment is inputted into the system. In the example case of a diabetes patient, we enter a new patient’s information electronic health record into the system, and a database lookup retrieves additional data for feature creation.Less
After preparing your dataset, the business problem should be quite familiar, along with the subject matter and the content of the dataset. This section is about modeling data, using data to train algorithms to create models that can be used to predict future events or understand past events. The section shows where data modeling fits in the overall machine learning pipeline. Traditionally, we store real-world data in one or more databases or files. This data is extracted, and features and a target (T) are created and submitted to the “Model Data” stage (the topic of this section). Following the completion of this stage, the model produced is examined (Section V) and placed into production. With the model in the production system, present data generated from the real-world environment is inputted into the system. In the example case of a diabetes patient, we enter a new patient’s information electronic health record into the system, and a database lookup retrieves additional data for feature creation.
Kai R. Larsen and Daniel S. Becker
- Published in print:
- 2021
- Published Online:
- July 2021
- ISBN:
- 9780190941659
- eISBN:
- 9780197601495
- Item type:
- book
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/oso/9780190941659.001.0001
- Subject:
- Business and Management, Information Technology, Innovation
In Automated Machine Learning for Business, we teach the machine learning process using a new development in data science: automated machine learning. AutoML, when implemented properly, makes machine ...
More
In Automated Machine Learning for Business, we teach the machine learning process using a new development in data science: automated machine learning. AutoML, when implemented properly, makes machine learning accessible to most people because it removes the need for years of experience in the most arcane aspects of data science, such as the math, statistics, and computer science skills required to become a top contender in traditional machine learning. Anyone trained in the use of AutoML can use it to test their ideas and support the quality of those ideas during presentations to management and stakeholder groups. Because the requisite investment is one semester-long undergraduate course rather than a year in a graduate program, these tools will likely become a core component of undergraduate programs, and over time, even the high school curriculum.Less
In Automated Machine Learning for Business, we teach the machine learning process using a new development in data science: automated machine learning. AutoML, when implemented properly, makes machine learning accessible to most people because it removes the need for years of experience in the most arcane aspects of data science, such as the math, statistics, and computer science skills required to become a top contender in traditional machine learning. Anyone trained in the use of AutoML can use it to test their ideas and support the quality of those ideas during presentations to management and stakeholder groups. Because the requisite investment is one semester-long undergraduate course rather than a year in a graduate program, these tools will likely become a core component of undergraduate programs, and over time, even the high school curriculum.
Kai R. Larsen and Daniel S. Becker
- Published in print:
- 2021
- Published Online:
- July 2021
- ISBN:
- 9780190941659
- eISBN:
- 9780197601495
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/oso/9780190941659.003.0005
- Subject:
- Business and Management, Information Technology, Innovation
Having evaluated all the measures and selected the best model for this case, and much of the machine learning process has been clarified, our understanding of the problem context is still relatively ...
More
Having evaluated all the measures and selected the best model for this case, and much of the machine learning process has been clarified, our understanding of the problem context is still relatively immature. That is, while we have carefully specified the problem, we still do not fully understand what drives that target. Convincing management to support the implementation of the model typically includes explaining the answers to “why,” “what,” “where,” and “when” questions embedded in the model. While the model may be the best overall possible model according to selected measures, for the particular problem related to hospital readmissions, it is still not clear why the model predicts the readmission of some patients will be readmitted and that others will not. It also remains unknown what features drive these outcomes, where the patients who were readmitted come from, or whether or not this is relevant. In this case, access to time information is also unavailable––when, so it is not relevant, but it is easy to imagine that patients admitted in the middle of the night might have worse outcomes due to tired staff or lack of access to the best physicians. If we can convince management that the current analysis is useful, we can likely also make a case for the collection of additional data. The new data might include more information on past interactions with this patient, as well as date and time information to test the hypothesis about the effect of time-of-admission and whether the specific staff caring for a patient matters.Less
Having evaluated all the measures and selected the best model for this case, and much of the machine learning process has been clarified, our understanding of the problem context is still relatively immature. That is, while we have carefully specified the problem, we still do not fully understand what drives that target. Convincing management to support the implementation of the model typically includes explaining the answers to “why,” “what,” “where,” and “when” questions embedded in the model. While the model may be the best overall possible model according to selected measures, for the particular problem related to hospital readmissions, it is still not clear why the model predicts the readmission of some patients will be readmitted and that others will not. It also remains unknown what features drive these outcomes, where the patients who were readmitted come from, or whether or not this is relevant. In this case, access to time information is also unavailable––when, so it is not relevant, but it is easy to imagine that patients admitted in the middle of the night might have worse outcomes due to tired staff or lack of access to the best physicians. If we can convince management that the current analysis is useful, we can likely also make a case for the collection of additional data. The new data might include more information on past interactions with this patient, as well as date and time information to test the hypothesis about the effect of time-of-admission and whether the specific staff caring for a patient matters.