Kai R. Larsen and Daniel S. Becker
- Published in print:
- 2021
- Published Online:
- July 2021
- ISBN:
- 9780190941659
- eISBN:
- 9780197601495
- Item type:
- book
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/oso/9780190941659.001.0001
- Subject:
- Business and Management, Information Technology, Innovation
In Automated Machine Learning for Business, we teach the machine learning process using a new development in data science: automated machine learning. AutoML, when implemented properly, makes machine ...
More
In Automated Machine Learning for Business, we teach the machine learning process using a new development in data science: automated machine learning. AutoML, when implemented properly, makes machine learning accessible to most people because it removes the need for years of experience in the most arcane aspects of data science, such as the math, statistics, and computer science skills required to become a top contender in traditional machine learning. Anyone trained in the use of AutoML can use it to test their ideas and support the quality of those ideas during presentations to management and stakeholder groups. Because the requisite investment is one semester-long undergraduate course rather than a year in a graduate program, these tools will likely become a core component of undergraduate programs, and over time, even the high school curriculum.Less
In Automated Machine Learning for Business, we teach the machine learning process using a new development in data science: automated machine learning. AutoML, when implemented properly, makes machine learning accessible to most people because it removes the need for years of experience in the most arcane aspects of data science, such as the math, statistics, and computer science skills required to become a top contender in traditional machine learning. Anyone trained in the use of AutoML can use it to test their ideas and support the quality of those ideas during presentations to management and stakeholder groups. Because the requisite investment is one semester-long undergraduate course rather than a year in a graduate program, these tools will likely become a core component of undergraduate programs, and over time, even the high school curriculum.
Kai R. Larsen and Daniel S. Becker
- Published in print:
- 2021
- Published Online:
- July 2021
- ISBN:
- 9780190941659
- eISBN:
- 9780197601495
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/oso/9780190941659.003.0001
- Subject:
- Business and Management, Information Technology, Innovation
Machine learning is involved in search, translation, detecting depression, likelihood of college dropout, finding lost children, and to sell all kinds of products. While barely beyond its inception, ...
More
Machine learning is involved in search, translation, detecting depression, likelihood of college dropout, finding lost children, and to sell all kinds of products. While barely beyond its inception, the current machine learning revolution will affect people and organizations no less than the Industrial Revolution’s effect on weavers and many other skilled laborers. Machine learning will automate hundreds of millions of jobs that were considered too complex for machines ever to take over even a decade ago, including driving, flying, painting, programming, and customer service, as well as many of the jobs previously reserved for humans in the fields of finance, marketing, operations, accounting, and human resources. This section explains how automated machine learning addresses exploratory data analysis, feature engineering, algorithm selection, hyperparameter tuning, and model diagnostics. The section covers the eight criteria considered essential for AutoML to have significant impact: accuracy, productivity, ease of use, understanding and learning, resource availability, process transparency, generalization, and recommended actions.Less
Machine learning is involved in search, translation, detecting depression, likelihood of college dropout, finding lost children, and to sell all kinds of products. While barely beyond its inception, the current machine learning revolution will affect people and organizations no less than the Industrial Revolution’s effect on weavers and many other skilled laborers. Machine learning will automate hundreds of millions of jobs that were considered too complex for machines ever to take over even a decade ago, including driving, flying, painting, programming, and customer service, as well as many of the jobs previously reserved for humans in the fields of finance, marketing, operations, accounting, and human resources. This section explains how automated machine learning addresses exploratory data analysis, feature engineering, algorithm selection, hyperparameter tuning, and model diagnostics. The section covers the eight criteria considered essential for AutoML to have significant impact: accuracy, productivity, ease of use, understanding and learning, resource availability, process transparency, generalization, and recommended actions.
Kai R. Larsen and Daniel S. Becker
- Published in print:
- 2021
- Published Online:
- July 2021
- ISBN:
- 9780190941659
- eISBN:
- 9780197601495
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/oso/9780190941659.003.0002
- Subject:
- Business and Management, Information Technology, Innovation
This section covers the first steps of a the Machine Learning Life Cycle Model; how to specify a business problem, acquire subject matter expertise, define prediction target, define unit of analysis, ...
More
This section covers the first steps of a the Machine Learning Life Cycle Model; how to specify a business problem, acquire subject matter expertise, define prediction target, define unit of analysis, identify success criteria, evaluate risks, and finally, decide whether to continue a project. Focus is on who will use the model, whether management is supportive, whether the drivers of the model can be visualized, and how much value a model can produce.Less
This section covers the first steps of a the Machine Learning Life Cycle Model; how to specify a business problem, acquire subject matter expertise, define prediction target, define unit of analysis, identify success criteria, evaluate risks, and finally, decide whether to continue a project. Focus is on who will use the model, whether management is supportive, whether the drivers of the model can be visualized, and how much value a model can produce.
Kai R. Larsen and Daniel S. Becker
- Published in print:
- 2021
- Published Online:
- July 2021
- ISBN:
- 9780190941659
- eISBN:
- 9780197601495
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/oso/9780190941659.003.0004
- Subject:
- Business and Management, Information Technology, Innovation
After preparing your dataset, the business problem should be quite familiar, along with the subject matter and the content of the dataset. This section is about modeling data, using data to train ...
More
After preparing your dataset, the business problem should be quite familiar, along with the subject matter and the content of the dataset. This section is about modeling data, using data to train algorithms to create models that can be used to predict future events or understand past events. The section shows where data modeling fits in the overall machine learning pipeline. Traditionally, we store real-world data in one or more databases or files. This data is extracted, and features and a target (T) are created and submitted to the “Model Data” stage (the topic of this section). Following the completion of this stage, the model produced is examined (Section V) and placed into production. With the model in the production system, present data generated from the real-world environment is inputted into the system. In the example case of a diabetes patient, we enter a new patient’s information electronic health record into the system, and a database lookup retrieves additional data for feature creation.Less
After preparing your dataset, the business problem should be quite familiar, along with the subject matter and the content of the dataset. This section is about modeling data, using data to train algorithms to create models that can be used to predict future events or understand past events. The section shows where data modeling fits in the overall machine learning pipeline. Traditionally, we store real-world data in one or more databases or files. This data is extracted, and features and a target (T) are created and submitted to the “Model Data” stage (the topic of this section). Following the completion of this stage, the model produced is examined (Section V) and placed into production. With the model in the production system, present data generated from the real-world environment is inputted into the system. In the example case of a diabetes patient, we enter a new patient’s information electronic health record into the system, and a database lookup retrieves additional data for feature creation.
Kai R. Larsen and Daniel S. Becker
- Published in print:
- 2021
- Published Online:
- July 2021
- ISBN:
- 9780190941659
- eISBN:
- 9780197601495
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/oso/9780190941659.003.0005
- Subject:
- Business and Management, Information Technology, Innovation
Having evaluated all the measures and selected the best model for this case, and much of the machine learning process has been clarified, our understanding of the problem context is still relatively ...
More
Having evaluated all the measures and selected the best model for this case, and much of the machine learning process has been clarified, our understanding of the problem context is still relatively immature. That is, while we have carefully specified the problem, we still do not fully understand what drives that target. Convincing management to support the implementation of the model typically includes explaining the answers to “why,” “what,” “where,” and “when” questions embedded in the model. While the model may be the best overall possible model according to selected measures, for the particular problem related to hospital readmissions, it is still not clear why the model predicts the readmission of some patients will be readmitted and that others will not. It also remains unknown what features drive these outcomes, where the patients who were readmitted come from, or whether or not this is relevant. In this case, access to time information is also unavailable––when, so it is not relevant, but it is easy to imagine that patients admitted in the middle of the night might have worse outcomes due to tired staff or lack of access to the best physicians. If we can convince management that the current analysis is useful, we can likely also make a case for the collection of additional data. The new data might include more information on past interactions with this patient, as well as date and time information to test the hypothesis about the effect of time-of-admission and whether the specific staff caring for a patient matters.Less
Having evaluated all the measures and selected the best model for this case, and much of the machine learning process has been clarified, our understanding of the problem context is still relatively immature. That is, while we have carefully specified the problem, we still do not fully understand what drives that target. Convincing management to support the implementation of the model typically includes explaining the answers to “why,” “what,” “where,” and “when” questions embedded in the model. While the model may be the best overall possible model according to selected measures, for the particular problem related to hospital readmissions, it is still not clear why the model predicts the readmission of some patients will be readmitted and that others will not. It also remains unknown what features drive these outcomes, where the patients who were readmitted come from, or whether or not this is relevant. In this case, access to time information is also unavailable––when, so it is not relevant, but it is easy to imagine that patients admitted in the middle of the night might have worse outcomes due to tired staff or lack of access to the best physicians. If we can convince management that the current analysis is useful, we can likely also make a case for the collection of additional data. The new data might include more information on past interactions with this patient, as well as date and time information to test the hypothesis about the effect of time-of-admission and whether the specific staff caring for a patient matters.
Kai R. Larsen and Daniel S. Becker
- Published in print:
- 2021
- Published Online:
- July 2021
- ISBN:
- 9780190941659
- eISBN:
- 9780197601495
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/oso/9780190941659.003.0006
- Subject:
- Business and Management, Information Technology, Innovation
This section covers the final section of the machine learning life cycle. Consider these the most important steps of the entire process. This is the point at which we have the greatest potential to ...
More
This section covers the final section of the machine learning life cycle. Consider these the most important steps of the entire process. This is the point at which we have the greatest potential to help our organization reap the benefits of machine learning. In traditional information systems development, 60–80% of the cost of a system comes during the maintenance phase, so these steps are important. This section covers how to deploy a machine learning model, as well as documenting and maintaining this model. A chapter covers the seven types of target leakage followed by time-aware validation and time-series analysis.Less
This section covers the final section of the machine learning life cycle. Consider these the most important steps of the entire process. This is the point at which we have the greatest potential to help our organization reap the benefits of machine learning. In traditional information systems development, 60–80% of the cost of a system comes during the maintenance phase, so these steps are important. This section covers how to deploy a machine learning model, as well as documenting and maintaining this model. A chapter covers the seven types of target leakage followed by time-aware validation and time-series analysis.