*Diane J. Cook and Lawrence B. Holder*

- Published in print:
- 1999
- Published Online:
- November 2020
- ISBN:
- 9780195119404
- eISBN:
- 9780197561256
- Item type:
- chapter

- Publisher:
- Oxford University Press
- DOI:
- 10.1093/oso/9780195119404.003.0016
- Subject:
- Computer Science, Systems Analysis and Design

The large amount of data collected today is quickly overwhelming researchers’ abilities to interpret the data and discover interesting patterns. In response to this problem, a number of researchers ...
More

The large amount of data collected today is quickly overwhelming researchers’ abilities to interpret the data and discover interesting patterns. In response to this problem, a number of researchers have developed techniques for discovering concepts in databases. These techniques work well for data expressed in a nonstructural, attribute-value representation and address issues of data relevance, missing data, noise and uncertainty, and utilization of domain knowledge (Fisher, 1987; Cheeseman and Stutz, 1996). However, recent data acquisition projects are collecting structural data describing the relationships among the data objects. Correspondingly, there exists a need for techniques to analyze and discover concepts in structural databases (Fayyad et al., 1996b). One method for discovering knowledge in structural data is the identification of common substructures. The goal is to find substructures capable of compressing the data and to identify conceptually interesting substructures that enhance the interpretation of the data. Substructure discovery is the process of identifying concepts describing interesting and repetitive substructures within structural data. Once discovered, the substructure concept can be used to simplify the data by replacing instances of the substructure with a pointer to the newly discovered concept. The discovered substructure concepts allow abstraction over detailed structure in the original data and provide new, relevant attributes for interpreting the data. Iteration of the substructure discovery and replacement process constructs a hierarchical description of the structural data in terms of the discovered substructures. This hierarchy provides varying levels of interpretation that can be accessed based on the goals of the data analysis. We describe a system called Subdue that discovers interesting substructures in structural data based on the minimum description length (MDL) principle. The Subdue system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously discovered substructures, multiple passes of Subdue produce a hierarchical description of the structural regularities in the data. Subdue uses a computationally bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints.
Less

The large amount of data collected today is quickly overwhelming researchers’ abilities to interpret the data and discover interesting patterns. In response to this problem, a number of researchers have developed techniques for discovering concepts in databases. These techniques work well for data expressed in a nonstructural, attribute-value representation and address issues of data relevance, missing data, noise and uncertainty, and utilization of domain knowledge (Fisher, 1987; Cheeseman and Stutz, 1996). However, recent data acquisition projects are collecting structural data describing the relationships among the data objects. Correspondingly, there exists a need for techniques to analyze and discover concepts in structural databases (Fayyad et al., 1996b). One method for discovering knowledge in structural data is the identification of common substructures. The goal is to find substructures capable of compressing the data and to identify conceptually interesting substructures that enhance the interpretation of the data. Substructure discovery is the process of identifying concepts describing interesting and repetitive substructures within structural data. Once discovered, the substructure concept can be used to simplify the data by replacing instances of the substructure with a pointer to the newly discovered concept. The discovered substructure concepts allow abstraction over detailed structure in the original data and provide new, relevant attributes for interpreting the data. Iteration of the substructure discovery and replacement process constructs a hierarchical description of the structural data in terms of the discovered substructures. This hierarchy provides varying levels of interpretation that can be accessed based on the goals of the data analysis. We describe a system called Subdue that discovers interesting substructures in structural data based on the minimum description length (MDL) principle. The Subdue system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously discovered substructures, multiple passes of Subdue produce a hierarchical description of the structural regularities in the data. Subdue uses a computationally bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints.

*Lance Fortnow*

- Published in print:
- 2017
- Published Online:
- May 2018
- ISBN:
- 9780691175782
- eISBN:
- 9781400846610
- Item type:
- chapter

- Publisher:
- Princeton University Press
- DOI:
- 10.23943/princeton/9780691175782.003.0004
- Subject:
- Computer Science, Programming Languages

This chapter looks at some of the hardest problems in NP. Most of the NP problems that people considered in the mid-1970s either turned out to be NP-complete or people found efficient algorithms ...
More

This chapter looks at some of the hardest problems in NP. Most of the NP problems that people considered in the mid-1970s either turned out to be NP-complete or people found efficient algorithms putting them in P. However, some NP problems refused to be so nicely and quickly characterized. Some would be settled years later, and others are still not known. These NP problems include the graph isomorphism, one of the few problems whose difficulty seems somewhat harder than P but not as hard as NP-complete problems like Hamiltonian paths and max-cut. Other NP problems include prime numbers and factoring, and linear programming. The linear programming problem has good algorithms in theory and practice—they just happen to be two very different algorithms.Less

This chapter looks at some of the hardest problems in NP. Most of the NP problems that people considered in the mid-1970s either turned out to be NP-complete or people found efficient algorithms putting them in P. However, some NP problems refused to be so nicely and quickly characterized. Some would be settled years later, and others are still not known. These NP problems include the graph isomorphism, one of the few problems whose difficulty seems somewhat harder than P but not as hard as NP-complete problems like Hamiltonian paths and max-cut. Other NP problems include prime numbers and factoring, and linear programming. The linear programming problem has good algorithms in theory and practice—they just happen to be two very different algorithms.