Eliminating combinatorics from the match in production systems (or rule-based systems) is important for expert systems, real-time performance, machine learning (particularly with respect to the utility issue), parallel implementations and cognitive modeling. In , the unique-attribute representation was introduced to eliminate combinatorics from the match. However, in so doing, unique-attributes engender a sufficiently negative set of trade-offs, so that investigating whether there are alternative representations that yield better trade-offs becomes of critical importance.
This article identifies two promising spaces of such alternatives, and explores a number of the alternatives within these spaces. The first space is generated from local syntactic restrictions on working memory. Within this space, unique-attributes is shown to be the best alternative possible. The second space comes from restrictions on the search performed during the match of individual productions (match-search). In particular, this space is derived from the combination of a new, more relaxed, match formulation (instantiationless match) and a set of restrictions derived from the constraint-satisfaction literature. Within this space, new alternatives are found that outperform unique-attributes in some, but not yet all, domains.
Match algorithms capable of handling large amounts of dat% without giving up expressiveness are a key requirement for successful integration of relational database systems and powerful rule-based systems. Algorithms that have been used for database rule systems have usually been unable to support large and complex rule sets, while the algorithms that have been used for rule-based expert systems do not scale welt with data. Furthermore% these algorithms do not ~ovide support for collection (or set) oriented production languages. This paper ~oposes a basic shift in the nature of match algorithms: from tuple-oriented to collectwn-oriented. A collection-oriented match algorithm matches each condition in a production with a collection of tuples and generatea collection-oriented instarrtiutwns, i.e., instantiation that have collection of tuples corresponding to each condition. This approach shows great promise for efllciently matching expressive productions against large amounts of data. In addition, it provides direct support for collection-oriented production languages. We have found that many existing tuple-oriented match algorithms can be easily rmnsfonned to their collection-oriented analogues. This paper presents the transformation of Rete to Collection Rete as an example and compares the two based on a set of benchmarks. Results presented in this paper show tha~ for large amounts of data, a relatively underoptitnized implementation of Collection Rete achieves orders of magnitude improvement in time and space over an optimized version of Rete. The results establish the feasibility of collection-oriented match for integrated database-production systems.
Knowledge-refinement is a central problem in the field of expert systems[Buchanan and Shortliffe,1984]. It refers to the progressive refinement of the initial knowledge-base of an expert systeminto a high-performance knowledge-base. For rule-based systems, refinement implies the addition,deletion and modification of rules in the system so as to improve the system’sempirical adequacy,i.e., its ability to reach correct conclusions in the problems it is intended to solve[Ginsberget al.,1988].The goal of our research effort is to understand the methodology for refining large rule-basedsystems, as well as to develop tools that will be useful in refining such systems. The vehicle forour investigation isspam, a production system (rule-based system) for the interpretation of aerialimagery[McKeownet al., 1985, McKeownet al., 1989]. It is a mature research system havingover 600 productions, many of which interact with a variety of complex (non-rule-based) geometricalgorithms. A typical scene analysis task requires between50,000 to 400,000 production firings andan execution time of the order of 2 to 4 cpu hours1.Large, compute-intensive systems likespamimpose some unique constraints on knowledge refine-ment. First, the problem of credit/blame-assignment is complicated; it is extremely difficult toisolate a single culprit production (or a set of culprit productions) to blame for an error observed inthe output. As a result, the methodology adopted in well-known systems such asseekandseek2[Politakis and Weiss, 1984, Ginsberget al., 1988], orkrust[Craw and Sleeman, 1991], cannotbe directly employed to refine knowledge inspam. These systems identify errorful output andbackward-chain through the executed rules to localize the source of the error. A second problemis thatspam’s long run-times make it difficult to rely on extensive experimentation for knowledgerefinement. Iterative refinement and generate and test methods typically require many experimentsto improve the knowledge base. In particular,spam’s long run-times prohibit a thorough search ofthe space of possible refinements.Givenspam’s constraints, we have employed a bottom-up approach for knowledge refinement.Specifically, we begin by refining small portions ofspam, and then attempt to understand the inter-actions of these refinements and their impact on intermediate results. Fortunately,spamis alreadydivided into four phases, facilitating the identification of the ”modular” pieces on which we focusour refinement efforts, as well as our evaluation efforts (discussed below). We begin by identifyinggaps and/or faults within small portions ofspam’s individual phases by comparing their output tothat of an expert. The knowledge is modified to more accurately match the expert’s output. Wethen evaluate the new output to see how well the refined knowledge is performing. This methodchains forward from the refined knowledge to an evaluation ofit’s performance. This is in contrastto the backward-chaining systems cited above, which identify errorful output and reason backwardto find the faulty knowledge (i.e., assign credit/blame). Backward chaining is difficult when theinteractions between rules are complex. In particular, inspam, this implies backward-chaininghundreds of thousands of rule firings as well as applicationsof complex algorithmic constraints,which would be extremely difficult.However, forward-chaining does not obviate the credit/blame assignment problem. It introducesthe problem in a new form — that of evaluating the impact of theknowledge refinements. Inparticular, in forward-chaining systems, given complex rule interactions, refinement in one partof the system need not improve the end result. This may occur because the refined portion maycontribute to overall improvement in a relatively small number of cases; or a second culprit compo-nent involved in the chaining may actually minimize the impact of the refinements. Hence, there is a need for evaluation and analysis ofintermediateresults. Of course, these intermediate resultsshould more than exactly evaluate the parts of the system refined as a result of interaction withthe expert — they should progressively yield a better understanding of how to change the rules tobetter the system’s overall performance.spam’s decomposition into phases helps us to a certainextent: the endpoints of the phases serve as potential intermediate evaluation points.In our work so far, we have focused on the second phase inspamlocal-consistency (lcc). Thisphase was chosen because most ofspam’s time is spent in this phase, and because it shows the mostpotential for future growth.lccperforms a modified constraint satisfaction between hypothesesgenerated inspam’s first phase. It applies constraints to a set of plausible hypotheses and prunesthe hypotheses that are inconsistent with those constraints. For example, objects hypothesized ashangar-buildings and objects hypothesized as parking-aprons would participate in the constrainthangar-buildings and parking-aprons are close together. This rule is realized as a distance con-straint on pairs of objects hypothesized to be hangar-buildings and parking-aprons. A successfulapplication of anlccconstraint provides support for each pair of hypotheses, and an unsuccessfulapplication decreases support for that pair of hypotheses.In essence, each constraint in thelccphase classifies pairs of hypotheses — either the constraintsupports that pair, or it does not. (lccandspamare described in further detail in Section 2.)In working toward refining these constraints, we posed several questions:1. What role does this constraint play in the interpretation process? 2. If the constraint does play a role, is it positive (helpful) or negative (unhelpful)?3. If the role is positive, and the constraint is numeric, can we optimize the constraint values?4. How effective (powerful) is this constant? 5. What is this constraint’s impact on run time?To address these questions, we began by asking a user to manually compile a database of correct intermediate outputs — the ground-truth database. The database and the system outputs were compared. An automatic procedure was created to adjust the spam knowledge base so that the correspondence between the ground-truth database and the knowledge base improved. This adjusted knowledge was then re-run through the system and the entire set of intermediate outputs(refined or otherwise) were evaluated. The somewhat surprising result of this analysis was that the expert’s input did not help as much as expected. This result raised questions about why the expert’s input was not as helpful as anticipated, and how the results should be evaluated.In order to better understand both the interactions between constraints and the effects of modifying the knowledge base, we did a brute-force evaluation of the results, providing some answers to the questions we posed above. Namely, the distance constraints do play positive roles in spam’s interpretation process, though they are not very powerful.More importantly, we now see that re-fined knowledge may apply selectively. That is, the set of objects affected by individual constraints largely overlaps, implying that spam could reduce computation by selectively applying constraints and still achieve the similar performance. Tools eliciting the expert’s input must carefully take these interactions into consideration. Finally, we can also conclude that intermediate result evaluation is not straight-forward. For example, the complex structures that spam generates using the results of constraint application, called functional-areas, must be matched and these matches evaluated.Having provided some of the motivation for this work, we begin by describing the spam image interpretation system system. Following this background material, we present our refinement methodology and an analysis of our results. We then re-evaluate the methodology and discuss further experimental results. Finally, we will outline some areas of future work.
Machine learning approaches to knowledge compilation seek to improve the perfonnance of problem-solvers by storing solutions to previously solved problems in an efficient, generalized fonn. The problem-solver retrieves these learned solutions in appropriate later situations to obtain results more efficiently. However, by relying on its learned knowledge to provide a solution, the problem-solver may miss an alternative solution of higher quality - one that could have been generated using the original (non-learned) problem-solving knowledge. This phenomenon is referred to as the ITUlSking effect of learning.
In this paper, we examine a sequence of possible solutions for the masking effect. Each solution refines and builds on the previous one. The fmal solution is based on cascaded filters. When learned knowledge is retrieved, these filters alert the system about the inappropriateness of this knowledge so that the system can then derive a better alternative solution. We analyze conditions under which this solution will perfonn better than the others, and present experimental data supportivt: of the analysis. This investigation is based on a simulated robot domain called Groundworld.
Combinatorial match in production systems (rule-based system) is problematical in several areas of production system application: real-time performance, learning new productions for performance improvement. modeling human cognition. and parallelization. The unique-attribute representation in production systems is a promising approach to eliminate match combinatorics. Earlier investigations have focused on the ability of unique-attributes to alleviate the problems caused b~ combinatorial match 1331. This paper reports on an additional benefit of unique-attributes: a specialized match algorithm called Uni-Rete. Uni-Rete is a specialization of the widely used Rete match algorirhm for unique-attributes. The paper presents performance results for Uni-Rete. which indicate over 10-fold speedup with respect to Rete. It also discusses the implications of Uni-Rete for non-unique-attribute systems.
Robots pursuing complex goals must plan paths according to several criteria of quality, including shortness, safety, speed and planning time. Many sources and kinds of knowledge, such as maps, procedures and perception, may be available or required. Both the quality criteria and sources of knowledge may vary widely over time, and in general they will interact. One approach to address this problem is to express all criteria and goals numerically in a single weighted graph, and then to search this graph to determine a path. Since this is problematic with symbolic or uncertain data and interacting criteria, we propose that what is needed instead is an integration of many kinds of planning capabilities. We describe a hybrid approach to integration, based on experiments with building simulated mobile robots using Soar, an integrated problem-solving and learning system. For flexibility, we have implemented a combination of internal planning, reactive capabilities and specialized tools. We illustrate how these components can complement each other's limitations and produce plans which integrate geometric and task knowledge.
This paper describes an initial exploration into large learning systems, i.e., systems that learn a large number of rules. Given the well-known utility problem in learning systems, efficiency questions are a major concern. But the questions are much broader than just efficiency, e.g., will the effectiveness of the learned rules change with scale? This investigation uses a single problem-solving and learning system, Dispatcher-Soar, to begin to get answers to these questions. Dispatcher-Soar has currently learned 10,112 new productions, on top of an initial system of 1,819 productions, so its total size is 11,931 productions. This represents one of the largest production systems in existence, and by far the largest number of rules ever learned by an AI system. This paper presents a variety of data from our experiments with Dispatcher-Soar and raises important questions for large learning systems1
In the past, researchers working on parallel implementations of production systems have focused on shared- memory multiprocessors and special-purpose architectures. Message-passing computers have not been given as much attention. The main reasons for this have been the large message- passing latency (as large as a few milliseconds) and high message-handling overheads (several hundred microseconds) associated with the first generation message-passing computers. These overheads were too large for parallel implementations of production systems, which require a fine-grain decomposition to obtain a significant speedup. Recent advances in interconnection network technology and processing element design, however, promise to reduce the network latency and message-handling overhead by 2-3 orders of magnitude, making these computers much more interesting for implementation of production systems. In this paper, we examine the suitability of message-passing computers for parallel implementations of production systems. We present two mappings for production systems on these computers, one targeted toward fine-grained message-passing machines and the other targeted toward medium-grained machines. We also present simulation results for the medium- grained mapping and show that it is possible to exploit the available parallelism and to obtain reasonable speedups. Finally, we perform a detailed analysis of the results and suggest solutions for some of the problems. Index Terms- Coarse-grain mapping, concurrent distributed hash table, fine-grain mapping, medium-grain mapping, message- passing computers, OPS5, parallel production systems, Rete net- work, simulation results.
Large production systems (rule-based systems) continue to suffer from extremely slow execution which limits their utility in practical applications as well as in research settings. Most investigations in speeding up these systems have focused on match parallelism. These investigations have revealed that the total speed-up available from this source is insufficient to alleviate the problem of slow execution in large-scale production system implementations. In this paper, we focus on task-level parallelism, which is obtained by a high-level decomposition of the production system. Speed-ups obtained from task-level parallelism will multiply with the speed-ups obtained from match parallelism. The vehicle for our investigation of task-level parallelism is SPAM, a high-level vision system, implemented as a production system. SPAM is a mature research system with a typical run requiring between 50,000 and 400,000 production firings. We report very encouraging speed-ups from task-level parallelism in SPAM… -our parallel implementation shows near linear speed-ups of over 12-fold using 14 processors and points the way to substantial (50- to 100-fold) speed-ups. We present a characterization of task-level parallelism in production systems and describe our methodology for selecting and applying a particular approach to parallelize SPAM. Additionally, we report the speed-ups obtained from the use of virtual shared memory. Overall, task-level parallelism has not received much attention in the literature. Our experience illustrates that it is potentially a very important tool for speeding up large-scale production systems.
The combinatorial match in production systems (rule-based systems) is problematical in several areas of production system application: real-time performance, learning new productions for performance improvement, modeling human cognition, and parallelization. The unique-attribute representation is a promising approach to eliminate match combinatorics. Earlier investigations have focused on the ability of unique-attributes to alleviate the problems caused by combinatorial match [Tambe, Newell and Rosenbloom 90]. This paper reports on an additional benefit of unique-attributes: a specialized match algorithm called Uni-Rete. Uni-Rete is a specialization of the widely used Rete match algorithm for unique-attributes, and it has shown over 10-fold speedup over Rete in performing match.
Large production systems (rule-based systems) continue to suffer from extremely slow execution which limits their utility in practical applications as well as in research settings. Most investigations in speeding up these systems have focused on match (or knowledge-search) parallelism. Although gocd speed-ups have been achieved in this pnxzss, these investigations have revealed the limitations on the total speed-up available from this source. This limited speed-up is insufficient to alleviate the problem of slow execution in large-scale production system implementations. Such large-scale systems are expected to increase as researchers develop increasingly more competent production systems. In this paper, we focus on task-level parallelism, which is obtained by a high-level decomposition of the production system. Speed-ups obtained from task-level parallelism will multiply with the speed-ups obtained from match parallelism. The vehicle for our investigation of task-level parallelism is SPAM. a high-level vision system, implemented as a production system. SPAM is a mature research system with a typical run requiring between 50.000 to 400,000 production firings and an execution time of tbe order of 10 to 100 cpu hours. We report very encouraging speed-ups from task-level parallelism in SPAM - our parallel implementation shows near linear speed-ups of over 12 fold using 14 processors and points the way to substantial (SO-100 fold) speed-ups from task-level parallelism. We present a characterization of task-level parallelism in production systems and describe our methodology for selecting and applying a particular approach to parallel&e SPAM. Additionally, we report the speed-ups obtained from the use of shared virtual memory (network shared memory) in this implementation. Overall, task-level parallelism has not received much attention in the literature. Our experience illustrates that it is potentially a very important tool for speeding up large-scale production systems’.
Real time constraints on AI systems require guaranteeing bounds on these systems’ performance. However, in the presence of sources of uncontrolled combinatorics, it is extremely difficult to guarantee such bounds on their performance. In production systems, the .prirnary source of uncontrolled combinatorics is the production match. To eliminate these combinatorics, the unique-attribute formulation was introduced in (Tambe and Rosenbloom, 1989). which achieved a linear bound on the production match. This formulation leads to several questions: is this unique-attributes formulation the best conceivable production system formulation? In fact, are there other alternative production system formulations? If there are other formulations, how should these alternatives be compared with the unique-attribute formulation? This paper attempts to address these questions in the context of Soar. It identifies independent dimensions along which alternative production system formulations can be specified. These dimensions are based on the fiied class of match algorithms currently employed in production systems. These dimensions create a framework for systematically generating alternative formulations. Using this framework we show that the unique-attribute formulation is the best one within the dimensions investigated. However, if a new class of match algorithms is admitted, by relaxing certain constraints, other competitor fonnulations emerge. The paper indicates which competitor formulations are promising and why. Although some of the concepts, such as unique-attributes, are introduced in the context of Soar, they should also be relevant to other rule-based systems.
Soar is an architecture for a system that is intended to be capable of general intelligence. Chunking, a simpleexperience-based learning mechanism, is Soar’s only learning mechanism. Chunking creates new items ofinformation, called chunks, based on the results of problem-solving and stores them in the knowledge base. Thesechunks are accessed and used in appropriate later situations to avoid the problem-solving required to determinethem. It is already well-established that chunking improves performance in Soar when viewed in terms of thesubproblems required and the number of steps within a subproblem. However, despite the reduction in number ofsteps, sometimes there may be a severe degradation in the total run time. This problem arises due toexpensivechunks, i.e., chunks that require a large amount of effort in accessing them from the knowledge base. They pose amajor problem for Soar, since in their presence, no guarantees can be given about Soar’s performance.In this article, we establish that expensive chunks exist and analyze their causes. We use this analysis to propose asolution for expensive chunks. The solution is based on the notion of restricting the expressiveness of therepresentational language to guarantee that the chunks formed will require only a limited amount of accessing effort.We analyze the tradeoffs involved in restricting expressiveness and present some empirical evidence to support ouranalysis.
Chunking, an experience based-learning mechanism, improves Soar's performance a great deal when viewed in terms of the number of subproblems required and the number of steps within a subproblem. This high-level view of the impact of chunking on performance is based on an deal computational model, which says that the time per step is constant. However, if the chunks created by chunking are expensive, then they consume a large amount of processing in the match, i.e, indexing the knowledge-base, distorting Soar*s constant time-per-stcp model. In these situations, the gain in number of steps does not reflect an improvement in performance; in fact there may be degradation in the total run time of the system. Such chunks form a major problem for the system, since absolutely 10 guarantees can be given about its behavior. I "his article presents a solution to the problem of expensive chunks. The solution is based on the notion of restricting the expressiveness of Soar's representational language to guarantee that chunks formed will require only a limited amount of matching effort. We analyze the tradeoffs involved in restricting expressiveness and present some empirical evidence to support our analysis.
The authors examine the suitability of message-passing computers for parallel implementations of production systems. Two mappings for production systems on these computers, one targeted toward fine-grained message-passing machines and the other targeted toward medium-grained machines, are presented. Simulation results for the medium-grained mapping are presented, and it is shown that it is possible to exploit the available parallelism and to obtain reasonable speedups. The authors perform a detailed analysis of the results and suggest solutions for some of the problems
Until now, most results reported for parallelism in production systems (rule-based systems) have been simulationresults -- very few real parallel implementations exist. In this paper, we present results from our parallelimplementation of OPS5 on the Encore multiprocessor. The implementation exploits very fine-grained parallelismto achieve significant speed-ups. For one of the applications, we achieve 12.4 fold speed-up using 13 processes.Our implementation is also distinct from other parallel implementations in that we parallelize a highly optimizedC-based implementation of OPS5. Running on a uniprocessor, our C-based implementation is 10-20 times fasterthan the standard lisp implementation distributed by Carnegie Mellon University. In addition to presenting theperformance numbers, the paper discusses the amount of contention observed for shared data structures, and thetechniques used to reduce such contention.
Soar is an attempt to realize a set of hypotheses on the nature of general intelligence within a single system. Soar uses a Production system (rule based system) to encode its knowledge base. Its learning mechanism, chunking. adds productions continuously to the production system. The process of searching for relevant knowledge, matching, is known to be a performance bottleneck in production systems. PSM-E is a C-based implementation of the OPS5 production system on the Encore Multimax that has achieved significant speedups in matching. In this paper we describe our im- plementation, Soar/PSM-E, of Soar on the Encore Multimax that is built on top of PSM-E. We fiit describe the exten- sions and modifications required to PSM-E in order to support Soar, especially the capability of adding productions at run time as required by chunking. We present tbe speedups obtained on Soar/PSM-E and discuss some effects of chunk- ing on parallelism. We also analyze the performance of the system and identify the bottlenecks limiting parallelism. Finally, we discuss the work in progress to deal with some of them.
Two important parallel architecture types are the shared-memory architectures and the message-passing architectures. In the past researchers working on the parallel implementations of production systems have focussed either on shared-memory multiprocessors or on special purpose architectures. Message-passing computers have not been studied. The main reasons have been the large message- passing latency (as large as a few milliseconds) and high message reception overheads (several hundred microseconds) exhibited by the first generation message-passing computers. These overheads are too large for the parallel implementation of production systems, where it is necessary to exploit parallelism at a very fine granularity to obtain significant speed-up (subtasks execute about 100 machine instructions). However, recent advances in interconnection network technology and processing node design have cut the network latency and message reception overhead by 2-3 orders of magnitude, making these computers much more interesting. In this paper we present techniques for mapping production systems onto message-passing computers. We show that using a concurrent distributed hash table data structure, it is possible to exploit parallelism at a very fine granularity and to obtain significant speed-ups from paralIelism.
Soar is an attempt to realize a set of hypothesis on the nature of general intelligence within a single system. One central hypothesis is that chunking, Soar's simple experience-based learning mechanism, can form the basis for a general learning mechanism. It is already well established that the addition of chunks improves the performance in Soar a great deal, when viewed in terms of subproblems required and number of steps within a subproblem. But this high level view does not take into account potential offsetting costs that arise from various computational effects. This paper is an investigation into the computational effect of expensive chunks. These chunks add significantly to the time per step by being individually expensive. We decompose the causes of expensive chunks into three components and identify the features of the task environment that give rise to them. We then discuss the implications of the existence of expensive chunks for a complete implementation of Soar.