W. Harvey, D. Kalp, M. Tambe, D. McKeown, and A. Newell. 1990. “ The effectiveness of task-level parallelism for high-level vision.” In ACM/SIGPLAN Symposium on Principles and Practices of Parallel Programming (PPOPP).Abstract
Large production systems (rule-based systems) continue to suffer from extremely slow execution which limits their utility in practical applications as well as in research settings. Most investigations in speeding up these systems have focused on match (or knowledge-search) parallelism. Although gocd speed-ups have been achieved in this pnxzss, these investigations have revealed the limitations on the total speed-up available from this source. This limited speed-up is insufficient to alleviate the problem of slow execution in large-scale production system implementations. Such large-scale systems are expected to increase as researchers develop increasingly more competent production systems. In this paper, we focus on task-level parallelism, which is obtained by a high-level decomposition of the production system. Speed-ups obtained from task-level parallelism will multiply with the speed-ups obtained from match parallelism. The vehicle for our investigation of task-level parallelism is SPAM. a high-level vision system, implemented as a production system. SPAM is a mature research system with a typical run requiring between 50.000 to 400,000 production firings and an execution time of tbe order of 10 to 100 cpu hours. We report very encouraging speed-ups from task-level parallelism in SPAM - our parallel implementation shows near linear speed-ups of over 12 fold using 14 processors and points the way to substantial (SO-100 fold) speed-ups from task-level parallelism. We present a characterization of task-level parallelism in production systems and describe our methodology for selecting and applying a particular approach to parallel&e SPAM. Additionally, we report the speed-ups obtained from the use of shared virtual memory (network shared memory) in this implementation. Overall, task-level parallelism has not received much attention in the literature. Our experience illustrates that it is potentially a very important tool for speeding up large-scale production systems’.
M. Tambe and P. Rosenbloom. 1990. “ A framework for investigating production system formulations with polynomially bounded match.” In National Conference on Artificial Intelligence (AAAI).Abstract
Real time constraints on AI systems require guaranteeing bounds on these systems’ performance. However, in the presence of sources of uncontrolled combinatorics, it is extremely difficult to guarantee such bounds on their performance. In production systems, the .prirnary source of uncontrolled combinatorics is the production match. To eliminate these combinatorics, the unique-attribute formulation was introduced in (Tambe and Rosenbloom, 1989). which achieved a linear bound on the production match. This formulation leads to several questions: is this unique-attributes formulation the best conceivable production system formulation? In fact, are there other alternative production system formulations? If there are other formulations, how should these alternatives be compared with the unique-attribute formulation? This paper attempts to address these questions in the context of Soar. It identifies independent dimensions along which alternative production system formulations can be specified. These dimensions are based on the fiied class of match algorithms currently employed in production systems. These dimensions create a framework for systematically generating alternative formulations. Using this framework we show that the unique-attribute formulation is the best one within the dimensions investigated. However, if a new class of match algorithms is admitted, by relaxing certain constraints, other competitor fonnulations emerge. The paper indicates which competitor formulations are promising and why. Although some of the concepts, such as unique-attributes, are introduced in the context of Soar, they should also be relevant to other rule-based systems.
1990. “ The problem of expensive chunks and its solution by restricting expressiveness.” Machine Learning Journal, 5, 3, Pp. 299-348.Abstract
Soar is an architecture for a system that is intended to be capable of general intelligence. Chunking, a simpleexperience-based learning mechanism, is Soar’s only learning mechanism. Chunking creates new items ofinformation, called chunks, based on the results of problem-solving and stores them in the knowledge base. Thesechunks are accessed and used in appropriate later situations to avoid the problem-solving required to determinethem. It is already well-established that chunking improves performance in Soar when viewed in terms of thesubproblems required and the number of steps within a subproblem. However, despite the reduction in number ofsteps, sometimes there may be a severe degradation in the total run time. This problem arises due toexpensivechunks, i.e., chunks that require a large amount of effort in accessing them from the knowledge base. They pose amajor problem for Soar, since in their presence, no guarantees can be given about Soar’s performance.In this article, we establish that expensive chunks exist and analyze their causes. We use this analysis to propose asolution for expensive chunks. The solution is based on the notion of restricting the expressiveness of therepresentational language to guarantee that the chunks formed will require only a limited amount of accessing effort.We analyze the tradeoffs involved in restricting expressiveness and present some empirical evidence to support ouranalysis.
M. Tambe and P. Rosenbloom. 1989. “ Eliminating expensive chunks by restricting expressiveness.” In International Joint Conference on Artificial Intelligence (IJCAI'89) .Abstract
Chunking, an experience based-learning mechanism, improves Soar's performance a great deal when viewed in terms of the number of subproblems required and the number of steps within a subproblem. This high-level view of the impact of chunking on performance is based on an deal computational model, which says that the time per step is constant. However, if the chunks created by chunking are expensive, then they consume a large amount of processing in the match, i.e, indexing the knowledge-base, distorting Soar*s constant time-per-stcp model. In these situations, the gain in number of steps does not reflect an improvement in performance; in fact there may be degradation in the total run time of the system. Such chunks form a major problem for the system, since absolutely 10 guarantees can be given about its behavior. I "his article presents a solution to the problem of expensive chunks. The solution is based on the notion of restricting the expressiveness of Soar's representational language to guarantee that chunks formed will require only a limited amount of matching effort. We analyze the tradeoffs involved in restricting expressiveness and present some empirical evidence to support our analysis.
Milind Tambe, Anurag Acharya, and Anoop Gupta. 1989. Implementation of production systems on message passing computers : techniques, simulation results and analysis. Carnegie Mellon University Computer Science Dept Technical Report.Abstract
The authors examine the suitability of message-passing computers for parallel implementations of production systems. Two mappings for production systems on these computers, one targeted toward fine-grained message-passing machines and the other targeted toward medium-grained machines, are presented. Simulation results for the medium-grained mapping are presented, and it is shown that it is possible to exploit the available parallelism and to obtain reasonable speedups. The authors perform a detailed analysis of the results and suggest solutions for some of the problems
A. Gupta, C. L. Forgy, D. Kalp, A. Newell, and M. Tambe. 1988. “ Parallel ops5 on the encore multimax.” In International Conference on Parallel Processing (ICPP).Abstract
Until now, most results reported for parallelism in production systems (rule-based systems) have been simulationresults -- very few real parallel implementations exist. In this paper, we present results from our parallelimplementation of OPS5 on the Encore multiprocessor. The implementation exploits very fine-grained parallelismto achieve significant speed-ups. For one of the applications, we achieve 12.4 fold speed-up using 13 processes.Our implementation is also distinct from other parallel implementations in that we parallelize a highly optimizedC-based implementation of OPS5. Running on a uniprocessor, our C-based implementation is 10-20 times fasterthan the standard lisp implementation distributed by Carnegie Mellon University. In addition to presenting theperformance numbers, the paper discusses the amount of contention observed for shared data structures, and thetechniques used to reduce such contention.
M. Tambe, D. Kalp, A. Gupta, C. L. Forgy, B. G. Milnes, and A. Newell. 1988. “ Soar/Psm-e: Investigating match parallelism in a learning production system.” In ACM/SIGPLAN Symposium on Parallel Programming: Experience with Applications, Languages, and Systems (PPEALS).Abstract
Soar is an attempt to realize a set of hypotheses on the nature of general intelligence within a single system. Soar uses a Production system (rule based system) to encode its knowledge base. Its learning mechanism, chunking. adds productions continuously to the production system. The process of searching for relevant knowledge, matching, is known to be a performance bottleneck in production systems. PSM-E is a C-based implementation of the OPS5 production system on the Encore Multimax that has achieved significant speedups in matching. In this paper we describe our im- plementation, Soar/PSM-E, of Soar on the Encore Multimax that is built on top of PSM-E. We fiit describe the exten- sions and modifications required to PSM-E in order to support Soar, especially the capability of adding productions at run time as required by chunking. We present tbe speedups obtained on Soar/PSM-E and discuss some effects of chunk- ing on parallelism. We also analyze the performance of the system and identify the bottlenecks limiting parallelism. Finally, we discuss the work in progress to deal with some of them.
A. Gupta and M. Tambe. 1988. “ Suitability of message passing computers for implementing production systems.” In National Conference on Artificial Intelligence (AAAI).Abstract
Two important parallel architecture types are the shared-memory architectures and the message-passing architectures. In the past researchers working on the parallel implementations of production systems have focussed either on shared-memory multiprocessors or on special purpose architectures. Message-passing computers have not been studied. The main reasons have been the large message- passing latency (as large as a few milliseconds) and high message reception overheads (several hundred microseconds) exhibited by the first generation message-passing computers. These overheads are too large for the parallel implementation of production systems, where it is necessary to exploit parallelism at a very fine granularity to obtain significant speed-up (subtasks execute about 100 machine instructions). However, recent advances in interconnection network technology and processing node design have cut the network latency and message reception overhead by 2-3 orders of magnitude, making these computers much more interesting. In this paper we present techniques for mapping production systems onto message-passing computers. We show that using a concurrent distributed hash table data structure, it is possible to exploit parallelism at a very fine granularity and to obtain significant speed-ups from paralIelism.
Milind Tambe and Allen Newell. 1988. Why some chunks are expensive. Carnegie Mellon University Computer Science.Abstract
Soar is an attempt to realize a set of hypothesis on the nature of general intelligence within a single system. One central hypothesis is that chunking, Soar's simple experience-based learning mechanism, can form the basis for a general learning mechanism. It is already well established that the addition of chunks improves the performance in Soar a great deal, when viewed in terms of subproblems required and number of steps within a subproblem. But this high level view does not take into account potential offsetting costs that arise from various computational effects. This paper is an investigation into the computational effect of expensive chunks. These chunks add significantly to the time per step by being individually expensive. We decompose the causes of expensive chunks into three components and identify the features of the task environment that give rise to them. We then discuss the implications of the existence of expensive chunks for a complete implementation of Soar.