A growing number of security applications are being developed and deployed to explicitly reduce risk from adversaries’ actions. However, there are many challenges when attempting to
evaluate such systems, both in the lab and in the real world. Traditional evaluations used by
computer scientists, such as runtime analysis and optimality proofs, may be largely irrelevant.
The primary contribution of this paper is to provide a preliminary framework which can guide
the evaluation of such systems and to apply the framework to the evaluation of ARMOR (a system deployed at LAX since August 2007). This framework helps to determine what evaluations
could, and should, be run in order to measure a system’s overall utility. A secondary contribution of this paper is to help familiarize our community with some of the difficulties inherent in
evaluating deployed applications, focusing on those in security domains.