A Comparative Evaluation of Level Generators in the Mario AI Framework
Britton Horn, Steve Dahlskog, Noor Shaker, Gillian Smith, Julian Togelius. A Comparative Evaluation of Level Generators in the Mario AI Framework. Proceedings of the 2014 Foundations of Digital Games Conference (FDG 2014), Fort Lauderdale, FL, April 3-7, 2014.
Evaluation is an open problem in procedural content generation research. The field is now in a state where there is a glut of content generators, each serving different purposes and using a variety of techniques. It is difficult to understand, quantitatively or qualitatively, what makes one generator different from another in terms of its output. To remedy this, we have conducted a large-scale comparative evaluation of level generators for the Mario AI Benchmark, a research-friendly clone of the classic platform game Super Mario Bros. In all, we compare the output of seven different level generators from the literature, based on different algorithmic methods, plus the levels from the original Super Mario Bros game. To compare them, we have defined six expressivity metrics, of which two are novel contributions in this paper. These metrics are shown to provide interestingly different characterizations of the level generators. The results presented in this paper, and the accompanying source code, is meant to become a benchmark against which to test new level generators and expressivity metrics.