Finke, R. A., & Slayton, K. (1988). Explorations of creative visual synthesis in mental imagery. Memory And Cognition, 16(3), 252–257. doi:10.3758/BF03197758

Summary

In this paper, Finke and Slayton asked participants to use mental imagery to imagine objects composed out of a given set of basic geometric parts. They found that in almost 40% of trials, participants were able to come up with a recognizible arrangement of the parts, demonstrating that mental imagery isn’t just used as a way of encoding visual information, but that it is an active process of simulation that people can use to discover new things.

Methods

On each trial, participants saw three geometric forms (out of: circle, square, rectangle, triangle, horizontal line, vertical line, D, C, L, T, J, 8, X, V, P) and were told to combine them into a recognizable form. They could scale and rotate the parts, but not skew or otherwise distort them. Participants had 2 minutes to come up with a response, at the end of which they were to write down the name of the form they came up with (if they were able to come up with one) and then to draw out the form. Once they had drawn the form they couldn’t change their answer (to ensure that they were actually using mental imagery).

Algorithm

n/a

Takeaways

I think these result are really cool, and illustrate how powerful mental imagery really can be: it’s not just some epiphenomenon, but a true generative process that can be used to run a certain type of simulation.

I also think this paper is interesting because it presents a uniquely difficult challenge for AI systems or computational models. How do you model the synthesis of something entirely new? I would argue that it’s not truly a new synthesis, because we have strong prior knowledge about object categories and prototypes. One potential way to approach this would be to do something like Google’s inceptionism. First, train a neural network to recognize various categories of images. Second, generate an input image from the parts used in this experiment. Then, “enhance” that image to maximize the network’s objective function, except that rather than modifying the image on the pixel level, modify it by changing the position, rotation, and scale of the parts. I wonder if doing this would actually produce something recognizable—I’m not sure if having such schematized line drawings would work, but it would be a cool thing to try.