We, human beings, can imagine sounds by taking a glance at a photo: The scenery of a beach may bring the sound of crashing waves to mind. You may hear sounds of horns and street advertising when you look at a picture of a busy crossing. "Imaginary Soundwalk" is a sound installation focusing on this unconscious behavior.

"soundwalk" is a term referring to the act of walking with a focus on listening to the environment. In this installation, viewers can move freely on Google Street View as a virtual soundwalker in the soundscape "imagined" by an Artificial Intelligence(AI) system.

This work is based on the recent development of the cross-modal information retrieval technique, such as image-to-audio, text-to-image, using deep learning. Given video inputs, the system was trained with two models: one well-established, pre-trained image recognition model, processes the frames, while another convolutional neural network reads the audio as spectrogram images, evolving so that the distribution of its output gets as close as possible to that of the first one.

Once trained, the two networks allow us to retrieve the best-matched sound file for a scene, out of our massive environmental sound dataset.

The installation consists of a display, a speaker, and a tablet. As an imaginary walker strolls on Google map, the monitor shows the landscape around it and plays the selected sounds; a slight gap before playing the sound nudges viewers to imagine by themselves. Using the tablet, viewers can navigate the walker themselves.

The soundscapes generated by the AI sometimes amaze us by meeting our expectation, but occasionally ignore the cultural and geographical context (the sound of waves on an icy field of Greenland for instance). These differences and mistakes lead us to contemplate how the imagination works and how fertile the sound environment surrounding us is.

By externalizing our synesthetic thinkings through an installation, we tried to shed lights on the power of imagination we all share.


Concept/Machine Learning

Nao Tokui

UI Design/Programming

Shoya Dozono

Machine Learning

Yuma Kajihara

Network Programming



Robin Jungers

Speaker provided by


Various sound files are used under Creative Commons Licenses. see