Understanding deep learning through neuron deletion

Deep neural networks are composed of many individual neurons, which combine in complex and counterintuitive ways to solve a wide range of challenging tasks. This complexity grants neural networks their power but also earns them their reputation as confusing and opaque black boxes. 

Understanding how deep neural networks function is critical for explaining their decisions and enabling us to build more powerful systems. For instance, imagine the difficulty of trying to build a clock without understanding how individual gears fit together. One approach to understanding neural networks, both in neuroscience and deep learning, is to investigate the role of individual neurons, especially those which are easily interpretable.

Our investigation into the importance of single directions for generalisation, soon to appear at the Sixth International Conference on Learning Representations (ICLR), uses an approach inspired by decades of experimental neuroscience — exploring the impact of damage — to determine: how important are small groups of neurons in deep neural networks? Are more easily interpretable neurons also more important to the network’s computation?