Today, Professor James Elder, co-author of the published York University study says:
Published in Cell Press iSciencedeep learning models fail to capture the formative nature of human form perception, a collaborative study by Elder, who holds the York Chair for Research in Human and Computer Vision and co-director of the York Center for Artificial Intelligence and Society, and Nicholas Baker Associate Professor of Psychology at Loyola College in Chicago, a fellow Previous in the VISTA Postdoctoral Program at York University.
The study used new visual stimuli called “Frankensteins” to explore how the human brain and DCNNs process an object’s overall formative properties.
“Frankensteins are just things that have been taken apart and put back together the wrong way,” says Elder. As a result, they have all the right local features, but in the wrong places.”
The investigators found that while the human visual system is confused by Frankensteins, DCNNs are not – revealing an insensitivity to the object’s formative properties.
“Our results explain why deep AI models fail under certain conditions and point to the need to consider tasks beyond object recognition in order to understand visual processing in the brain,” Elder says. “These deep models tend to take ‘shortcuts’ when solving complex recognition tasks. While these shortcuts may work in many cases, they can be dangerous in some of the real-world AI applications that we are currently working on with our industry and government partners,” Elder points out.
One such application is traffic video safety systems: “Objects in a busy traffic scene — vehicles, bikes, pedestrians — obstruct each other and reach the driver’s eye as a jumble of separate shrapnel,” Elder explains. “The brain needs to put those parts together correctly to determine the correct categories and locations of things. An AI traffic safety monitoring system that is only able to perceive the parts individually would fail in this task, which could lead to a misunderstanding of the risks to vulnerable road users.”
According to the researchers, modifications to the training and architecture aimed at making the networks more brain-like did not result in formative processing, and none of the networks were able to accurately predict the judgments of human beings from one trial to the next. “We anticipate that to match the sensitivity of human configuration, networks must be trained to solve a broader range of object tasks beyond class recognition,” Elder notes.