Scientists that used image-generating AI to interpret brain scans to present at 2023 CVPR
A study that used a deep learning AI model to decode brain activity and generate images of what subjects saw on screen, has been accepted for presentation at the Conference on Computer Vision and Pattern Recognition (CVPR).
Two scientists in Japan were behind the study, published in a paper late last year.
Yu Takagi and Shinji Nishimoto from the Graduate School of Frontier Biosciences at Osaka University used Stable Diffusion, a model developed in Germany in 2022, to analyze brain scans of eight test subjects shown up to 10,000 images while in an MRI machine.
The team used brain scans and a simple model to translate the activity into a readable format that SD then used to create the images. The AI was not trained to produce specific results and was able to generate accurate images without being shown the originals.
Takagi and Nishimoto developed a basic model to interpret brain activity into a readable format, which enabled Stable Diffusion to generate high-quality images that closely resemble the original.
That their work was accepted for the CVPR in Vancouver, Canada, in June is significant because the conference is a standard avenue for authenticating significant breakthroughs in neuroscience. The conference is one of the highest-impact events in computing on the global calendar.
Both scientists are members of the Continuous Innovation Network (CINet), which is "a global network of researchers and practitioners" that shares findings and expertise, to support "learning and diffused innovation within the organization and across organizational borders."
Takagi and Nishimoto's groundbreaking research had generated significant interest in the tech community thanks to the rapid strides in AI.
Their December paper, which outlines their discoveries, ranks among the top 1 percent for engagement out of more than 23 million research outputs recorded by data firm Altmetric.
Additionally, the study has been accepted for presentation at the Conference on Computer Vision and Pattern Recognition (CVPR) in June 2023, a standard avenue for authenticating significant breakthroughs in neuroscience.
According to Ricardo Silva, a computational neuroscience professor at University College London and a research partner at the Alan Turing Institute, it's difficult to predict the outcome of scientific application at such an early stage.
"It's hard to predict what a successful clinical application might be at this stage, as it is still very exploratory research," Silva told Al Jazeera. However, he also shared his optimism on how the technology can benefit the medical community.
"This may turn out to be one extra way of developing a marker for Alzheimer's detection and progression evaluation by assessing in which ways one could spot persistent anomalies in images of visual navigation tasks reconstructed from a patient's brain activity."
Stoking global interest
Takagi and Nishimoto's research has piqued the interest of academics worldwide. Among them is Leonardo Tanzi, a Ph.D. student at the Polytechnic University of Turin, Italy, who finds the problem of decoding brain representations fascinating but challenging.
"These attempts have demonstrated great but constrained performance in terms of pixel-wise and semantic fidelity, in part due to the small amount of neuroscience data and in part due to the multiple difficulties associated with building complicated generative models," said Tanzi.
In order to tackle this challenge, academics have used deep-learning techniques such as Generative Adversarial Networks and self-supervised learning but with limited success. Diffusion Models, specifically Latent Diffusion Models, have emerged as a GAN alternative.
Recommended by LinkedIn
However, they are all relatively new, which is why Takagi and Nishimoto presented their findings cautiously. They acknowledged two significant hurdles to achieving actual mind reading: limitations in brain scanning technology and AI itself.
Despite advances in neural interfaces, such as Electroencephalography (EEG) and fMRI, which detect brain waves and measure brain activity, respectively, scientists believe that accurately and consistently decoding imagined visual experiences might still be several decades away.
According to a 2021 paper by Korea Advanced Institute of Science and Technology researchers, neural interfaces have a "lack of persistent recording stability" due to the complexity of neural tissue, which can respond unpredictably to artificial interfaces.
Current recording methods rely on electrical channels that are "susceptible to electrical noises from surroundings," making it "not yet an easy feat" to achieve good signals with high sensitivity from the target region.
Takagi and Nishimoto's research involved a costly and time-consuming process where subjects needed to spend up to 40 hours in an fMRI scanner. However, while their framework can be applied to other brain scanning devices besides MRI, such as EEG, or even invasive technologies like the brain-computer implants developed by Neuralink, the method is only transferable to some subjects due to the variations in brain shapes between individuals.
Despite this limitation, Takagi envisions potential technological applications in scientific research, communication, and entertainment.
"I'm optimistic for AI, but I'm not optimistic for brain technology," said Takagi, believing that it "is the consensus among neuroscientists."
Yet, Takagi and his partner plan to continue their research, focusing on improving the technology and applying it to other modalities. They are already developing a better image reconstructing technique at a rapid pace for version two of their project.
"And it's happening at a very rapid pace," he said.
Ethical concerns
The breakthrough in AI image generation using brain scans has opened up new possibilities in neuroscience, potentially helping researchers better understand how the brain processes visual information.
Takagi believes this technology can help bridge the communication gap between the brain and computers and may one day be used to create an interface between the two. However, he emphasizes that the AI's ability to produce images does not constitute mind reading in response to such misunderstandings.
"We can't decode imaginations or dreams; we think this is too optimistic," said Takagi. "But, of course, there is potential in the future."
The potential for misuse and privacy issues are among the ethical implications of AI-generated images of people's thoughts. According to Silva, the most pressing issue is the extent to which data collectors should disclose complete details regarding the uses of collected data.
Consenting to a snapshot of someone's thoughts for future clinical purposes is one thing, but utilizing the data for secondary purposes is different.
"It's yet another completely different thing to have it used in secondary tasks such as marketing, or worse, used in legal cases against someone's own interests."
Takagi himself acknowledges these concerns, saying they are not without merit, as the technology could be misused by those with ill intent or without consent.
"For us, privacy issues are the most important thing. If a government or institution can read people's minds, it's a very sensitive issue," he said. "There needs to be high-level discussions to make sure this can't happen."