How Graphs Taught Transformers to Think Outside the Node

Stefan Wendin

Driving transformation, innovation & business growth by bridging the gap between technology and business; combining system & design thinking with cutting-edge technologies; Graphs, AI, GenAI, LLM, ML 🥇

Published Dec 15, 2024

I remember back in the days at Neo4j when I first read the article Transformers are Graph Neural Networks by Chaitanya K. Joshi. It sparked curiosity about the relationship between Transformers and Graph Neural Networks (GNNs). At the time, many of my colleagues dismissed my curiosity, but one of the most insightful elaborations on the topic came some years later by Petar Veličković refering back to Chaitanya's paper, all during a podcast recording—a session that, unfortunately, never aired as I left Neo4j before its release.

In essence, Transformers can be seen as a type of Graph Neural Network. They treat sentences as fully-connected graphs, where every word is linked to every other word. The attention mechanism in Transformers functions similarly to the neighborhood aggregation process in GNNs. This perspective offers a fresh lens for understanding Transformers and highlights exciting opportunities for exploration and refinement. For example, it provokes questions about optimal input formats for natural language processing (NLP), managing long-term dependencies between words, and whether Transformers are learning a form of neural syntax. This understanding could even inspire simplifications to Transformer architecture by removing unnecessary complexity.

During my lectures, I’ve likened this dynamic to the collaboration between Björn Borg and John McEnroe—both incredible players who transformed tennis into something much greater than the sum of their individual talents. As a child born in the seventies, I vividly remember watching their epic Wimbledon matches on a small, grainy television set. Those games were electric—Borg’s calm, almost robotic precision contrasting with McEnroe’s fiery, unpredictable brilliance. It was a rivalry that taught me the power of combining different strengths to create something truly transformative. Similarly, the interplay between Transformers and GNNs has the potential to revolutionize how we approach graph-based reasoning. Similarly, the interplay between Transformers and GNNs has the potential to revolutionize how we approach graph-based reasoning.

Now, five years later, it has been a rollercoaster of innovation and a continual stretching of mental models. This topic remains close to my heart, and on this quiet Sunday morning, I decided to write down some thoughts that have been bubbling up over time.

Architectural Strengths

The intersection of graph-based reasoning and large language models (LLMs) has revealed distinct architectural strengths of transformers, graph neural networks (GNNs), and hybrid models. Each architecture demonstrates unique advantages depending on the nature of the graph reasoning task.

Transformers: Pioneers of Global Reasoning

Transformers excel in tasks requiring global reasoning, such as graph connectivity, shortest path calculations, and other problems necessitating long-range dependencies. The core strength of transformers lies in their ability to perform parallel computations across the entire input sequence through mechanisms like self-attention. This capability allows them to capture complex relationships between nodes that are far apart in the graph, which is critical for tasks involving holistic graph understanding.

Key Attributes of Transformers in Graph Reasoning

Parallelism and Scalability: Transformers leverage self-attention to compute interactions between all pairs of nodes simultaneously. This parallelism ensures efficient processing of large graphs, particularly for tasks requiring aggregation of global information.
Flexibility in Encoding: By adapting tokenized graph inputs, transformers can effectively tackle graph problems despite not being inherently designed for graph-structured data. Techniques like attention masking or hierarchical encoding further enhance their adaptability.
Theoretical Backing: Transformers' equivalence to massively parallel computation (MPC) models underscores their ability to solve parallelizable tasks efficiently. Graph connectivity tasks, for example, demonstrate proven solutions with logarithmic depth transformers, emphasizing their computational advantages for large-scale reasoning.

GNNs: Masters of Local Inductive Biases

GNNs, in contrast, shine in tasks with a strong local component, such as node degree calculation, edge existence determination, and subgraph matching. By leveraging their message-passing mechanisms, GNNs are particularly well-suited to learning relationships between neighboring nodes, which allows them to achieve remarkable sample efficiency for such tasks.

Key Advantages of GNNs

Inductive Bias Favoring Local Structure: The architecture of GNNs inherently respects graph topology, focusing on neighborhood relationships and ensuring that local dependencies are captured effectively.
Sample Efficiency: Due to their inductive bias, GNNs can achieve high performance with smaller datasets compared to transformers, making them ideal for resource-constrained training scenarios.
Efficient Communication: GNNs' fixed communication strategy, where nodes aggregate information from their immediate neighbors, provides computational efficiency for tasks requiring local analysis.

Hybrid Models: Bridging Global and Local Reasoning

Hybrid architectures, such as the Graph Sequence Model++ (GSM++), combine the best of both worlds. By integrating transformers for global encoding with GNNs or recurrent models for local encoding, these models demonstrate superior performance across a wide range of graph reasoning tasks.

Key Innovations of Hybrid Models

Hierarchical Tokenization: Strategies like hierarchical affinity clustering (HAC) enable efficient partitioning of graphs into sequences that preserve both local and global information.
Layered Architectures: Combining GNNs for initial local feature extraction and transformers for global context aggregation ensures that hybrid models can handle diverse task requirements effectively.
Mitigation of Model Limitations: Hybrid models address the over-smoothing and over-squashing issues in GNNs while simultaneously overcoming the inefficiency of transformers in capturing fine-grained local details.

Encoding Innovations

Encoding graph-structured data for use in LLMs is a pivotal challenge. Recent advancements have introduced innovative methods that enable more effective graph reasoning, and the emergence of methodologies like GraphToken and hybrid encodings are revolutionizing this domain. These methods are not only enhancing computational efficiency but also broadening the scope of applications that benefit from graph reasoning models.

GraphToken: Soft Prompting for Structured Data

GraphToken represents a groundbreaking method for embedding graph-structured information into LLMs. By converting graph features into soft prompts within the LLM’s token space, this method allows for parameter-efficient fine-tuning of frozen LLMs while maintaining state-of-the-art reasoning capabilities. Unlike traditional approaches, GraphToken optimally balances parameter efficiency and reasoning accuracy, making it ideal for large-scale applications.

Key Attributes of GraphToken

Parameter Efficiency:
Generalization:
Seamless Integration with LLMs:

Let Your Graph Do the Talking: Encoding Structured Data for LLMs:

Text-Based Graph Encoding: The "Talk Like a Graph" Paradigm

The "Talk Like a Graph" approach encodes graphs as textual descriptions, leveraging LLMs’ inherent strengths in processing natural language. By transforming graph structures into textual prompts, this method aligns with LLMs’ operational design, offering a straightforward yet powerful way to integrate graph data.

Advantages of Text-Based Encoding

Ease of Use:
Versatility Across Tasks:
Benchmark Contributions:

Recommended by LinkedIn

The Transformer: The Game-Changing Neural Network That…

Vipul Patel 1 year ago

📚 A New Direction for Neural Networks

Pascal Biese 7 months ago

ConvNext: The Return Of Convolution Networks

Ritesh Kanjee 2 years ago

Overview of the framework for reasoning with graphs using LLMs from the paper: Talk like a Graph: Encoding Graphs for Large Language Models:

Hybrid Encoding: Integrating Graph Embeddings and Text

Hybrid encoding strategies combine textual descriptions with graph embeddings to enrich the context provided to LLMs. These methods draw on the interpretability of text and the structural depth of embeddings generated by GNNs or similar models.

Key Innovations in Hybrid Encoding

Textual-Augmented Embeddings:
Dynamic Edge Representations:
Cross-Modality Integration:

The Road Ahead

Encoding innovations like GraphToken, the "Talk Like a Graph" paradigm, and hybrid strategies are reshaping the boundaries of graph reasoning. These approaches not only improve the computational efficiency of processing structured data but also unlock new applications in domains ranging from healthcare to smart cities. Future advancements in encoding techniques are poised to further bridge the gap between structured graph data and the unparalleled reasoning power of LLMs.

Exploring Frontiers in Graph Reasoning

The fast evolution of graph reasoning and encoding methodologies has opened exciting new directions for exploration and development. These areas represent huge opportunities to push the boundaries of graph-based machine learning and its integration with large language models (LLMs). Recent advances suggest promising trajectories for both theoretical and practical innovations in this field.

Enhanced Hybrid Architectures

Hybrid architectures that integrate transformers and GNNs have shown immense potential, but further research is required to refine their adaptability to diverse tasks. A promising avenue lies in dynamic attention mechanisms that adjust based on task-specific requirements, ensuring that models can focus on either local or global dependencies as needed. Similarly, adaptive tokenization strategies, such as hierarchical clustering or subgraph extraction, can enhance efficiency and scalability for complex graph reasoning tasks.

Recent developments, such as Graph Sequence Model++ (GSM++), have demonstrated that combining local encodings (via GNNs) with global reasoning (via transformers) provides a balanced approach for handling intricate graph structures. These layered architectures are particularly relevant in domains where both fine-grained details and overarching patterns are critical, such as smart city planning or genomic research.

Efficient Encoding Techniques

Encoding methodologies continue to evolve, with a focus on scalability and expressivity for large and dynamic graphs. Advances in spectral embeddings, which leverage graph Laplacians to capture global structure, combined with hierarchical decomposition, offer promising paths for managing computational complexity. Hierarchical approaches break down large graphs into smaller components, making them more manageable for processing while retaining structural integrity.

Sparsification techniques are another area of interest, reducing the density of graph representations while preserving essential information. These techniques can significantly improve efficiency without sacrificing performance, especially in domains with dense connectivity, such as neural networks or transportation grids. Additionally, integrating temporal embeddings for dynamic graphs enables models to handle real-time updates effectively, which is crucial for applications like social media analysis or financial modeling.

Domain-Specific Applications

The versatility of graph reasoning models allows for their application across a wide range of domains. By incorporating domain-specific knowledge, these models can unlock new capabilities and achieve superior results. Examples include:

Healthcare: Graph models can map patient histories and interactions to identify disease progression patterns and optimize treatment plans. Hybrid encodings can combine molecular interaction graphs with temporal data for drug discovery.
Social Network Analysis: Advanced graph reasoning can uncover hidden community structures, influence dynamics, and detect anomalies, aiding in security and marketing.
Supply Chain Optimization: By modeling logistics as dynamic graphs, graph reasoning systems can optimize resource allocation, routing, and inventory management.

Interdisciplinary collaboration is key to these applications. By aligning model architectures with domain-specific requirements, researchers can ensure practical impact and real-world relevance.

Explainability and Cross-Modality

As graph reasoning systems become more integral to critical applications, their explainability grows increasingly important. Developing intuitive tools for visualizing attention mechanisms or saliency mappings can help users understand how models arrive at their conclusions. Explainable embeddings that highlight significant nodes, edges, or subgraphs enable transparency in decision-making.

Counterfactual reasoning frameworks offer another promising direction, allowing users to simulate "what-if" scenarios to assess the impact of changes in graph structures. For example, understanding how removing a node might affect network behavior is critical for cybersecurity or infrastructure resilience.

Cross-modality integration presents additional challenges and opportunities. Combining graph reasoning with other data types, such as images, videos, or time-series data, could revolutionize fields like robotics, autonomous vehicles, and environmental monitoring. Multi-modal transformers that fuse textual, visual, and graph-based inputs provide a foundation for this next generation of reasoning systems.

Unified Theoretical Frameworks

Unifying the strengths of transformers, GNNs, and hybrid models requires the development of comprehensive theoretical frameworks. These frameworks should address task complexities, scalability challenges, and the dynamic nature of real-world graphs. A formal taxonomy that categorizes graph reasoning tasks based on computational requirements and architectural compatibility can guide researchers toward optimal solutions.

The integration of symbolic reasoning with neural architectures is another exciting area. Combining logic-based approaches with graph neural reasoning can enhance interpretability and robustness, bridging the gap between structured and unstructured data processing.

Finally, frameworks that incorporate distributed and parallel computing paradigms can significantly advance scalability. By leveraging massively parallel computation models, these theories can inspire architectures capable of handling the ever-growing scale of graph datasets in domains like climate modeling, global logistics, and real-time analytics.

Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning Bahare Fatemi, Mehran Kazemi, Anton Tsitsulin, Karishma Malkan, Jinyeong Yim, John Palowitch, Sungyong Seo, Jonathan Halcrow, Bryan Perozzi. arXiv:2406.09170v1 Access the paper here
Talk Like a Graph: Encoding Graphs for Large Language Models Bahare Fatemi, Jonathan Halcrow, Bryan Perozzi. arXiv:2310.04560v1 Access the paper here
Understanding Transformer Reasoning Capabilities via Graph Algorithms Clayton Sanford, Bahare Fatemi, Ethan Hall, Anton Tsitsulin, Mehran Kazemi, Jonathan Halcrow, Bryan Perozzi, Vahab Mirrokni. arXiv:2405.18512v1 Access the paper here
Let Your Graph Do the Talking: Encoding Structured Data for LLMs Bryan Perozzi, Bahare Fatemi, Dustin Zelle, Anton Tsitsulin, Mehran Kazemi, Rami Al-Rfou, Jonathan Halcrow. arXiv:2402.05862v1 Access the paper here
Best of Both Worlds: Advantages of Hybrid Graph Sequence Models Ali Behrouz, Ali Parviz, Mahdi Karami, Clayton Sanford, Bryan Perozzi, Vahab Mirrokni. arXiv:2411.15671v1 Access the paper here
Graph Reasoning in Large Language Models Presented by Bryan Perozzi, Clayton Sanford, Jonathan Halcrow. NeurIPS Expo 2024 Presentation Access the presentation here
Transformers are Graph Neural Networks Chaitanya Joshi, Graph Deep Learning Blog Access the article here

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

The synergy between Transformers and GNNs is indeed promising, with recent studies showing hybrid models achieving up to 20% improvement in graph classification tasks compared to single-architecture approaches. GraphToken's ability to encode graph structures into sequential representations aligns well with Transformer's attention mechanisms, enabling global context understanding. However, the computational complexity of these hybrid models remains a challenge for large graphs. Given the increasing use of graph data in fields like drug discovery, how can we optimize these models for real-time analysis of complex biological networks?

3 Reactions

Henrik Göthberg

Alberto Baroso your post and Stefan Wendin's happened to land right after each other in my LinkdIn Flow... I thought they added complementary prespective where 1+1=3 If I added the latest insights on Data Product Management and compuational governance for Agentic AI from Paolo Platter. You would get to the @Dairdux Consortium Trifecta. What doess it mean? 1x1x1=10X To reach value with data and AI is a multiplier effect. Value is a product. Not a sum of its parts. The Right models x The right Data x The right practices/engineering/ops/gov. The down side with a Product and multiplier effects is that if you neglect one dimension and put this to Zero. You get Zero. No Value. To go from theoretical value of a model to repetitive value in full scale operation is about operating and governing this tri-fecta in innovation to adoption cycles at scale. With the best decision flow for Safe and smooth innovation to adoption if AI with secure Value Capture and Capitalisation And now we get to do this all supported by the AI-act that will stipulate the minimum safe/compliant approach in this tri-fecta. Tjoho! This will really help companies mature AI ambitions in 2025! This will weed out AI-snake oil/ hype. Petra Dalunde @

How Graphs Taught Transformers to Think Outside the Node

Stefan Wendin

Driving transformation, innovation & business growth by bridging the gap between technology and business; combining system & design thinking with cutting-edge technologies; Graphs, AI, GenAI, LLM, ML 🥇

Architectural Strengths

Transformers: Pioneers of Global Reasoning

Key Attributes of Transformers in Graph Reasoning

GNNs: Masters of Local Inductive Biases

Key Advantages of GNNs

Hybrid Models: Bridging Global and Local Reasoning

Key Innovations of Hybrid Models

Encoding Innovations

GraphToken: Soft Prompting for Structured Data

Key Attributes of GraphToken

Text-Based Graph Encoding: The "Talk Like a Graph" Paradigm

Advantages of Text-Based Encoding

Recommended by LinkedIn

Hybrid Encoding: Integrating Graph Embeddings and Text

Key Innovations in Hybrid Encoding

The Road Ahead

Exploring Frontiers in Graph Reasoning

Enhanced Hybrid Architectures

Efficient Encoding Techniques

Domain-Specific Applications

Explainability and Cross-Modality

Unified Theoretical Frameworks

More articles by this author

Insights from the community

Others also viewed

Understanding Neural Networks by Building a Language Model from Scratch

Brief History In Time: Decoding the Evolution of Generative AI

Transformers Made Simple: A User-Friendly guide to Formal Algorithms for Transformers

Move Over Transformers: The Next Evolution in AI Architecture Is Here!

Hallucinations in LLMs: bug or feature?

Long Short-Term Memory explained

Demystifying the Add & Norm Block in the Transformer Neural Network Architecture: With Code

Transformers Simplified: A Guide to Attention Is All You Need

Part 3: How machines remember

Anatomy of the Beast with many heads! [with code]

Explore topics

Architectural Strengths

Transformers: Pioneers of Global Reasoning

Key Attributes of Transformers in Graph Reasoning

GNNs: Masters of Local Inductive Biases

Key Advantages of GNNs

Hybrid Models: Bridging Global and Local Reasoning

Key Innovations of Hybrid Models

Encoding Innovations

GraphToken: Soft Prompting for Structured Data

Key Attributes of GraphToken

Text-Based Graph Encoding: The "Talk Like a Graph" Paradigm

Advantages of Text-Based Encoding

Recommended by LinkedIn

Hybrid Encoding: Integrating Graph Embeddings and Text

Key Innovations in Hybrid Encoding

The Road Ahead

Exploring Frontiers in Graph Reasoning

Enhanced Hybrid Architectures

Efficient Encoding Techniques

Domain-Specific Applications

Explainability and Cross-Modality

Unified Theoretical Frameworks

I-JEPA: Advancing Human-Like AI Through Predictive World Models

Dec 10, 2024

Overcoming the Limitations of Softmax for Sharp Out-of-Distribution Performance in AI Systems

Oct 4, 2024

Building Our Own Knowledge System: Why We Took This Path

Sep 24, 2024

Solar Pro: High-Performance LLM on a Single GPU

Sep 13, 2024

OpenAI o1 Is Out: Embracing Inference-Time Scaling and the Future of AI Reasoning

Sep 12, 2024

Deep dive into LiGNN: Graph Neural Networks at LinkedIn

Feb 23, 2024

The Illusion of Progress: Why Playing It Safe Is the Riskiest Move of All

Feb 19, 2024

The Intersection of Innovation, Privacy, and Collaboration in South Korea's Tech Landscape

Jan 15, 2024

Integrating Behavioral Economics into AI

Jan 12, 2024

SOLAR 10.7B: the Sun of AI Rises in the East

Jan 8, 2024

Insights from the community

Others also viewed

Understanding Neural Networks by Building a Language Model from Scratch

Brief History In Time: Decoding the Evolution of Generative AI

Transformers Made Simple: A User-Friendly guide to Formal Algorithms for Transformers

Move Over Transformers: The Next Evolution in AI Architecture Is Here!

Hallucinations in LLMs: bug or feature?

Long Short-Term Memory explained

Demystifying the Add & Norm Block in the Transformer Neural Network Architecture: With Code

Transformers Simplified: A Guide to Attention Is All You Need

Part 3: How machines remember

Anatomy of the Beast with many heads! [with code]

Explore topics