Introduction to Deep Learning for Graphs and Where It May Be Heading
In their wonderfully titled A Gentle Introduction to Deep Learning for Graphs, researchers from Italy’s University of Pisa present a clear and engaging tutorial on the main concepts and building blocks involved in neural architectures for graphs.
Graphs are everywhere. In chemistry and material sciences for example they are used to represent the molecular structure of a compound, protein interaction networks, as well as biological and biochemical associations. In social sciences, graph networks are widely used to represent people’s relationships, etc.
In a general sense, graphs are a powerful tool for representing rich and complex data produced by a variety of artificial and natural processes. A graph can be considered as a structured datatype that has nodes (entities that hold information) and edges (connections between nodes that also hold information) and therefore has a compositional nature as well as a relational nature.
The amount of information carried by such data together with the increasing availability of large repositories has motivated a recent surge of interest in deep learning models that can process graphs in an adaptive fashion.
Methodological and practical challenges remain. Ideally, learning models for graphs should be able to generalize to samples that can vary in size and topology. The University of Pisa researchers however found that information about node identity and ordering across multiple samples is rarely available. Additional challenges include restrictions to the differentiability of graphs and the potential increase of model complexity brought by graphs with loops, which are common.
All in all, dealing with graph data is challenging due mainly to its expressiveness and computational complexity. But at the same time, deep learning for graphs is an excellent field in which to try out new neural network methodologies, say the researchers.
The paper is of a tutorial nature that aims to introduce to the readers the topic of deep learning for graphs with a proper review of the historical literature as well as a top-down approach in its exposition, explained paper coauther Federico Errica, a PhD Student at the University of Pisa.
“Upon reading the literature, we noticed many available surveys and reviews of deep learning for graphs, but none of them was of introductory nature. Moreover, almost all of them lacked appropriate referencing to pioneering and consolidated works,” Errica and Davide Bacciu, the paper’s first author, told Synced in an email.
“We thought this could be the first work (to the best of our knowledge) of tutorial nature to introduce readers to the main concepts and architectural aspects of deep learning methods working on graphs, with special attention to foundational works.”
The paper covers most of the techniques used in previous studies without delving deep into technicalities, Errica explained. It focuses on unordered and non-positional graphs, where there is no total order defined over the nodes, and there exists no bijective mapping from edges to a set of natural numbers in the graphs. It also includes a summary of experimental evaluation, applications, and future research avenues.
Deep Learning for Graphs Has a Long-Standing History
The deep learning for graphs field is rooted in neural networks for graphs research and early 1990s works on Recursive Neural Networks (RecNN) for tree structured data. The RecNN approach was later rediscovered in the context of natural language processing applications. Starting with directed acyclic graphs, it has been progressively extended to more complex and richer structures.
The main problem in extending such approaches to general graphs (cyclic or acyclic, directed or undirected) was the processing of cycles, due to the mutual dependencies that occur among the variables defined in the neural recursive units. The earliest models to tackle this problem were the Graph Neural Network (GNN) and the Neural Network for Graphs (NN4G).
The GNN model is based on a state transition system similar to the RecNN, but it allows cycles in state computation. NN4G derives from the idea that mutual dependencies can be managed by leveraging the representations from previous layers in the architecture, so it can break the recursive dependencies in the graph loops with a multi-layered architecture.
These models have pioneered the field by establishing the foundations of two of the main approaches for graph processing — the recurrent approaches represented by GNN and the feedforward approaches represented by NN4G. The latter in particular has now become the predominant approach — under the umbrella of graph convolutional (neural) networks.
Representation Learning in Graphs: A Generalized Formulation and an Architecture Roadmap
To deal with flat data, the popular approach is Convolutional Neural Networks (CNN). But for graph data with structured domains that usually contain more information than flat data, an adaptive processing of the structure is needed to exploit this additional information.
Representation learning for this purpose has attracted interest in the research community because it allows a system to automatically learn to discover the representations needed for feature detection or classification from raw data to solve a task. In the specific case of graphs, ideally no assumptions about the size or topology should be made in order to ensure their general applicability, so graph processing methods ought to be designed in absence of known and fixed causal dependencies.
Regardless of the training objectives, almost all deep learning methods on graphs ultimately produce node representations, or states. This process is referred to as performing an isomorphic transduction of the graph. These representations are the result of visiting graph nodes in parallel by traversing the graph without any specific node ordering.
The overall mechanism shared by all graph learning methods, as depicted in Figure 2, tackles nodes, edges and graph related tasks and is extremely useful. For instance, a graph representation can be easily computed by aggregating its node representations. The work of researchers can therefore revolve around the definition of deep learning models that automatically extract the relevant features from a graph. In the paper, such models are referred to as Deep Graph Networks
DGN refers to a subset of the architecture that produces the final internal node representations. These can be obtained by either connecting all the internal representations computed at each layer or by taking the internal representations produced at the very last layer. Any DGN can be combined with a predictor that solves a task by using the final internal node representations as input.
The authors divide DGNs into three broad categories: Deep Neural Graph Networks (DNGNs), which include models inspired by neural architectures; Deep Bayesian Graph Networks (DBGNs), whose representatives are probabilistic models of graphs; and Deep Generative Graph Networks (DGGNs), which include generative approaches of graphs that can leverage both neural and probabilistic models.
In order to more seamlessly process information of graphs that vary both in size and shape, the researchers tried to build models that work locally at the node level rather than at the graph level — so that all the model cares about is the relationship between a node and its neighbourhood. All the approaches under these categories in the paper are based on local relations and iterative processing to diffuse node contexts across the graph, regardless of their neural or probabilistic nature.
Basic Building Blocks for Modern Deep Learning Architectures for Graphs
The main constituents of local graph learning models by and large determine the kind of representations a model can compute. The paper therefore presents some of the basic building blocks common to such architectures, and explores how they can be assembled or combined to compose an effective learning model for graphs.
The way models aggregate neighbours to compute hidden node representations is at the core of local graph processing. The general neighbourhood aggregation scheme entails that arcs are unattributed or contain the same information. This assumption does not hold in general as arcs in a graph often contain additional information about the nature of the relation, thus mechanisms that leverage arc labels to enrich node representations are needed, according to the paper.
One solution that gained popularity especially in language related tasks is attention mechanisms, which assign a relevance score to each part of the input of a neural layer. Attention to the aggregation function can be applied when the input is graph-structured, and this results in a weighted average of the neighbours where individual weights are a function of a node and its neighbour.
When graphs are large and dense however, it can be unfeasible to perform aggregations over all neighbours for each node. Therefore, alternative strategies such as neighbourhood sampling are needed to reduce the computational burden.
The authors also introduce methods such as graph pooling, which reduces the dimension of a graph after a DGN layer and can be used to discover important communities in a graph, imbue this knowledge in learned representations, and reduce the computational costs in large scale structures.
The authors summarize ways to exploit graph embeddings at different layers. A straightforward approach is to use the graph embedding of the last layer as a representative for the whole graph, while another strategy is to view all the intermediate representations as a sequence and let the model learn a final graph embedding as the output of a Long Short-Term Memory network on the sequence.
Tasks in Graph Representation Learning
The paper explores unsupervised, supervised, generative and adversarial learning tasks for graph representation. Link prediction for instance aims at building node representations that are similar if an arc connects the associated nodes and is a common unsupervised loss used by graph neural networks. Supervised graph learning tasks meanwhile include node classification, graph classification and graph regression. The researchers say generative learning, or learning how to generate a graph from a dataset of available samples, involves the most complex tasks.
The authors summarize and categorize different models for tackling these tasks according to four key properties — context diffusion method, how embeddings are computed, how layers are constructed, and the nature of the approach. They also include additional possible model properties such as ability to handle edges, perform pooling, and attend and sample neighbours.
Applications and Future Directions
Traditional flat or sequential data delivery can’t fully satisfy today’s demanding deep learning models — and graphs are emerging as the solution. The authors believe graph learning can be advantageously applied in domains such as chemistry and drug design, natural language processing, spatio-temporal forecasting, security, social networks, and recommender systems.
“Chemistry and Drug design is arguably the field where more data is and will be available, so we expect the field of deep learning for graphs to influence the discovery of new valid molecules or fast identification of molecular properties,” Errica and Bacciu told Synced in an email.
For the field to move further to a maturity phase, the authors believe certain aspects of deep learning for graphs should be prioritized. They highlight the challenges of formalizing different adaptive graph processing models under a unified framework; and establishing a set of robust benchmarks to test and assess models in fair, consistent and reproducible conditions.