Performance-Adaptive Sampling Strategy (PASS) for GNNs: Open sourcing PASS
March 7, 2022
Co-authors: Jaewon Yang, Minji Yoon, Sufeng Niu, Dash Shi, and Qi He
Graphs are a universal way to represent relationships among entities. Social graphs represent how people interact with each other, professional graphs represent how people collaborate, and so on. Graph Neural Networks (GNNs) are deep learning models that are specialized for understanding graphs. For a member in a social network, for instance, GNNs look at the member’s connections (1-hop neighbors in the graph) and the connections of connections (2-hop neighbors), and leverage this neighborhood information for a certain AI task such as search or recommendation. For example, if we want to make a job recommendation for Member A in the below figure, a GNN would use the member’s connections.
An example for using the member connection graph for a job recommendation task. PASS selects relevant connections (B, C, D) and drops irrelevant connections (E, F) for the task.
While GNNs are helpful when building models based on graphs, they do have some challenges in how they leverage a member’s neighbors. First, a GNN-based approach does not scale to real-world social networks. In many cases, one member has many, many connections and leveraging all of them does not scale. For example, a celebrity might have hundreds of millions of connections. A second challenge is that not every connection is relevant to the task at hand. For example, in the job recommendation task in the figure above, the member’s connections who work in very different fields, and may be personal friends, such as Members E and F, would be irrelevant for the task.
There are some existing methods that address the first issue by sampling a fixed number of neighbors, thereby limiting the scale of the inputs to the GNN, but the drawback of these samplers is that they do not consider which neighbors are more relevant for GNNs. A random sample of neighbors (for instance, ones including Members E and F in the example above) may produce a recommendation that’s less accurate than a sample that includes relevant neighbors (like Members B, C, and D).
To solve these problems, we developed a novel GNN method called “Performance-Adaptive Sampling Strategy,” or “PASS.” PASS uses an AI model to select relevant neighbors. The neighbor selection AI model decides whether to select a given neighbor or not by looking at the neighbor’s attributes. PASS’s AI model learns how to select neighbors that boost the GNN model’s predictive accuracy. An advantage of this approach is that it works well regardless of the specific task the GNN model is being used for.
We developed an efficient algorithm to train such neighbor selection AI models. In experiments on seven public benchmark graphs and two LinkedIn graphs, PASS outperformed the state-of-the-art GNN methods by 1.3%-10.4%. Furthermore, we also showed that PASS achieves robust accuracy even if the input graph contains noisy edges. When we added noisy edges to the benchmark graphs, PASS showed 2-3 times greater accuracy compared to the baseline methods. To our knowledge, this is the first method to learn to select neighbors to maximize a GNN’s predictive performance. In our experiments, PASS can achieve higher prediction accuracy by using fewer numbers of neighbors than other GNN models, even though the other GNN models use more neighbors. In other words, PASS shows “less is more.”
Today, we’re excited to open source the implementation of PASS to help researchers create more efficient, accurate GNN models. Since PASS opened up a new direction of selecting neighbors using AI, we hope to see more research work building on top of PASS. We will also work on gradually applying PASS to various GNN applications at LinkedIn. For more technical details, please refer to our KDD paper and a blog post from one of the paper’s authors from Carnegie Mellon University.