Seven ICML-2025 Papers Advance Graph Learning, Transformers, and Physics-aware Models

June 4, 2025

Researchers of AC and MLS from the KI Institute will present seven papers at ICML 2025, in Vancouver, Canada. The papers showcase progress in graph learning, transformers, and physics-aware models.

Graph learning

Message-passing neural networks face three common issues: oversmoothing, which makes node representations look alike after many layers; oversquashing, which compresses signals that travel across many hops into a single small vector; and underreach, where information from distant nodes never arrives at all. Adaptive message passing [1] tackles these problems by letting each node decide, layer by layer, how far messages should travel and which neighbours to ignore, keeping representations sharp without altering the graph itself.

However, if the graph contains errors, predictions can stay biased long after training. To address this problem, Cognac [2], a corrective-unlearning routine, surgically removes the influence of poisoned or mislabelled nodes; even when only 5 % of them are identified, model accuracy nearly returns to its clean-data level.

Two further studies question how progress is measured. In fact, it was shown that popular benchmarks dominated by tiny molecule graphs say little about real-world impact [3]. Moreover, in the spotlight paper [4] it was shown that supposedly “complex” knowledge graph queries in current benchmarks mostly collapse into predicting a single missing edge. The new benchmarks proposed by the authors demand true multi-hop reasoning and reveal that state-of-the-art methods struggle with them, leaving room for advancements.

Transformers

To adapt a pre-trained language model to a specialised task, practitioners insert a compact adapter module into each layer. Its weights start at zero, behaving like a blank page that leaves the backbone untouched, and that is updated only during fine-tuning. A formal justification for this zero-initialisation strategy is given in [5], where it is also shown that letting the adapter learn a non-linear prompt yields the best accuracy compared to linear approaches.

To tailor a generative Transformer for conditional text generation, Tractable Transformers (Tracformer) [6], incorporates a sparse encoder that simultaneously models nearby and long-range context. This dual-scale view lets the model answer conditional probability queries it never encountered during training, and it sets new state-of-the-art results in text modelling, outperforming recent diffusion and autoregressive baselines.

 

Physics-aware models

To build physics-aware models, physical principles are directly embedded into the loss used to train machine-learned interatomic potentials. They add two terms—one enforcing a Taylor-series expansion of the potential energy, the other ensuring conservative forces—which together raise accuracy and robustness with sparse data, reduce reliance on large datasets, and improve molecular-dynamics simulations. [7]



[1] Errica, F., Christiansen, H., Zaverkin, V., Maruyama, T., Niepert, M., & Alesiani, F. (2023). Adaptive message passing: A general framework to mitigate oversmoothing, oversquashing, and underreaching. arXiv preprint arXiv:2312.16560. ICML 2025.

[2] Kolipaka, V., Sinha, A., Mishra, D., Kumar, S., Arun, A., Goel, S., & Kumaraguru, P. (2024). A Cognac shot to forget bad memories: Corrective Unlearning in GNNs. arXiv preprint arXiv:2412.00789. ICML 2025.

[3] Bechler-Speicher, M., Finkelshtein, B., Frasca, F., Müller, L., Tönshoff, J., Siraudin, A.,  Zaverkin, V., Bronstein, M.,Niepert,N.,Perozzi B.,Galkin,M.,  Morris, C. (2025). Position: Graph learning will lose relevance due to poor benchmarks. arXiv preprint arXiv:2502.14546. ICML 2025.

[4] Gregucci, C., Xiong, B., Hernandez, D., Loconte, L., Minervini, P., Staab, S., & Vergari, A. (2024). Is Complex Query Answering Really Complex?. arXiv preprint arXiv:2410.12537. ICML 2025.

[5] Diep, N. T., Nguyen, H., Nguyen, C., Le, M., Nguyen, D. M., Sonntag, D., Niepert,M., Ho, N. (2025). On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation. arXiv preprint arXiv:2502.03029. ICML 2025.

[6] Liu, A., Liu, X., Zhao, D., Niepert, M., Liang, Y., & Broeck, G. V. D. (2025). Tractable Transformers for Flexible Conditional Generation. arXiv preprint arXiv:2502.07616. ICML 2025.

[7] Takamoto, M., Zaverkin, V., & Niepert, M. (2024). Physics-Informed Weakly Supervised Learning for Interatomic Potentials. arXiv preprint arXiv:2408.05215. ICML 2025.

To the top of the page