The never-ending list of paper to read
Why use a proper reference management software when you can use post-it or a markdown file?
- 2026-04-09: Sum of Gaussian vectors and large sets (2026) by A. Song. (via Y. Laguel) Every subgaussian is the sum of three Gaussian through convex bodies analysis.
- 2026-04-09: Log-Concave Sampling (2026) by S. Chewi. (via Y. Laguel) Book on complexity theory for log-concave sampling.
- 2026-04-08: Lipschitz minimization and the Goldstein modulus (2024) by S. Kong, A. Lewis. (via Y. Laguel). Goldstein methods minimize Lipschitz functions using a subgradient built from nearby gradients, and this paper introduces a new modulus based on those subgradients that measures slope.
- 2026-04-04: Muon Dynamics as a Spectral Wasserstein Flow (2026) by G. Peyré. Muon seen as a particle system.
- 2026-04-04: Bilevel Optimization: Recent Algorithmic & Theoretical Advances, and Emerging Applications in Training LLMs (????) by M. Hong. BO with a glimpse of LLM training.
- 2026-04-04: Hamilton-Jacobi-Bellman Equation: Reinforcement Learning and Diffusion Models (2026) by D. Montero. Blogpost of HJB pov to study diffusion.
- 2026-04-04: The adjoint state method for parametric definable optimization without smoothness or uniqueness (2026) by J. Bolte, E. Pauwels, C. Traoré.
- 2026-03-25: Angular Steering: Behavior Control via Rotation in Activation Space (2025) by H. Vu, T. M. Nguyen. An alternative to difference-of-means.
- 2026-02-12: A Fully First-Order Method for Stochastic Bilevel Optimization (2023) by J. Kwon, D. Kwon, S. Wright, R. Nowak. A constrained pov to solve bilevel optimization.
- 2026-02-11: Convergence Rates for Stochastic Proximal and Projection Estimators (2026) by D. Morales, P. Pérez-Aros, A. Vilches.
- 2026-02-06: Model Organisms for Emergent Misalignment (2025) by E. Turner, A. Soligo, M. Taylor, S. Rajamanoharan, N. Nanda. (Mis)alignment with a single LoRA.
- 2026-02-06: Statistical Learning Theory in Lean 4: Empirical Processes from Scratch (2026) by Y. Zhang, J. D. Lee, F. Liu. Basic learning theory results in Lean.
- 2026-01-09: Efficient Transferable Optimal Transport via Min-Sliced Transport Plans (2025) by X. Liu, E. Akbari, R. D. Martin, N. NaderiAlizadeh, S. Kolouri.
- 2026-01-09: LapSum – One Method to Differentiate Them All: Ranking, Sorting and Top-k Selection (2025) by L. Struski, M. B. Bednarczyk, I. T. Podolak, J. Tabor
- 2026-01-08: Large-Scale Methods for Distributionally Robust Optimization (2020) by D. Levy, Y. Carmon, J. C. Duchi, A. Sidford.
- 2025-12-19: Token Sample Complexity of Attention (2025) by L. Bohbot, C. Letrouit, G. Peyré, F-X. Vialard. deep? iid -> Markov? exchangeable?
- 2025-12-18: Towards Evaluating the Robustness of Neural Networks (2016) by N. Carlini, D. Wagner.
- 2025-12-15: Stochastic Model-Based Minimization of Weakly Convex Functions (2018) by D. Davis, D. Drusvyatskiy.
- 2025-12-09: To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models (2025) by A. Hedström, S. I. Amoukou, T. Bewley, S. Mishra, M. Veloso.
- 2025-12-05: Open Problems in Mechanistic Interpretability (2025) by L. Sharkey et al.
- 2025-12-05: Machine Unlearning under Overparameterization (2025) by J. L. Block, A. Mokhtari, S. Shakkottai.
- 2025-12-04: Training Neural Networks at Any Scale (2025) by T. Pethick, K. Antonakopoulos, A. Silveti-Falls, L. C. Vankadara, V. Cevher.
- 2025-12-04: Machine Unlearning via Information Theoretic Regularization (2025) by S. Xu, T. Strohmer.
- 2025-11-14: The Principles of Diffusion Models (2025) by C. Lai, Y. Song, D. Kim, Y. Mitsufuji, S. Ermon.
- 2025-11-13: Implicit Neural Representations with Periodic Activation Functions (2020) by V. Sitzmann, J. N. P. Martel, A. W. Bergman, D. B. Lindell, G. Wetzstein.
- 2025-11-06: Inexact Gradient Projection and Fast Data Driven Compressed Sensing (2017) by M. Golbabaee, M. E. Davies.
- 2025-11-04: On the Eigenvalue Decay Rates of a Class of Neural-Network Related Kernel Functions Defined on General Domains (2023) by Y. Li, Z. Yu, G. Chen, Q. Lin.
- 2025-11-03: Nonasymptotic Convergence Rates for Plug-and-Play Methods With MMSE Denoisers (2025) by H. Pritchard, R. Parhi.
- 2025-10-22: Optimal Zeroth-Order Bilevel Optimization (2025) by A. Aghasi, J. Kwon, S. Ghadimi. Now let me fetch the two non-arxiv URLs:I have everything. Here is the formatted list:
- 2025-09-29: Optimal Multi-Distribution Learning (2023) by Z. Zhang, W. Zhan, Y. Chen, S. S. Du, J. D. Lee.
- 2025-09-25: Stochastic Bilevel Optimization with Heavy-Tailed Noise (2025) by Z. Liu, L. Luo.
- 2025-09-15: LoRA Training in the NTK Regime has No Spurious Local Minima (2024) by U. Jang, J. D. Lee, E. K. Ryu.
- 2025-09-15: LoRA Training Provably Converges to a Low-Rank Global Minimum or It Fails Loudly (But it Probably Won’t Fail) (2025) by J. Kim, J. Kim, E. K. Ryu.
- 2025-09-08: Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression (2025) by J. Kim, D. Meunier, A. Gretton, T. Suzuki, Z. Li.
- 2025-09-08: Distributionally Robust Optimization (2024) by D. Kuhn, S. Shafiee, W. Wiesemann.
- 2025-09-01: Gradient Descent Follows the Regularization Path for General Losses (2020) by Z. Ji, M. Dudík, R. E. Schapire, M. Telgarsky.
- 2025-07-30: On the Lower Bound of Minimizing Polyak-Łojasiewicz Functions (2023) by P. Yue, C. Fang, Z. Lin.
- 2025-07-30: BLUR: A Bi-Level Optimization Approach for LLM Unlearning (2025) by H. Reisizadeh, J. Jia, Z. Bu, B. Vinzamuri, A. Ramakrishna, K. Chang, V. Cevher, S. Liu, M. Hong.
- 2025-07-30: A Generalization Theory for Zero-Shot Prediction (2025) by R. Mehta, Z. Harchaoui.
- 2025-07-30: Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (2024) by Anthropic team
- 2025-06-11: Flowing Datasets with Wasserstein over Wasserstein Gradient Flows (2025) by C. Bonet, C. Vauthier, A. Korba.
- 2025-05-23: Learning with Local Search MCMC Layers (2025) by G. Vivier-Ardisson, M. Blondel, A. Parmentier.
- 2025-05-23: Stochastic Approximation Beyond Gradient for Signal Processing and Machine Learning (2023) by A. Dieuleveut, G. Fort, E. Moulines, H. Wai.
- 2025-05-21: On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions (2017) by F. Bach.
- 2025-05-21: Convergence beyond the Over-Parameterized Regime using Rayleigh Quotients (2022) by D. A. R. Robin, K. Scaman, M. Lelarge.
- 2025-05-21: Gradient Descent Provably Optimizes Over-parameterized Neural Networks (2018) by S. S. Du, X. Zhai, B. Poczos, A. Singh.
- 2025-05-21: DFWLayer: Differentiable Frank-Wolfe Optimization Layer (2023) by Z. Liu, L. Liu, X. Wang, P. Zhao.