One Piece of Math a Day

This is a mirror of the daily Samuel Vaiter's serie available on twitter and bluesky. One piece of math = one slide = one post every (working) day. All content is available under the CC BY-SA license.

January 2025

A one line proof of Sum of Two Squares Thm

2025-01-13

Heath-Brown/Zagier proofs of Fermat’s sum of two squares theorem relies on the fact that the cardinalities of a set and the fixed points of an involution on it have the same parity. [ref]

thumbnail

Typical complexities when analyzing algorithms

2025-01-10

Algorithm (asymptotic) complexity typically measures time or space required as input size grows. They are measured against a model of computation, typically a Turing machine or variations.

thumbnail

Tropical geometry

2025-01-09

Tropical geometry studies algebraic varieties through the "tropical semiring" where addition is min (or max) and multiplication is the regular addition. It transforms polynomial eqs into piecewise linear structures and is linked to dynamic programming. [ref]

thumbnail

Brownian motion

2025-01-08

Brownian motion is the fundamental martingale that arises in almost all areas of mathematics with a stochastic aspect, used in biology, passing by statistical mechanics and diffusion model. [ref]

thumbnail

Nadaraya—Watson kernel estimator

2025-01-07

The Nadaraya-Watson estimator is linear local averaging estimator relying on a pointwise nonnegative kernel. Most of the time, a box or Gaussian kernel is used. [ref]

thumbnail

Partly smooth function (in the convex case)

2025-01-06

The concept of partly smooth functions (Lewis '02) unifies many nonsmooth functions, capturing their geometry: smooth along the manifold and sharp normal to it. It allows fine sensitivity analysis results. In the convex case, it generalizes sparsity.

thumbnail

Inversion in the complex plane

2025-01-03

Complex transformation lead to fun visual effects. Inversion is a transformation that maps a point to its reflection through a circle. It is conformal, meaning it preserves angles. [ref]

thumbnail

Basic Game of Life demo

2025-01-02

The Game of Life is a cellular automaton devised by John Conway in 1970. It consists of a grid of cells that evolve according to simple deterministic rules. Despite the simplicity of the rules, the Game of Life can exhibit complex behavior. [ref]

thumbnail

Smoothstep functions

2025-01-01

Smoothstep functions are a family of functions that interpolate between 0 and 1 smoothly. They are used in computer graphics to create smooth transitions between colors or shapes. [ref]

thumbnail

December 2024

Mandelbrot zoom

2024-12-31

The Mandelbrot set is a fractal defined by iterating the complex map z² + c. Zooming in reveals patterns and self-similarity at all scales. [ref]

thumbnail

Sierpinski triangle

2024-12-30

The Sierpinski triangle is a fractal with the property of self-similarity. It is constructed by iteratively removing the central triangle of each triangle in a sequence of triangles. [ref]

thumbnail

Cyclide

2024-12-27

A cyclide is a surface that is a generalization of the torus and can be defined as the envelope of a family of spheres.

thumbnail

Whitney umbrella

2024-12-26

The equation x² − y²z = 0 describes the Whitney umbrella. It is a ruled surface with a cusp at the origin.

thumbnail

Fermat’s Christmas theorem

2024-12-25

Fermat's theorem on sums of two squares is a result of number theory that was first described in a letter of Fermat to Mersenne on December 25, 1640. It was first proved a century later by Euler in 1749. [ref]

thumbnail

Moebius strip

2024-12-24

The Möbius strip is a one-sided surface with no boundaries, formed by twisting a strip of paper 180° and joining its ends.

thumbnail

Cayley singular cubic

2024-12-23

The algebraic surface (x+y+z)² + xyz = 0 is one of Cayley singular cubic. Cayley classifies them in 23 cases. [ref]

thumbnail

Spirograph

2024-12-20

A spirograph is a parametric curve generated by tracing a point fixed on a circle as it rolls along the inside or outside of another circle. The resulting pattern depends on the radius of the circles and the point's position. [ref]

thumbnail

Implicit bias of gradient descent on OLS

2024-12-19

When optimization problems have multiple minima, algorithms favor specific solutions due to their implicit bias. For ordinary least squares (OLS), gradient descent inherently converges to the minimal norm solution among all possible solutions. [ref]

thumbnail

Linearly inducible orderings

2024-12-18

Linearly inducible orderings of n points in d-dimensional space are determined by projections onto a reference vector. Their combinatorial are typically less than the number of permutations n! (except when d ≥ n-1). [ref]

thumbnail

An increasing continuous singular function

2024-12-17

There exists f:[0,1]→[0,1] strictly increasing, continuous function such that its derivative is 0 almost everywhere. [ref]

thumbnail

Gauss—Lucas theorem

2024-12-16

Gauss-Lucas theorem states that the roots of the derivative of a polynomial with complex coefficients always lie within the convex hull of the original polynomial's roots.

thumbnail

Automatic differentiation: forward mode

2024-12-13

Automatic differentiation in forward mode computes derivatives by breaking down functions into elem operations and propagating derivatives alongside values. It’s efficient for functions with fewer inputs than outputs and for Jacobian-vect prod, using for instance dual numbers.

thumbnail

Sinkhorn—Knopp algorithm

2024-12-12

The Sinkhorn—Knopp algorithm is an iterative method for approximating entropic optimal transport solutions. It balances two dual vectors through iterative rescaling leading to a matrix that respects marginals. Thanks to its parallel nature, it’s easy to implement on GPU for ML.

thumbnail

Perlin's noise

2024-12-11

Perlin’s noise generates smooth textures by assigning random gradient vectors to grid points and blending values through smooth interpolation. It is used in graphics, terrain, and textures generation. [ref]

thumbnail

Newton's method for root finding

2024-12-10

Newton's method is an iterative root-finding algorithm that uses the derivative of a function to approximate a root. It converges quadratically, but may fail to converge or converge to a wrong root if the initial guess is not close enough.

thumbnail

Kuratowski's theorem

2024-12-09

Kuratowski's theorem states that a graph is planar (can be drawn on a plane without edge crossings) iff it does not contain a subgraph that is a subdivision of either K₅ or K₃,₃. [ref]

thumbnail

Zonotopes and Zonoids

2024-12-06

A zonoid is a convex body that is a limit of zonotope (polytopes formed by the Minkowski sum of line segments). Bourgain et al. gave a precise number of segments required to approximate a zonoid by a zonotope in Hausdorff metric for a precision. [ref]

thumbnail

Morley's theorem

2024-12-05

Morley's theorem (1899) states that the intersections of the adjacent angle trisectors of any triangle defines an equilateral triangle. [ref]

thumbnail

A one line proof of Cauchy—Schwarz

2024-12-04

Classical trinom proof of Cauchy-Schwarz is boring! It can be proved using only the squared norm of the addition. [ref]

thumbnail

Some image datasets in supervised learning

2024-12-03

There are a lot of different datasets in computer vision, with various dimensions, number of classes, text annotations or different kind of structural information (e.g., bounding boxes). Some popular general datasets are MNIST, CIFAR10/100, Pascal VOC, ImageNet and COCO.

thumbnail

Birkhoff contraction theorem

2024-12-02

Birkhoff's contraction theorem states that linear nonnegative (i.e. such that the image of the cone is contained in itself) mappings are contraction with respect to the Hilbert metric. In particular, it allows to use Banach fixed point theorem .[ref]

thumbnail

November 2024

Bigram model

2024-11-29

A bigram model is a language model that predicts the next token based only on the previous one. It is an example of a discrete Markov chain, and was in fact one of the first motivation of Markov himself. [ref]

thumbnail

Two Fenchel dual problems

2024-11-28

When trying to compute a dual of a composite problem involving two functions and two linear operators (e.g., Total Variation regularization of inverse problems), it is sometimes useful to consider either of the operators as the dual operator.

thumbnail

Shannon’s entropy

2024-11-27

Shannon's entropy measures the uncertainty or information content in a probability distribution. It's an important concept in data compression and communication introduced in the seminal paper “A Mathematical Theory of Communication”. [ref]

thumbnail

Banach—Steinhaus theorem

2024-11-26

The Banach-Steinhaus theorem, or Uniform Boundedness Principle, is a key result in functional analysis. It states that for a family of continuous linear operators on a Banach space, pointwise boundedness implies uniform boundedness. [ref]

thumbnail

Hilbert metric

2024-11-25

The Hilbert projective (constant on rays) metric is a way to measure distances within a convex cone in a vector space. It is an important tool in hyperbolic geometry and for Perron-Frobenius theory. [ref]

thumbnail

Mersenne primes & Lucas—Lehmer test

2024-11-22

Mersenne primes are of the form Mₙ = 2ⁿ - 1, where n itself is a prime number. The Lucas-Lehmer test is an efficient method to check if Mₙ is prime. It iterates a sequence mod Mₙ to determine primality as used by the GIMPS project. [ref]

thumbnail

Bubble sort

2024-11-21

Bubble Sort is an adaptative sorting algorithm that compares adjacent items, and swaps them if they're in the wrong order. It is easy to understand and implement, but inefficient in both the worst and average cases. [ref]

thumbnail

Tokenizer

2024-11-20

Tokenization is the process of breaking text into smaller units - tokens - which can be words, subwords, or characters. It's a step in NLP, transforming raw text into token embeddings. The classical algorithm is Byte-Pair Encoding at the subword level. [ref]

thumbnail

Lorenz attractor

2024-11-19

The Lorenz system is a model that describes chaotic behavior in atmospheric convection. Its study reveals how small changes in initial conditions can lead to different outcomes. [ref]

thumbnail

Nets, Covering & Packing Numbers

2024-11-18

Covering numbers can be thought as way to quantify “compactness” of a set in a complete metric space. It is also closely related to packing numbers. [ref]

thumbnail

(Girard)-Hutchinson estimator

2024-11-15

Hutchinson's estimator is a stochastic method for approximating the trace of large square matrices. It requires only matrix-vector products, allowing to compute the trace of implicitly defined matrices. It is useful in several settings (cross-validation, trace of the Hessian, etc). [ref]

thumbnail

Graph Isomorphism

2024-11-14

Graph isomorphism problem asks the following question: given two finite graphs, are they isomorphic? There is no definite answer if it is in P, NP-complete or in another class (for the moment, it defines its proper complexity class GI). [ref] (in french)

thumbnail

Cumulative and quantile functions

2024-11-13

The cumulative distribution function (CDF) gives the probability that a random variable is less than or equal to a value. It characterizes a distribution. The quantile function (generalized inverse of the CDF) returns the value for a given cumulative probability.

thumbnail

Universal Approximation Theorem

2024-11-12

The Universal Approximation Theorem states that a feedforward neural network with a single hidden layer, using a non-linear activation function, can approximate any continuous function on a compact, given enough neurons. [ref]

thumbnail

Prim’s algorithm

2024-11-11

Prim's algorithm is a greedy method for finding the minimum spanning tree of a weighted, connected graph. It starts with any node and repeatedly adds the smallest edge that connects a visited node to an unvisited node. [ref]

thumbnail

What implies what? Probability version

2024-11-08

Convergence of random variables (and associated probability measures) comes in several modes: almost sure, Lp, in probability, but also in Wasserstein distance or Total Variation in the space of measures. [ref]

thumbnail

Harmonic function

2024-11-07

Harmonic functions solve Laplace's equation Δu = 0 and are infinitely differentiable (and analytic). They exhibit no local maxima or minima within the domain, achieving extrema only on the boundary. Under Dirichlet boundary conditions, Laplace solution is uniquely determined. [ref]

thumbnail

Convolution and Fourier

2024-11-06

Convolution theorem: Fourier transform of the convolution of two functions (under suitable assumptions) is the product of the Fourier transforms of these two functions. [ref]

thumbnail

Dual numbers

2024-11-05

Dual numbers correspond to the completion of the real line with an nilpotent element ε different from 0. It can be thought as a universal linearization of functions, or as the forward-mode of automatic differentiation. [ref]

thumbnail

Convergence of iterative differentiation

2024-11-04

Convergence of iterates does not imply convergence of the derivatives. Nevertheless, Gilbert (1994) proposed an interversion limit-derivative theorem under strong assumption on the spectrum of the derivatives. [ref]

thumbnail

Leader-follower games

2024-11-01

Leader-follower games, also known as Stackelberg games, are models in game theory where one player (the leader) makes a decision first, and the other player (the follower) responds, considering the leader’s action. This is one the first instance of bilevel optimization.

thumbnail

October 2024

Matrix mortality problem

2024-10-31

The Matrix Mortality Problem asks if a given set of square matrices can multiply to the zero matrix after a finite sequence of multiplications of elements. It is is undecidable for matrices of size 3x3 or larger. [ref]

thumbnail

Boolean satisfiability (SAT)

2024-10-30

The SAT problem asks whether a logical formula, composed of variables and (AND, OR, NOT), can be satisfied by assigning True to the variables. SAT is NP-complete along with 3-SAT, with clauses of three literals, while 2-SAT, is in P! [ref]

thumbnail

Krein—Milman theorem

2024-10-29

The Krein-Milman theorem states that any compact convex subset of a locally convex topological vector space is the closed convex hull of its extreme points. In particular, the set of extreme points of a nonempty compact convex set is nonempty. [ref]

thumbnail

Stone-Weierstrass theorem

2024-10-28

The Stone-Weierstrass theorem states that any continuous function on a compact Hausdorff space can be uniformly approximated by elements of a subalgebra, provided the subalgebra separates points and contains constants. [ref]

thumbnail

Nesterov Accelerated Gradient Descent

2024-10-25

The Nesterov Accelerated Gradient (NAG) algorithm refines gradient descent by using an extrapolation step before computing the gradient. It leads to faster convergence for smooth convex functions, achieving the optimal rate of O(1/k^2). [ref]

thumbnail

Pinhole camera model

2024-10-24

The pinhole camera model illustrates the concept of projective projection. Light rays from a 3D scene pass through an (infinitely) small aperture — the pinhole — and project onto a 2D surface, creating an inverted image on the focal plan. [ref]

Twitter Post

thumbnail

Łojasiewicz inequality

2024-10-23

Łojasiewicz inequality provides a way to control how close points are to the zeros of a real analytic function based on the value of the function itself. Extension of this result to semialgebraic or o-minimal functions exist. [ref]

Twitter Post

thumbnail

0x5F3759DF

2024-10-22

The fast inverse square root trick from Quake III is an algorithm to quickly approximate 1/√x, crucial for 3D graphics (normalisation). It uses bit-level manipulation and Newton's method for refinement. [ref]

Twitter Post

thumbnail

Monge problem: existence? uniqueness?

2024-10-21

A Monge map, i.e., a solution to optimal transport Monge problems, may not always exist, be unique, or be symmetric with respect to the source and target distributions. It was one of the motivation to introduce Kantorovich relaxation. [ref]

Twitter Post

thumbnail

Attouch's theorem

2024-10-18

Attouch’ theorem relates the (epi-) convergence of convex lsc functions to the convergence of their subdifferential seen as graph on the product between the space and its dual. [ref]

Twitter Post

thumbnail

Brenier's theorem

2024-10-17

Brenier's theorem states that the optimal transport map between two probability measures for quadratic cost is the gradient of a convex function. Moreover, it is uniquely defined up to a Lebesgue negligible set. [ref]

Twitter Post

thumbnail

Weierstrass function

2024-10-16

The Weierstrass function is a famous example of a continuous function that is nowhere differentiable. It defies intuition by being continuous everywhere but having no tangent at any point. Introduced by Karl Weierstrass in 1872.

Twitter Post

thumbnail

Proximal operator

2024-10-15

The proximal operator generalizes projection in convex optimization. It converts minimisers to fixed points. It is at the core of nonsmooth splitting methods and was first introduced by Jean-Jacques Moreau in 1965. [ref]

Twitter Post

thumbnail

Berry—Esseen Theorem (i.i.d version)

2024-10-14

The Berry-Esseen theorem quantifies how fast the distribution of the sum of independent random variables converges to a normal distribution, as described by the Central Limit Theorem (CLT). It provides an upper bound depending on skewness and sample size. [ref]

Twitter Post

thumbnail

Bilevel Optimization: optimistic vs pessimistic

2024-10-11

Bilevel optimization problems with multiple inner solutions come typically in two flavors: optimistic and pessimistic. Optimistic assumes the inner problem selects the best solution for the outer objective, while pessimistic assumes the worst-case solution is chosen.

Twitter Post

thumbnail

Geometry — according to Klein

2024-10-10

Klein geometry, part of the Erlangen Program introduced by F. Klein, studies geometry via transformation groups. A Klein geometry is defined as a space X=G/H, where G is a Lie group acting transitively on X. This unifies classical geometries under symmetry.[ref]

Twitter Post

thumbnail

Fenchel conjugate in 1D

2024-10-09

The Fenchel conjugate f*(y) is the maximum vertical gap between the line yx and the graph of f. This maximum occurs where the line is tangent to the curve, meaning f'(x) = y. It captures the largest gap at that tangent point.

Twitter Post

thumbnail

Gradient conjecture of R. Thom

2024-10-08

The Thom gradient conjecture states that the trajectory of a gradient flow of a real-analytic function must converge such that the secant converge also. It was proved by Kurdyka, Mostowski, and Parusiński in 2000. [ref]

Twitter Post

thumbnail

Johnson–Lindenstrauss lemma (1984)

2024-10-07

The Johnson–Lindenstrauss Lemma states that a set of high-dimensional points can be mapped into a much lower-dimensional space while approximately preserving pairwise distances. This is useful for dimensionality reduction, clustering, etc. [ref]

Twitter Post

thumbnail

Stein's lemma

2024-10-04

Stein's Lemma states that for a normally distributed variable X, the expected value E[Xg(X)] = E[g’(X)] for any g absolutely continuous (derivative a.e.) such that E[|g’(X)|] < ∞. It is a central result for characterizing (close-to) Gaussian data [ref]

Twitter Post

thumbnail

Erdős–Rényi(–Gilbert) graph models

2024-10-03

Erdős–Rényi-Gilbert graph models are two random models introduced in 1959 sharing a lot of common properties. In particular, G(n, p) exhibits a sharp threshold for the connectedness for p ≈ n/ln n. [ref]

Twitter Post

GitHub File

thumbnail

Law(s) of Large Numbers

2024-10-02

The law of large numbers tells that the empirical mean of an integrable random sample converges towards its mean in probability (weak LLN) and almost surely (strong LLN). [ref]

Twitter Post

GitHub File

thumbnail

Tarski—Seidenberg: logical statement

2024-10-01

The Tarski-Seidenberg theorem in logical form states that the set of first-order formulas over the real numbers is closed under quantifier elimination. This means any formula with quantifiers can be converted into an equivalent quantifier-free formula. [ref]

Twitter Post

thumbnail

September 2024

Davis—Kahan sin(θ) inequality

2024-09-30

The Davis-Kahan sin(θ) theorem bounds the difference between the subspaces spanned by eigenvectors of two symmetric matrices, based on the difference between the matrices. It quantifies how small perturbations in a matrix affect its eigenvectors. [ref]

Twitter Post

thumbnail

Grid search

2024-09-27

Grid search is the most popular method for hyperparameter optimization in machine learning. Using a performance metric, it simply aims to “exhaustively” evaluate this metric on a discretisation of the sample space. It suffers from the curse of dimensionality.

Twitter Post

thumbnail

Determinant vs Permanent

2024-09-26

The determinant and permanent of a matrix may look similar, but they differ fundamentally: the determinant includes the signatures of permutations, while the permanent uses only addition. One is easy to compute, the others expected to be very hard. [ref]

Twitter Post

thumbnail

Chordal slope lemma

2024-09-25

If you pick three points along the curve in increasing order, the slope of the line between the first two points will always be less than or equal to the slope of the line between the last two points (chordal slope's lemma)

Twitter Post

thumbnail

Finite-difference in 1D (forward)

2024-09-24

Finite difference approximation is a numerical method to compute derivatives by discretization. The forward difference illustrated here is a first order approximation of the order of the stepsize. [ref]

Twitter Post

thumbnail

Barycenters in the Wasserstein space

2024-09-23

Barycenters in the Wasserstein space were studied by Agueh & Carlier in 2011. They define meaningful geometric mean for probability distribution, and are used in practice for instance for histogram transfer (Rabin et al. ’15) [ref]

Twitter Post

GitHub File

thumbnail

Tarski—Seidenberg: geometric statement

2024-09-20

Tarski—Seidenberg theorem claims that semialgebraic sets on 𝐑 are stable by projection. [ref]

Twitter Post

thumbnail

Semialgebraic set

2024-09-19

Semialgebraic sets can be expressed as finite union of polynomial equalities and inequalities. [ref]

Twitter Post

thumbnail

Banach fixed point & Picard’s iterations

2024-09-18

Banach fixed point theorem guarantees existence and uniqueness of fixed point of contraction mapping in metric spaces. It is at the root of the fixed point method.

Twitter Post

thumbnail

Dual and polar cones

2024-09-17

Dual cone and polar cone are sets in the dual of a vector space. They generalize the notion of orthogonal subspace.

Twitter Post

thumbnail

Voronoi diagram

2024-09-16

Voronoi diagram partitions the space in a proximity graph (cells) according to some sites. Each cell is the portion of the space where every points are closer to a given site than any other sites. [ref]

Twitter Post

GitHub File

thumbnail

Hamming distance

2024-09-13

Hamming distance is fundamental in coding theory: it corresponds to the number of different characters between two words of the same length. [ref]

Twitter Post

thumbnail

The rendering equation

2024-09-12

The rendering equation is the fundamental recursive integral equation of light transport proposed by Kajiya in 1986, at the heart of (ray) path-tracing and particle tracing. Numerous sampling methods have been proposed to solve it numerically. [ref]

Twitter Post

thumbnail

Random graph models

2024-09-11

Latent position models encompass popular random models of graphs such as SBM, Erdos-Renyi, etc. They are based on a latent space and a connectivity kernel, and strongly linked to graphons in the dense case. [ref]

Twitter Post

thumbnail

Clarke Generalized derivative

2024-09-10

Clarke derivative generalizes the notion of convex subdifferential to nonconvex locally Lipschitz functions — thanks to Rademacher’s theorem — as the convex hull of the limiting sequences of gradient converging. [ref]

Twitter Post

thumbnail

Complex step approximation

2024-09-09

Complex step approximation is a numerical method to approximate the derivative from a single function evaluation using complex arithmetic. It is some kind of “poor man” automatic differentiation. [ref]

Twitter Post

thumbnail

Stochastic Block Model

2024-09-06

Stochastic block model is a popular generative model for graph proposed by Holland-Laskey-Leinhardt in 1983 to model communities. [ref]

Twitter Post

GitHub File

thumbnail

Thomae's popcorn & Dirichlet functions

2024-09-05

Thomae’s popcorn function is a (1-periodic) discontinuous function on ℚ but continuous on ℝ\ℚ that is nowhere differentiable. Dirichlet is nowhere continuous on ℝ. [ref]

Twitter Post

GitHub File

thumbnail

Weisfeiler-Leman test

2024-09-04

(1-dimensional) Weisfeiler-Leman is a heuristic for the graph isomorphism problem. It is a color refinement algorithm akin to a basic message passing scheme. [ref]

Twitter Post

thumbnail

Talagrand's inequality

2024-09-03

Talagrand's inequality is a probabilistic isoperimetric inequality that allows to derive a concentration inequality for the median. This is an instance of "concentration of measure" that made him win the Abel Prize in 2024. [ref]

Twitter Post

thumbnail

Representation of closed set

2024-09-02

Every closed subset of 𝐑ⁿ is the zero set of a infinitely differentiable function from 𝐑ⁿ to 𝐑 (due to Whitney)

Twitter Post

thumbnail