
Quantifying Knowledge Evolution With Thermodynamics: A Data-Driven Study of Scientific Concepts
We develop a data-driven framework for analyzing how scientific concepts evolve through their empirical in-text frequency distributions in large text corpora. For each concept, the observed distribution is paired with a maximum entropy equilibrium reference, which takes a generalized Boltzmann form determined by two measurable statistical moments. Using data from more than 500,000 physics papers (about 13,000 concepts, 2000–2018), we reconstruct the temporal trajectories of the associated MaxEnt parameters and entropy measures, and we identify two characteristic regimes of concept dynamics, stable and driven, separated by a transition point near criticality. Departures from equilibrium are quantified using a residual-information measure that captures how much structure a concept exhibits beyond its equilibrium baseline. To analyze temporal change, we adapt the Hatano–Sasa and Esposito–Van den Broeck decomposition to discrete time and separate maintenance-like contributions from externally driven reorganization. The proposed efficiency indicators describe how concepts sustain or reorganize their informational structure under a finite representational capacity. Together, these elements provide a unified and empirically grounded description of concept evolution in scientific communication, based on equilibrium references, nonequilibrium structure, and informational work.





