The electronic band structure and crystal structure are the two complementary identifiers of solid-state materials. Although convenient instruments and reconstruction algorithms have made large, empirical, crystal structure databases possible, extracting the quasiparticle dispersion (closely related to band structure) from photoemission band mapping data is currently limited by the available computational methods. To cope with the growing size and scale of photoemission data, here we develop a pipeline including probabilistic machine learning and the associated data processing, optimization, and evaluation methods for band-structure reconstruction, leveraging theoretical calculations. The pipeline reconstructs all 14 valence bands of a semiconductor and shows excellent performance on benchmarks and other materials datasets. The reconstruction uncovers previously inaccessible momentum-space structural information on both global and local scales while realizing a path toward integration with materials science databases. Our approach illustrates the potential of combining machine learning and domain knowledge for scalable feature extraction in multidimensional data.
Full story: Xian et al., Nature Comp. Sci., advanced online publication.