
Nhat Ho
Brief BiographyI am currently an Assistant Professor of Statistics and Data Sciences at the University of Texas at Austin. I am also a core member of the Machine Learning Laboratory and senior personnel of the Institute for Foundations of Machine Learning (IFML). Before going to Austin, I was a postdoctoral fellow in the Electrical Engineering and Computer Science (EECS) Department where I am very fortunate to be mentored by Professor Michael I. Jordan and Professor Martin J. Wainwright. Going further back in time, I finished my Phd degree in 2017 at the Department of Statistics, University of Michigan, Ann Arbor where I am very fortunate to be advised by Professor Long Nguyen and Professor Ya'acov Ritov. Research InterestsA central theme of my research focuses on four important aspects of complex and largescale models and data:
For the first aspect (1), we utilize insight from statistical machine learning modeling and theories, HamiltonJacobi partial differential equation (PDE) to understand deep learning and complex machine learning models. An example of our works (T.1, T.2) includes using mixture and hierarchical models to improve the redundancy in Transformer, a recent stateoftheart deep learning archiecture for language and computer vision applications. Furthermore, we also utilize Fourier Integral Theorem and its generalized version (T.3), a beautiful result in mathematics, to improve the interpretability and performance of Transformer. The Fourier Integral Theorem is also used in our other works to build estimators in other machine learning and statistics applications (T.4). Finally, we also develop a Bayesian deconvolution model (T.5) to understand Convolutional Neural Networks. For the second aspect (2), we focus on improving the scalability and curse of dimensionality of optimal transport in deep learning applications, such as deep generative model, domain adaptation, etc. For the curse of dimensionality, we propose several new variants of sliced optimal transport (OT.1, OT.2) to not only circumvent the curse of dimensionality of optimal transport but also improve the sampling scheme and training procedure of sliced optimal transport to select the most important directions. For the scalability, we propose new minibatch frameworks (OT.3, OT.4) to improve the misspecified matching issues of the current minibatch optimal transport in the literature. Furthermore, we also develop several optimization algorithms with near optimal computational complexities (OT.5, OT.6, OT.7) for approximating the optimal transport and its variants. For the third aspect (3), we study the interplay and tradeoff between the instability, statistical accuracy, and computational efficiency of optimization and sampling algorithms (O.1) for solving parameter estimation in statistical machine learning models. Based on these insights, we provide a rigorous statistical behaviors of the ExpectationMaximization (EM) (O.2, O.3) algorithm for solving mixture models and of the factorized gradient descent for solving a class of lowrank matrix factorization problems (O.4). Finally, in the recent work, we propose the exponential schedule for gradient descent (O.5) and demonstrate that this algorithm obtains the optimal linear computational complexity for solving parameter estimation in statistical machine learning models. For the fourth aspect (4), an example of our works includes the statistical and geometric behaviors of latent variables in mixture and hierarchical model via tools from optimal transport, quantization theory, and algebraic geometry. For example, we demonstrate that the convergence rates of the maximum likelihood estimation for finite mixture models and finite mixture of experts are determined by the solvability of some system of the polynomial equations, which is one of the key problems in algebraic geometry (M.1, M.2, M.3, M.4). These theories with the convergence rates of the MLE also lead to a novel model selection procedure (M.5) in finite and infinite mixture models. Apart from these topics, we also study Bayesian inference and asymptotics from new perspectives. For example, we utilize diffusion process to establish the posterior convergence rate of parameters in statistical models (E.1) or employ Fourier Integral Theorem to establish the posterior consistency of Bayesian nonparametric models (E.2). CodesThe official Github link for codes of research papers from our Data Science and Machine Learning (DSML) Lab is: https://github.com/UTAustinDataScienceGroup. Editorial Boards of JournalsArea Chairs of Conferences in Machine LearningRecent News
Selected Publications on Theory (Hierarchical and Mixture Models, (Non)Convex Optimization, Optimal Transport, (Approximate) Bayesian Inference, etc.)
Selected Publications on Method and Application (Generative Models, Optimal Transport, Transformer, Convolutional Neural Networks, etc.)
