Nhat Ho

Coming Soon! 

Assistant Professor
Department of Statistics and Data Sciences
The University of Texas at Austin

Other affiliations:
Core member, Machine Learning Laboratory
Senior personnel, Institute for Foundations of Machine Learning (IFML)

Email: minhnhat@utexas.edu
Office: WEL 5.242, 105 E 24th Street Austin, TX 78712

Brief Biography


I am currently an Assistant Professor of Statistics and Data Sciences at the University of Texas at Austin. I am also a core member of the Machine Learning Laboratory and senior personnel of the Institute for Foundations of Machine Learning (IFML). Before going to Austin, I was a postdoctoral fellow in the Electrical Engineering and Computer Science (EECS) Department where I am very fortunate to be mentored by Professor Michael I. Jordan and Professor Martin J. Wainwright. Going further back in time, I finished my Phd degree in 2017 at the Department of Statistics, University of Michigan, Ann Arbor where I am very fortunate to be advised by Professor Long Nguyen and Professor Ya'acov Ritov.

Research Interests


A central theme of my research focuses on four important aspects of complex and large-scale models and data:

  • (1) Heterogeneity of complex data, including Mixture and Hierarchical Models, Bayesian Nonparametrics, etc.;

  • (2) Interpretability, Efficiency, Scalability, and Robustness of deep learning and complex machine learning models, including Transformer architectures, Deep Generative Models, Convolutional Neural Networks, etc.;

  • (3) Scalability of Optimal Transport for machine learning and deep learning applications;

  • (4) Stability and Optimality of optimization and sampling algorithms for solving complex statistical machine learning models.

--- For the first aspect (1), an example of our works includes the statistical and geometric behaviors of latent variables in sparse and high dimensional mixture and hierarchical model via tools from optimal transport, quantization theory, and algebraic geometry. For example, we demonstrate that the convergence rates of the maximum likelihood estimation for finite mixture models and input-independent gating mixture of experts are determined by the solvability of some system of the polynomial equations, which is one of the key problems in algebraic geometry (M.1, M.2, M.3, M.4). These theories with the convergence rates of the MLE also lead to a novel model selection procedure (M.5) in finite and infinite mixture models.

Recently, we provided a comprehensive theory to long-standing open problem about parameter and expert estimation in softmax gating Gaussian mixture of experts (M.6), a class of conditional mixtures being widely used in machine learning and deep learning to scale up large-scale neural networks architectures. Our theory relies on defining novel Voronoi-based losses among parameters, which can precisely capture the intrinsic interaction (via partial differential equations with respect to model parameters) between the softmax gating function and the expert functions. In a subsequent work (M.7), we also established general theories for softmax gating mixture of experts with least-square loss function. Furtherore, we also carried these insight into understanding several other important variants of softmax gating mixture of experts that have been currently used in scaling up Transformer and Large Language Model, including top-K sparse mixture of experts (M.8), dense-to-sparse (equivalently temperature softmax) mixture of experts (M.9) and being used in other machine learning tasks (M.10, M.11).

From the methodology and application sides, we recently developed novel effective training of sparse mixture of experts via competition for scaling up large-scale AI models (M.12) or utilized mixture of experts with Laplace gating function for Transformer, a recent state-of-the-art deep learning archiecture for language and computer vision applications, to develop large multimodal model for multimodal data appearing in eletronic health records (M.13)

--- For the second aspect (2), we utilize insight from statistical machine learning modeling and theories, Hamilton-Jacobi partial differential equation (PDE) to understand deep learning and complex machine learning models. Examples of our works include using mixture and hierarchical models (T.1, T.2) to improve the redundancy in Transformer or interpreting Transformer using primal-dual frameworks from support vector regression (T.3). Furthermore, we also utilize Fourier Integral Theorem and its generalized version (T.4), a beautiful result in mathematics, to improve the interpretability and performance of Transformer. The Fourier Integral Theorem is also used in our other works to build estimators in other machine learning and statistics applications (T.5). Finally, we also develop a Bayesian deconvolution model (T.6) to understand Convolutional Neural Networks or provide a complete theory for neural collapse phenomenon in deep linear neural network (T.7).

--- For the third aspect (3), we focus on improving the scalability and curse of dimensionality of optimal transport in deep learning applications, such as deep generative model, domain adaptation, etc. For the curse of dimensionality, we propose several new variants of sliced optimal transport (OT.1, OT.2, OT.3, OT.4, OT.5) to not only circumvent the curse of dimensionality of optimal transport but also improve the sampling scheme and training procedure of sliced optimal transport to select the most important directions. For the scalability, we propose new minibatch frameworks (OT.6, OT.7) to improve the misspecified matching issues of the current minibatch optimal transport in the literature. Furthermore, we also develop several optimization algorithms with near optimal computational complexities (OT.8, OT.9, OT.10) for approximating the optimal transport and its variants.

--- For the fourth aspect (4), we study the interplay and trade-off between the instability, statistical accuracy, and computational efficiency of optimization and sampling algorithms (O.1) for solving parameter estimation in statistical machine learning models. Based on these insights, we provide a rigorous statistical behaviors of the Expectation-Maximization (EM) (O.2, O.3) algorithm for solving mixture models and of the factorized gradient descent for solving a class of low-rank matrix factorization problems (O.4). Finally, in the recent work, we propose the exponential schedule for gradient descent (O.5) and demonstrate that this algorithm obtains the optimal linear computational complexity for solving parameter estimation in statistical machine learning models.

--- Apart from these topics, we also study Bayesian inference and asymptotics from new perspectives. For example, we utilize diffusion process to establish the posterior convergence rate of parameters in statistical models (E.1) or employ Fourier Integral Theorem to establish the posterior consistency of Bayesian nonparametric models (E.2).

Codes


The official Github link for codes of research papers from our Data Science and Machine Learning (DSML) Lab is: https://github.com/UT-Austin-Data-Science-Group.

Editorial Boards of Journals


Area Chairs of Conferences in Machine Learning and Artificial Intelligence


Media Coverage


Data Science, Machine Learning, Statistics, and Artifical Intelligence have become very important fields in Vietnam these days. However, as these fields are still very young in Vietnam, young Vietnamese generation often faces challenges to equip themselves with enough information, knowledge, and skills to pursue their career paths in these fields. For this reason, several leading newspapers and shows in Vietnam covered my path and story of becoming a professor in the leading US university as well as my opinion about these fields to inspire and provide necessary information to young generation in Vietnam that would like to pursue their careers in Data Science, Machine Learning, Statistics, and Artifical Intelligence, including:

Recent News


  • [02/2024] The paper " On the computational and statistical complexity of over-parameterized matrix sensing " , coauthored with Jiacheng Zhuo, Jeongyeol Kwon, Constantine Caramanis was accepted to Journal of Machine Learning Research (JMLR)

  • [02/2024] The paper " On integral theorems and their statistical properties " , coauthored with Stephen Walker was accepted to Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences

  • [02/2024] One paper (1) was accepted to Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  • [01/2024] I am glad to serve as an Area Chair of the International Conference on Machine Learning (ICML), 2024

  • [01/2024] Six papers (1, 2, 3, 4, 5, 6) were accepted to International Conference on Learning Representations (ICLR), 2024

  • [01/2024] Two papers (1, 2) were accepted to International Conference on Artificial Intelligence and Statistics (AISTATS), 2024

  • [12/2023] The paper " A diffusion process perspective on the posterior contraction rates for parameters ", coauthored with Wenlong Mou, Martin J. Wainwright, Peter L. Bartlett, Michael I. Jordan was accepted to SIAM Journal on Mathematics of Data Science (SIMODS)

  • [09/2023] Six papers (1, 2, 3, 4, 5, 6) were accepted to Conference on Neural Information Processing Systems (NeurIPS), 2023

  • [08/2023] I am glad to serve as an Area Chair of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2024

  • [04/2023] Four papers (1, 2, 3, 4) were accepted to International Conference on Machine Learning (ICML), 2023

  • [04/2023] The paper " A Bayesian perspective of convolutional neural networks through a deconvolutional generative model " , coauthored with Tan Nguyen, Ankit Patel, Richard Baraniuk, Anima Anandkumar, Michael I. Jordan was accepted to Journal of Machine Learning Research (JMLR)

  • [01/2023] I am glad to serve as an Associate Editor of the Electronic Journal of Statistics , a top journal in Statistics and Data Science

  • [01/2023] Three papers (1, 2, 3) are accepted to International Conference on Learning Representations (ICLR) and International Conference on Artificial Intelligence and Statistics (AISTATS)

  • [11/2022] The paper " Instability, computational efficiency, and statistical accuracy " , coauthored with Raaz Dwivedi, Koulik Khamaru, Martin J. Wainwright, Michael I. Jordan, and Bin Yu was accepted to Journal of Machine Learning Research (JMLR) subject to minor revision

  • [09/2022] Six papers (1, 2, 3, 4, 5, 6) were accepted to Conference on Neural Information Processing Systems (NeurIPS), 2022

  • [09/2022] The paper " Bayesian consistency with the supremum metric", coauthored with Stephen Walker, was accepted to Statistica Sinica

  • [08/2022] I am glad to serve as an Area Chair of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2023

  • [08/2022] The paper " Convergence rates for Gaussian mixtures of experts " , coauthored with Chiao-Yu Yang and Michael I. Jordan, was accepted to Journal of Machine Learning Research (JMLR)

  • [05/2022] Six papers (1, 2, 3, 4, 5, 6) were accepted to International Conference on Machine Learning (ICML), 2022

  • [05/2022] The paper " On the efficiency of entropic regularized algorithms for optimal transport " ,coauthored with Tianyi Lin and Michael I. Jordan, was accepted to Journal of Machine Learning Research (JMLR), 2022

  • [02/2022] The paper " On the complexity of approximating multi-marginal optimal transport" ,coauthored with Tianyi Lin, Marco Cuturi, and Michael I. Jordan, was accepted to Journal of Machine Learning Research (JMLR), 2022

  • [01/2022] Four papers (1, 2, 3, 4) were accepted to International Conference on Artificial Intelligence and Statistics (AISTATS), 2022

Selected Publications on Theory (Hierarchical and Mixture Models, Bayesian Nonparametrics, Optimal Transport, Deep Learning, (Approximate) Bayesian Inference, (Non)-Convex Optimization, etc.)

(* = equal contribution )
(** = alphabetical order )
( = co-last author )


Selected Publications on Method and Application (Optimal Transport, Transformer, Deep Generative Models, 3D Deep Learning, Convolutional Neural Networks, etc.)