Nhat Ho

Assistant Professor
Department of Statistics and Data Sciences
The University of Texas at Austin

Other affiliations:
Core member, Machine Learning Laboratory
Senior personnel, Institute for Foundations of Machine Learning (IFML)

Email: minhnhat@utexas.edu
Office: WEL 5.242, 105 E 24th Street Austin, TX 78712

Brief Biography

I am currently an Assistant Professor of Statistics and Data Sciences at the University of Texas at Austin. I am also a core member of the Machine Learning Laboratory and senior personnel of the Institute for Foundations of Machine Learning (IFML). Before going to Austin, I was a postdoctoral fellow in the Electrical Engineering and Computer Science (EECS) Department where I am very fortunate to be mentored by Professor Michael I. Jordan and Professor Martin J. Wainwright. Going further back in time, I finished my Phd degree in 2017 at the Department of Statistics, University of Michigan, Ann Arbor where I am very fortunate to be advised by Professor Long Nguyen and Professor Ya'acov Ritov.

Research Interests

A central theme of my research focuses on four important aspects of complex and large-scale models and data:

(1) Heterogeneity of complex data, including Mixture and Hierarchical Models, Bayesian Nonparametrics, etc.;
(2) Interpretability, Efficiency, Scalability, and Robustness of deep learning and complex machine learning models, including Transformer architectures, Deep Generative Models, Convolutional Neural Networks, etc.;
(3) Scalability and Efficiency of Optimal Transport for machine learning and deep learning applications;
(4) Stability, Optimality, and Robustness of optimization and sampling algorithms for solving complex statistical machine learning models.

--- For the first aspect (1), an example of our works includes the statistical and geometric behaviors of latent variables in sparse and high dimensional mixture and hierarchical model via tools from optimal transport, quantization theory, and algebraic geometry. For example, we demonstrate that the convergence rates of the maximum likelihood estimation for finite mixture models and input-independent gating mixture of experts are determined by the solvability of some system of the polynomial equations, which is one of the key problems in algebraic geometry (M.1, M.2, M.3, M.4). These theories with the convergence rates of the MLE also lead to a novel model selection procedure (M.5) in finite and infinite mixture models.

Recently, we provided a comprehensive theory to long-standing open problem about parameter and expert estimation in softmax gating Gaussian mixture of experts (M.6), a class of conditional mixtures being widely used in machine learning and deep learning to scale up large-scale neural networks architectures. Our theory relies on defining novel Voronoi-based losses among parameters, which can precisely capture the intrinsic interaction (via partial differential equations with respect to model parameters) between the softmax gating function and the expert functions. In a subsequent work (M.7), we also established general theories for softmax gating mixture of experts with least-square loss function. Furtherore, we also carried these insight into understanding several other important variants of softmax gating mixture of experts that have been currently used in scaling up Transformer and Large Language Model, including top-K sparse mixture of experts (M.8), dense-to-sparse (equivalently temperature softmax) mixture of experts (M.9) and being used in other machine learning tasks (M.10, M.11).

From the methodology and application sides, we recently developed novel effective training of sparse mixture of experts via competition for scaling up large-scale AI models (M.12) or utilized mixture of experts with Laplace gating function for Transformer, a recent state-of-the-art deep learning archiecture for language and computer vision applications, to develop large multimodal model for multimodal data appearing in eletronic health records (M.13).

--- For the second aspect (2), we utilize insight from statistical machine learning modeling and theories, Hamilton-Jacobi partial differential equation (PDE) to understand deep learning and complex machine learning models. Examples of our works include using mixture and hierarchical models (T.1, T.2) to improve the redundancy in Transformer or interpreting Transformer using primal-dual frameworks from support vector regression (T.3). Furthermore, we also utilize Fourier Integral Theorem and its generalized version (T.4), a beautiful result in mathematics, to improve the interpretability and performance of Transformer. The Fourier Integral Theorem is also used in our other works to build estimators in other machine learning and statistics applications (T.5). Finally, we also develop a Bayesian deconvolution model (T.6) to understand Convolutional Neural Networks or provide a complete theory for neural collapse phenomenon in deep linear neural network (T.7).

Recently, we also established guarantees for several interesting phenomena in training deep learning models, including Neural Collapse (T.8, T.9), and Posterior Collapse (T.10).

--- For the third aspect (3), we focus on improving the scalability, efficiency, and curse of dimensionality of optimal transport in deep learning applications, such as deep generative model, domain adaptation, etc. For the efficiency and curse of dimensionality, we propose several new variants of sliced optimal transport (OT.1, OT.2, OT.3, OT.4, OT.5) to not only circumvent the curse of dimensionality of optimal transport but also improve the sampling scheme and training procedure of sliced optimal transport to select the most important directions. For the scalability, we propose new minibatch frameworks (OT.6, OT.7) to improve the misspecified matching issues of the current minibatch optimal transport in the literature. Furthermore, we also develop several optimization algorithms with near optimal computational complexities (OT.8, OT.9, OT.10) for approximating the optimal transport and its variants.

From the application sides, we proposed using (sliced) optimal transport and its variants for building large pretrained models for medical images (OT.11), for audio-text retrieval (OT.12), for cortical surface reconstruction (OT.13), for molecular property prediction (OT.14), and for shape correspondence learning (OT.15).

--- For the fourth aspect (4), we study the interplay and trade-off between the instability, statistical accuracy, and computational efficiency of optimization and sampling algorithms (O.1) for solving parameter estimation in statistical machine learning models. Based on these insights, we provide a rigorous statistical behaviors of the Expectation-Maximization (EM) (O.2, O.3) algorithm for solving mixture models and of the factorized gradient descent for solving a class of low-rank matrix factorization problems (O.4). Finally, in the recent work, we propose the exponential schedule for gradient descent (O.5) and demonstrate that this algorithm obtains the optimal linear computational complexity for solving parameter estimation in statistical machine learning models.

Recently, we proposed a novel robust criterion for distributionally robust optimization by combining insights from Bayesian nonparametric (e.g., Dirichlet Process) theory and recent decision-theoretic models of smooth ambiguity-averse preferences (O.6).

--- Apart from these topics, we also study Bayesian inference and asymptotics from new perspectives. For example, we utilize diffusion process to establish the posterior convergence rate of parameters in statistical models (E.1) or employ Fourier Integral Theorem to establish the posterior consistency of Bayesian nonparametric models (E.2).

Codes

The official Github link for codes of research papers from our Data Science and Machine Learning (DSML) Lab is: https://github.com/UT-Austin-Data-Science-Group.

Editorial Boards of Journals

Electronic Journal of Statistics

Area Chairs of Conferences in Machine Learning and Artificial Intelligence

International Conference on Machine Learning (ICML)

International Conference on Artificial Intelligence and Statistics (AISTATS)

Media Coverage

Data Science, Machine Learning, Statistics, and Artifical Intelligence have become very important fields in Vietnam these days. However, as these fields are still very young in Vietnam, young Vietnamese generation often faces challenges to equip themselves with enough information, knowledge, and skills to pursue their career paths in these fields. For this reason, several leading newspapers and shows in Vietnam covered my path and story of becoming a professor in the leading US university as well as my opinion about these fields to inspire and provide necessary information to young generation in Vietnam that would like to pursue their careers in Data Science, Machine Learning, Statistics, and Artifical Intelligence, including:

Thanh Niên Newspaper (in Vietnamese): Giáo sư 34 tuổi giúp người trẻ học tiến sĩ tại các trường hàng đầu thế giới
Thanh Niên Newspaper (in Vietnamese): Ngành nghề của tương lai: Nhiều cơ hội việc làm về khoa học dữ liệu
Sài Gòn Giải Phóng Newspaper (in Vietnamese): Báo Xuân Giáp Thìn "Những Cánh Thiên Di của Trí Tuệ Việt"; Outstanding Vietnamese conquering pinnacle of knowledge overseas
Thanh Niên Newspaper (in Vietnamese): Việc làm lĩnh vực trí tuệ nhân tạo: Cơ hội việc làm rộng mở
Tuổi Trẻ Newspaper (in Vietnamese): Muốn trở thành kỹ sư trí tuệ nhân tạo cần học tốt những môn nào?
Thanh Niên Newspaper (in Vietnamese): Trang bị kỹ năng sử dụng AI để có thêm nhiều cơ hội trong cuộc sống; Muốn sử dụng AI, bắt đầu từ đâu?; Muốn sử dụng AI, bắt đầu từ đâu?: Những người trẻ không... lỗi nhịp; Muốn sử dụng AI, bắt đầu từ đâu?: Nhờ AI, khởi nghiệp đột phá
Tiền Phong Newspaper (in Vietnamese): Chàng trai Bạc Liêu chia sẻ hành trình trở thành giáo sư tại Mỹ về AI và Khoa học công nghệ
VnExpress Newspaper (in Vietnamese): Từ học sinh tỉnh lẻ thành giáo sư đại học Mỹ
Thanh Niên Newspaper (in Vietnamese): Người trẻ thiếu trầm trọng kỹ năng sống; Người trẻ thiếu trầm trọng kỹ năng sống: Nguy cơ trở thành tội phạm; Người trẻ thiếu trầm trọng kỹ năng sống: 5 cách để tự lấp đầy 'lỗ hổng'
Sài Gòn Giải Phóng Newspaper (in Vietnamese): Đi vào vùng chưa từng khám phá
Giáo Dục Việt Nam Newspaper (in Vietnamese): Bài học kinh nghiệm nhìn từ cách huy động nguồn lực tài chính ở một số ĐH của Mỹ
VTV1 Television Show "Cất Cánh" (in Vietnamese): Tiếng gọi quê hương
VTV1 Television Show "Toàn Cảnh Báo Xuân" (in Vietnamese): Tự hào Trí Tuệ Việt

Selected Publications on Theory (Hierarchical and Mixture Models, Bayesian Nonparametrics, Optimal Transport, Deep Learning, (Approximate) Bayesian Inference, (Non)-Convex Optimization, etc.)

(* = equal contribution )
(** = alphabetical order )
(† = co-last author )

Instability, computational efficiency, and statistical accuracy . Accepted with minor revision at Journal of Machine Learning Research (JMLR), 2023.
Nhat Ho*, Raaz Dwivedi*, Koulik Khamaru*, Martin J. Wainwright, Michael I. Jordan, Bin Yu.

Demystifying softmax gating in Gaussian mixture of experts . Advances in NeurIPS, 2023 (Spotlight).
Huy Nguyen, Tin Nguyen, Nhat Ho.

Is temperature sample efficient for softmax Gaussian mixture of experts? . Proceedings of the ICML, 2024.
Huy Nguyen, Pedram Akbarian, Nhat Ho.

Bayesian nonparametrics meets data-driven robust optimization . Under review.
Nicola Bariletto, Nhat Ho.

Sigmoid gating is more sample efficient than softmax gating in mixture of experts . Under review.
Huy Nguyen, Nhat Ho†, Alessandro Rinaldo†.

Borrowing strength in distributionally robust optimization via hierarchical Dirichlet processes . Under review.
Nicola Bariletto, Khai Nguyen, Nhat Ho.

Statistical advantages of perturbing cosine router in sparse mixture of experts . Under review.
Huy Nguyen, Pedram Akbarian*, Trang Pham*, Trang Nguyen*, Shujian Zhang Nhat Ho.

On least square estimation in softmax gating mixture of experts . Proceedings of the ICML, 2024.
Huy Nguyen, Nhat Ho†, Alessandro Rinaldo†.

Statistical perspective of top-K sparse softmax gating mixture of experts . ICLR, 2024.
Huy Nguyen, Pedram Akbarian, Fanqi Yan, Nhat Ho.

Multivariate smoothing via the Fourier integral theorem and Fourier kernel . Under revision.
Nhat Ho**, Stephen G. Walker**.

A diffusion process perspective on the posterior contraction rates for parameters. SIAM Journal on Mathematics of Data Science (SIMODS), 2023.
Wenlong Mou, Nhat Ho, Martin J. Wainwright, Peter L. Bartlett, Michael I. Jordan.

Towards convergence rates for parameter estimation in Gaussian-gated mixture of experts . AISTATS, 2024.
Huy Nguyen, Tin Nguyen, Khai Nguyen, Nhat Ho.

Minimax optimal rate for parameter estimation in multivariate deviated models. Advances in NeurIPS, 2023.
Dat Do*, Huy Nguyen*, Khai Nguyen, Nhat Ho.

Neural collapse for cross-entropy class-imbalanced learning with unconstrained ReLU feature model . Proceedings of the ICML, 2024.
Hien Dang, Tho Tran, Tan Nguyen†, Nhat Ho†.

A general theory for softmax gating multinomial logistic mixture of experts . Proceedings of the ICML, 2024.
Huy Nguyen, Pedram Akbarian, Tin Nguyen, Nhat Ho.

Neural collapse in deep linear network: from balanced to imbalanced data . Proceedings of the ICML, 2023.
Hien Dang*, Tho Tran*, Hung Tran, Tan Nguyen†, Nhat Ho†.

Beyond vanilla variational autoencoders: Detecting posterior collapse in conditional and hierarchical variational autoencoders . ICLR, 2024.
Hien Dang, Tho Tran, Tan Nguyen†, Nhat Ho†.

Bayesian consistency with the supremum metric . Statistica Sinica, 2022.
Nhat Ho**, Stephen G. Walker**.

An exponentially increasing step-size for parameter estimation in statistical models. Under review.
Nhat Ho**, Tongzheng Ren**, Purnamrita Sarkar**, Sujay Sanghavi**, Rachel Ward**.

On integral theorems and their statistical properties . Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2024.
Nhat Ho**, Stephen G. Walker**.

On excess mass behavior in Gaussian mixture models with Orlicz Wasserstein distances . Proceedings of the ICML, 2023.
Aritra Guha, Nhat Ho, Long Nguyen.

Beyond black box densities: Parameter learning for the deviated components. Advances in NeurIPS, 2022.
Dat Do*, Nhat Ho*, Long Nguyen.

Refined convergence rates for maximum likelihood estimation under finite mixture models. Proceedings of the ICML, 2022 (Long Presentation).
Tudor Manole, Nhat Ho.

Entropic Gromov-Wasserstein between Gaussian distributions. Proceedings of the ICML, 2022.
Khang Le*, Dung Le*, Huy Nguyen*, Dat Do, Tung Pham, Nhat Ho.

Towards statistical and computational complexities of Polyak step size gradient descent . AISTATS, 2022.
Tongzheng Ren*, Fuheng Cui*, Alexia Atsidakou*, Sujay Sanghavi, Nhat Ho.

On the minimax optimality of the EM algorithm for learning two-component mixed linear regression. AISTATS, 2021.
Jeong Y. Kwon, Nhat Ho, Constantine Caramanis.

Beyond EM algorithm on over-specified two-component location-scale Gaussian mixtures. Under review.
Tongzheng Ren*, Fuheng Cui*, Sujay Sanghavi, Nhat Ho.

On the efficiency of entropic regularized algorithms for optimal transport . Journal of Machine Learning Research (JMLR), 2022.
Tianyi Lin, Nhat Ho, Michael I. Jordan.

Convergence rates for Gaussian mixtures of experts. Journal of Machine Learning Research (JMLR), 2022.
Nhat Ho, Chiao-Yu Yang, Michael I. Jordan.

On the computational and statistical complexity of over-parameterized matrix sensing . Journal of Machine Learning Research (JMLR), 2024.
Jiacheng Zhuo, Jeongyeol Kwon, Nhat Ho, Constantine Caramanis.

Projection robust Wasserstein distance and Riemannian optimization . Advances in NeurIPS, 2020 (Spotlight).
Tianyi Lin*, Chenyou Fan*, Nhat Ho, Marco Cuturi, Michael I. Jordan.

Improving computational complexity in statistical models with second-order information . Proceedings of the ICML, 2024.
Tongzheng Ren, Jiacheng Zhuo, Sujay Sanghavi, Nhat Ho.

On posterior contraction of parameters and interpretability in Bayesian mixture modeling . Bernoulli 27 (4), 2159-2188, 2021.
Aritra Guha, Nhat Ho, XuanLong Nguyen.

Singularity, misspecification, and the convergence rate of EM. Annals of Statistics, 48(6), 3161-3182, 2020.
Raaz Dwivedi*, Nhat Ho*, Koulik Khamaru*, Martin J. Wainwright, Michael I. Jordan, Bin Yu.

Singularity structures and impacts on parameter estimation behavior in finite mixtures of distributions. SIAM Journal on Mathematics of Data Science (SIMODS), 1(4), 730–758, 2019.
Nhat Ho and XuanLong Nguyen.

Convergence rates of parameter estimation for some weakly identifiable finite mixtures. Annals of Statistics, 44(6), 2726-2755, 2016.
Nhat Ho and XuanLong Nguyen

Selected Publications on Method and Application (Optimal Transport, Transformer, Deep Generative Models, 3D Deep Learning, Convolutional Neural Networks, etc.)

A Bayesian perspective of convolutional neural networks through a deconvolutional generative model. Journal of Machine Learning Research (JMLR), 2023.
Nhat Ho*, Tan Nguyen*, Ankit Patel, Anima Anandkumar, Michael I. Jordan, Richard Baraniuk.

Quasi-Monte Carlo for 3D sliced Wasserstein . ICLR, 2024 (Spotlight).
Khai Nguyen, Nicola Bariletto, Nhat Ho.

Sliced Wasserstein estimation with control variates . ICLR, 2024.
Khai Nguyen, Nhat Ho.

Sliced Wasserstein with random-path projecting directions . Proceedings of the ICML, 2024.
Khai Nguyen, Shujian Zhang, Tam Le, Nhat Ho.

FuseMoE: Mixture-of-experts Transformers for fleximodal fusion . Under review.
Xing Han, Huy Nguyen, Carl William Harris, Nhat Ho†, Suchi Saria†.

Hierarchical hybrid sliced Wasserstein: A scalable metric for heterogeneous joint distributions . Under review.
Khai Nguyen, Nhat Ho.

CompeteSMoE - Effective training of sparse mixture of experts via competition . Under review.
Quang Pham, Truong Giang Do, Huy Nguyen, TrungTin Nguyen, Chenghao Liu, Mina Sartipi, Binh T. Nguyen, Savitha Ramasamy, Xiaoli Li, Steven Hoi†, Nhat Ho†.

Mixture of experts meets prompt-based continual learning . Under review.
Minh Le, An Nguyen*, Huy Nguyen*, Trang Nguyen*, Trang Pham*, Linh Van Ngo, Nhat Ho.

Marginal fairness sliced Wasserstein barycenter . Under review.
Khai Nguyen, Hai Nguyen, Nhat Ho.

Backdoor attack in prompt-based continual learning . Under review.
Trang Nguyen, Anh Tran, Nhat Ho.

Structure-aware E(3)-invariant molecular conformer Aggregation Networks . Proceedings of the ICML, 2024.
Duy Minh Ho Nguyen, Nina Lukashina, Tai Nguyen, An Thai Le, TrungTin Nguyen, Nhat Ho, Jan Peters, Daniel Sonntag, Viktor Zaverkin, Mathias Niepert.

Integrating efficient optimal transport and functional maps For unsupervised shape correspondence learning. Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
Thanh Tung Le, Khai Nguyen, shanlin sun, Nhat Ho, Xiaohui Xie.

Energy-based sliced Wasserstein distance . Advances in NeurIPS, 2023.
Khai Nguyen, Nhat Ho.

Markovian sliced Wasserstein distances: Beyond independent projections . Advances in NeurIPS, 2023.
Khai Nguyen, Tongzheng Ren, Nhat Ho.

Revisiting projected Wasserstein metric on images: from vectorization to convolution. Advances in NeurIPS, 2022.
Khai Nguyen, Nhat Ho.

FourierFormer: Transformer meets generalized Fourier integral attentions . Advances in NeurIPS, 2022.
Tan Nguyen*, Minh Pham*, Tam Nguyen, Khai Nguyen, Stanley Osher, Nhat Ho.

Improving Transformers with probabilistic attention keys . ICML, 2022.
Tam Nguyen*, Tan Nguyen*, Dung Le, Khuong Nguyen, Anh Tran, Richard Baraniuk, Nhat Ho†, Stanley Osher†.

Amortized projection optimization for sliced Wasserstein generative models. Advances in NeurIPS, 2022.
Khai Nguyen, Nhat Ho.

Hierarchical sliced Wasserstein distance. ICLR, 2023.
Khai Nguyen, Tongzheng Ren, Huy Nguyen, Litu Rout, Tan Nguyen, Nhat Ho.

Designing robust transformers using robust kernel density estimation. Advances in NeurIPS, 2023.
Xing Han, Tongzheng Ren, Tan Minh Nguyen, Khai Nguyen, Joydeep Ghosh, Nhat Ho.

A primal-dual framework for transformers and neural networks. ICLR, 2023 (Spotlight).
Tan Minh Nguyen, Tam Minh Nguyen, Nhat Ho, Andrea L. Bertozzi, Richard Baraniuk, Stanley Osher.

Self-attention amortized distributional projection optimization for sliced Wasserstein point-clouds reconstruction . Proceedings of the ICML, 2023.
Khai Nguyen*, Dang Nguyen*, Nhat Ho.

Revisiting over-smoothing and over-squashing using Ollivier's Ricci curvature . Proceedings of the ICML, 2023.
Khang Nguyen, Tan Nguyen, Nong Hieu, Vinh Nguyen, Nhat Ho†, Stanley Osher†.

Improving Transformer with an admixture of attention heads . Advances in NeurIPS, 2022.
Tam Nguyen*, Tan Nguyen*, Hai Do, Khai Nguyen, Vishwanath Saragadam, Minh Pham, Khuong Nguyen,Stanley Osher†, Nhat Ho†.

Generative models from the multivariate Fourier integral theorem . Under review.
Nhat Ho**, Stephen G. Walker**.

Probabilistic best subset selection via gradient-based optimization. Under review.
Mingzhang Yin, Nhat Ho, Bowei Yan, Xiaoning Qian, Mingyuan Zhou.