Deep Learning and Optimization Seminar


Organizing Committee:

Contact & Discussion: You are welcome to join our Google group and our Slack.


Coming Up

Date (yyyy-mm-dd hh:mm) Presenter Topic or Paper Materials


Past Events

Date Presenter Topic or Paper Materials
2024-01-04 16:00-17:00 Frederik Kunstner, UBC Why does Adam work so well (especially on Transformers): it's not because of stochasticity abstract, paper, video.
2023-12-07 10:00-11:00 Xinran GU, Tsinghua University Understanding and Improving Local Gradient Methods for Distributed Deep Learning abstract, video.
2023-11-22 15:00-16:00 Jiayuan YE, NUS Initialization Matters: Privacy-Utility Analysis of Overparameterized Neural Networks abstract, paper, video.
2023-11-14 19:00-20:00 Maksym ANDRIUSHCHENKO, EPFL Why Do We Need Weight Decay in Modern Deep Learning abstract, paper, video.
2023-11-10 13:00-14:00 Yao FU, University of Edinburgh On Compression Theory for LLMs abstract, video.
2023-11-09 10:00-11:00 Hao LIU, UC Berkeley Large context window with blockwise parallel transformers and ring attention abstract, paper1, paper2, video.
2023-11-01 15:00-16:00 Tianyu PANG, Sea AI Lab On Evaluating Adversarial Robustness of Large Vision-Language Models abstract, slides, paper1, paper2, paper3, video.
2023-10-24 11:00-12:00 Xiaoqiang LIN, NUS and Zhaoxuan WU, NUS Use Your INSTINCT: INSTruction optimization usIng Neural bandits Coupled with Transformers abstract, paper, website, video.
2023-10-18 13:00-14:00 Dacheng LI, UC Berkeley Sequence level partition for long-context LLMs training and automatic parallelism abstract, paper1, paper2, video.
2023-07-20 10:00-11:00 Libin ZHU, UCSD Spikes in the training loss of SGD, catapults and feature learning abstract, slides, video.
2023-07-17 16:00-17:00 Fanghui LIU, EPFL The role of over-parameterization in machine learning - a function space perspective abstract, slides, video.
2023-07-11 19:00-20:00 Tongtian ZHU, ZJU Decentralize to Generalize? On the Asymptotic Equivalence of Decentralized SGD and Average-direction SAM abstract, slides, video, paper.
2023-06-28 16:00-17:00 Francesco CROCE, University of Tübingen How to Quickly Obtain Models Robust to Multiple and Different Threats, and Their Advantages abstract, video.
2023-06-19 15:00-16:00 Binhang YUAN, HKUST Accommodating LLM Training over Decentralized Computational Resources abstract, slides, video.
2023-06-07 19:00-20:00 Bohang ZHANG, Peking University Rethinking the expressive power of gnns via graph biconnectivity abstract, slides, video, paper.
2023-05-31 15:30-16:30 Jiaxin SHI, Google Deepmind Sequence Modeling with Multiresolution Convolutional Memory abstract, paper.
2023-04-28 09:00-10:00 Ziming LIU, MIT Physics of deep learning: Understanding grokking via the lens of physics abstract, slides.
2023-04-24 19:30-20:30 Dingfan CHEN, CISPA Privacy-preserving Generative Modeling abstract, slides.
2023-04-21 16:00-17:00 Jialin LIU,
DAMO Academy (US)
Towards Constituting Mathematical Structures for Learning to Optimize abstract, slides, paper.
2023-04-12 17:00-18:00 Maksym ANDRIUSHCHENKO, EPFL SGD with large step sizes learns sparse features abstract, video, paper.
2022-12-01 16:00-17:30 Ligeng ZHU, MIT Algorithm-System Co-Design for TinyML. abstract, paper, slides, website, demo, code.