KAIST GSAI Spring 2022

AI599: Special Topics in Machine Learning : Deep Learning and Real-world Applications

Deep learning is now an integral part of daily systems and tools people use, and therefore no longer is a concern of only academic research. You will get the front-row experience on practical issues in research and development of deep learning systems from the leading experts and researchers. Major course activities include:

Reading Response: You'll read and discuss important papers and articles in the field. Each week, there will be 1-2 reading assignments, for which you'll write a short response to.
Topic Presentation: Once a semester, you'll lead the class by summarizing the readings, and spurring the in-class discussion.
In-class Activities: Each class will feature activities that will help you understand core concepts introduced in the course.

Course Staff

Instructors:
    Prof. Minsuk Chang
    Prof. Dongyoon Han
    Prof. Sangwoo Lee

TAs:
     Sunghyun Park
     Dongmin Choi

Staff Mailing List:
     dl_ai599@navercorp.com
     note: this is a group email address that includes the instructors and the TAs.

Time & Location

When: 10:30am-13:15pm Fri
Where:

Links

Course Website: https://ai599.github.io/spring-2022/
Submission & Grading: KLMS
Discussion Forum: TBD

Updates

3/18: The lecture slide for each class will be provided before the class at KLMS (If there are no restricted contents), and the recorded clips for each class are currently unavailable.
3/4: First day of class!
3/3: Extra enrollment is closed ~~- but spaces might open up if others unregister. If you want to be waitlisted, please fill in this survey. We will enroll you first come first serve once spaces open up.~~
3/2: You may "audit" or "sit in" this class, but you still have to submit reading responses and actively participate in class activities. If you're interested, please send an email to dl_ai599@navercorp.com .
3/2: We are accepting extra enrollments, but spaces are limited to total of 46 students. If you're interested in taking this class, please send an email to dl_ai599@navercorp.com and fill in this survey. Current headcount: 46/46
2/28: Welcome to the deep learning and real-world applications class! We're still finalizing the schedule and the reading list. Stay tuned!

Schedule

Week	Date	Topic	Invited Speaker	Reading (response indicates a reading response is required for one of the two articles.)
1	3/4	Introduction & Course Overview + AI research in industry	하정우	please read the updated course syllabus, and please ask any questions you might have.
2	3/11	Representation learning in computer vision Session 1: Learning representations with evolved model architectures	한동윤	(1) response 1 Tan, Mingxing, and Quoc Le. "EfficientNetV2: Smaller Models and Faster Training.", ICML 2021 (2) response 1 Liu, Ze, et al. "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows.", ICCV 2021 Recommended reading Han, Dongyoon, et al. "Rethinking Channel Dimensions for Efficient Model Design.", CVPR 2021 Tolstikhin, Ilya O., et al. "MLP-Mixer: An all-MLP Architecture for Vision.", NeuIPS 2021 Dosovitskiy, Alexey, et al. "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.", ICLR 2021 Heo, Byeongho, et al. "Rethinking Spatial Dimensions of Vision Transformers.", ICCV 2021 Khan, Salman, et al. "Transformers in Vision: A Survey.", ACM Computing Surveys 2021
2	3/11	Representation learning in computer vision Session 2: Practical scenarios and applications in computer vision	유영준	(1) response 2 An, Xiang, et al. "Partial FC: Training 10 Million Identities on a Single Machine.", ICCV 2021 (2) response 2 Sculley, David, et al. "Hidden technical debt in machine learning systems.", NeurIPS 2015 Recommended reading Guo, Yandong, et al. "MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition.", ECCV 2016 Zhu, Zheng, et al. "WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition.", CVPR 2021
3	3/18	Towards reliable machine learning Session 1: Definition and real examples of shortcut learning	전상혁	(1) response 1 Brendel, et al. "Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet.", ICLR 2019 (2) response 1 Geirhos, et al. "ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness.", ICLR 2019 Recommended reading Geirhos, et al. "Shortcut Learning in Deep Neural Networks.", Nature Machine Intelligence 2020. Scimeca, et al. "Which Shortcut Cues Will DNNs Choose? A Study from the Parameter-Space Perspective.", ICLR 2022. de Vries, Terrance, et al. "Does object recognition work for everyone?." CVPR Workshops. 2019.
3	3/18	Towards reliable machine learning Session 2: Attempts to mitigate shortcut learning	전상혁	(1) response 2 Madry, et al. "Towards Deep Learning Models Resistant to Adversarial Attacks.", ICLR 2018 (2) response 2 Ganin, et al. "Domain-Adversarial Training of Neural Networks.", JMLR 2016 Recommended reading Bahng, Hyojin, et al. "Learning De-biased Representations with Biased Representations", ICML 2020 Nam, Junhyun, et al. "Learning from Failure: Training Debiased Classifier from Biased Classifier", NeurIPS 2020 Cha, Junbum, et al. "SWAD: Domain Generalization by Seeking Flat Minima", NeurIPS 2021
4	3/25	Multimodal representation learning Session 1: Multimodal deep learning	김진화	(1) response 1 Kim, Jin-Hwa, et al. "Bilinear Attention Networks.", NeurIPS 2018 (2) response 1 Anderson, Peter, et al. "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering.", CVPR 2018 Recommended reading Ngiam, Jiquan, et al. "Multimodal Deep Learning.", ICML 2011 Goyal, et al. "Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering.", CVPR 2017 Hudson, Drew A., and Christopher D. Manning. "GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering.", CVPR 2019
4	3/25	Multimodal representation learning Session 2: Vision-and-Language Pre-training	김원재	(1) response 2 Lu, Jiasen, et al. "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks.", NeurIPS 2019 (2) response 2 Kim, Wonjae, et al. "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision.", ICML 2021 Recommended reading Chen, Yen-Chun, et al. "UNITER: UNiversal Image-TExt Representation Learning.", ECCV 2020 Singh, Amanpreet, et al. "FLAVA: A Foundational Language And Vision Alignment Model.", arXiv 2021 Li, Junnan, et al. "BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation.", arXiv 2022
5	4/1	Noisy Labeling + Practical scenarios and applications in computer vision	송환준	(1) response 1 Han, Bo, et al. "Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels.", NeurIPS 2018 (2) response 1 Li, Junnan, et al. "DivideMix: Learning with Noisy Labels as Semi-supervised Learning.", ICLR 2020 Recommended reading Zhang, Chiyuan, et al. "Understanding deep learning requires rethinking generalization.", ICLR 2017 Jiang, Lu, et al. "MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels.", ICML 2018 Song, Hwanjun, et al. "SELFIE: Refurbishing unclean samples for robust deep learning.", ICML 2019 Song, Hwanjun, et al. "Robust Learning by Self-Transition for Handling Noisy Labels.", KDD 2021 Song, Hwanjun, et al. "Learning from Noisy Labels with Deep Neural Networks: A Survey.", TNNLS 2022
5	4/1	Noisy Labeling + Practical scenarios and applications in computer vision	위동윤	(1) response 2 Feichtenhofer, Christoph, et al. "SlowFast Networks for Video Recognition.", ICCV 2019 (2) response 2 Wang, Xiaolong, et al. "Non-local Neural Networks.", CVPR 2018 Recommended reading Carreira, Joao and Zisserman, Andrew. "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset.", CVPR 2017 Cu, Chunhui, et al. "AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions.", CVPR 2018 Kim, Jinhyung, et al. "Regularization on Spatio-Temporally Smoothed Feature for Action Recognition.", CVPR 2020
6	4/8	Practical scenarios and applications in computer vision	백영민	(1) response 1 Kittenplon, Yair, et al. "Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer.", arXiv 2022 (2) response 1 Baek, Youngmin, et al. "Character region awareness for text detection.", CVPR 2019 Recommended reading Baek, Jeonghun, et al. "What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis.", ICCV 2019 Baek, Youngmin, et al. "Character Region Attention For Text Spotting", ECCV 2020
6	4/8	Practical scenarios and applications in computer vision	이바도	(1) response 2 Cha, Junbum, et al. "Few-shot Compositional Font Generation with Dual Memory.", ECCV 2020 (2) response 2 Park, Song, et al. "Few-shot Font Generation with Localized Style Representations and Factorization.", AAAI 2021 Recommended reading Park, Song, et al. "Multiple Heads are Better than One:Few-shot Font Generation with Multiple Localized Experts.", ICCV 2021
7	4/15	Generative models	김윤지	(1) response 1 Ji, Xu, et al. "Invariant Information Clustering for Unsupervised Image Classification and Segmentation.", ICCV 2019 (2) response 1 Van, Gansbeke, et al. "SCAN: Learning to Classify Images without Labels.", ECCV 2020 Recommended reading Chen, Xi, et al. "InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets.", NeurIPS 2016 Krishna, Kumar, Singh, et al. "FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery.", CVPR 2019 Kim, Yunji and Ha, Jung-Woo. "Contrastive Fine-grained Class Clustering via Generative Adversarial Networks.", ICLR 2022
7	4/15	Generative models	김준호	(1) response 2 Kang, Minguk and Park, Jaesik. "ContraGAN: Contrastive Learning for Conditional Image Generation." NeurIPS 2020 (2) response 2 Liu, Bingchen, et al. "Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis." ICLR 2021 Recommended reading Zhao, Long, et al. "Improved Transformer for High-Resolution GANs.", ICLR 2021 Karras, Tero, et al. "Analyzing and Improving the Image Quality of StyleGAN.", CVPR 2020 Zhang, Han, et al. "Consistency Regularization for Generative Adversarial Networks.", ICLR 2020 Kim, Junho, et al. "Feature Statistics Mixing Regularization for Generative Adversarial Networks.", arXiv 2021
8	4/22	No Class (Midterm exams)
9	4/29	Voice synthesis and applications	송은우	(1) response 1 Shen, Jonathan, et al. "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions.", ICASSP 2018 (2) response 1 Ren, Yi, et al. "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.", ICLR 2021 Recommended reading Wang, Yuxuan, et al. "Tacotron: Towards End-to-End Speech Synthesis.", Interspeech 2017 Li, Naihan, et al. "Neural Speech Synthesis with Transformer Network.", AAAI 2019 Yi, Ren, et al. "FastSpeech: Fast, Robust and Controllable Text to Speech.", NeurIPS 2019
9	4/29	Voice synthesis and applications	황민제	(1) response 2 Kumar, Kundan, et al. "MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis.", NeurIPS 2019 (2) response 2 Yamamoto, Ryuichi, et al. "Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram.", ICASSP 2020 Recommended reading Oord, Aaron van den, et al. "WaveNet: A Generative Model for Raw Audio.", arXiv 2016 Oord, Aaron van den, et al. "Parallel WaveNet: Fast High-Fidelity Speech Synthesis.", ICML 2018 Kong, Jungil, et al. "HiFi-GAN- Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis.", NeurIPS 2020
10	5/6	Speech recognition and applications	김한규	(1) response 1 Hsu, Wei-Ning, et al. "HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units.", IEEE/ACM Transactions on Audio, Speech, and Language Processing 2021 (2) response 1 Chung, Yu-An, et al. "W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training.", arXiv 2021 Recommended reading Baevski, Alexei, et al. "wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations.", NeurIPS 2020 Zoph, Barret, et al. "Self-training and Pre-training are Complementary for Speech Recognition.", NeurIPS 2020 Zhang, Yu, et al. "Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition.", NeurIPS 2020 Workshop
10	5/6	Speech recognition and applications	정남규	(1) response 2 Culati, Anmol, et al. "Conformer: Convolution-augmented Transformer for Speech Recognition.", Interspeech 2020 (2) response 2 Han, Wei, et al. "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context.", Interspeech 2020 Recommended reading Graves, Alex, et al. "Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks.", ICML 2006 Amodei, Dario, et al. "Deep Speech 2: End-to-End Speech Recognition in English and Mandarin.", ICML 2016 Graves, Alex. "Sequence Transduction with Recurrent Neural Networks.", ICML 2012 Workshop
11	5/13	AutoML and Practical MLOps	김지훈	(1) response 1 Real, Esteban, et al. "AutoML-Zero: Evolving Machine Learning Algorithms From Scratch.", ICML 2020 (2) response 1 Falkner, Stefan, et al. "BOHB: Robust and Efficient Hyperparameter Optimization at Scale.", ICML 2018 Recommended reading Neural Architecture Search: Liu, Hanxiao, et al. "DARTS: Differentiable Architecture Search.", ICLR 2019 Dong, XuanYi and Yang, Yi. "NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search.", ICLR 2020 Hyperparameter Optimization: Li, Lisha, et al. "Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization.", JLMR 2018 Jaderberg, Max, et al. "Population Based Training of Neural Networks.", arXiv 2017
11	5/13	AutoML and Practical MLOps	서동필	No reading this week
12	5/20	NLP, Dialogues, and QA	이상우	(1) response 1 Devlin, et al. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.", NAACL 2019. (2) response 1 Raffel, et al. "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.", JMLR 2020. Recommended reading Radford, Alec, et al. "Language Models are Unsupervised Multitask Learners.", OpenAI 2019 Yoo, Kangmin, et al. "GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation.", ACL Findings 2021 Kim, Sungdong, et al. "NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-Based Simulation.", , ACL 2021
12	5/20	NLP, Dialogues, and QA	김성동	(1) response 2 Roller, Stephen, et al. "Recipes for building an open-domain chatbot.", EACL 2021 (2) response 2 Lewis, Patrick, et al. "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks", NeurIPS 2020 Recommended reading Izacard and Grave. "Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering.", EACL 2021 Shuster, Kurt, et al. "Retrieval Augmentation Reduces Hallucination in Conversation.", EMNLP Findings 2021 Xu, Jing, et al. "Beyond Goldfish Memory: Long-Term Open-Domain Conversation.", arXiv 2021 Borgeaud, Sebastian, et al. "Improving language models by retrieving from trillions of tokens.", arXiv 2021 Sungdong, Kim and Gangwoo, Kim. "Saving Dense Retriever from Shortcut Dependency in Conversational Search.", arXiv 2022
13	5/27	Hyperscale LM & NLP applications	이기창	(1) response 1 Brown, et al. "Language Models are Few-Shot Learners.", NeurIPS 2021 (2) response 1 Rae, et al. "Scaling Language Models: Methods, Analysis & Insights from Training Gopher.", arXiv 2021. Recommended reading Smith, et al. "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model.", arXiv 2022 Tay, et al., "Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers.", ICLR 2022 Kim, Boseop, et al. "What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers.", EMNLP 2021
13	5/27	Hyperscale LM & NLP applications	유강민	(1) response 2 Lester, Brian, et al. "The Power of Scale for Parameter-Efficient Prompt Tuning.", EMNLP 2021 (2) response 2 Li, Xiang Lisa, and Percy, Liang. "Prefix-Tuning: Optimizing Continuous Prompts for Generation.", arXiv 2021 Recommended reading He, Junxian, et al. "Towards a Unified View of Parameter-Efficient Transfer Learning.", ICLR 2022 J. Hu, Edward, et al. "LoRA: Low-Rank Adaptation of Large Language Models.", arXiv 2021 Schick, Timo and Schütze, Hinrich "It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners.", NAACL 2021 Ouyang, Long, et al. "Training language models to follow instructions with human feedback (InstructGPT), OpenAI Blog 2022
14	6/3	Human-centric NLP	이화란	(1) response 1 Dinan, Emily, et al. "Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation.", EMNLP 2020 (2) response 1 Perez, Ethan, et al. "Red Teaming Language Models with Language Models.", arXiv 2022. Recommended reading Bender, Emily M., et al. "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?🦜.", ACM Conference on Fairness, Accountability, and Transparency 2021 Liu, Haochen, et al. "Does Gender Matter? Towards Fairness in Dialogue Systems.", COLING 2020 Liu, Haochen, et al. "Mitigating Gender Bias for Neural Dialogue Generation with Adversarial Learning.", EMNLP 2020 Sheng, Emily, et al. "“Nice Try, Kiddo”: Investigating Ad Hominems in Dialogue Responses.", NAACL 2021 Ma, Xinyao, et al. "PowerTransformer: Unsupervised Controllable Revision for Biased Language Correction.", EMNLP 2020 Xu, Albert, et al. "Detoxifying Language Models Risks Marginalizing Minority Voices.", NAACL 2021 OpenAI. "WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing (WebGPT).",ArXiv 2021
14	6/3	Human-centric NLP	정준영, 이민아	(1) response 2 Chung, JJY, et al. "TaleBrush: Sketching Stories with Generative Pretrained Language Models.", CHI 2022 (2) response 2 Lee, Mina, Percy Liang, and Qian Yang. "CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities.", CHI 2022 Recommended reading Clark, Elizabeth, et al. "Creative writing with a machine in the loop: Case studies on slogans and stories.", IUI 2018 Singh, Nikhil, et al. "Where to Hide a Stolen Elephant: Leaps in Creative Writing with Multimodal Machine Intelligence.", ToCHI 2022 Krause, Ben, et al. "Gedi: Generative discriminator guided sequence generation.", EMNLP findings 2021 Qian, Jing et al. "Controllable Natural Language Generation with Contrastive Prefixes.", arXiv 2022 Buschek, Daniel, Martin Zürn, and Malin Eiband. "The impact of multiple parallel phrase suggestions on email input and composition behaviour of native and non-native english writers.", CHI 2021 Calderwood, Alex, et al. "How Novelists Use Generative Language Models: An Exploratory User Study.", HAI-GEN+ user2agent@ IUI. 2020.
15	6/10	Large-scale user modeling and its applications	곽하녹	(1) response 1 Shin, et al. "Scaling Law for Recommendation Models: Towards General-purpose User Representations", arXiv 2021 (2) response 1 Shin, et al. "One4all user representation for recommender systems in e-commerce", arXiv 2021
15	6/10	Large-scale user modeling and its applications	정지수	(1) response 2 Hsieh, et al. "Collaborative Metric Learning.", WWW 2017 (2) response 2 Kim, Boseop, et al. "What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers.", EMNLP 2021 Recommended reading Tran, et al. "Hierarchical Latent Relation Modeling for Collaborative Metric Learning.", RecSys 2021 OpenAI. "GPT-3 Powers the Next Generation of Apps.", OpenAI Blog 2021 Eric, Verduzco. "Best GPT-3 Tools, Examples and Use Cases.", 2021
16	6/17	No Class (Final exams)

Topics (tentative)

Major topics include:

Representation Learning
Reliable ML
Voice and Speech
NLP
MLOps
Recommendation systems

Grading

Attendance: 20%
Reading responses: 40%
Topic presentation: 20%
Class participation: 10%
Quizes: 10%

Late policy: Three lowest reading response grades will be removed. Each quiz score will be normalized identical to the score of 5 questions asked in the lessons. No late submissions are allowed for the reading responses.

Prerequisites

There are no official course prerequisites. But assignments involve a lot of reading, research experience in machine learning is useful, but not required.