Course Information

Course Title: Deep Learning for Computer Vision: Seminal Work

Course ID: 263-5904-00S

Lecturers: Dr. Zuria Bauer

Teaching Assistants: Alexander Veicht, Philipp Lindenberger, Paper Supervisors

Venue: Mo 16-18h, CAB G 57

Course Description :

This seminar covers seminal papers on the topic of deep learning for computer vision. The students will present and discuss the papers and gain an understanding of the most influential research in this area - both past and present. The objectives of this seminar are two-fold. Firstly, the aim is to provide a solid understanding of key contributions to the field of deep learning for vision (including a historical perspective as well as recent work). Secondly, the students will learn to critically read and analyse original research papers and judge their impact, as well as how to give a scientific presentation and lead a discussion on their topic.

Each student chooses one paper from the provided collection to present during the course of the seminar. The students will be supported in the preparation of their presentation by the seminar assistants.

Important Information

Recommended Textbooks for Reference
Image Processing
  • Class Material:
  • Class material will be posted on Moodle. We will also use Moodle to submit assignments.
  • Assignments:
  • Each student will present one paper. Everybody is encouraged to read each paper before it is being presented and engage in a discussion following the presentations. To foster interesting discussions, each paper will also be assigned to a "critic" who studies the paper and shortly presents a summary of the paper's weaknesses.
  • Grading:
  • Each student will be graded based on their presentation (70%) and their participation in the assigned discussions (20%). There is a small participation grade (10%) for those that ask questions in papers even if they are not assigned to them.
  • Attendance:
  • Attendance is required to pass the course (3 absences allowed).
  • The class is held in person, except if otherwise stated.

Paper List:

Paper Title Supervisor
ImageNet Classification with Deep Convolutional Neural Networks
Deep residual learning for image recognition
Attention Is All You Need
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Dropout: a simple way to prevent neural networks from overfitting
RoFormer: Enhanced Transformer with Rotary Position Embedding
U-Net: Convolutional Networks for Biomedical Image Segmentation
Vision Transformers for Dense Prediction
What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?
Deep Image Prior
Emerging Properties in Self-Supervised Vision Transformers (DINO)
Learning Transferable Visual Models From Natural Language Supervision
You only look once: Unified, real-time object detection
DUSt3R: Geometric 3D Vision Made Easy
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
Denoising Diffusion Probabilistic Models
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
3D Gaussian Splatting for Real-Time Radiance Field Rendering
SuperPoint: Self-Supervised Interest Point Detection and Description
NetVLAD: CNN architecture for weakly supervised place recognition
OpenVLA: An Open-Source Vision-Language-Action Model
Visual Instruction Tuning (LLaVA)