Computer Vision (CS-GY 6643)
Fall 2024
An important goal of signal processing and artificial intelligence (AI) is to equip computers with the capability of interpreting visual inputs. Computer vision is an area that deals with the construction of explicit, meaningful measurements and descriptions of physical objects from images. It includes many techniques from image processing, pattern recognition, geometric modeling, cognitive processing, and machine and deep learning. This course introduces students to the fundamental concepts and techniques in image processing and computer vision.
This is a graduate level course requiring working knowledge of linear algebra, data structures and proficiency in programming (python). Advanced undergraduates may enroll upon permission from instructor.
Course information:
Time: Thursdays 11am-1:30pm
Place: 2 MetroTech Center Room 801
Slack channel: nyucomputervision.slack.com
Course team:
Grading breakdown: Individual homework 18%, Individual programming projects 45%, In-class midterm 15%, Final group project 22%
Online Discussion: Preferred course communication will be via Slack, so please join our site from using this link: nyucomputervision.slack.com. All questions should also be posted to Slack (not sent via emails). We prefer that lectures or homework questions are asked publicly, since they will often help your classmates. Slack also supports private questions through direct messages for things relevant only to you.
Python and Jupyter: Demos and labs in this class use Python, run through Jupyter notebooks. Jupyter lets you create and edit documents with live Python code and rich comments and images. We suggest that students run their Jupyter notebooks via Google Colaboratory, and we will share them via Colab.
Assignments: Individual homework and programming projects must be turned in to Brightspace by the specified deadline (11:59pm of the due date). Programming projects should be turned in as evaluated Jupyter notebooks. Do not clear the output before turning in. For written problem sets, we encourage using LaTeX or Markdown (with math support.) You can use this template for LaTeX. While there is a learning curve, these tools typically save students time in the end! If you do write problems by hand, scan and upload as a PDF. Discussion is allowed on homework, but solutions and code must be written independently. See the syllabus for policies. We have a zero tolerance policy for copied code or solutions: any students with duplicate or very similar material will receive a zero on the offending assignment.
Late policy: Every hour that a project is late (rounded down) will cause 1% penalization of the total allotted grade. For example, a project or homework that is 11 hours 45 minutes late will have a maximum possible score of 89%.
Textbooks: Computer Vision (2nd edition) by Szeliski will accompany the lectures that we cover and specific chapters from this book will be mentioned under reading materials in the schedule below. Textbook is freely available digitally at https://szeliski.org/Book/download.php.
Tutorials: Linear algebra, Google Colab, Python
Schedule
Date | Topic | Material | Homework/Projects |
---|---|---|---|
September 5, 2024 |
Intro and survey of topics |
Lecture 1 slides, Python tutorial, Google Colab Setup | Homework 0 out: Join slack nyucomputervision.slack.com (Due September 11, 11:59pm) |
September 12, 2024 |
Image formation and filtering |
Lecture 2 slides, Colab for Lecture 2, Szeliski 2.2, 2.3, 3.1, 3.2 | Homework 1 out, Project 1 out |
September 19, 2024 |
Non-linear filtering and edge detection |
Lecture 3 slides, Colab for Lecture 3, Szeliski 3.3, 7.2 | Homework 1 due, September 19 10:59a |
September 26, 2024 |
Image recognition, Feature detection and matching |
Lecture 4 slides, Colab for Lecture 4, Szeliski 6,7 | Final project team selection due, September 22 10:59am |
October 3, 2024 |
Contour tracking and Hough transforms |
Lecture 5 slides, Colab for Lecture 5, Szeliski 7.3,7.4 | Project 1 due, October 4, 10:59am |
October 10, 2024 |
Image alignment |
Lecture 6 slides, Colab for Lecture 6, Szeliski 8.1,8.2 | Homework 2 out, Final project proposal due, October 11, 11:59pm |
October 17, 2024 |
Segmentation |
Lecture 7 slides, Colab for Lecture 7, Szeliski 7.5 | Homework 2 due, October 18, 11:59pm |
October 24, 2024 |
Midterm exam (in class) |
Midterm Exam, Solution | |
October 31, 2024 |
Machine learning, optimization | Lecture 8 slides, Colab for Lecture 8 | Project 2 out (Due Nov. 18, 11:59am) |
November 7, 2024 |
Convolutional Neural Nets |
Lecture 9 slides, Deep-STORM, 3D CNNs | Homework 3 out, Nov. 8 (Due Nov 21,10:59am) |
November 14, 2024 |
Learning based recognition |
Lecture 10 slides, GradCAM, GradCAM Colab | Project 2 due, November 18, 11:59am |
November 21, 2024
|
Motion models, depth estimation and optical flow |
Lecture 11 slides, RAFT Paper Presentation | Homework 3 due, November 21, 10:59am |
November 28, 2024 |
THANKSGIVING BREAK (No class) |
||
December 5, 2024 |
Learning based segmentation |
Lecture 12 slides | Project 3 due, December 6, 11:59pm |
December 12, 2024 |
Final project presentations |
Presentation Info | Video submission due December 12, 08:00am Write-up and code due December 19, 11:59pm |
Essential reads
Textbooks:
Szeliski, R. (2022). Computer vision: algorithms and applications. Springer Nature.
Milan Sonka, Vaclav Hlavac, and Roger Boyle, Image Processing, Analysis, and Machine Vision, 4th Ed, 2015
David A. Forsyth and J Ponce, Computer Vision: A Modern Approach, 2012
Papers
(Under construction)