**E27: Computer Vision** **ENGR 27/CPSC 72** **Spring 2025** **[Matt Zucker](http://mzucker.github.io/swarthmore)** | Lecture: | Tue/Thu 11:20 AM-12:35 PM, Singer 346 | |-----------------|--------------------------------------------| | Office/Lab Hours: | Fri 10:30-noon and by appointment | | Wizard session: | TBA | | Discussion forum: | | # Overview This class is about applying mathematical theory to endow computers with the ability to understand and interact with the real world through physical imaging sensors. Although there will be plenty of programming, coding is not the main focus. **If you are looking for a class about programming particular APIs (e.g. OpenCV, TensorFlow), you may be disappointed!** The course is divided into three broad areas of investigation: * *Appearance based methods (~4.5 weeks)* including filtering, morphological operators, convolutions, frequency domain methods, edge and feature detection, correlation, and template tracking. * *3D geometry based methods (~4.5 weeks)* including multiple view geometry, stereo and structured light, and structure from motion. * *Probabilistic and learning based (~5 weeks)* methods such as classification, object recognition, and clustering. # Resources * Textbook: [Richard Szeliski, *Computer Vision: Algorithms and Applications, 2nd Edition* Springer 2010-22.](http://szeliski.org/Book/) * [OpenCV Reference manual](https://docs.opencv.org/master/) * [Numpy Reference manual](https://numpy.org/doc/stable/) * [Linear algebra refresher](../linalg-reintroduction.pdf) * [Python/numpy tutorial from Stanford's CS 231n](https://cs231n.github.io/python-numpy-tutorial/) # Requirements **Prerequisites:** Either [ENGR 019](https://catalog.swarthmore.edu/preview_course.php?catoid=30&coid=97706) or [ENGR 021](https://catalog.swarthmore.edu/preview_course.php?catoid=30&coid=100069) or permission of the instructor. MATH 027 or MATH 028 is recommended. **Skills:** In practice, I expect you to be conversant in elementary programming concepts in Python, including basic loops, functions, and array processing. You should also be comfortable with the process of converting a set of mathematical equations into a working Python program. I also expect you to be comfortable with [linear algebra concepts](../linalg-reintroduction.pdf) such as solving linear systems, matrix inverses, rank and eigenvalues/vectors. We will also be using related geometric concepts such as the dot product and vector norms as well as rotations and translations. **Time:** I expect students to spend approximately 8 hours per week on this class (4 classes × 8 hours per class + [8 hours for paid student work](https://www.swarthmore.edu/student-employment/employment-faqs) = 40 hours). Although this figure will vary from individual to individual and week to week, you should plan to commit several hours outside of class to homework, reading, and projects each week. **Discussion forum**: We will use [this EdStem forum](https://edstem.org/us/courses/74585/discussion/) throughout the semester to communicate course announcements and answer questions. Please use Ed (instead of just emailing me) for all course-related communications -- this allows students to see common problems and to engage in discussions about course material. **Wizards**: The course Wizards are Quentin Adolphe and Hojune Kim. The class will have a weekly Wizard session on Sundays from 7-9 PM in Singer 221. # Assignments Homework consisting of math, short answer questions, and small programming exercises will be assigned roughly weekly. Typically, homework will be assigned on Tuesday, and be due online before the start of class the following Tuesday. We will also have regular quizzes to be completed each week. Each quiz will generally be based on the homework due the previous week, and potentially course material prior to that as well. There will be several regularly scheduled projects as well as an open-ended final project. Ideally, projects will be completed by pairs of students, but I will allow a larger group if we have an odd number of students. Grading will follow approximately the divisions shown below: * Homeworks: 35% * Weekly quizzes: 20% * Regular projects: 35% * Final project: 10% Homework is graded on a ✓+, ✓, ✓- scale. A grade of ✓+ indicates no notable errors, ✓ indicates a sound understanding of course material, and ✓- indicates that you should spend more time reviewing the material on the assignment, attend office hours, and/or attend Wizard sessions to shore up your understanding. Each student may miss one homework or one quiz with no penalty; if a student completes all homeworks or quizzes, I will drop the lowest grade in that category. Although we will not have a final exam, I may administer one final quiz during the exam period. ## Collaboration and attribution * Feel free to collaborate with your classmates on homework; however, you must submit your own work. Duplicating others’ assignments verbatim (especially code!) is prohibited. * If you do discuss homework with your classmates, I expect you to disclose any such collaboration clearly in your submitted work. Err on the side of caution – it’s the best way to avoid awkward conversations about suspicious similarities between assignments. * Cite any external sources used, including the textbook, internet, discussions with other professors, etc. * Do not consult ChatGPT or any other generative AI systems to complete any graded work for this course. * Collaboration or communication with other students about quizzes is prohibited, as is consulting outside resources beyond the textbook, your class notes, and the course website. * Aside from raising technical and procedural questions on the course discussion forum, do not collaborate on projects with others besides your partner. * Do not post homework, quiz, or project solutions online. Questions or answers that discuss solutions too closely will be deleted. Aside from the course-specific policies above, you are expected to understand and abide by the college's [policy on academic misconduct](https://www.swarthmore.edu/student-handbook/academic-policies#academic_misconduct). ## Late policy, extensions, etc. Homework will generally be assigned on Tuesday, and due at the start of class the following Tuesday. Homework assignments may be turned in up to a week late for half credit. Students get one free late homework turn-in without penalty. Quizzes will be graded under the same guidelines (half-credit for up to one week late, first late quiz not penalized). Projects may be turned in up to a week late for half credit. There are no penalty-free late project submissions. I will be liberal in granting extensions for homeworks and projects *when requested in advance*; however, extensions for quizzes will only be granted for exceptional circumstances. Extensions for assignments are much less likely to be granted retroactively. The more notice and communication I have from you about your academic needs, the more help I can provide! # Accommodations If you believe you need accommodations for a disability or a chronic medical condition, please contact Student Disability Services (Parrish 113W) via [email](mailto:studentdisabilityservices@swarthmore.edu) to arrange an appointment to discuss your needs. As appropriate, the Office will issue students with documented disabilities or medical conditions a formal Accommodations Letter. Since accommodations require early planning and are not retroactive, please contact Student Disability Services as soon as possible. For details about the accommodations process, visit the Student Disability Services website. You are also welcome to contact me privately to discuss your academic needs. However, all disability-related accommodations must be arranged, in advance, through Student Disability Services. ***Even outside the context of accommodations for disabilities, if there is something I can do to facilitate your learning, please do not hesitate to contact me.*** # Software You will need an up-to-date installation of Python 3 along with OpenCV 4. It's available through multiple distribution channels. See [this document](install_opencv.html) for detailed installation instructions. OpenCV is also installed on the CS cluster machines. # Schedule !!! Warning Please note that topics and dates are subject to change. Check this page frequently for updates, assignments, readings, etc. Jan 21, 2025: Introdution Topics: * Introduction * Syllabus review * Image formation Reading/resources: * Chapter 1 * Sections 2.1, 2.3 * [Linear algebra basics](../linalg-reintroduction.pdf) * [Installing OpenCV](install_opencv.html) Jan 23, 2025: Background subtraction Topics: * Image representations * Thresholding & color segmentation Reading/resources: * Sections 3.1, 3.2, 3.3 Jan 28, 2025: Morphological operators and filtering Topics: * Morphological operators * Adaptive thresholding Reading/resources: * Section 3.2 Jan 30, 2025: Filtering & image derivatives Topics: * Convolution * Cross-correlation * Image gradients Reading/resources: * Sections 3.2, 7.2 Assignments: * [Project 1](project1.html) (Jan 30, 2025): P1: Background subtraction Feb 4, 2025: Edge detection Topics: * Canny edge detector Reading/resources: * Section 7.2 Feb 6, 2025: Frequency domain Topics: * Fourier transform Reading/resources: * Section 3.4, 3.5 Feb 11, 2025: Frequency domain, cont'd. Topics: * Fourier transform Feb 13, 2025: Feature detection & description Topics: * Feature detection * Feature matching * Random sample consensus (RANSAC) Reading/resources: * Section 7.1 * [FAST: Rosten et al. 2006](../papers/rosten2006fast.pdf) * [BRIEF: Calonder et al. 2010](../papers/calonder2010brief.pdf) * [ORB: Rublee et al. 2011](../papers/rublee2011orb.pdf) * [Whiteboard from today](feature_detectors_and_descriptors.pdf) Feb 18, 2025: Template matching & template tracking Topics: * Squared distances * Normalized cross-correlation * Review: ordinary least squares * Kanade-Lucas-Tomasi tracker Reading/resources: * Section 9.1 (Feb 18, 2025): P2: Pyramid blending & hybrid images Feb 20, 2025: Projective algebra fundamentals Topics: * Points in 2D * Lines in 2D * Homogeneous coordinates Reading/resources: * Sections 2.1, 3.6.1 * [Application: unprojecting text](https://mzucker.github.io/2016/10/11/unprojecting-text-with-ellipses.html) Feb 25, 2025: Homographies and homogeneous least squares Topics: * Homographies * Homogeneous least squares Feb 27, 2025: 3D geometric fundamentals Topics: * Homography recap Reading/resources: * Section 11.1 Mar 4, 2025: Geometric image formation Topics: * Image formation redux * Intrinsic & extrinsic parameters Mar 6, 2025: Camera calibration Topics: * Triangulation * Camera calibration * Algebraic vs geometric error Reading/resources: * Section 11.1 (Mar 6, 2025): P3: Image stitching with homographies (Mar 11, 2025): Spring break (Mar 13, 2025): Spring break Mar 18, 2025: 3D imaging Topics: * Stereo vision * Structured light Reading/resources: * Chapter 12 * [Technical description of Kinect calibration](http://wiki.ros.org/kinect_calibration/technical) * [Lens distortion](https://www.edmundoptics.com/knowledge-center/application-notes/imaging/distortion/) Mar 20, 2025: Epipolar geometry Topics: * Essential matrix * Fundamental matrix Reading/resources: * Section 12.1 Mar 25, 2025: Structure from Motion Topics: * Structure from Motion * Affine cameras * Singular Value Decomposition (SVD) Reading/resources: * Chapter 11 Mar 27, 2025: Affine SFM Topics: * Affine SFM Apr 1, 2025: Face recognition / ML ethics Topics: * Face recognition * Ethics in machine learning Apr 3, 2025: Machine Learning basics Topics: * Binary classification * Linear classifiers/single-layer perceptron Reading/resources: * Section 5.1 Apr 8, 2025: Neural networks Topics: * Multi-layer perceptron Reading/resources: * Section 5.3 * MNIST dataset * [original website](http://yann.lecun.com/exdb/mnist/) * [on Wikipedia](https://en.wikipedia.org/wiki/MNIST_database) * [Neural network MNIST demo](https://github.swarthmore.edu/e27-spring2025/nnet_demo) Apr 10, 2025: Neural networks cont'd Topics: * Backpropagation of error Apr 15, 2025: Misc topics in neural nets / ML Topics: * Dataset practicalities * ML software packages: Keras * Nearest neighbor classification Reading/resources: * Section 5.2 Apr 17, 2025: Unsupervised learning & PCA Topics: * Unsupervised learning * Principal component analysis Apr 22, 2025: Clustering & *k*-means Topics: * Clustering & *k*-means * Visual bag of words Reading/resources: * [Varma & Zisserman paper](../papers/varma05textons.pdf) Apr 24, 2025: Deep learning Topics: * Modern activation functions * Convolutional networks & weight tying * Data augmentation * Dropout Reading/resources: * Sections 5.3, 5.4 Apr 29, 2025: Application: Resnet & GANs Topics: * Resnet * ResNet application - fruit fly classification * GANs Reading/resources: * Section 5.5 * Datasets: [CIFAR](https://www.cs.toronto.edu/~kriz/cifar.html), [ILSVRC2012](http://image-net.org/challenges/LSVRC/2012/analysis/#bbox_div), [ImageNet](http://www.image-net.org/challenges/LSVRC/index) * [VGG in Tensorflow](http://www.cs.toronto.edu/~frossard/post/vgg16/) * [*Deep Residual Learning for Image Recognition*, He et al. 2015](../papers/he2015resnet.pdf) * GANs: [Goodfellow et al. 2014](https://papers.nips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf), [2016](https://arxiv.org/pdf/1606.03498.pdf), [Isola et al. 2017](https://arxiv.org/pdf/1611.07004.pdf) * [pix2pix online examples](https://affinelayer.com/pixsrv/) May 1, 2025: Transfer learning & manifold learning Topics: * Transfer learning * FaceNet * Feature visualization Resources: * [Transfer learning in Keras](https://keras.io/guides/transfer_learning/) * [Schroff et al. 2015 - FaceNet](https://arxiv.org/abs/1503.03832) * [Deep dream video](https://vimeo.com/132700334) * [Olah et al. 2017 - Feature Visualization](https://distill.pub/2017/feature-visualization/) (May 2, 2025): Classes end