Michal Valko : Graphs in Machine Learning - Fall 2019

Graphs in Machine Learning - Fall 2019 - MVA - ENS Paris-Saclay

News

  • 07.10.2019: This year we move to 100% Python for the labs (and discountinue Matlab and VM).
  • 01.10.2019: Piazza fixed - you can use any school email address together with the class code.
  • 30.09.2019: Check piazza to get a bonus grade!
  • 30.09.2019: Signup and communication managed via piazza
  • 27.09.2019: The course will be presented on Friday, 27. 9. 2019.
  • 27.09.2019: First class will start on 01. 10. 2019 at 13h30.
old news

Administrivia

  • Time: Wednesday afternoon
  • Place: ENS Paris-Saclay (different lecture halls)
  • 7 or 8 lectures and 3 recitations (TD)
  • Validation: grades from TD (40%) + class project (60%)
  • Research: projects, internships (stages) and PhD. thesis at SequeL and elsewhere possible
  • Piazza: Registration (with your school email) and online class discussion on piazza.
  • TA: Omar Darwiche Dominques
  • course description at MVA at ENS Paris-Saclay
  • MVA tags: content: #apprentissage, type: #méthodologique #théorique, validation: #projet #td

Main topics

  • spectral graph theory, graph Laplacians
  • semi-supervised graph-based learning
  • manifold learning
  • graphs from flat data - graph as a non-parametric basis
  • online learning with graphs
  • real world graphs scalability and approximations
  • graph neural networks (new in 2019)
  • social networks and recommender systems applications
  • large graph analysis, learning, and mining
  • vision applications (e.g., face recognition)

Important: Don't take this class ...

  • ... if you don't have time to do reports. While 5-10% students finish their 3 assignments during the 2-hour long recitations, about 20% students find that they spend 3 times longer time doing homework reports than for other classes in the master program.
  • ... if you expect your instructor to reply to your emails or not willing to read this webpage and other instructions. Piazza is the place for all communication.
  • ... if you believe that extra extensions beyond the rules below would be granted or if you cannot deliver the project report on time.

Intro

The graphs come handy whenever we deal with relations between the objects. This course, focused on learning, will present methods involving two main sources of graphs in ML: 1) graphs coming from networks, e.g., social, biological, technology, etc. and 2) graphs coming from flat (often vision) data, where a graph serves as a useful nonparametric basis and is an effective data representation for such tasks as spectral clustering, manifold or semi-supervised learning. We will also discuss online decision-making on graphs, suitable for recommender systems or online advertising. Finally, we will always discuss the scalability of all approaches and learn how to address huge graphs in practice. The lectures will show not only how but mostly why things work. The students will learn relevant topics from spectral graph theory, learning theory, bandit theory, graph neural networks, necessary mathematical concepts and the concrete graph-based approaches for typical machine learning problems. The practical sessions will provide hands-on experience on interesting applications (e.g., online face recognizer) and state-of-the-art graphs processing tools.

Organization

The course will feature 11 sessions, 8 lectures and 3 recitations (TD), each of them 2 hours long. There may be a special session with guest lectures. There may be also an extra homework with extra credit. The evaluation is be based on reports from TD and from the projects. Several project topics will be proposed but the students will be able to come up with their own and they will be able to work in groups of 2-3 people. The best reference for this course are the slides from the lecture which are made to be comprehensive and there is no recommended textbook. The material we cover is mostly based on research papers, some of which very recent. The course will be in English.

Recitations and homeworks (TDs)

Bring your own laptop to the practical sessions. Each of the 3 practical sessions are followed by a graded report. The assignments are posted on piazza. You are welcome to discuss with your peers (in which case indicate the people you have discussed with in your report), but the reports should be written individually to avoid a penalty.

Class projects

The main part the of the grade comes from the projects. The students are encouraged to come up with the topic of their interest related to the course and start working on it as soon as possible. In this case, please e-mail the lecturer with a short description of the project for the approval. Some project proposals will be given. Additional project proposals will be presented on 5.11.2019. Deadline for deciding on the project is 26.11.2019, but the recommended date for picking up the projects is on 19.11.2019. The deadline for submitting 5-10 page reports in NeurIPS format is 07.01.2020. The planned time for 15+5 minutes will be from 11. 1. 2020 over Skype/Hangout. Students can work in pairs of 2 and exceptionally 3. Very detailed instructions are given on the dedicated page for the class projects.

Registration, Communication, and Questions

We will be using piazza for the enrollment and online class discussion. Use your full name and your school e-mail when registering. The access code will be given out during the class. Piazza is the place of questions regarding lectures, homeworks, and logistics. Posting questions to piazza makes the whole class benefit from the answers and enables students to answer questions too. However, refrain from posting the solutions to the homeworks. Please use piazza also for public or private communication with the instructors of any kind. E-mails that should be posted to Pizza to the instructors will not be answered or will be answered late by a canned response "please post this question to piazza".

Late policy

You will have 4 late days without penalty to be used across the entire course. You can use them for any deadline (homework, project assignment, project report delivery). After those late days are used, you will be penalized according to the following policy: (1) full credit at the midnight on the due date, Paris time (2) half credit for the next 48 hours; (3) zero credit after that. We encourage the students not to use these late dates except in the exceptional circumstances. All the deadlines are strict and we ask students to avoid demanding the extensions. If you have serious reasons that prevent you meeting the deadlines, please use the formal procedures of your school.

Prerequisites

linear algebra, basic statistics, others tools needed will be covered in the lectures

Syllabus

Session 0 - Overview 27.09.2019 - Amphi Curie

  • Short presentation of the course

Session 1 - Lecture 1 01.10.2019, Salle Condorcet 13h30-15h30

  • Introduction to graphs in Machine Learning
  • Motivation and overview of the successful approaches
  • Applications (Recommender Systems, Semi-Supervised Learning, …)
  • Submodularity for influence maximization
  • Google Pagerank
  • Graphs as data-dependent approach
  • Online SSL for Face Recognition: from raw pixels to analysis
  • Erdős number project
  • Random graphs models – real-world graph models
  • Social network modeling, Small world phenomena, Advertising
  • Graph theory refresher
  • Data-graph constructions

Session 2 – Lecture 2 08.10.2019, Salle Condorcet 13h30-15h30

  • Available public graphs datasets
  • Spectral algebra
  • Graph Laplacian and its properties
  • Random walks and the Laplacians
  • Geometry of the data and connectivity
  • Spectral clustering

Session 3 – TD 1 15.10.2019, Salle Condorcet 13h30-15h30

  • Graph construction and spectral clustering
  • k-NN and epsilon graphs, comparison
  • Understand connectivity vs. compactness paradigm in clustering
  • Spectral clustering on simple image segmentation problem

Session 4 – Lecture 3 22.10.2019, Salle Condorcet 13h30-15h30

  • Introduction to manifold learning
  • Manifold learning with Laplacian Eigenmaps
  • Recommendations with graph distances
  • Resistance networks
  • Semi-Supervised Learning
  • SSL why and when it helps
  • SSL with self-training
  • Semi-Supervised SVMs
  • Graph-based semi-supervised learning
  • Gaussian random fields and harmonic solution

Session 5 - Lecture 4 29.10.2019, Salle Condorcet 13h30-15h30, Deadline: TD1 report

  • Regularization of harmonic solution
  • Soft-harmonic solution
  • Inductive and transductive semi-supervised learning
  • Manifold regularization
  • Max-margin graph cuts
  • Theory of Laplacian-based manifold methods
  • Transductive learning stability based bounds

Session 6 - Lecture 5 and class project proposals 05.11.2019, Salle Condorcet 13h30-15h30,

  • Online learning with graphs
  • Online clustering
  • Online semi-supervised learning
  • What to do when graphs grow?
  • Online incremental k-centers
  • Examples of applications of online SSL
  • Analysis of online SSL
  • Analysis of quantization error
  • When does graph-based SSL provably help?
  • Scalability
  • Scaling harmonic functions to millions of samples
  • Numerical eigenfunctions

Session 7 – TD 2 12.11.2019, Salle Condorcet 13h30-15h30

  • Semi Supervised Learning and Harmonic Function Solution
  • Face Recognizer with HFS
  • Report experiments of one's own face
  • Suggest improvements

Session 8 - Lecture 6 19.11.2019, Salle Condorcet 13h30-15h30, Soft deadline: project assignment

  • Large-scale sparsification
  • Combinatorial sparsification
  • Spectral sparsification
  • GraphLab abstraction
  • Decision-making with graphs
  • Online Semi-Supervised Learning
  • Big graph data learning and analysis
  • Large-scale data processing tools: GraphLab
  • Large-scale label propagation
  • Large-scale graph construction
  • Graph bandits
  • Online decision-making on graphs
  • Smoothness of rewards (preferences) on a given graphs
  • Online movie recommendation on graphs
  • Observability graphs
  • Improving learning rates using side information
  • Online influence maximization

Session 9 - Lecture 7 26.11.2019, 14h00 - 16h00, Salle Condorcet 13h30-15h30, Deadlines: TD2 report, project assignment

Session 10 – TD 3 06.12.2019, Salle Condorcet 12h00-14h00

  • Graph Nets

No class 10.12.2019, Michal and Omar will be at NeurIPS 2019.
You can check out the papers that your instructor(s) are presenting at NeurIPS 2019 during that week:
  • Planning in entropy-regularized Markov decision processes and games
  • Multiagent evaluation under incomplete information
  • Exact sampling of determinantal point processes with sublinear time preprocessing
  • On two ways to use determinantal point processes for Monte Carlo integration

10.12.2019: Deadline: TD3 report

07.01.2020: Deadline: project report

from 13.01.2020 project presentations over Skype/Hangouts

Course page from previous years:

mv