Michal Valko : Graphs in Machine Learning - Spring 2015

Graphs in Machine Learning - Spring 2015 - MVA - ENS Cachan

News

  • 09.04.2015: Sign up for a presentation after reading the instructions.
  • 02.03.2015: Class project proposals available.
  • 24.02.2015: Course website update with administrative information about the class projects.
  • 23.02.2015: For TP 2, bring your own laptop and video according to instructions.
  • 12.02.2015: ALTeGraD course of Michalis Vazirgiannis starts on 04.03.2015
  • 18.12.2014: First lecture on 13. 1. 2014
  • 18.12.2014: Course presented at MVA: slides
old news

Administrivia

  • Time: Tuesdays 11h-13h
  • Place: ENS Cachan - Bldg. Cournot, 1. floor
  • 8 lectures and 3 recitations (TP)
  • Validation: grades from TD + class project
  • Research: projects, interships (stages) and PhD. thesis at SequeL and elsewhere possible
  • TA: Daniele.Calandriello#inria.fr
  • course description at MVA at ENS Cachan

Main topics

  • spectral graph theory, graph Laplacians
  • semi-supervised graph-based learning
  • manifold learning
  • graphs from flat data - graph as a non-parametric basis
  • online learning with graphs
  • real world graphs scalability and approximations
  • random graphs models
  • social networks and recommender systems applications
  • large graph analysis, learning, and mining
  • vision applications (e.g., face recognition)

Intro

The graphs are useful models whenever we deal with relations between the objects. This course, focused on learning, will present methods involving two main sources of graphs in ML: 1) graphs coming from networks, e.g., social, biological, computer, … and 2) graphs coming from flat (often vision) data, where a graph serves as a useful nonparametric basis and is an effective data representation for such tasks as: spectral clustering, manifold or semi-supervised learning. The lectures will show not only how but mostly why things work. The students will learn topics from spectral graph theory, learning theory, relevant mathematical concepts and the concrete graph-based approaches for typical machine learning problems. The practical sessions will provide hands-on experience on interesting applications (e.g., online face recognizer) and state-of-the-art graphs processing tools (e.g., GraphLab).

Organization

The course will feature 11 sessions, 8 lectures and 3 recitations (TP), each of them 2 hours long. There may be a special session with guest lectures and internship proposals. The evaluation will be based on reports from TP and from the projects. Several project topics will be proposed but the students will be able to come with their own and they will be able to work in groups of 2-3 people. The projects will be presented during a special day at the very end. The course will be in English.

Class projects

The main part the of the grade comes from the projects. The students are encouraged to come up with the topic of their interest related to the course and start working on it as soon as possible. In this case, please e-mail the lecturer with a short description of the project for the approval. Additional project proposals will be presented on 03.03.2015. Deadline for deciding on the project is 17.03.2015, but the recommended date for picking up the projects is on 10.03.2015. The deadline for submitting 5-10 page reports in NIPS format is 11.04.2015. The planned time for 20 minutes in-class presentations is 13.04.2015. Students can work in pairs of 2 and exceptionally 3.

Prerequisites

linear algebra, basis statistics, others tools needed will be covered in the lectures

Syllabus

Session 1 - Lecture 1 13.01.2015, C103

  • Introduction to graphs in Machine Learning
  • Motivation and overview of the successful approaches
  • Applications (Recommender Systems, Semi-Supervised Learning, …)
  • Submodularity for influence maximization
  • Google Pagerank
  • Graphs as data-dependent approach
  • Online SSL for Face Recognition: from raw pixels to analysis
  • Erdős number project
  • Random graphs models – real-world graph models
  • Social network modeling, Small world phenomena, Advertising

Session 2 – Lecture 2 20.01.2015, C103

  • Graph theory refresher
  • Data-graph constructions
  • Available public graphs datasets
  • Spectral algebra
  • Graph Laplacian and its properties
  • Random walks and the Laplacians

Session 3 – Lecture 3 27.01.2015, C103

  • Recommendations with graph distances
  • Resistance networks
  • Geometry of the data and connectivity
  • Spectral clustering
  • Introduction to manifold learning

Session 4 – TP 1 03.02.2015, C109+C103

  • Graph construction and spectral clustering
  • k-NN and epsilon graphs, comparison
  • Understand connectivity vs. compactness paradigm in clustering
  • Spectral clustering on simple image segmentation problem

Session 5 - Lecture 4 10.02.2015, C103

  • Manifold learning with Laplacian Eigenmaps
  • Semi-Supervised Learning
  • SSL why and when it helps
  • SSL with self-training
  • Semi-Supervised SVMs
  • Graph-based semi-supervised learning
  • Gaussian random fields and harmonic solution
  • Regularization of harmonic solution
  • Soft-harmonic solution

Session 6 - Lecture 5 17.02.2015, C103

  • Inductive and transductive semi-supervised learning
  • Manifold regularization
  • Theory of Laplacian-based manifold methods
  • Transductive learning stability based bounds
  • SSL Learnability
  • Online learning with graphs
  • Online clustering
  • Online semi-supervised learning
  • What to do when graphs grow?
  • Online incremental k-centers

Session 7 – TP 2 24.02.2015, C109+103+C123

  • Semi Supervised Learning and Harmonic Function Solution
  • Face Recognizer with HFS
  • Online Semi-Supervised Learning
  • Report experiments of one's own face
  • Suggest improvements

Session 8 - Lecture 6 and class project proposals 03.03.2015, C103

  • Analysis of online SSL
  • Analysis of quantization error
  • When does graph-based SSL provably help?

Session 9 - TP 3 10.03.2015, C109+103+C123

  • Big graph data learning and analysis
  • Large-scale data processing tools: GraphLab
  • Large-scale label propagation
  • Large-scale graph construction
  • Large-scale sparsification

Session 10 - Lecture 7 17.03.2015, C103

  • Scalability
  • Scaling harmonic functions to millions of samples
  • Numerical eigenfunctions
  • Graph bandits
  • Online decision-making on graphs
  • Smoothness of rewards (preferences) on a given graphs
  • Online movie recommendation on graphs

Session 11 – Lecture 8 24.03.2015, C103

  • Observability graphs
  • Improving the learning rates using side information
  • Combinatorial sparsification
  • Spectral sparsification
mv