Machine learning with a 2020 sight

GOAL

Group of machine learning experts from the Hebrew University and the Technion, led by Prof. Shai Shalev Shwartz (HUJI) and Prof. Shie Mannor (Technion) will develop advanced learning algorithms with the goal of meeting the requirements of future applications, in particular intelligent agents and distributed learning. The research team has identified and will explore five fundamental technology gaps, which are believed essential to the next generation of machine learning applications, and can benefit and enhance novel computer platforms and architectures:

Distributed Learning. Most current learning algorithms involve a centralized agent, which collects data and uses it to deduce prediction rules (“the intelligent behavior”). However, this approach is quickly becoming irrelevant, due to the exponential growth in both the available data (e.g., images, texts, sensor data) and computational resources (data centers, clouds, millions of available cores, smart-phones). The purpose of this project is to build novel frameworks for a “decentralized intelligence”. The researchers will define a distributed learning model and will develop novel learning algorithms to tackle the unique features and constraints of decentralized learning, such as lack of centralized authority, resiliency to failures of components, dynamic network topology, asynchrony, power restrictions, communication limitations, privacy, etc.

Light supervision. In many real world applications we are constrained with the amount of information we can receive for each individual training example. The lack of full information on each individual example can be due to various reasons such as partial supervision, latent variables, inherent noise, privacy protected data, or from constraints on the type of interaction with the data source. For example, suppose that we would like an intelligent smart phone to learn to classify our phone calls, say according to work related vs. personal. While the phone can easily record all our calls, it does not receive the “labeling” of which calls are work related and which calls are personal. It may receive labels of few examples, or even may ask the user to label specific calls.

Online Learning. In static learning problems, we have a clear distinction between a learning phase, in which the learning algorithm trains the model based on a batch of training examples, and a prediction phase in which the learned model is applied on new examples. However, in many real world problems, the learner must continue to learn and improve all the time. This is especially true in dynamic environment. For example, in security applications the environment always tries to find weaknessed in the current model. In addition, in next generation applications of machine learning, new examples always keep coming, and it may even be impossible to store all the examples.

Learning sequences. In well structured learning problems, there is a simple way to represent objects as vectors. For example, an image can be represented as a matrix of pixels or as a bag of image descriptors. Things are more involved when we like to classify are sequenced objects, like video, audio, text streams, or Internet traffic. Sequences are common to all modalities of sensory data. New methods and learning models are required for effectively dealing with structure and prediction of sequential data streams.

Process and sensing-acting learning. The most challenging learning scenario, on which there is much progress in recent years, is the integration of continuous multi sensory signals, such as audio and video, into sequences of decisions, actions and long term planning, in a continuously changing stochastic environment. This type of learning setting dominates applications like robotics and autonomous intelligent agents, but is most crucial for natural human machine interaction. We will develop novel paradigms that combine information and control theories with online and semi-supervised learning algorithms, to tackle the very challenging setting of decision making and planning in partially observed environments.

The main goal of this proposal is to tackle aspects of the aforementioned technology gaps by providing novel algorithms for learning in the scenarios mentioned above. For all these central machine learning problems we will investigate special accelerators that can enhance their performance.

The main outcomes of this research are formal models, algorithms and software libraries. Ideally, one would like to build a proof of concept by applying these technologies into real-world application. However, this is conditioned on the availability of real–life data.

During the first year we will develop and analyze a first round of algorithms for the aforementioned projects. In particular, we will build a learning framework for distributed learning (analogous to PAC learning), we will develop efficient algorithms for active learning, we will develop algorithms for online learning with smart experts, for finding meaningful patterns in sequences, and for long term planning. In subsequent years we will implement and experiment with the algorithms, underscore possible practical deficiencies, and will develop improved versions of the algorithms.

STATUS
The team has recently developed a first algorithm for active learning (available on arxiv, http://arxiv.org/abs/1208.3561 ) and is in the process of developing an algorithm for distributed primal-dual optimization.

PEOPLE

Prof. Shai Shalev Shwartz, HUJI CSE
Prof. Shie Mannor, Technion EE
Dr.  Koby Crammer, Technion EE
Prof. Ran El-Yaniv, Technion CS
Dr. Gal Elidan, HUJI Statistics
Dr. Elad Hazan, Technion IE&M
Dr. Amir Globerson, HUJI CSE
Prof. Naftali Tishby, HUJI CSE
Prof. Michael Schapira, HUJI CSE
Prof. Ami Wiesel, HUJI CSE
Alon Gonen (HUJI)
Alon Vinnikov (HUJI)
PUBLICATIONS

Shie Mannor ➭

  • T. Mann & S. Mannor, “The Advantage of Planning with Options”. Reinforcement Learning and Decision Making (RLDM) 2013
  • M. Harel,  K. Crammer, R. El-Yaniv, S.Mannor, “Concept Drift Detection Through Resampling”, ICML 2014
  • O. Anava, E. Hazan, S. Mannor and O.Shamir , “Online Learning for Time Series Prediction”, The 26th Conference on Learning Theory (COLT 2013)

Shai Shalev Shwartz ➭

  • E. Hazan, S. Kale and S. Shalev-Shwartz, “Near-Optimal Algorithms for Online Matrix Prediction”. (COLT 2012)
  • Shai Shalev-Shwartz and Shai Ben-David. “Understanding Machine Learning: From Theory to Algorithms”. Cambridge University Press. 2014.
  • S. Shalev-Shwartz and T. Zhang, “Accelerated Proximal Stochastic Dual Coordinate Ascent for Regularized Loss Minimization”. ICML 2014
  • A.Gonen, S.Sabato, S.Shalev-Shwartz, “Efficient Active Learning of Halfspaces: an Aggressive Approach”, ICML 2013.
  • R. Livni ,S. Shalev-Shwartz , “Vanishing Component Analysis”, ICML 2013
  • Roi Livni, Shai Shalev-Shwartz and Ohad Shamir, “An Algorithm for Training Polynomial Networks”, arXiv preprint 1304.7045 (to be submitted)
  • Shai Shalev-Shwartz and Alon Vinnikov, “K-means Recovers ICA Filters when Independent Components are Sparse”ICML 2014
  • Shai Shalev-Shwartz, Alon Gonen, Dan Rosenbaum, Yonina Eldar, “The Sample Complexity of Subspace Learning with Partial Information”, Submitted
  • Roi Livni, Ohad Shamir, and Shai Shalev-Shwartz, “A Provably Efficient Algorithm for Training Deep Networks”, Preprint.
  • Shai Shalev-Shwartz and Tong Zhang, “Accelerated Mini-Batch Stochastic Dual Coordinate Ascent”, NIPS 2013.
  • Shai Shalev-Shwartz and Tong Zhang, “Accelerated Proximal Stochastic Dual Coordinate Ascent for Regularized Loss Minimization”, ICML 2013.
  • S. Shalev-Shwartz and T. Zhang, “Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization”, Journal of Machine Learning Research, 14(Feb):567-599, 2013

Koby Crammer ➭

  • M. Harel,  K. Crammer, R. El-Yaniv, S.Mannor, “Concept Drift Detection Through Resampling”, ICML 2014
  • Asaf Noy, Koby Crammer, “Robust Algorithms via PAC-Bayes and Laplace Distributions”, Book Chapter in Alexey Chervonenkis Festchrift, Springer (to appear)
  • K. Crammer, “Doubly Aggressive Selective Sampling Algorithms for Classification”, AISTATS2014
  • A. Noy, K.Crammer, “Robust Forward Algorithms via PAC-Bayes and Laplace Distributions”,  AISTATS 2014
  • E. Moroshko, K. Crammer, “Selective Sampling with Drift”, AISTATS 2014
  • Yevgeny Seldin, Peter Bartlett, Yasin Abbasi-Yadkori and Koby Crammer, “Prediction with Limited Advice and Multiarmed Bandits with Paid Observations”. ICML 2014

Ran El-Yaniv ➭

  • M. Harel,  K. Crammer, R. El-Yaniv, S.Mannor, “Concept Drift Detection Through Resampling”, ICML 2014
  • R. El-Yaniv, Y. Wiener, “On the Version Space Compression Set Size and its Applications”. To appear in Measures of Complexity (contribution to A. Chervonenkis Festchrift), Springer 2014
  • Y. Wiener and R. El-Yaniv, “Pointwise Tracking the Optimal Regression Function”. NIPS 2012.

Gal Elidan ➭

  • G.Elidan,E. Eban, G. Rothschild, A. Mizrahi, and I. Nelken, “Dynamic Copula Networks for Modeling Real-valued Time Series”,  AI-Stats 2013
  • O. Meshi, E. Eban, G. Elidan and Amir Globerson , “Learning Max-Margin Tree Predictors” , Uncertainty in Artificial Intelligence (UAI) 2013
  • G.Elidan,Y.Tenzer,  “Speedy Model Selection  (SMS) for Copula Models”,  UAI 2013

Elad Hazan ➭

  • E. Hazan and T. Koren, “Linear Regression with Limited Observation”. The 29th International Conference on Machine Learning (ICML 2012) ICML 2012
  • E. Hazan, S. Kale and S. Shalev-Shwartz, “Near-Optimal Algorithms for Online Matrix Prediction”. (COLT 2012)
  • E. Hazan and S. Kakade, “Calibration is Computationally Hard”. The 25th conference on learning theory (COLT 2012)
  • E. Hazan and S. Kale, “Projection-free Online Learning”. (ICML 2012)
  • D. Garber and E. Hazan , “Playing Non-linear Games with Linear Oracles”, 54th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2013)
  • O. Dekel and E.Hazan, “Better Rates for Any Adversarial Deterministic MDP”, The 30th International Conference on Machine Learning (ICML 2013)
  • O. Anava, E. Hazan, S. Mannor and O.Shamir , “Online Learning for Time Series Prediction”, The 26th Conference on Learning Theory (COLT 2013)
  • D.Garber and E. Hazan, “Universal MMSE Filtering With Logarithmic Adaptive Regret”, IEEE Transactions on Signal Processing, volume 61(7): 1595-1604, 2013

Amir Globerson ➭

  • A. Globverson, “Inferning with High Girth Graphical Models”. International Conference on Machine Learning (ICML) 2014.  Accepted for publication
  • O. Meshi, E. Eban, G. Elidan and Amir Globerson , “Learning Max-Margin Tree Predictors” , Uncertainty in Artificial Intelligence (UAI) 2013

Naftali Tishby ➭

  • N. Jacoby, P. Keller, B.Repp, and N.Tishby, “Parameter estimation of linear sensorimotor synchronization models: phase correction, period correction and ensemble synchronization”. To appear in a special issue of “Timing and Time Perception” (Brill, UK).
  • P. Ortega, D.Braun, N.Tishby, “Monte Carlo Methods for Exact & Efficient Solution of the Generalized Optimality Equations”, in ICRA 2014. Journal version submitted to JMLR.
  • N.Jacoby and N.Tishby, “Gaussian Embedding and Information Geometry” (NIPS 2014)
  • M.Moshkovich and N.Tishby, “Control your Predictions”. In proceedings of  ISIT 2014 (June 2014).
  • S. Sabato, N. Srebro and N. Tishby, “Distribution-Dependent Sample Complexity of Large Margin Learning”, Journal of Machine Learning Research, 14(Jul):2119-2149, 2013
  • S. Sabato and N. Tishby, “Multi-Instance Learning with Any Hypothesis Class”, Journal of Machine Learning Research, 13(Oct):2999-3039, 2012.

Michael Schapira ➭

Ami Wiesel ➭