Shai Ben-David - University of Waterloo

Is Learning Possible without Prior Knowledge
Dec 8, 2011 10:00 - 10:40

Ingo Steinwart - University of Stuttgart

Statistical Analysis of SVMs
Dec 8, 2011 10:40 - 11:20

Since their invention by Vladimir Vapnik and his co-workers in the early nineties, SVMs have attracted a lot of research activities from various communities. While at the beginning this research mostly focused on generalization bounds, the last decade witnessed a shift towards oracle inequalities and learning rates. In this talk I will discuss some of the latter developments, in particular in view of least squares and quantile regression, binary classification, and anomaly detection.

Volodya Vovk - Royal Holloway, University of London

Kernel Ridge Regression
Dec 8, 2011 11:20 - 12:00

Kernel ridge regression (KRR) is a simplified version of support vector regression. The main formula of KRR is identical to a formula in kriging, a Bayesian method widely used in geostatistics. But KRR has performance guarantees that have nothing to do with the assumptions needed for kriging. I will discuss two kinds of such performance guarantees: those not requiring any stochastic assumptions and those depending only on the iid assumption.

Manfred Opper - TU Berlin

Assessing the Quality of Approximate Inference for Bayesian Kernel Methods
Dec 8, 2011 13:30 - 14:10

Models with Gaussian process priors over latent functions can be understood as Bayesian versions of kernel machines. Unfortunately, except for the case of regression with Gaussian noise, these models do not allow for exact inference. Efficient approximation techniques such as the expectation propagation (EP) algorithm have been developed to overcome this problem. Empirical comparisons with extensive Monte Carlo inference on a variety of benchmark data sets for Gaussian processes classifiers have shown that EP can yield excellent approximations. However, such a positive result may not hold in general. In this talk we show how the error of the EP approximation for Gaussian process models can be expressed analytically in terms of a series expansion. Low order terms of the expansion can be used to get a practical estimate of the quality of EP.

Joint work with Ulrich Paquet (Microsoft Cambridge) and Ole Winther (TU Kopenhagen).

Andreas Christmann - Universitat Bayreuth

On Stability Properties of Support Vector Machines
Dec 8, 2011 14:10 - 14:50

Support Vector Machines (SVMs) play an important role in modern statistical learning theory. The original SVM approach by Boser, Guyon and Vapnik (1992) was derived from the generalized portrait algorithm invented by Vapnik and Lerner (1963). The books by Vapnik (1982, 1995, 1998) and later on by Cristianini and Shawe-Taylor (2000) and Scholkopf and Smola (2002) had a large impact on the development of SVMs and their success in various field of applications. It is well- known that SVMs have nice numerical properties and that they are the solution of a well-defined mathematical problem in Hadamard’s sense.

The talk will briefly summarize some properties of SVMs, which show that SVMs additionally have nice properties from the view point of statistical stability with respect to the unknown underlying distribution.

Ulrike von Luxburg - University of Hamburg

Random Walk Distances on Graphs
Dec 8, 2011 14:50 - 15:30

We present simple procedures for the prediction of a real valued time series with side information. For squared loss, the prediction algorithms are based on a machine learning combination of several simple predictors. We show that if the sequence is a realization of a stationary and ergodic random process then the average of squared errors converges, almost surely, to that of the optimum, given by the Bayes predictor. We offer an analog result for the prediction of stationary gaussian processes, and show an open problem. These prediction strategies have some consequences for 0−1 loss (pattern recognition problem for time series).

László Györfi - Budapest University of Technology and Economics

Nonparametric Sequential Prediction of Stationary Time Series
Dec 8, 2011 16:00 - 16:40

We present simple procedures for the prediction of a real valued time series with side information. For squared loss, the prediction algorithms are based on a machine learning combination of several simple predictors. We show that if the sequence is a realization of a stationary and ergodic random process then the average of squared errors converges, almost surely, to that of the optimum, given by the Bayes predictor. We offer an analog result for the prediction of stationary gaussian processes, and show an open problem. These prediction strategies have some consequences for 0−1 loss (pattern recognition problem for time series).

Peter Bühlmann - ETH Zürich

High-dimensional Causal Inference
Dec 8, 2011 16:40 - 17:20

Understanding cause-effect relationships between variables is of interest in many fields of science. It is desirable to extract causal information from observational data obtained by observing a system of interest without subjecting it to interventions (i.e. without randomized experiments). When assuming no or little information about (causal) influence diagrams, the problem in its full generality is ill-posed. However, we will discuss how sparse graphical modeling and intervention calculus can be used for quantifying useful bounds for causal effects, even for the high-dimensional, sparse case where the number of variables can greatly exceed sample size. Besides methodology, computation and theory, we will illustrate validation of the method with gene intervention experiments in yeast (Saccharomyces cerevisiae) and arabidopsis (Arabidopsis Thaliana).

Leon Bottou - Microsoft Research

About the origins of the VC lemma
Dec 8, 2011 17:20 - 18:00

Klaus-Robert Mueller - TU Berlin

15 years of Kernel-based Learning
Dec 8, 2011 18:00 - 19:00

Bernhard Schölkopf - MPI for Intelligent Systems

Inference of Cause and Effect
Dec 9, 2011 10:00 - 10:40

Alexandr Tsybakov - Université Paris 6

Optimal Exponential Bounds for the Accuracy of Classification
Dec 9, 2011 10:40 - 11:20

Bob Williamson - NICTA

Theory of Loss Functions
Dec 9, 2011 11:20 - 12:00

The decision-theoretic approach to statistics and machine learning is built upon the idea of a loss function which measures the accuracy of a prediction. In most work (including in Vapnik’s books) really only three loss functions are considered. But there are many other possibilities. In the talk I will focus on proper losses for probability estimation, and present some old and new results that demonstrate the richness of loss functions and the significance of their study.

Alex Smola - Yahoo! Research

The Mean Trick
Dec 9, 2011 13:30 - 14:10

In this talk I will give an overview over the mean trick, that is, the use of Hilbert Space embeddings for expectation operators. It allows one to unify a large number of techniques ranging from two-sample tests to visualization, feature extraction, and graphical models.

Vladimir Vapnik - NEC Laboratories America, Inc.

Dec 9, 2011 14:10 - 15:30

Larry Jackel - North-C Tecnologies, Inc.

Machine Learning Applications at Bell Labs: Before and After the Arrival of Vladimir Vapnik
Dec 9, 2011 19:00 - 19:40

Olivier Chapelle - Yahoo! Research

Click Modeling for Display Advertising
Dec 10, 2011 10:40 - 11:20

Naftali Tishby - The Hebrew University

Kernel Information Bottleneck
Dec 10, 2011 10:00 - 10:40

A fundamental problem of learning theory is finding simple functions that capture the relevant information in empirical data with respect to hypothesis class or parametric distributions. Such functions were termed "minimal sufficient statistics" in parametric inference and are known to exist, with fixed dimensionality, only for distribution families of exponential form. A principled information theoretic generalization of minimum sufficient statistics was proposed by the information bottleneck method (IB), based on the data processing inequality for mutual information: extract variables that minimize the mutual information between the sample and the statistics, while constraining the mutual information between the statistics and the relevant variables (e.g., the distribution parameters). This optimization problem is in general non-convex and its optimal solutions can be obtain by an alternating projections algorithm only locally. The IB problem was shown, however, to be efficiently globally solvable for the special case of multivariate Gaussian variables (GIB). In this case it provides an information theoretic generalization of Canonical Correlation Analysis (CCA) and established interesting connections between CCA, channels with side information, and approximate minimal sufficient statistics with continuous tradeoff between the accuracy and complexity. In this talk I will describe a recent extension of the GIB, using Vapnik's Kernel trick, that makes the IB as practical to any data for which Kernels can be applied. This new version of the IB corresponds to the information theoretic Kernel-CCA, and makes the IB algorithm and the systematic calculation of information curves - the optimal tradeoff between complexity and accuracy of empirical data - completely practical even for very large empirical data.

Based on joint work with Nori Jacoby.

Masashi Sugiyama - Tokyo Institute of Technology

Density Ratio Estimation: A New Versatile Tool for Machine Learning
Dec 10, 2011 11:20 - 12:00

In statistical machine learning, avoiding density estimation is essential since it is often more difficult than solving a target machine learning problem itself. This is often referred to as "Vapnik's principle", and the support vector machine is one of the successful examples of this principle. Following this spirit, we recently introduced a new machine learning framework based on the ratio of two probability density functions. This density-ratio formulation includes various important machine learning tasks such as non-stationarity adaptation, outlier detection, feature selection, clustering, and conditional density estimation. Then, by direct estimating the density ratio without going through density estimation, all the above tasks can be effectively and efficiently solved in a unified manner.

In this talk, I give an overview of recent advances in theory, algorithms, and application of density ratio estimation.

Koji Tsuda - National Institute of Advanced Industrial Science and Technology

Fast Graph Search with Succinct Trees
Dec 10, 2011 13:30 - 14:10

In the last 10-15 years there has been a great increase of interest in space-efficient (succinct) data structuresthat are compressed up to the information theoretic lower bound. Compared to pointer-based naive data structures, the memory usage can be smaller up to 20-30 fold. I briefly present basics of succinct data structures andour recent work of indexing 25 million chemical graphs for similarity search in memory.

Gunnar Raetsch - Friedrich Miescher Laboratory

Dec 10, 2011 14:10 - 14:50

Olivier Bousquet - Google

Shattering and Compression
Dec 10, 2011 14:50 - 15:30

Andre Elisseeff - Nhumi Technologies

Two Statistical Challenges in Medical Applications
Dec 10, 2011 16:00 - 16:30

Application developers in medical informatics are regularly faced with uncertainty. Medical Data is noisy and requires statistical tricks to extract relevant information. Data visualization can also be uncertain: it relies sometimes on statistics to decide what the users want to see. This presentation will address some of the tricks used in medical software to handle such noisy situations. We will introduce two applications where noise, bias and uncertainty are currently handled by simple statistical tests and where machine learning approaches could also be applied.

Joaquin Quiñonero Candela - Microsoft Research

Click Prediction in Computational Advertising
Dec 10, 2011 16:30 - 17:00

Mingmin Chi - Fudan University

Chinese Stock Mining via Topic Models
Dec 10, 2011 17:00 - 17:30

Currently, there are more than 2,000 stocks in the Chinese stock market and usually four new ones are IPO per week. Officially, those are divided into different sectors based on four systems: China's Securities Regulatory Commission, e.g., forestry, fishery and agricultural industries; Concepts, e.g., new energy, internet of things, etc.; Regions, e.g., Shanghai, Beijing, Tibet, etc.; and Industry, e.g., Real estate, Steel & Iron, Auto, etc. Also, a large amount of different kinds of financial and related political news are released per day. Different pieces of news can be connected to the related sectors or stocks. However, there are usually no explicit words or terms which point to the sectors or stocks. How to automatically dig out the related stocks in terms of the large amount of news articles is of a highly challenging task for retail/institutional investors. In the paper, we propose to use topic models automatically to generate the “topics” (or sectors) which are implicitly related to the associated stocks. The preliminary results are shown in the experiments and two new topic models are also given for further investigation.

Matthew Blaschko - Ecole Centrale Paris

Ranking and Structured Output Prediction
Dec 10, 2011 17:30 - 18:00