Shai BenDavid  University of Waterloo
http://www.cs.uwaterloo.ca/~shai/


Is Learning Possible without Prior
Knowledge Dec 8, 2011 10:00  10:40




Ingo Steinwart  University of Stuttgart
http://www.isa.unistuttgart.de/Steinwart/index.t?lang=en&/Steinwart/=


Statistical Analysis of SVMs Dec 8, 2011 10:40  11:20
Since their invention by Vladimir Vapnik and his coworkers in
the early nineties, SVMs have attracted a lot of research activities from various
communities. While at the beginning this research mostly focused on generalization
bounds, the last decade witnessed a shift towards oracle inequalities and learning
rates. In this talk I will discuss some of the latter developments, in particular in
view of least squares and quantile regression, binary classification, and anomaly
detection.




Volodya Vovk  Royal Holloway, University of London
http://www.vovk.net


Kernel Ridge Regression Dec 8, 2011 11:20  12:00
Kernel ridge regression (KRR) is a simplified version of support vector regression.
The main formula of KRR is identical to a formula in kriging, a Bayesian method widely used in geostatistics.
But KRR has performance guarantees that have nothing to do with the assumptions needed for kriging.
I will discuss two kinds of such performance guarantees: those not requiring any stochastic assumptions
and those depending only on the iid assumption.




Manfred Opper  TU Berlin
http://www.ki.tuberlin.de/menue/methoden_der_kuenstlichen_intelligenz/parameter/en/


Assessing the Quality of Approximate Inference for Bayesian Kernel
Methods Dec 8, 2011
13:30  14:10
Models with Gaussian process priors over latent functions can be understood as Bayesian versions of kernel
machines. Unfortunately, except for the case of regression with Gaussian noise, these models do not allow for exact inference.
Efficient approximation techniques such as the expectation propagation (EP) algorithm have been developed to overcome this problem.
Empirical comparisons with extensive Monte Carlo inference on a variety of benchmark data sets for Gaussian processes classifiers
have shown that EP can yield excellent approximations. However, such a positive result may not hold in general. In this talk we
show how the error of the EP approximation for Gaussian process models can be expressed analytically in terms of a series expansion.
Low order terms of the expansion can be used to get a practical estimate of the quality of EP.
Joint work with Ulrich Paquet (Microsoft Cambridge) and Ole Winther (TU Kopenhagen).




Andreas Christmann  Universitat Bayreuth
http://www.stoch.unibayreuth.de/de/team/Andreas_Christmann/


On Stability Properties of Support Vector
Machines Dec
8, 2011 14:10  14:50
Support Vector Machines (SVMs) play an important role in modern statistical learning theory.
The original SVM approach by Boser, Guyon and Vapnik (1992) was derived from the generalized
portrait algorithm invented by Vapnik and Lerner (1963). The books by Vapnik (1982, 1995, 1998)
and later on by Cristianini and ShaweTaylor (2000) and Scholkopf and Smola (2002) had a large
impact on the development of SVMs and their success in various field of applications. It is well
known that SVMs have nice numerical properties and that they are the solution of a welldefined
mathematical problem in Hadamard’s sense.
The talk will briefly summarize some properties of SVMs, which show that SVMs additionally have nice
properties from the view point of statistical stability with respect to the unknown underlying distribution.




Ulrike von Luxburg  University of Hamburg
http://www.informatik.unihamburg.de/ML/contents/people/luxburg/teaching/


Random Walk Distances on
Graphs Dec
8, 2011 14:50  15:30
We present simple procedures for the prediction of a real valued time series
with side information. For squared loss, the prediction
algorithms are based on a machine learning combination of several simple predictors. We
show that if the sequence is a realization of a stationary and ergodic random
process then the average of squared errors converges, almost surely,
to that of the optimum, given by the Bayes predictor. We offer an analog
result for the prediction of stationary gaussian processes, and show an open problem. These
prediction strategies have some consequences for 0−1 loss (pattern recognition problem for time series).




László Györfi  Budapest University of Technology and Economics
http://www.cs.bme.hu/~gyorfi/indexen.html


Nonparametric Sequential
Prediction of Stationary Time
Series Dec
8, 2011 16:00  16:40
We present simple procedures for
the prediction of a real valued time
series with side information. For
squared loss, the prediction
algorithms are based on a machine
learning combination of several
simple predictors. We show that if
the sequence is a realization of a
stationary and ergodic random
process then the average of squared
errors converges, almost surely, to
that of the optimum, given by the
Bayes predictor. We offer an analog
result for the prediction of
stationary gaussian processes, and
show an open problem. These
prediction strategies have some
consequences for 0−1 loss (pattern
recognition problem for time
series).




Peter Bühlmann  ETH
Zürich
http://stat.ethz.ch/~buhlmann/


Highdimensional Causal
Inference Dec 8, 2011 16:40 
17:20
Understanding causeeffect relationships between variables is of
interest in many fields of science. It is
desirable to extract causal information from observational data
obtained by observing a system of interest without subjecting it to
interventions (i.e. without randomized experiments). When assuming no or little
information about (causal) influence diagrams, the problem in its full
generality is illposed. However, we will discuss how sparse graphical
modeling and intervention calculus can be used for quantifying
useful bounds for causal effects, even for the highdimensional, sparse
case where the number of variables can greatly exceed sample size.
Besides methodology, computation and theory,
we will illustrate validation of the method with gene intervention
experiments in yeast (Saccharomyces cerevisiae) and arabidopsis
(Arabidopsis Thaliana).




Leon Bottou  Microsoft Research
http://leon.bottou.org/


About the origins of the VC
lemma Dec 8,
2011 17:20  18:00




KlausRobert Mueller  TU Berlin
http://www.ml.tuberlin.de/menue/members/klausrobert_mueller/


15 years of Kernelbased
Learning Dec
8, 2011 18:00  19:00




Bernhard Schölkopf  MPI for Intelligent Systems
http://www.kyb.mpg.de/nc/employee/details/bs.html


Inference of Cause and
Effect Dec 9, 2011 10:00  10:40




Alexandr Tsybakov  Université Paris 6
http://www.proba.jussieu.fr/~tsybakov/


Optimal Exponential Bounds for the Accuracy of
Classification Dec 9, 2011 10:40 
11:20




Bob Williamson  NICTA
http://users.cecs.anu.edu.au/~williams/


Theory of Loss
Functions Dec 9, 2011 11:20 
12:00
The decisiontheoretic approach to statistics and machine learning is built upon
the idea of a loss function which measures the accuracy of a prediction. In most work (including in
Vapnik’s books) really only three loss functions are considered. But there are many other possibilities.
In the talk I will focus on proper losses for probability estimation, and present some old and new
results that demonstrate the richness of loss functions and the significance of their study.




Alex Smola  Yahoo! Research
http://alex.smola.org


The Mean Trick Dec 9, 2011 13:30 
14:10
In this talk I will give an overview over the mean trick, that is, the use
of Hilbert Space embeddings for expectation operators. It allows one to unify a large number of techniques ranging
from twosample tests to visualization, feature extraction, and graphical models.




Vladimir Vapnik  NEC Laboratories America, Inc.
http://www.neclabs.com/research/machine/ml_website/person.php?person=vlad


TBA Dec 9, 2011 14:10 
15:30




Larry Jackel  NorthC Tecnologies, Inc.
http://www.northc.com


Machine Learning Applications at Bell Labs: Before and After the
Arrival of Vladimir Vapnik Dec 9, 2011 19:00  19:40




Olivier Chapelle  Yahoo! Research
http://olivier.chapelle.cc/


Click Modeling for Display
Advertising Dec 10, 2011 10:40 
11:20




Naftali Tishby  The Hebrew University
http://www.cs.huji.ac.il/~tishby/


Kernel Information
Bottleneck Dec 10, 2011 10:00 
10:40
A fundamental problem of learning theory is finding simple functions
that capture the relevant information in empirical data with respect to hypothesis class or
parametric distributions. Such functions were termed "minimal sufficient statistics" in
parametric inference and are known to exist, with fixed dimensionality, only for
distribution families of exponential form. A principled information theoretic
generalization of minimum sufficient statistics was proposed by the information bottleneck
method (IB), based on the data processing inequality for mutual information: extract
variables that minimize the mutual information between the sample and the statistics, while
constraining the mutual information between the statistics and the relevant variables
(e.g., the distribution parameters). This optimization problem is in general nonconvex and
its optimal solutions can be obtain by an alternating projections algorithm only locally.
The IB problem was shown, however, to be efficiently globally solvable for the special case
of multivariate Gaussian variables (GIB). In this case it provides an information theoretic
generalization of Canonical Correlation Analysis (CCA) and established interesting
connections between CCA, channels with side information, and approximate minimal sufficient
statistics with continuous tradeoff between the accuracy and complexity. In this talk I
will describe a recent extension of the GIB, using Vapnik's Kernel trick, that makes the IB
as practical to any data for which Kernels can be applied. This new version of the IB
corresponds to the information theoretic KernelCCA, and makes the IB algorithm and the
systematic calculation of information curves  the optimal tradeoff between complexity
and accuracy of empirical data  completely practical even for very large empirical data.
Based on joint work with Nori Jacoby.




Masashi Sugiyama  Tokyo Institute of Technology
http://sugiyamawww.cs.titech.ac.jp/~sugi/


Density Ratio Estimation: A New Versatile Tool for Machine
Learning Dec
10, 2011 11:20  12:00
In statistical machine learning, avoiding density estimation is
essential since it is often more difficult than solving a target
machine learning problem itself. This is often referred to as
"Vapnik's principle", and the support vector machine is one of the
successful examples of this principle. Following this spirit, we
recently introduced a new machine learning framework based on the
ratio of two probability density functions. This densityratio
formulation includes various important machine learning tasks such as
nonstationarity adaptation, outlier detection, feature selection,
clustering, and conditional density estimation. Then, by direct
estimating the density ratio without going through density estimation,
all the above tasks can be effectively and efficiently solved in a
unified manner.
In this talk, I give an overview of recent advances in theory,
algorithms, and application of density ratio estimation.




Koji Tsuda  National Institute of Advanced Industrial Science and Technology
http://www.cbrc.jp/~tsuda/


Fast Graph Search with Succinct
Trees Dec 10,
2011 13:30  14:10
In the last 1015 years there has been a great increase of interest in
spaceefficient (succinct) data structuresthat are compressed up to
the information theoretic lower bound. Compared to pointerbased naive
data structures, the memory usage can be smaller up to 2030 fold. I
briefly present basics of succinct data structures andour recent work
of indexing 25 million chemical graphs for similarity search in memory.




Gunnar Raetsch  Friedrich Miescher Laboratory
http://www.fml.mpg.de/~raetsch


TBA Dec 10, 2011 14:10 
14:50




Olivier Bousquet  Google
https://plus.google.com/115212009470496496882/about


Shattering and
Compression Dec 10, 2011 14:50 
15:30




Andre Elisseeff  Nhumi Technologies
http://nhumi.com/en/company/about/


Two Statistical Challenges in Medical
Applications Dec 10, 2011 16:00 
16:30
Application developers in medical informatics are regularly faced with
uncertainty. Medical Data is noisy and requires statistical tricks to
extract relevant information. Data visualization can also be uncertain: it
relies sometimes on statistics to decide what the users want to see. This
presentation will address some of the tricks used in medical software to
handle such noisy situations. We will introduce two applications where
noise, bias and uncertainty are currently handled by simple statistical
tests and where machine learning approaches could also be applied.




Joaquin Quiñonero Candela  Microsoft Research
http://research.microsoft.com/enus/people/joaquinc/


Click Prediction in Computational
Advertising Dec 10, 2011 16:30 
17:00




Mingmin Chi  Fudan University
http://homepage.fudan.edu.cn/mingmin/de


Chinese Stock Mining via Topic
Models Dec
10, 2011 17:00  17:30
Currently, there are more than 2,000 stocks in the Chinese stock market and usually
four new ones are IPO per week. Officially, those are divided into different sectors based on four systems:
China's Securities Regulatory Commission, e.g., forestry, fishery and agricultural industries; Concepts, e.g.,
new energy, internet of things, etc.; Regions, e.g., Shanghai, Beijing, Tibet, etc.; and Industry, e.g., Real
estate, Steel & Iron, Auto, etc. Also, a large amount of different kinds of financial and related political
news are released per day. Different pieces of news can be connected to the related sectors or stocks. However,
there are usually no explicit words or terms which point to the sectors or stocks. How to automatically dig out
the related stocks in terms of the large amount of news articles is of a highly challenging task for retail/institutional
investors. In the paper, we propose to use topic models automatically to generate the “topics” (or sectors) which
are implicitly related to the associated stocks. The preliminary results are shown in the experiments and two new
topic models are also given for further investigation.




Matthew Blaschko  Ecole Centrale Paris
http://vision.mas.ecp.fr/Personnel/blaschko/


Ranking and Structured Output
Prediction Dec 10, 2011 17:30 
18:00



