Call For Papers - ALRA : Active Learning in Real-world Applications - Workshop & Challenge (to be held at ECML-PKDD 2012 in Bristol, UK, on Friday, September 28, 2012)
This workshop aims to offer a meeting opportunity for academics and
industry-related researchers, belonging to the various communities
of Computational Intelligence, Machine Learning, Experimental Design
and Data Mining to discuss new areas of active learning, and to
bridge the gap between data acquisition or experimentation and model
building. How active sampling, incremental learning and data
acquisition, can contribute towards the design and modeling of
highly intelligent machine learning systems?
Machine learning indicates methods and algorithms which allow a
model to learn a behavior thanks to examples. Active learning
gathers methods which select examples used to build a training
dataset for the predictive model. All the strategies aim to use a
set of examples as small as possible and to select the most
informative examples.
When designing active learning algorithms for real-world data, some
specific issues are raised. The main ones are scalability and
practicability. Methods must be able to handle high volumes of data,
and the process for labeling new examples by an expert must be
optimized.
We encourage papers that describe applications of active learning in
real-world. The industrial context, the main difficulties met and
the original solution developed, shall be described. Contributions
on the following challenge, that proposes such a practical
application of active learning, will also be welcome.
Associated challenge
As a search engine of places, Nomao collects data coming from
multiple sources on the web and aggregates them. The deduplication
process consists in detecting what data refer to the same place. To
automate this process, using Machine Learning is well suited, and to
optimize the creation of the training dataset, using Active Learning
is appropriate.
However, in that case, millions of data must be labeled, so labeling
the training examples one by one, and running the model at each
step, is unpracticable. Instead, sets of examples must be proposed
for labeling, and this raises specific issues.
Today, 29,104 examples have already been labeled, each example being
characterized by 120 features. This training dataset is available on
the Nomao
Challenge page, along with a test set of size 1,985.
A huge dataset of 100,000 unlabeled examples will also be provided.
Then two active campaigns will be organized, each participant being
allowed to ask for the labeling of a given number (e.g. 100) of the
unlabeled examples by an expert.
And a test campaign will be carried out to evaluate the different
approaches proposed, each participant being asked to label a given
set of examples, and their predictions being compared to the known
true labels.
Papers that address this issue will be welcome. Authors will thus
contribute to the confrontation of proposed solutions and to
discussions during the workshop. And author of the best results will
receive a free registration for the conference and workshop.
Challenge prize
Author of the best
results will receive a free registration for the ECML-PKDD 2012
conference and ALRA workshop.
Topics of interest include (but are not limited to)
- Active Learning
- Experimental Design
- Incremental Learning
- On-line learning
- Case Studies of Active Learning
- Active Learning in Experimental Design (use of the learning
machine to guide further data acquisition)
Key dates
- First active campaign: Friday, June 1, 2012
- Second active campaign: Friday, June 8, 2012
- Final test campaign: Friday, June 15, 2012
- Paper submission deadline: Friday, June 29, 2012
- Paper acceptance notification: Friday, July 20, 2012
- Paper camera-ready deadline: Friday, August 3, 2012
- Workshop: Friday, September 28, 2012, Bristol, UK
Program Committee
Mahmoud Abou-Nasr (Ford Motor Company, USA)
Cesare Alippi (Politecnico di Milano, Italia)
Albert Bifet (University of Waikato, Hamilton, New Zealand)
Zalan Bodo (Babeg Bolyai University, Cluj-Napoca, Romania)
Lehel Csato (Babeg Bolyai University, Cluj-Napoca, Romania)
Gideon Dror (Academic college of Tel-Aviv Yaffo, Israel)
Hugo Jair Escalante (National Institute of Astrophysics,
Optics and Electronics, Mexico)
Matthieu Geist (IMS Research Group, Supelec, Metz, France)
Liang Lan (Temple University, Philadelphia, USA)
Chris Lovell (University of Southampton, UK)
George Runger (Arizona State University, Tempe, AZ, USA)
Burr Settles (Carnegie Mellon University, USA)
Fabien Torre (INRIA, Lille 1 University, France)
Ming-Hen Tsai (National Taiwan University)
Ioannis Tsamardinos (University of Crete, Greece)
Slobodan Vucetic (Temple University, Philadelphia, USA)
Papers will normally be reviewed by three referees. The review
process is single-blind (reviewer identities unknown to authors) and
there will be no opportunity for author rebuttal. This decision was
made to minimize reviewer workload and to concentrate it in time,
which may ultimately result in better review quality and decisions.
If necessary, a discussion will take place among the reviewers of a
paper until a decision is reached.