Agnostic Learning vs. Prior Knowledge Challenge FAQ

Agnostic Learning vs.
Prior Knowledge
Challenge FAQ

What is the goal of the challenge?
The goal is to provide the best possible predictive models for the five tasks of the challenge using or not prior knowledge about the tasks. All tasks are 2-class classification problems.

Are there prizes?
Yes: A cash award and a certificate will be conferred to the winner(s) at IJCNN 2007. There will be several prizes:
- One award for the best overall entry in the agnostic learning track.
- Five awards for the best entries for individual datasets in the prior knowledge track.
- One best paper award. This has been attributed based on the IJCNN submissions and the winner will be revealed at the IJCNN workshop.
In addition, deserving challenge participants who need financial support to attend the workshop may send a request to agnostic@clopinet.com.

Important: All final entries must include results on all 5 datasets to facilitate our book-keeping, even if you specialize on a particular dataset in the prior knowledge track. You may use the sample submission to fill in results for datasets you do not want to work on. You must identify yourself with your real name in your final entries.

What is the schedule of the challenge?
The challenge starts October 1^st, 2006 and ends in March 1^st, 2007, and has been extended until August 1^st, 2007. See the exact schedule for the intermediate milestones, submission of papers, and the workshop.

Did you publish the intermediate results of the March 1^st?
Yes, in addition to the on-line feed-back provided on the validation set, we did 2 intermediate rankings. One in the agnostic track only for the model selection game and one in both both tracks at the original deadline. To avoid compromising the test set too much, we revealed only participant rankings, not the ranking of all the entries, and did not reveal the performances on the test set. The December 1^st ranking and the results of the model selection game are available from the NIPS 2006 workshop web page. The results of the March 1^st ranking are available from the IJCNN 2007 workshop web page.

How do I participate in the challenge?
Participation to the challenge is free and open to everyone, see the website of the challenge. It is possible to submit on-line results on the validation set during the development period and get immediate performance feed-back. Any time before the end of the challenge, the competitors can return predictions on the final test test. But, the performances on the test set will not be revealed until the challenge is over.

Where are the datasets?
The datasets may be downloaded from the workshop page or the website of the challenge.

What are these "two tracks"?
In this challenge we provide two versions of the datasets:
- The "raw data" for the "prior knowledge" track.
- Preprocessed data for the "agnostic learning track".
You may select either version of the data and return prediction results.

Which track should I choose?
If are not an expert in any of the tasks and you have not entered the previous challenge, you may want to use the agnostic track data first. The data representations are the same as those used in the previous challenge, but the examples and features were re-shuffled. You may then be compelled to improve your performance by taking a look at the raw data.
But, if you are an expert in one of the tasks, you may want to try your proven methods on that data first. Perhaps you beat already badly the agnostic track entries!

How do I select the track I chose?
Just submit results using the data version you chose. We will figure out automatically which version you used (the data splits are the same, but the patterns are shuffled differently in each set).

Can I make mixed submissions?
You may make mixed submissions, with results on "Agnostic learning track" data for some datasets and on "Prior knowledge track" data for others. In this way, you can enter the challenge even if you do not have domain knowledge for some of the tasks. Mixed entries will count towards the "Prior knowledge track". Your entry will count towards the "Agnostic learning track" only if it uses the data provided for that track.

What kind of prior knowledge can I use in the "agnostic track"?
None. You may not use prior knowledge if you want your entry to count towards the "agnostic track" competition.
We do not discourage you to make use of such information, we just warn you that this may turn your "agnostic track" entry into a "prior knowledge track" entry:
If you use prior knowledge with some of the preprocessed dataset, you will have to disclose it when we ask you to fill out a fact sheet about your method. We will then decide whether your entry should count as a "prior knowledge track" entry rather than an "agnostic track" entry. Examples of use of prior knowledge that would turn your entry into a "prior knowledge track" entry include:
- the number of "real" features (this would help you remove the purposely added redundant or irrelevant features),
- the number of classes (several problems are multiclass problems turned into 2-class problems),
- the nature of the task (it may help to know, for instance, that in an image the features are correlated according to their placement in a plane).
- the results of the previous challenge.
So, for instance, if you do feature selection in the agnostic track, you must have a means of obtaining the number of features that uses the data. If you use clustering to improve your performance, you may not set by hand the number of clusters to the number of classes in the underlying multi-class problem, you must have a means of finding the number of clusters from data. If you use the results of past challenges to select the most promising methods, it is OK. But, you may not pick the best hyperparameters without obtaining them automatically from data.

How can you say the there is no prior knowledge in the "Agnostic track data" you provide?
It is true that we had to use some knowledge to produce the feature representation we give in the "agnostic track" to produce the feature representations for NOVA and HIVA. But you can judge for yourself, the data representations are rather straightforward and make no use of elaborate knowledge. For NOVA, we use a "bag-of-words" representation of text. For HIVA, we use presence/absence of substrings of the chemical formulas.
In the case of ADA, GINA, and SYLVA, the data for both tracks has a feature representation, but the preprocessing is a mere agnostic coding that should be concerving all the information of the raw data and not add extra information. The preprocessing a way for us to disguise the data such that it is hard to use prior knowledge inadvertently.

How will you prevent "agnostic learning" track participants to use prior knowledge?
We cannot really prevent participants to do so. The only really agnostic learning results were obtained as part of the previous challenge, which used the same datasets (but a different data shuffling). Then, the participants had no knowledge of the nature of the tasks at hand. This time, the agnostic track people can take a peak at the information provided to the prior knowledge track. We have to rely on the participants good faith: We expect that they honestly disclose in their fact sheet and paper what they did to the data. Eventually, if it is judged by the organizers that prior knowledge was used to improve performance on data provided for the agnostic track, the entry will count towards the "prior knowledge" track.

Is it permitted to use extra training data, which is not provided?
Yes, but only if you enter the prior knowledge track. For instance, the GINA task is a handwritten digit recognition task. You may use extra training data if you have some available. HOWEVER, in no event is it permitted to train on test data. Since we have revealed the original source of the data we use in the challenge and those data are publicly available, it is of course possible to get the test patterns to train. This would be considered cheating, would invalidate the entry and disqualify the entrant. Check to make sure you are not using those data.

Can I vary the number of examples to demonstrate the value of my method?
Absolutely, this is a great idea. Sometimes, when enough data is available, there is not much value added by prior knowledge. But with smaller datasets, there is. Report your results in your paper to increase your chances to win the best paper award.

What will happen to me if I cheat?
Probably nothing. A lot of cheaters never get caught, isn't it? HOWEVER, if you are one of the top ranking entrants, we will spend quite a bit of effort to try to reproduce your results (and other people will). If your results cannot be reproduced, this will be highly suspicious and shed doubt on your integrity... Think about it before you cheat, you may end up not looking so good after all.

How will you proceed to reproduce the results of the winners?
If the methods are published with sufficient details, we may try to re-code it. If this does not work, we will ask the participants to send us their code. We will ask for both the source code and an executable. For commercially available software released prior the start of the challenge, we may accept to gain access only to the executable, provided that the date of release can be verified.

Why is there no feature number in the raw data of HIVA and NOVA?
Because the raw data does not come as a data table. The features must yet be extracted either rom the chemical structure or from raw text.

Why are there sometimes more features in the "agnostic" data?
Because some features that are categorical are encoded as a longer binary code, for convenience. In some cases, distractor features have been introduced in the "agnostic" data. See the data documentation for details.

What is the data format?
For the agnostic track, the data sets are in the same format and include 5 files in text format:
dataname.param: Parameters and statistics about the data
dataname_train.data: Training set (a sparse or a regular matrix, patterns in lines, features in columns).
dataname_valid.data: Development test set or "validation set".
dataname_test.data: Test set.
dataname_train.labels: Labels (truth values of the classes) for training examples.

The matrix data formats used are (in all cases, each line represents a pattern):
- For regular matrices: a space delimited file with a new-line character at the end of each line.
- For sparse matrices with binary values: for each line of the matrix, a space delimited list of indices of the non-zero values. A new-line character at the end of each line.

For the prior knowledge track, the files: dataname.param, and dataname_train.labels are also provided. As additional "prior knowledge", a file dataname_train.mlabels containing the original multi-class labels is provided (THESE SHOULD NOT BE USED AS TRUTH VALUES, the target values are provided by dataname_train.labels). The .data files containing the patterns are in miscellanous formats, depending on the nature of the data:
ADA: Coma separated files (ada_train.csv, ada_valid.csv, and ada_test.csv). Each line represents a feature set. The features are given in ada.feat.
GINA: The regular matrix format (gina_train.data, gina_valid.data, and gina_test.data). Each line is a vector of 28x28 image pixels (the lines have been concatenated).
HIVA: The 3D molecular structure is represented in the MDL-SD format, records beeing separated by $$$$ (hiva_train.sd, hiva_valid.sd, and hiva_test.sd).
NOVA: The data consists of emails, records beeing separated by $$$$ (hiva_train.txt, hiva_valid.txt, and hiva_test.txt).
SYLVA: The regular matrix format (sylva_train.data, sylva_valid.data, and sylva_test.data). The features are given in sylva.feat.

How should I format and submit my results?
The results on each dataset should be formatted in 6 ASCII files:
dataname_train.resu: +-1 classifier outputs for training examples (mandatory for final submissions).
dataname_valid.resu: +-1 classifier outputs for validation examples (mandatory for development and final submissions).
dataname_test.resu: +-1 classifier outputs for test examples (mandatory for final submissions).
dataname_train.conf: Confidence values for training examples (optional).
dataname_valid.conf: Confidence values for validation examples (optional).
dataname_test.conf: Confidence values for test examples (optional).

Format for classifier outputs:
- All .resu files should have one +-1 integer value per line indicating the prediction for the various patterns.
- All .conf files should have one decimal positive numeric value per line indicating classification confidence. The confidence values can be the absolute discriminant values. They do not need to be normalized to look like probabilities. Optionally they can be normalized between 0 and 1 to be interpreted as abs(P(y=1|x)-P(y=-1|x)). They will be used to compute ROC curves and Area Under such Curve (AUC). and other performance metrics such as the negative cross-entropy.

Create a .zip or .tar.gz archive with your files and give to the archive the name of your submission. You may want to check the example submission file zarbi.zip. Matlab code is available to help you format the results.
Submit the results on-line. If any problem, contact the challenge web site administrator.

Is there code to help me read the data and format the results?
Yes: Matlab code is provided for that purpose, see the challenge website. A subset of the full package containing sample code to read the data and format the results can also be downloaded.

Is there a limit to the number of submissions?
You can make as many submissions as you want (albeit no more than 5 per day not to overload our system.) However, only your FIVE last valid submissions in either track will be used for the final ranking. Valid submissions include all results on all datasets.

Why do we need to enter results on all five tasks?
If the participants could select which track and which task they want to compete on, there would not be enough participants to make interesting comparisons for each task/track combination. If you are not an expert in some of the tasks and want to enter the prior knowledge track, you may use the sample submission to fill in results on datasets you do not want to work on or use the preprocessed data rather than the raw data on those tasks. Perhaps, if you have time, you will be curious to see if some of your ideas work also on the raw data!

Why are there no multiclass and regression tasks?
We do not want to have a too wide variety of difficulties in the same challenge so we can let the participants focus on one particular aspect of machine learning. Other challenges are examining regression or multiclass problems, check the links. Note that we provided the multiclass labels when available in the prior knowledge track so you can eventually make use of them.

Why do you have an "agnostic track" in parallel with the "prior knowledge track"?
We believe it will encourage the participants to push the frontiers in both "agnostic learning" and "prior knowledge" incorporation, one track eventually catching up with the other and vice versa. We are aware that because the tracks run in parallel, the "agnostic track" does not perform purely agnostic learning. We have the results of the previous challenge as a yard stick for agnostic learning.

Is there code I can use to perform the challenge tasks?
Yes: We provide a Matlab package called CLOP (Challenge Learning Object Package), which is based on the interface of the Spider package developed at the Max Plank Institute for Biological Cybernetics. It contains preprocessing and learning machine "objects", and examples on how to apply then to the challenge data. The models include some of the best performing methods in past challenge.

What is the scoring method?
The competitors in each track separately will be ranked according to the test balanced error rate (BER), that is the average of the error rate of examples of the positive class and the error rate of examples of the negative class.
The area under the ROC curve (AUC) will also be computed, if the participants provide classification confidences (the .conf files) in addition to class label predictions (the compulsory .resu files). But the relative strength of classifiers will be judged only on the BER. Other statistics may also be computed and will be reported (e.g. performances using other loss functions) but will not be used towards determining the winner.

How will you create a global ranking?
The final ranking will be based on the average rank of the participants over all 5 datasets, using for each participant his/her best entry on each dataset. This prevents overweighing the datasets with largest error rates.

Will the results be published?
Yes, the results of the challenge will be published at the IJCNN 2007 conference. You can submit a paper to that conference and participate the workshop where the results of the challenge will be presented (please use the category of 'Special Competitions' and submit before January 31, 2007). Since IJCNN 2007 marks the 20 year anniversary of the event, a special issue of Neural Networks, the official journal of the INNS, will be published to include selected outstanding papers from the conference.

Can I use an alias or a funky email not to reveal my identity?
To enter the final ranking, we require participants to identify themselves by their real name. You cannot win the competition if you use an alias. However, you may use an alias instead of your real name during the development period, to make development entries that do not include results on test data. You must always provide a valid email. Since the system identifies you by email, please use always the same email. Your email will only be used by the administrators to contact you with information that affect the challenge. It will not be visible from others during the challenge.

Do I need to let you know what my method is?
Disclosing information about your method is optional during the development period. However, to participate to the final ranking, you will have to fill out a fact sheet about your method(s). This is compulsory because we must decide which entry qualifies as truly "agnostic" and we must be able to reproduce the results to verify there was no cheating. We encourage the participants not only to fill out the fact sheets, but write a conference paper with more details. A best paper award will distiguish entries with particularly original methods, methods with definite advantages (other that best performance) and good experimental design

Can me or my group make multiple submissions?
Multiple submissions by the same person (nominatively uniquely and properly identified) are permitted, provided that the following conditions are met:
- For each final submission, results on ALL the data sets are provided.
- Less than five final submissions are entered per person abd per track (if a larger number of submissions are made, the last 5 fulfilling the criteria of final submissions will be considered for the final ranking and selecting the winner in each track).

Can I use a robot to make submissions?
Robot submissions are not explicitly forbidden. However, we require that the total number of submissions per 24 hours from the same origin does not exceed 5. Please be courteous otherwise we run at risk to overload the server and we will need to take more drastic measures.

Can I make a submission with mixed methods?
Mixed submissions containing results of different methods on the various datasets are permitted. Choosing the methods on the basis of the validation set results is permitted.

What is the difference between a development submission and a final submission?
A final submission consists of classification results on ALL the datasets provided for the five tasks. Partial "development" submissions (including results only on a subset of the data or only on the validation set) may also optionally be entered to get feed-back, but they will not be considered for the final ranking. The organizers will compute validation set scores right away and publish them on-line. The test set results and the competition winner(s) will be disclosed only after the closing deadline.

A development submission may include results on a subset of the datasets. There are no limits on the number of development submissions, except that we request than no more than five submissions per day be made to avoid overloading the system. All final submissions should include classification results on ALL the datasets for the five tasks (that is training, validation and test set, a total of 15 files) and optionally the corresponding confidence values (15 files). There is a limit of 5 final submissions. If more than 5 submissions fulfilling the criterion of a valid final submission are entered, the last 5 only will be taken into account in the final ranking. Therefore, you may enter final submissions even during development, but only the last five will be used for the final ranking.

Why should I make development submissions?
Development submissions are not mandatory. However, they can help you in a number of ways:
- To get familiar with the submission procedure and make sure everything runs smoothly before the rush of the deadline.
- To evaluate your method on an independent test set and compare the results with those of others.
- To get alerted by email if we make changes or become aware of a problem of general interest to the participants.

Can I attend the workshop if I do not participate to the challenge?
Yes. You can even submit a paper for presentation on the themes of the workshop.

Should I use the models provided for the challenge?
You can use your own model(s).

Why did you split the data into training, validation, and test set?
The validation set that we reserved could rather be called "development test set". It allows participants to assess their performance relative to other participants' performance during the development period. The performances on the test set will remain confidential until the closing deadline. This prevents participants from tuning their method using the test set, but it allows them to get some feed-back during the development period.
The participants are free to do what they want with the training data, including splitting it again to perform cross-validation.

What motivates the proportion of the data split?
The proportions training/validation/test are 10/1/100. The validation set size is purposely small. Hence, using the validation set performance as your performance prediction is probably not a good idea. The training set is ten times larger than the validation set, to encourage participants to devise strategies of cross-validation or other ways of using the training data to make performance predictions. The test set is 100 times larger than the validation set. Thus, the error bar of our estimate of your "generalization performance" based on test data predictions will be approximately an order of magnitude smaller than the validation error bar.

Are the training, validation, and test set distributed differently?
We shuffled the examples randomly before splitting the data. We respected approximately the same proportion of positive and negative examples in each subset. This should ensure that the distributions of examples in the three subsets are similar.

Is the data split the same in both tracks?
The training/validation/test is the same in both track, but the examples are shuffled differently within each set.

Is it allowed to use the validation and test sets as extra unlabelled training data?
Yes.

Are the results on NOVA and HIVA correctly reported on the web page?
The datasets NOVA or HIVA ares a strongly biased: they contain only a small fraction of examples of the positive class. Classifiers that minimize the error rate, not the balanced error rate (BER) will tend to predict systematically the negative class. This may yield a reasonable error rate, but a BER of about 50%. However, the AUC may be very good if the classifier orders the scores in a meaningful way.

Can I get the labels of the validation set to train my classifier?
I has been argued that by making sufficiently many development submissions, participants could guess the validation set labels and obtain an unfair advantage. One month before the challenge is over, we will make the validation set labels available to the participants so they can use them to make their final submissions.

Will the organizers enter the competition?
The winner of the challenge may not be one of the challenge organizers. However, other workshop organizers that did not participate to the organization of the challenge may enter the competition. The challenge organizers will enter development submissions from time to time to challenge others, under the name "Reference". Reference entries are shown for information only and are not part of the competition.

Can a participant give an arbitrary hard time to the organizers?
DISCLAIMER: ALL INFORMATION, SOFTWARE, DOCUMENTATION, AND DATA ARE PROVIDED "AS-IS". ISABELLE GUYON AND/OR OTHER ORGANIZERS DISCLAIM ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR ANY PARTICULAR PURPOSE, AND THE WARRANTY OF NON-INFRIGEMENT OF ANY THIRD PARTY'S INTELLECTUAL PROPERTY RIGHTS. IN NO EVENT SHALL ISABELLE GUYON AND/OR OTHER ORGANIZERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF SOFTWARE, DOCUMENTS, MATERIALS, PUBLICATIONS, OR INFORMATION MADE AVAILABLE FOR THE CHALLENGE.

Who can I ask for more help?
For all other questions, email agnostic@clopinet.com.