powered by kaggle

Completed • $250,000 • 173 teams

GE Flight Quest

in partnership with
Wed 28 Nov 2012
– Mon 11 Mar 2013 (17 months ago)

Acknowledging two more great competitors

« Prev
Topic
» Next
Topic
<123>
Ben Hamner's image
Ben Hamner
Competition Admin
Kaggle Admin
Posts 809
Thanks 357
Joined 31 May '10
Email User
From Kaggle

Thank you for your hard work and participation in this competition, and making it a tremendous success!

We at Kaggle wanted to acknowledge two participants who built great models, but who we were regretfully unable to recognize more formally with prize placements and press releases, due to errors with their final submissions.

The first, dxsx, could have received a score of 8.30650. However, all the predicted times in his final submission were based in the wrong time zone.

The second, __mtb__, had a model that could have received a score of 8.40080 on the final leaderboard. However, he had selected an old public leaderboard submission as his final entry, and our system calculated his score based on this selection. 

We regret that these two issues prevented dxsx and __mtb__ from being among the prize winners. Issues like these are unfortunate for everyone involved, but we wanted to be sure we publically acknowledged these two worthwhile contributions. This isn’t the first time we’ve had participants put in a tremendous amount of effort, but make a mistake that leaves them out of the money.

Whenever this happens, we make an effort to improve our tools and processes in order to decrease the likelihood that these issues occur in the future.

 
__mtb__'s image
Posts 28
Thanks 2
Joined 13 Dec '11
Email User

First off, a sincere congrats to all of the other competitors. This was a tough competition!

Ben:

dxsx’s error, as unfortunate as it was, is a problem with his model. The error made in my case is with *your* software. I followed your directions exactly:

The final evaluation set is now up on the data page.

You may make a submission on the final evaluation set regardless of whether you've submitted a final model or hash. However, you are only eligible for prize money if you submitted a final model and use the same model to make predictions on this new set.

When you make a submission on this new set, a public score of "0.0000" without errors means the submission was processed correctly.

Please let us know if you find any issues with the data on this thread.

Good luck!

As I have been saying over email repeatedly over the past 8 days, and yesterday on the phone with you and the Sorkin, there is no checkbox to check! I scored the final file and submitted it. I received all zero's like your directions read, and in your own words: this means the '... was processed correctly'. It is completely unambiguous what my final submission is. Any reasonable person understands this.

As competition admins, you need to be able to distinguish between errors in your software and errors in the models. It is clear you are not properly doing this here.

Like I said in my email last night, it is still my expectation that you do the following by 5pm EST this afternoon:

1. The final leaderboard updated - moving me into 4th place showing my final score of 8.40080

2. The contact person at GE who I can coordinate with to get my $30,000

3. The 'winners page' updated, with me moved into 4th place

Do the right thing!

 
rjm5's image
Rank 31st
Posts 4
Thanks 1
Joined 14 Sep '12
Email User

Wow, Matt that's hard to believe.  As a fellow competitor, I thought the submission process was unclear.  I would agree this is a bug with Kaggle's scoring process.

One of the real losers here is GE.

Ben, is there a process for competitors to resolve disputes, after Kaggle declares prize winners?

Thanked by __mtb__
 
ATCGuy's image
Rank 20th
Posts 3
Thanks 2
Joined 30 Nov '12
Email User

I also find all of this hard to believe (not Matt's side of this story but the fact that Kaggle couldn't resolve it).

As a new member to Kaggle for this project I can certainly say that there have been highs and lows from this competition process.  I really enjoy the concept of Kaggle, the competitive nature of the process, and the atmosphere for learning.  But it seems to me that Kaggle still has a lot of kinks to work out in their process and procedures for competitions.  I don't know how the other competitions played out, as this is my first, but there were several sets of confusing guidelines and missed deadlines (i.e. promised deadlines that weren't made by Kaggle). 

I can understand that these kind of projects will have hiccups, but if Kaggle and a veteran of Kaggle (Matt is very high in the Kaggle rankings) can't work out something like this I think that is a major flaw in the whole process.  What hope would a new member to the community have if something like this were to happen?

Matt - if Kaggle/GE can't resolve this, and I were you, I might reach out to someone like Passur http://www.passur.com/ or Honeywell http://www.honeywell.com and explain the situation.  I wouldn't be surprised if someone at that company would be interested in your algorithms.  It would seem as though they already have an interest in Flight Management Systems and aircraft arrival predictions.  In addition it would probably be a great piece of PR for them.

As an aside, I think the fact that the results aren’t released limits users learning experiences.  I don’t feel like I can learn from mistakes as well as I had hoped would be possible.  As a user who doesn’t have a strong “big data” background, if I can’t learn there isn’t an incentive to stick around and use the site.  Due to the limited number of winners, and the fact that most of the top Kaggle members win multiple prizes, most users will continue to come back for the challenge and learning, not the prizes.  Maybe the Kaggle model is just to hook up a small set of “Kaggle Masters” to companies.  Well, I guess then it doesn’t matter if your user base grows and learns.  My belief is that having a larger set of continually improving users from various fields will provide your customers (i.e. companies and people that host competitions) with better and more interesting results. 

Thanked by __mtb__
 
__mtb__'s image
Posts 28
Thanks 2
Joined 13 Dec '11
Email User

rjm5, ATCGuy - I appreciate the support. I would love to hear what the other competitors think about this topic.

It kills me that a bug in Kaggle's own process is what is holding this whole thing up ... 

I should also say that I emailed Ben an hour after the offical results were posted. Then I emailed him again in the afternoon. Then I emailed the community manager because Ben never responded. Finally, the first I heard back from Ben was Friday night telling me the same thing he said above.

Kaggle, here are a few tips for future competitions:

1. Bad news doesn't get better with age

2. Respond to the competitors inquiries

3. *Don't* announce the winners in a joint press release until the competitors have had a chance to dispute the results. You certainly painted yourself into a corner this time

4. *Don't* offer the competitors money and kaggle points to ''unofficially' resolve the matter. That was an insult to my integrity and is certainly not fair to the Sponser or other competitors.

 
rjm5's image
Rank 31st
Posts 4
Thanks 1
Joined 14 Sep '12
Email User

Ben, did you offer money to keep this quite?  Please respond.

 
Konrad Banachewicz's image
Rank 26th
Posts 116
Thanks 37
Joined 3 Aug '10
Email User

FWIW, the data submission process was indeed fairly convoluted (and I suppose there could´ve been more people who got lost in it along the way). I believe a clarification from the admins on whether there was a software error glitch on top of it, or not, would benefit all participants. The silence is having an opposite effect.

Thanked by __mtb__
 
dxsx's image
Rank 39th
Posts 3
Thanks 1
Joined 1 Jan '13
Email User

Thanks Ben for your acknowledgment. 

First, I was surprised how important is for Kaggle to strictly follow the competition deadlines over having better models (as I corrected my submission few hours after the submission deadline). Not to mention, almost all the deadlines were missed from the Kaggle side in this competition. Anyways, it was my own fault!

It is very sad to hear what happened to _mtb_. It is just unfair to him to lose his position because of some systematic error.

I can understand that creating a perfect competition platform is extremely hard. I do appreciate your constant efforts in improving the competitions’ setup. Any competition set up has its own issues. As you mentioned, this is not the first time that you are disappointing competitors. 

However, I expected that Kaggle to be more flexible in handing issues like what happened to _mtb_. 

Thanked by __mtb__
 
__mtb__'s image
Posts 28
Thanks 2
Joined 13 Dec '11
Email User
From: Anthony Goldbloom | Kaggle

Dear Matt,
I know it’s frustrating to put so much time into a competition and not win due to a very small issue. We’ve seen it before and it’s always such a great source of disappointment for everyone. We have enormous respect for your efforts and how close you’ve come and hope you’ll be back for future competitions, including Flight Quest 2.
We’ve taken your concerns very seriously and conferred with GE regarding the same, and I believe it will be helpful to review the rationale for our decision, including the history from the log files, for you to better understand our conclusion. 
The submission page stated: “Note: You can select up to 1 submission that will be used to calculate your final leaderboard score. If you do not select them, up to 1 entry will be chosen for you based on your most recent submissions. Your final score will not be based on the same exact subset data as the public leaderboard, but rather a different private data subset of your full submission. Your public score is only a rough indication of what your final score might be. You should choose an entry that will most likely be best overall, and not necessarily just on the public subset.”
During the course of the competition, you made 23 submissions of which 3 were selected for leaderboard scoring.  First you selected submission 242748 on January 7, 2013 (all times UTC). You unselected that submission on January 24, 2013 and selected submission 250852. You unselected that submission on January 26, 2013 and selected submission 252143. You did not revise that selection at the time of your final submission. It seems logical to conclude that you  understood the process of selecting submissions.
Three of the other winners selected their final submission for scoring. The two that did not select the final submission had not previously made a manual selection or had unchecked prior submissions and the software worked as designed and selected the final submission. One participant (but not a winner) made 3 final submissions and selected one for scoring - which was the case the software for selecting a response was intended to handle. There was no bug in the software. 
On April 3, you sent Ben Hamner an email in which you stated “I was looking back at the 'My Submissions' page and I am guessing this might be related to the fact that I didn't hit the checkbox next to the final submission. Any chance that might be correct? Obviously I should have checked that box, but I guess I didn't think it would be necessary since there really is no ambiguity (i.e. I only had one submission for the final set).” As noted above, other participants did not have this same issue.
That being said, no competition is perfect and we always strive to improve our procedures and user interface. We appreciate your input into the areas you found confusing and will incorporate appropriate changes into future competitions.  As a company populated with past competitors, we genuinely do sympathize with your frustration and disappointment on this issue 
Lastly, the rules of the competition state:
"By participating in the Competition, each Entrant agrees to release, indemnify and hold harmless GE, Kaggle Inc., and their respective affiliates, subsidiaries, advertising and promotions agencies, as applicable, and each of their respective agents, representatives, officers, directors, shareholders, and employees from and against any injuries, losses, damages, claims, actions and any liability of any kind resulting from or arising out of your participation in or association with the Competition. GE is not responsible for any miscommunications such as technical failures related to computer, telephone, cable, and unavailable network or server connections, related technical failures, or other failures related to hardware, software or virus, or incomplete, late or misdirected Entries. GE reserves the right to cancel, modify or suspend the Competition should any computer virus, bug or other technical difficulty or other causes beyond the control of GE corrupt the administration, security or proper play of the Competition, and to determine winners from among Entries not affected by the corruption, if any, in its sole discretion."
We genuinely wish that this issue had not arisen and your otherwise impressive efforts could have been recognized as you wished. However, in order to be fair to the other participants who followed all the rules and instructions, we cannot modify the final standings. 
I know this is hugely disappointing for you and we sympathize with your situation. We sincerely hope that you will be able to get past your disappointment and participate in future competitions with results you are clearly capable of achieving. 
Kind Regards,
Anthony
-- 
Anthony Goldbloom  / @antgoldbloom / Kaggle Founder & CEO   
 
__mtb__'s image
Posts 28
Thanks 2
Joined 13 Dec '11
Email User

This is a 2 phase competition so there are 2 separate file types that need to be scored - the public leaderboard file and the final leaderboard file. During the public leaderboard portion of the competition, yes I used the checkbox to keep track of my best submission. Then, following the directions Ben posted, I scored the final leaderboard file and uploaded it. As Ben's direction describe, I received the all zeros which in his own words means the submission was properly received.

What on earth is a checkbox doing on the submission screen of a two phase competition? There is no ambiguity - there is only a single *file type* that is valid for the final leaderboard file. Why on earth is your software not smart enough to figure that out?

Also - why didn't the default selection logic kick in? Because only final leaderboard files are valid for the final submission, why didn't your software pick the *only* valid option?

Certainly I am not going to be competing in anymore Kaggle competitions. I do not trust your company to properly host a competition.

I am done with Kaggle.

 
__mtb__'s image
Posts 28
Thanks 2
Joined 13 Dec '11
Email User

Anthony:

Here is my more detailed response. Let's bring some transparency into the equation by keeping the discussion in the forums. I will respond to your email and ask some serious questions that need to be answered. Please do that here instead of over email.

Submission Issue:

The text you have highlighted from the submission page is only valid for a single phase competition.

The submission page stated: 'Note: You can select up to 1 submission that will be used to calculate your final leaderboard score. If you do not select them, up to 1 entry will be chosen for you based on your most recent submissions. Your final score will not be based on the same exact subset data as the public leaderboard, but rather a different private data subset of your full submission. Your public score is only a rough indication of what your final score might be. You should choose an entry that will most likely be best overall, and not necessarily just on the public subset.'

This is not accurate text for the two phase submission. There are no subsets, instead there are two separate files. The public leaderboard file and the final leaderboard file. A submission is for exactly one or the other, but not for both. During the final submission period of a two phase competition, this text should not be displayed because it is not accurate. If you don't understand why this text doesn't make sens for a two phase competition, run it by your employee's that are competitors.

Let me outline what I did for the final submission again, because from you email I don't believe you are familiar with how the two phase competition works (Rich didn't understand it either)

  1. During the public leaderboard portion of the competition, I used the checkboxes so I could keep track of my best public leaderboard submission
  2. After the final data set was released, I scored it and uploaded my scored file (called final_sub.csv)
  3. Ben's directions state that our final submissions were successfully processed if we recived a score of 0.0000. I did so I believed I had done everything properly (again, keep in mind the text from above for the submission page doesn't apply to a two phase competition)
  4. Again, I submitted exactly one final submission. At this point the checkboxes should not matter - there are no other submissions to disambiguate. By your own explanation of the 'default' selection rules, this should have been the submission that was selected by the system. This is indeed a software bug. Certainly you are not suggesting that your default logic would select a submission that is not valid for the competition (like selecting a public leaderboard submission for the final submission)

Questions:


Are you, Anthony actually confused as to what my final submission was?


If you don't believe this is a bug, than why are you going to 'incorporate appropriate changes into future competitions'?


What is the true formula for acquiring Kaggle points?

I know the wiki page has a forula, but it doesn't include the additional points awarded to keep competitors quiet. It seems like you would want to update it to include this point. Especially now that companies like the NYTimes and American Express are using Kaggle rankings when hiring data scientists


How often do you offer 'partial prize money' as compensation for competition mishandlings?


Why isn't there a formal and transparent process for handling disputes?


Anthony, it is clear to me what ultimately happened here. You painted yourself into a corner because you announced the results in a press release with GE the same day the rest of the competitors saw the results. I know there is quite a marketing push behind GE's quests and having to change the results after the press releases and magazines articles were written leaves egg on Kaggle's face. The problem is that the competition portion of your business is built on trust and integrity.

The obvious analogy is to sporting events - if the referee can not be trusted than the games don't matter.

That is why I will not compete in anymore Kaggle competitions.

__mtb__

p.s. this forum software is terrible. it always seems to mangle the posts.

 
Paweł's image
Rank 2nd
Posts 72
Thanks 114
Joined 13 Dec '11
Email User

Wow. I enter the quest forum after a week and what a surprise to read that post. I think you have your point. If you made only 1 final submission there is no ambiguity. It should have been chosen. Kaggle competitions should be closer to real life in my opinion. Normally after seeing a skyhigh (which is bad) performance you should be able to correct it. Seeing no performance as in your case should give you a hint that something is wrong. By "real life" I mean that such cases should not diminish 3 months of hard work. When competing I had a nightmarish vision that something like this would happen to us which is not a good thing.

I kind of didn't like that you have dismissed dxsx's performance. In "real life" (not Kaggle) he was better. The fact that your mistake/bug was different than his is circumstantial.

Thanked by __mtb__
 
__mtb__'s image
Posts 28
Thanks 2
Joined 13 Dec '11
Email User

Pawel

Thanks for pointing that out about dxsx - that is certainly not my intention to dismiss his performance. I will post here what I sent to him in a private email a couple days back:

 dxsx:

To be honest, the mistake you are talking about was keeping me up at night the last couple weeks of the competition. I really think Kaggle needs to do something different in these 2 phase competitions to let the competitors know more about the submission. Maybe like let them have up to 5 submissions or something, but then show them the score over say 25% of the dataset. Obviously you wouldn't be able to overfit, but you could still prevent what happened to you. I bet they will change this in the future. It sucks for both the competitors AND the sponsors to have models not considered because of this.

Sorry dxsx if I was dismissive of your hard work! That was not my intentions.

edit: formatting

 
__mtb__'s image
Posts 28
Thanks 2
Joined 13 Dec '11
Email User

A competitor emailed me a couple of things that I wanted to share. First he was able to summarize my issue with the submission bug very nicely in a couple of statements and 3 bullet points. I have been working for 2 weeks trying to explain this - yet I was not able to.

Clearly the Kaggle platform is not the most bug free software in the universe. Let me point these events:

  1. After the competition finish the results were visible for a few minutes on profile pages = BUG
  2. After the competition finish the ranking points were visible for a week on profile pages = BUG
  3. Not taking the only proper submission as a default = BUG

So your situation should be treated as one of the bugs. I'm sure that knowing what they know now they would correct this in the future. It's costly but still it is not your fault.

And his other point was that 'You should keep your tone less emotional' . I appologize for that. I am not going to go back and edit old posts or anything - what was said is said, but I will make sure I cut down the commentary and focus on the facts.

Anthony, I am certainly looking forward to having an open, civil conversation in this forum, with not just you and me, but with all of the competitors. I posted the questions that I feel need to be answered. Please familiarize yourself with the details of the two-phase competitions and make sure you take a close look at the submission page and understand why the text you quoted does not make sense for a final submission in a two-phase competition.

I am looking forward for your response.

 
__mtb__'s image
Posts 28
Thanks 2
Joined 13 Dec '11
Email User

Also, it sounds like similar issues with the submission process for two-phase competitions was encountered for the fast iron competition. This is highly relevant:

Like Leustagos says, the two stage submission process in it's current state is really poorly implemented. It was clearly duct taped onto the current submission system sloppily and because of that there are huge usability issues with submitting solutions and models. In the Bulldozer competition we had numerous issues with the submission process and the responses from kaggle were very slow and often not that helpful. I really like kaggle but I think they have some serious usability issues and problems with their current support process.

-- pjreddie

Here is the original link.

Anthony, according to the competitors (who are also the users), there are numerous bugs with the submission process and submission page for two-phase competitions. I believe your assertion that 'There was no bug in the software.' is incorrect.

 
beluga's image
Rank 40th
Posts 99
Thanks 66
Joined 5 Oct '11
Email User

First of all: Kaggle do an amazing job and I have to tell that this site is probably my favorite place in the web.

I understand that you guys (Kaggle) had an extremely hard competition this time, the dataset was hard to prepare and process correctly and the competition prize pool and media attention was also higher than usual. 

With all the respect I think you did not choose the best decision after the competition have ended. I think the problems with dxsx and __mtb__'s solutions should have been discovered during the 1.5 month while you were validating the final results.

Even if the problems discovered later than the official announcement I think you should have been far more flexible to reward the two competitor who created excellent solutions but slipped on a ( their or yours?)  banana peel. 

Creating a forum post is basically free but don't solve the problem. The e-mail received by __mtb__ is also a mixed message (nit-pciking, sympathy, legal shield...).

I understand that you do not want to hurt the prizewinner competitors who also created amazing solutions and they definetly deserve the prize amount and the ranking they got. But I can not beleive that you do not see the huge problem with this situation and you could not solve it. 

ps.: @ __mtb__  I also find your tone sometimes too "emotional" and I am afraid that you will loose most Kagglers sympathy if you can not keep calm. (Of course it is easy to say).

 
Leustagos's image
Posts 485
Thanks 317
Joined 22 Nov '11
Email User

I think __mtb__ is being a little emotional, and i totally understand that, but he is right. 

Kaggle must improve (and automatize) many steps of two stage competitions in order to avoid this. And in this case, i think that at least, both users should receive the ranking and the public leaderboard recognition for their feats.

Rules exists to prevent users from cheating and everything else, but fair competitors shouldnt be punished with some stretching of them. By the way, what is more important, having the best models or what?

I'm a software architect myself, and in two stage competitions, allowing a invalid entry to be selected is clearly a violation of a business rule. I deal too much with software uses, and they call it a bug. If not in the implementation, a bug in the specification, but a bug nevertheless.

 
John Park's image
Posts 27
Thanks 16
Joined 19 Aug '12
Email User

It is a bit confusing. 

If the purpose of this discussion is to make kaggle better, 

For the final score,

how about calculating on 5% of the test data?  instead of 0% of the data. So, the players would know if something is seriously wrong.

or,  show the score accuracy upto the 50% of the public leaderboard score. For example, if public score is 10, it would show the final score is above 20, or below 20. 

Either method does not leak much signal, unless something went wrong, right??

I think what everyone is asking, is something more indicative than 0.0000 as the blind score.

 
Alexander  Larko's image
Rank 24th
Posts 86
Thanks 41
Joined 14 May '10
Email User

Hi all!

First of all, I want to express my sympathy _mtb_ , and dxsx.

But nevertheless, you unfortunate mistake was made.

Practically I do not know English, and I find it hard to understand the rules, and in the description of the data, but on the page "Your Submissions" we all see the column "Select?".

 Whose fault is that you do not put a 'bird'?

   Ben? Anthony?

 And it's not a software bug!!!!

As for bugs dxsx - very sorry for what happened.

On the software Kaggle.

I worked for many years in the computer field and has not yet met the perfect program.

Great Dijkstra once said: Testing programs can very effectively demonstrate the presence of bugs, but is hopelessly inadequate to demonstrate their absence.

So we wish to colleagues from Kaggle success in their work is not easy.

And in conclusion I want to say:

Dear _ mtb _ do not get excited, life goes on, there will be many difficulties and challenges in your way. But there will be joy and good luck!

 

 
B Yang's image
Posts 255
Thanks 71
Joined 12 Nov '10
Email User

I'm not sure if I understand this 100%, is the following what happened and all the relevant issues ?

1. This is a 2-stage competition. During stage 1, __mtb__ made many submissions and selected 3 (what were the purpose of selections for stage 1 ?).

2. Stage 1 finished and stage 2 started, with a new data set released. Competitors were supposed to make predictions against this new data set, and their scores will determine their final rankings.

3. __mtb__ made only one submission against the new dataset. He did not select it, nor did he change the selections he made during stage 1.

4. In the absence of stage 2 selections, the kaggle system picked one of his stage 1 selections to determine his final score and rank, and presumably because it was for stage 1 dataset, it couldn't be scored against the stage 2 answers, or at least resulted in a very bad score.

 
<123>

Reply

Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?