Kaggle, admins and contestants,
I don’t find it easy to write this post and maybe I’m the only one who thinks this way (in that case, just ignore me).
In my opinion in an ideal world all the rules of which flights to include in the validation set and which not should be clear before the start of the competition.
If that isn’t possible they should be clear well before the model submission deadline.
What I see happening now is that we have rules e.g:
- no diverted flights
- no redirected flights
- no flights with actual_gate_departure after actual_runway_departure
- etc.
But that next to the rules there is also still room for discussion.
I don’t find this a good development because:
a. It is an open invitation to everybody who thinks he/she makes a chance of winning to discuss flights which are possible harmful for there predictions. And this would make it a challenge of predictive modeling AND Public Relations instead of only predictive
modeling.
b. It takes away the possibility for Kaggle to demonstrate that they work in an unbiased and transparent way, because in theory they could leave out flights (after March 11) which benefit one contestant more than others.
I’m NOT saying any contestant is thinking in such a way or that Kaggle or any admins are biased, we are just making it harder to proof that we’re not.
I propose we only filter the validation set according to rules which were clear before February 15th
with —