powered by kaggle

Completed • $220,000 • 122 teams

Flight Quest 2: Flight Optimization, Main Phase

Thu 26 Sep 2013
– Sat 18 Jan 2014 (11 months ago)

Which files are going to be provided ?

« Prev
Topic
» Next
Topic
<12>

Where can we find the RAP files before cut-off ?

It would be nice if the organizers could provide us with a pointer to the last RAP data that is allowed to use in the agent.

Also, is there any place where we could find a list of scheduled take-offs (as known at cutoff time)?

http://soostrc.comet.ucar.edu/data/grib/rap/  here are the 'rap' documents. But it seems very difficult for me to understand the context of such data. Such data is too professional

Alessandro Mariani wrote:

here is attached - thanks for having a look into this!

Is any of my steps above incorrect?

Hi!

Do we have more details about this? I still think is quite a significant aspect which requires some clarification please! I'm still catching up with the last 4 weeks posts but I couldn't find an answer :(

@Alessandro and admins,

I too wasn't able to recreate actualLandings_20130910_1803.csv

I used the SQL from

https://github.com/benhamner/GEFlightQuest/blob/master/PythonModule/geflight/postgres_ingest/schema.sql#L377

and the training2_flighthistory.csv file.

I found 19 flights landing at 2013-09-10 18:04 (time = 0.001758 in the actuallandings file)

instead of the 23 according to actualLandings_20130910_1803.csv.

I found a total of 7734 flights landing between

2013-09-10 18:03:53.6712

and

2013-09-11 01:03:53.6712

using one of the 63 airports defined.

So my question for the admins is: What is the exact procedure to create actualLandings_20130910_1803.csv from training2_flighthistory.csv?

Thanks in advance

sparrow wrote:

Hi,

I assume that you are going to provide TestFeatures files (like test2_flighthistory.csv) in order to calculate "actualTakeoffs" , "groundConditions" , "actualLandings". am i right ?

In this case i see you filter out flights (form flighthistory) which "published_departure" is aftrer the cutoff time. i think this is too aggressive filter since "published_departure" and even other features such as "scheduled_*" should be available, as information,  before the cutoff time to flights that should depature after the cutoff time. in other words, we should have more information to calcualte "actualTakeoffs". Am i wrong?

I'm bumping this question, because it appears it hasn't been answered: are TestFeatures files going to be included in the final set?

And what about those flights whose published departure is after the cutoff?

Jules wrote:

So my question for the admins is: What is the exact procedure to create actualLandings_20130910_1803.csv from training2_flighthistory.csv?

Admins,

As reported by Alessandro and Jules, there are discrepancies between training2_flighthistory.csv and actualLandings_20130910_1803.csv.

Want proof? On a bash shell, cd to the directory that has training2_flighthistory.csv and run this command:

cat training2_flighthistory.csv | grep KCVG | cut -f18 -d',' |  grep "2013-09-10 18:04" | wc -l

This will output the number of flights that either took off or landed at KCVG AND has an actual_runway_arrival time of "2013-09-10 18:04". One would expect this to be a non-zero value. Why? Because KCVG appears in the first time slot within actualLandings_20130910_1803.csv.

The command, however returns 0. Maybe I got the time wrong? The exact cutoff time is 18:03:53 after all. So let's try this:

cat training2_flighthistory.csv | grep KCVG | cut -f18 -d',' |  grep "2013-09-10 18:03" | wc -l

Nada.

Maybe this will work?

cat training2_flighthistory.csv | grep KCVG | cut -f18 -d',' |  grep "2013-09-10 18:05" | wc -l

Still nada.

So, what gives?

Curiously enough, there is a flight that lands at KCVG with an actual_runway_departure of 18:04. Makes you wonder...

Please look into it. It is also possible that you are using  actualLanding files with incorrect values to score the leaderboard.

Hi

I guess we have a filtered version of the complete flighthistory table.

training2_flighthistory.csv and training3_flighthistory.csv only includes continental domestic flights, i.e. flights whose departure AND arrival airport are in the continental United States, but actualLandings and actualTakeoffs include also international flights (I guess).

Admins, Am I right? Can you provide the complete flighthistory table?  We would need all the flights whose departure OR arrival airport are in the continental United States. Don`t we?

I'm sure this has already been asked and answered several times, but since I can't find it, just to check: the hour of weather data that we'll be provided for the test day, does that hour END at the cutoff time or START at the cutoff time?

Anil Thomas wrote:

Admins,

As reported by Alessandro and Jules, there are discrepancies between training2_flighthistory.csv and actualLandings_20130910_1803.csv.

...

Still waiting for an answer on this. May I point out that this issue was originally raised almost two months ago?

Thanks for pointing the discrepancy between flighthistory.csv and actualLandings*.csv out. We've investigated it and got to the bottom of the discrepancy.

As José pointed out, the counts do not agree since the flighthistory file provided only contained flights both departing from and arriving to US airports. This was incorrect, and we apologize for the trouble that it's called. That file should have also contained flights with only one endpoint at a US airport as well (since international flights departing from or arriving to the US also naturally cause congestion at US airports). The actualLandings*.csv and actualTakeoffs*.csv files were built off the complete dataset (including international flights), as was correct.

To address this, we have uploaded a corrected version of the flighthistory files for both the augmented training data sets and the current leaderboard data set as FQ2_DataRelease_FHWithUSArrivalOrDeparture.zip. If you are estimating actualLandings*.csv and actualTakeoffs*.csv as part of your solution, make sure to use these updated files.

Hi Ben, thanks for this!

training2_flighthistory_USARRDEP.csv now contains new flights, but test2_flighthistory_USARRDEP.csv is the same file as the original provided test2_flighthistory.csv :-/

Corrected test2 is now up on the Data page.

Gentlemen, this is not acceptable. It's inconceivable that training datasets released more than 4 (four) months ago are being changed 1 (one) week before the final deadline. Either the final deadline is going to be extended, or actualLandings and actualTakeoffs for both the current leaderboard and the final one must not contain international flights.

We've spent the last 2 (two) months trying to develop a model for producing actualLandings and actualTakeoffs as close as possible as the real ones, not to mention the effects that these files had on the development of the complete route model. Now we have 1 (one) week to redo/rethink our entire approach. Do you think this is fair? Actually, I think we don't even have enough time to re-calculate 14 (fourteen) days of routes from scratch with our simple computers. 8 (eight) days are less than the time allowed from the release of the final dataset to the end of the competition. I think only a bunch of people here have access to big data center-like machines.

I checked the differences in the old and new datasets. As an example, take test2_flighthistory.csv: we're moving from 148017 to 173621 flights. That's 25604 flight more than before: +17.3%! Are you kidding me? Come on, it's almost 1/5 more flights than before! Not to mention the fact that current datasets are still not complete: as an example, take test2_flighthistoryevents.csv. I see no international flights there...

Ouch. FWIW, I predict takeoffs and landings too, and it takes days to train the models. I thought that was done and I could spend the last few days before the deadline polishing the optimization. Guess not. :(

Is training1_flighthistory.csv from the initial data release also incomplete? If so, will it be updated too? (Yes, you guessed, I use that file.)

I don't have it with me at the moment, but you can easily check if either source or destination ICAO codes contain 4-letter codes that don't start by K. In that case there's a high chance that the dataset is complete.

Good point. And wouldn't you know it, it's K all the way. So I'm about to find out what happens when I drop one third of my training data.

<12>

Reply

Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?