powered by kaggle

Completed • $220,000 • 122 teams

Flight Quest 2: Flight Optimization, Main Phase

Thu 26 Sep 2013
– Sat 18 Jan 2014 (11 months ago)

Which files are going to be provided ?

« Prev
Topic
» Next
Topic
<12>

Hello,

can you please confirm which files are going to be provided for the final evaluation ?

i understand that: "Configuration.json" ; "Projection.json" ; "Airports.csv" (both easting/northing and latitude/longitude version) ; Landing.csv ; Taxi.csv ; restrictedZones.csv and turbulentZones_{0}.csv are going to be provided. and also weather_{0}.txt.gz for the first hour.

what about "flights_{0}.csv" ? you already provided them so why we need "TestFlights.csv" ?

also i assume that you will give the relevant Aggregate Test Features.

can you please also explain the format of Landing.csv ; Taxi.csv ?

thanks

It really would be nice to have everything very clearly and precisely described regarding the data. Currently it feels like a baffling labyrinth laid out across various pages and forum threads, with different bits of data released at different times, and the flight simulator source serving as the primary documentation for the format of the data files themselves.

We've listed out the files used and needed by the Simulator on this webpage. Please take a look.

That's extremely helpful, thanks very much!

joycenv wrote:

We've listed out the files used and needed by the Simulator on this webpage. Please take a look.

Thanks for the link joycenv, very useful! 

I think I've raised this during phase one, about landing & takeoff events. I'm trying to recreate the actualLanding20130910_1803.csv which is provided as part of the oneday files.

I not sure what I'm doing wrong, but I can't recreate this from the training2_flighthistory.csv. i'm short of 1,022 flights landing from 18:03 to 01:03 of the following day (+7 hours). I'm not only short, I have a significant variation of flights landing at each airports (mean 16, median 4, min -81, max 135), this grouping by airports and actual_runaway_arrivals.

I do understand that the "actual" data is not provided for the test set, but I can't even recreate train data from the flighthistory file provided. Basically, I can't get an approximation of the congestion of an airports if I can't recreate a similar counts per each airports.

Can you please have a look a this? Hopefully is just me doing something wrong, but I'm kinda spent enough time to have some concerns!

Thanks!

thanks, indeed this is very helpful.

And i join the question regarding the landing & takeoff events, i also found some inconsistence.

please see my question:

https://www.gequest.com/c/flight2-main/forums/t/6115/how-the-time-in-actuallandings-file-is-calculated

Hi admins,

Will TestFlights.csv also be provided for the validation period?

If yes, is it possible to get also a version of TestFlights.csv for Sept 10th 2013?

Thanks!

Alessandro Mariani wrote:

joycenv wrote:

We've listed out the files used and needed by the Simulator on this webpage. Please take a look.

Thanks for the link joycenv, very useful! 

I think I've raised this during phase one, about landing & takeoff events. I'm trying to recreate the actualLanding20130910_1803.csv which is provided as part of the oneday files.

I not sure what I'm doing wrong, but I can't recreate this from the training2_flighthistory.csv. i'm short of 1,022 flights landing from 18:03 to 01:03 of the following day (+7 hours). I'm not only short, I have a significant variation of flights landing at each airports (mean 16, median 4, min -81, max 135), this grouping by airports and actual_runaway_arrivals.

I do understand that the "actual" data is not provided for the test set, but I can't even recreate train data from the flighthistory file provided. Basically, I can't get an approximation of the congestion of an airports if I can't recreate a similar counts per each airports.

Can you please have a look a this? Hopefully is just me doing something wrong, but I'm kinda spent enough time to have some concerns!

Thanks!

Hi Alessandro,

I'm not exactly sure what you are doing differently, but here are a couple things to consider.

  • We use the convention that a "day" is from 9am UTC to 9am UTC. So the "day" Sep 10 runs from Sep 10 9 am UTC to Sep 11 8:59 am UTC.
  • We use scheduled departure information as the time filters because that is information you should have at any time. The actual landing is not known until you land, so that is future information, and is not suitable for time filtering.

Does this information help you? There maybe is an "edge" effect because this the last day of the training set (although I'm not sure how), but the days previous to Sep 10 are completely available and usable for building a congestion model.

Jules wrote:

Hi admins,

Will TestFlights.csv also be provided for the validation period?

If yes, is it possible to get also a version of TestFlights.csv for Sept 10th 2013?

Jules, TestFlights.csv will always be provided.  I've added the Sept. 10th TestFlights to the zip download.  

joycenv wrote:

Hi Alessandro,

I'm not exactly sure what you are doing differently, but here are a couple things to consider.

  • We use the convention that a "day" is from 9am UTC to 9am UTC. So the "day" Sep 10 runs from Sep 10 9 am UTC to Sep 11 8:59 am UTC.
  • We use scheduled departure information as the time filters because that is information you should have at any time. The actual landing is not known until you land, so that is future information, and is not suitable for time filtering.

Does this information help you? There maybe is an "edge" effect because this the last day of the training set (although I'm not sure how), but the days previous to Sep 10 are completely available and usable for building a congestion model.

Thanks Joyce! Unfortunately doesn't help me... :(

I'll try to make it as simple as possibile. If I open your training2_flighthistory.csv file, then I use a filter on actual_runaway_arrival between 2013/09/10 18:03 (UTC) and 2013/09/11 01:03 (UTC), this will return 10,353 flights. Now, if I apply a filter on arrival_airport_icao_code to include only the 63 US aiports I end up with 7,743 flights.

actualLandings_20130910_1803.csv has 8,639 flights landing between the cutoff time and the following 7 hours (2013/09/10 18:03 to 2013/09/11 01:03).

Shouldn't be the counts be quite similar? Is not just similarity, slicing down by airport, the counts are in excess or deficit which is not a close rapresentation of what happen at each airports.

What am I doing wrong?

Hi,

I assume that you are going to provide TestFeatures files (like test2_flighthistory.csv) in order to calculate "actualTakeoffs" , "groundConditions" , "actualLandings". am i right ?

In this case i see you filter out flights (form flighthistory) which "published_departure" is aftrer the cutoff time. i think this is too aggressive filter since "published_departure" and even other features such as "scheduled_*" should be available, as information,  before the cutoff time to flights that should depature after the cutoff time. in other words, we should have more information to calcualte "actualTakeoffs". Am i wrong?

Alessandro Mariani wrote:

I'll try to make it as simple as possibile. If I open your training2_flighthistory.csv file, then I use a filter on actual_runaway_arrival between 2013/09/10 18:03 (UTC) and 2013/09/11 01:03 (UTC), this will return 10,353 flights. Now, if I apply a filter on arrival_airport_icao_code to include only the 63 US aiports I end up with 7,743 flights.

actualLandings_20130910_1803.csv has 8,639 flights landing between the cutoff time and the following 7 hours (2013/09/10 18:03 to 2013/09/11 01:03).

Shouldn't be the counts be quite similar? Is not just similarity, slicing down by airport, the counts are in excess or deficit which is not a close rapresentation of what happen at each airports.

What am I doing wrong?

Can you upload a list of the flight_history_ids for your 7743 flights? That'll help us narrow down the sources of any discrepancy.

here is attached - thanks for having a look into this!

Is any of my steps above incorrect?

1 Attachment —

I'd like official word on sparrow's assumption at the top of this thread that weather_{0}.txt.gz for the first hour will be provided by Kaggle for the final evaluation. Is this correct?

More generally, I am having doubts about the status of RAP data. There is  wealth of high resolution information in those files, but is it off limits after the final model submission deadline on December 18? In other words, can I train a model using RAP data up to December 18, but not use new RAP data after December 18?

Obvious example: RAP includes hourly weather predictions produced by a very sophisticated physical model which we can have no hope to match. It seems very likely that route optimization results could be improved by incorporating the last RAP predictions available at each day's cutoff. But we will not be able to use those predictions in the final evaluation, because they do not yet exist on December 18, and Kaggle will not include them in the final evaluation data. Is this correct?

short of 1000 landing when i recreated the actual landing file. and i carefully followed the python code.  So i think we really need the something like the ids of these 7743 flights to solve this. Thanks 

I haven't tried the exercise myself, but I wonder if it's not just a matter of the flight history file only listing flights between airports on the US mainland, while actual landings include flights originating anywhere (Alaska, Hawaii, Europe...).

Has someone confirmed which, if any, weather files (simulator format) will be available?  There were indications that the first weather file for each day would be available, but that's not confirmed on the webpage joycenv indicated.  Either way, the rest of the day the simulator will run with actual weather patterns that we'd, I assume, have to predict (or ignore or something).  Is that correct?

It seems, then, that weather is the only non-deterministic factor we have in the simulator and files for the final submission... yes?  The other three files listed as "not provided" come with code that can be used to create them, and of course those rely on the actual predicted flight data so that's understandable (and deterministic, if a bit recursive).

The main reason I want to be certain is that it seems the weather could really screw up an otherwise good solution, for example by causing collisions in an algorithm that actually has good collision avoidance.  If, for example, an eastbound and westbound flight encounter a windy day they may collide when there was no reason to expect them to when the flight paths were created.  It also affects concerns in the simulator bugs forum about crossing boundaries of restricted areas during a simulator timeslice... with the unknown airspeed component change, it becomes impossible to be that precise on a final submission, correct?

Or am I missing something?  Will we be able to generate the actual weather files used in the final simulator?  Thanks for any advice; right now my wind model is my achilles heel (right along side the 50 other things that don't work, but at least I think I grok).

I'm bumping this because I think it's important.

ChipMonkey, collisions between planes are not an issue; the simulator handles one flight at the time, and does so as if it were the only one. But winds can certainly affect the choice of optimal route substantially.

Besides winds, takeoffs and landings after the cutoff are not provided and must be predicted. Or at least congestion must be predicted. That too can be affected by the weather.

It bothers me that we still have no clear statement on which data we can use for the final evaluation. Let me sketch a few possible scenarios:

- Team A decides to rely on the latest RAP data available at each day's cutoff. They lead the public board and fully expect to place well in the final evaluation. Instead, when it's too late to change approach, Kaggle declares that RAP data can not be used this way. Team A spends the next month telling everybody who listens how they feel about Kaggle.

- Team B decides not to rely on RAP, since it's not guaranteed to be allowed. They spend most of their time creating a heroic weather predictor which does miracles with low quality METAR input. In the final evaluation, they are beaten by somebody who simply used the latest RAP data available at each day's cutoff. You can guess what Team B spends the next month doing.

- Team C has a bunch of ideas which it could try, but is unwilling to spend days coding and testing them if Kaggle is unwilling to spend a few minutes to spell out which data they will be able to rely on for the final evaluation. When somebody else eventually takes the big prize, they go "we could esily have done that too, but it's hard to shoot for a goal before the goal posts have been put down", and spend the next month doing you know what.

This is fun, but I think I'd better stop here...

For each day, you may use the RAP data up to that day's cutoff. Additionally, we will be providing wind data files in the simulator format containing the last hour of wind data prior to the cutoff for each day in the final evaluation set, as we have for the first two data releases.

Thanks, now I can leave Team C. :)

<12>

Reply

Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?