powered by kaggle

GE Flight Quest

in partnership with
Finished
Wednesday, November 28, 2012
Monday, March 11, 2013
$250,000 • 173 teams

Welcome to GE Flight Quest - Initial Data Release

« Prev
Topic
» Next
Topic
Ben Hamner's image
Ben Hamner
Competition Admin
Kaggle Admin
Posts 809
Thanks 356
Joined 31 May '10
Email User
From Kaggle

Welcome to GE Flight Quest! We are very excited to work with GE and Alaska Airlines to launch this challenge on predicting domestic US flight arrival times.

For this initial training data released, we have posted 14 days of data provided by FlightStats in InitialTrainingSet.zip. This can be downloaded from the data page. The file formats are more extensively documented on the flight quest wiki. Most of the wiki pages are editable, so feel free to add detailed descriptions along with stats and insights to the data pages for the individual files. These files contain information on each individual flight, the minute-by-minute position tracks for the flight, and weather. Please let us know if you have any questions about the data.

The competition will kick into full swing on December 18, 2012 with the release of the public leaderboard data set, a two-week test dataset that will be used to form the public leaderboard.

Good luck in developing the best models!

Ben

 
Andy Sloane's image
Posts 22
Thanks 13
Joined 3 Aug '10
Email User

Just to be absolutely clear -- the quantities we are to predict, once the leaderboards open up, are actual_gate_arrival and actual_runway_arrival from flighthistory.csv, correct?

There are a lot of different forms of data available and it isn't totally clear where to start, but I'm reading between the lines with the evaluation criteria and the data we have.

 
Ben Hamner's image
Ben Hamner
Competition Admin
Kaggle Admin
Posts 809
Thanks 356
Joined 31 May '10
Email User
From Kaggle

Andy Sloane wrote:

Just to be absolutely clear -- the quantities we are to predict, once the leaderboards open up, are actual_gate_arrival and actual_runway_arrival from flighthistory.csv, correct?

This is correct - you are predicting actual_gate_arrival and actual_runway_arrival. As this information is missing for some flights, we will provide you with the list of specific flights for which you'll predict this on each test day.

Thanked by Andy Sloane
 
Jacky Lupino's image
Posts 2
Joined 30 Nov '12
Email User

Do we assume that the prediction will be always conducted before the flight takes-off? More specifically, is it true that we are not assumed to provide models that update the prediction dynamically as the plane is on air?

I couldn't find such specification in the challenge description.

 
jsink's image
Posts 2
Joined 1 Dec '12
Email User

Didn't the rules or proces say something about being given a set of flights that were mid-air (or missing arrival times), and predicting what the arrival times would be, based only on data from that day?

 
Jacky Lupino's image
Posts 2
Joined 30 Nov '12
Email User

Can you please point the web link where such description is available?

 
MFEBen's image
Posts 8
Joined 27 Feb '12
Email User

This is the most imprecisely described and difficult to understand competition. We need to search through 10s of pages just to know what data we have. There seem to be duplicate data fields but with different naming conventions. The exact objective is vague. Come on Kaggle. Do a better job.

 
Ben Hamner's image
Ben Hamner
Competition Admin
Kaggle Admin
Posts 809
Thanks 356
Joined 31 May '10
Email User
From Kaggle

MFEBen wrote:

This is the most imprecisely described and difficult to understand competition. 

We're very aware that this is not an easy competition to "get started on" at this point. Part of this reflects the complexity of the problem and handling large amounts of data from disparate sources.

Thanks for taking an early look and participating so far. This competition was set up on a very aggressive timeline, and we made the decision to launch with as much as we had instead of launching with everything perfected. If you want a more concisely described and easier-to-enter competition, come back in 1-2 weeks once we've had time to go through and edit everything. If you want a chance to take an early stab at the problem, sift through the information we've provided so far and enter now.

The wiki pages are publicly editable, so add to them or improve them if appropriate. If there is anything in the descriptive pages that isn't as clear, please say so here (and suggest improvements).

Thanked by Olexiy
 
MFEBen's image
Posts 8
Joined 27 Feb '12
Email User

It is unclear to me how to match weather stations and their corresponding weather observations with the flight position file. Am I just supposed to determine myself within which weather station area (dictated by the longitude latitude) the flight is in at any point? And if so, within the weather station data itself, there are numerous stations that have the exact same ares ascribed to them. 16 and 25 as one example. What is needed, which may exist but I cannot seem to find it, is the metarreportsid to be placed into the asdiposition.csv file.

 
ml_learner's image
Posts 8
Thanks 1
Joined 4 Dec '12
Email User

A clarification question. In the public leader-board data set, in the file flighhistory.csv, there are some enteries which are 'HIDDEN'. So, basically, the other non hidden and non missing entries should be used to train our model, and we need to predict the values that have been HIDDEN?

 
Black Magic's image
Rank 18th
Posts 504
Thanks 58
Joined 18 Nov '11
Email User

is this correct?

Ben: Pls. respond

ml_learner wrote:

A clarification question. In the public leader-board data set, in the file flighhistory.csv, there are some enteries which are 'HIDDEN'. So, basically, the other non hidden and non missing entries should be used to train our model, and we need to predict the values that have been HIDDEN.

 
Ben Hamner's image
Ben Hamner
Competition Admin
Kaggle Admin
Posts 809
Thanks 356
Joined 31 May '10
Email User
From Kaggle

ml_learner wrote:

A clarification question. In the public leader-board data set, in the file flighhistory.csv, there are some enteries which are 'HIDDEN'. So, basically, the other non hidden and non missing entries should be used to train our model, and we need to predict the values that have been HIDDEN?

Values that are "HIDDEN" were hidden because they contain information from after the cutoff time. You are not making predictions for all the values that are labeled HIDDEN.

You need to predict the arrival times for the flights in test_flights.csv, and you can use any of the data provided to train your model to make predictions on the PublicLeaderboardSet.

 

Reply

Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?