powered by kaggle

Completed • $250,000 • 130 teams

Flight Quest 2: Flight Optimization, Milestone Phase

Tue 6 Aug 2013
– Wed 25 Sep 2013 (15 months ago)

Help in understanding the problem please!

« Prev
Topic
» Next
Topic
<123>

I didn't participate in part one so am probably missing some key information that I can't find in the comp description. I am starting this thread to help get others also in this situation up to speed with exactly what it is we are supposed to be doing.

here goes for starters,

1) When I look at the cost function for the route, as far as I can tell it would be best to go as fast as possible in a straight line. What are the added constraints that prevent this from being the best solution. I assume they are things like you can't fly through storms and can't land at the same time as another aircraft. I would expect penalties in the cost function for doing naughty things like this but I don't see any. Can you please give a little more background info or point me in the right direction?

2) You refer to a 'Simulator' in various places in the description but I can't seem to locate any info on what this is and what it will be for. Can you please elaborate or point me in the right direction?

Cheers ;-)

Excellent point! I've added a Basic Structure page to address this. Please post again if you have other questions. Sorry for the oversight.

---

Link above has been corrected and is now working. The Basic Structure page is also available from the other competition pages.

Sali Mali wrote:

1) When I look at the cost function for the route, as far as I can tell it would be best to go as fast as possible in a straight line. What are the added constraints that prevent this from being the best solution. I assume they are things like you can't fly through storms and can't land at the same time as another aircraft. I would expect penalties in the cost function for doing naughty things like this but I don't see any. Can you please give a little more background info or point me in the right direction? 

Each submitted route will be 'flown' through the Flight Simulator. Yes, you could devise a route that flies as fast as possible in a straight line, but that may be very expensive in terms of the fuel burned. Part of the challenge is optimizing fuel efficiency as well as time of flight, among other things.

joycenv wrote:

Excellent point! I've added a Basic Structure page to address this. Please post again if you have other questions. Sorry for the oversight.

Hi joycenv, the link is not working.

I'd like to know Sali Mali's questions, too.Based on submission instruction, we will probably predict the following items:

Latitude, Longitude, Altitude, AirSpeed

I imagine the following task for now. Fuel efficiency depends on the speed of aircraft, basically the slower the better, and also the altitude, basically the higher the better. But if it goes higher, it might be hit by jet stream and take more time to fly. Taking weather information into account, we will predict where it should fly and at what speed under the constraint that it arrives on time. But I don't know how GE confirms that proposed flight plan is feasible based on the above items.

Thanks for the further info.

So the 'Flight Simulator' is the calculation engine that calculates the 'cost' of each flight, specified by the cost function - and it is just going to be a black box to us that tells us the cost of each flight; or are we supposed to reverse engineer the simulator to figure out the logic it uses to calculate the costs based on weather info etc, so we can develop an optimisation algorithm that can use this info in its decision process?

For example - in a Genetic Algorithm the process would be..

1) generate a population of random solutions.

2) calculate the cost of each solution by passing the solutions through the simulator

3) combine the best solutions using crossover to come up with a new set of candidate solutions

4) goto 2 and continue until the population costs converge.

5) use the lowest cost solution in the population

But I don't see how this could work in practice in this competition. My algorithm could essentially be..

1) test all possible solutions and use the best.

That is just a brute force search.

So can you please clarify how the simulator is to be used. Can it be a part of our algorithm? If so where are the penalties for how long it takes to calculate the optimal solution? What stops us from just testing the known universe of possible solutions?

If we cannot use it as an integral part of the algorithm itself then how are we supposed to approach this problem without first reverse engineering it?

Hope I'm not being too dumb here :-(

[EDIT: I just saw I posted this at the same time as the above post...I basically have the same question]

So it seems the Flight Simulator code is essentially our cost function  -- albeit a complicated one.  

So that raises all sorts of questions, like the following:

  • Will the Flight Simulator essentially be a "black box" cost function we'll have to use? (e.g. input flight plans, do some magic, get the cost).  
  • Will the simulator documentation provide sufficient detail to allow us to recode / recreate the simulator if we want to, or will some details be withheld?  
  • Will we get the simulator source, or just the executable?
    • If it's the executable, what platform(s) will it run on?  
    • If it's source, what language is it in? 

 I suppose I could wait a week for the release of the simulator to get the answers, but I'm curious now :)

  

Sali Mali wrote:

When I look at the cost function for the route, as far as I can tell it would be best to go as fast as possible in a straight line. What are the added constraints that prevent this from being the best solution

I guess it depends on how 'realistic' the flight simulator is. Since earth is not flat, you'd be flying in an arc, it would take extra fuel & time to gain altitude, but gives you lower air resistance, effects of the wind, rotation of the earth, avoiding nearby aircrafts, the Himalayas, etc.

If the simulator does all this, it could be a contest in itself.

Not to mention the fact that ATC is not simulated (if it is, praise to those who wrote a wonderful piece of software): what if ATC put an aircraft on hold a 5000ft because too close to other incoming traffic (what about wake turbolence, is it included?)? It would certainly be more fuel- and time-efficient to fly slightly slower at optimal altitude, say FL340, and avoid the hold right before final.

I mean, there are so many variables - and restrictions, mainly due to FAA regulations - in the game that to me it looks more like an optimization problem of the specific simulator, rather than a general optimization problem.

Will be full set of simulation parameters of the simulator available for our agents? By other words, need we predict/estimate value of some variables during of optimization or all values will be known?

Will be the cost (computed by the simulator) deterministic function of the parameters?

Is there constraints on agents runtime, used hardware? If so, what they are?

These are all great questions. We'll be sure to include them in a Q&A section on the Basic Structure page (which can be found from the main comp pages) when we're ready to release the Simulator and leaderboard. Thanks for the input and please continue to post.

Some silly(?) questions:

Ok so we are supposed to predict time-serie of following for each flight:
Latitude, Longitude, Altitude, AirSpeed

Few questions that come in my mind:
1) How evaluation formula/simulator penalizes if physics of airplane are not taken into account (smooth tangent of flying path vs ufo teleporting, etc)? The "ufo teleporting" is related to next question 2 (that is what are limits of changing the airspeed between two time points).

2) What is time frame between the time serie (lat,long,alt,airspeed) pairs? If it is fixed say s milliseconds then naturally airspeed between point t and point t+1 cannot change more than physical limits of flying airplane (and weather conditions).

3) If object is to land to say point E then how simulator penalizes solution if flight path does not end at point E. (thinking about "perfect solution" taking weather conditions, like headwind, etc 100% perfectly into account and landing to E versus "bad solution" which ignores weather conditions, and say there is heavy tailwind and lands far off point E - although the algorithm itself may think that "we" (simulator) lands at point E.

Christopher Hefele wrote:
  • Will we get the simulator source, or just the executable?

It seems like we'll get the source.  I just saw the following quote on the Open Source and IP Terms thread, and the section I underlined below talks about what the Entrants are permitted to do with the source code.  Let's hope it's heavily commented :)

Angus Christophersen wrote:

License to Code: All code provided to Entrants by GE or Kaggle on behalf of GE is provided to Entrants on a non-exclusive basis for the duration of the Competition and solely for Entrants to participate in the Competition, including rights to read and learn from the source code; compile and execute the source code; and alter the source code locally (including to interact with Entrant’s own code

If we get access to the source code, it brings up questions about what, exactly, the goal is (as Sali Mali pointed out earlier).

I could spend my time reading the simulator source code and gaming its scoring, or I could spend it deriving a model to predict the most accurate paths.

Given that we will be measured by the simulator, I would guess that understanding its scoring methodology will be the key determinant of performance.  Will the "training" simulator differ from the "testing" simulator?  And if so, will the differences be critical enough that the leaderboard isn't useful?

Other concerns about the simulator:

  • If we are scored based on weather conditions, etc, wouldn't the simulator need access to the same data?  Which means that training on data sets not explicitly provided to the simulator (like the NOAA set?) would not be useful.
  • Why bother with the simulator at all?  Why not just pull some "optimal" flight paths, anonymize them, and see how closely our models track those paths?  I understand that multiple paths could be optimal, but a large enough set should even out the errors.  
  • You could even use the simulator to "discover" the optimal paths, and then track how closely our paths match the optimal ones.  This would solve a lot of the gaming issues.
  • Until I have the simulator, I can't really work with the data, because I don't know what data is useful.  Is there any chance of releasing some more explicit information before the leaderboard release?

::putting on my former competitor hat and speaking as a neutral 3rd party to this comp::

Lots of great questions and enthusiasm here, but think folks are getting a bit mired down in the details without even seeing the thing yet.  You will get the simulator, you will spend some time with the data, maybe you'll take it out to dinner and share a nice bottle of wine.  You'll open up to each other and find that deep and soulful connection you always wanted to feel. It's really easy to criticize the minutiae of any model (no relativistic correction for non-inertial reference frames and no geese simulations?!!! This competition is meaningless!) and it's really hard to write a flight simulator.

William Cukierski wrote:

Lots of great questions and enthusiasm here, but think folks are getting a bit mired down in the details without even seeing the thing yet.

Thanks Will, but detail is quite important - there have been numerous Kaggle competitions (dare I say the majority) where there has been too much haste in starting the comp only for the competitors to point out data issues and flaws.

wrt the simulator - I don't think we need to see it before we are told how we can use it.

My solution is going to be to be an algorithm that tests every possible flight path with the simulator and chooses the best one.  I don't see anything in the details we have been given to date that forbids this - so  as Vik pointed out above, it is pointless even looking at the data or getting excited about this comp until we know if it is going to be the 'real' best algorithm that wins or just the person who is smart enough to 'game the game'.

As Vik also pointed out - what is the point of 'other' data sets being made available?

I agree with Sali Mali that some Kaggle competitions have been very unstable in the description, the data and the rules for a time after they have been launched.  I sympathize with the people setting this competition because the flight of a commercial aircraft is a very complex phenomenon.  (If this competition wasn't so interesting I would put it away for a couple of weeks and hope things were more stable when I come back!) 

I fancy this competition is too big for my humble desktop and my fossil languages so I am making the following observation in case it might be useful to someone else.  The structure of the flight would seem to be amenable to 'Dynamic Programming'.  Further, if the simulator is to be used to evaluate the cost function of answers submitted and if it accepts starting point and destination information and if it is made available for use by competitors then it could be used to evaluate the cost function at intermediate points and, using Dynamic Programming, give an answer that should be the best in the competition.

Am I wrong?

By any standard, this competition is a disaster. Even Admins don't know what is going on. 

"but think folks are getting a bit mired down in the details without even seeing the thing yet"

Really! Is that your excuse for not being able to answer the questions! If you are not ready, why do you launch it!

i can only agree.

Sali Mali wrote:

I didn't participate in part one so am probably missing some key information that I can't find in the comp description. I am starting this thread to help get others also in this situation up to speed with exactly what it is we are supposed to be doing.

Phase 2 just launched, you're ahead of everyone and not behind.

Sali Mali wrote:

My solution is going to be to be an algorithm that tests every possible flight path with the simulator and chooses the best one.  I don't see anything in the details we have been given to date that forbids this  

other than the sheer computational complexity of doing so?

It has to run and be submitted by the deadline.

Sali Mali wrote:

As Vik also pointed out - what is the point of 'other' data sets being made available?

Weather is all part of one large connected system, so the simulator
weather data and the external sources are not independent and may have
some interesting connections.For most purposes, you would probably just want whatever weather data the simulator is using.

That said, you may be able to predict weather features that are "coming up" in the simulation dataset which have not "yet" (at any given simulation time) appeared, based on something like the following:
When X appears in the simulation's weather forecast, it tends to be associated with Y in the external dataset, which then tends to be associated with Z in the later simulation dataset.  It's like exposing the middle layer of a neural network to where you can actually read the values, which makes designing algorithms a lot easier. 


That being said, I probably won't use external weather data. 

Per JC36's post, see you in a few weeks!

I was an active pilot years ago (weekend warrior - not commercial) , but I have seen first-hand what the 'big-guys' have to go through when arriving at an airport: ATC takes a firm hand and can shuffle these planes around based on fuel, weather, create stacked holding patterns, divert planes, handle emergencies, and let VIPS or military cut in line (just to name a few).

Surely we are not going to play ATC with this competition. (get into the pattern, turn into downwind leg, descend, turn to base, descend, get on the ILS or other system, and touch wheels-down... just beyond the numbers... on the correct runway).

I 'assume' that all we have to do is get the plane to a known 'goal post' and that is it.

It would be nice to know the explicit 'goal post' that we are aiming for,  for each flight.

I 'assume' that each goal post  (or more accurately an imaginary keyhole)  is a "Lat/Lon and altitude" somewhere in the sky near the destination airport?

Did I miss something in this pile of data? Thanks for in advance...

EDIT: if the admins could post a low-scoring yet complete 'submission file', I think a lot of questions would be answered....?

<123>

Reply

Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?