powered by kaggle

Completed • $250,000 • 130 teams

Flight Quest 2: Flight Optimization, Milestone Phase

Tue 6 Aug 2013
– Wed 25 Sep 2013 (15 months ago)

Help in understanding the problem please!

« Prev
Topic
» Next
Topic
<123>

Vik Paruchuri wrote:
  • If we are scored based on weather conditions, etc, wouldn't the simulator need access to the same data?  Which means that training on data sets not explicitly provided to the simulator (like the NOAA set?) would not be useful.

The NOAA set contains weather information more relevant to the test set, although you don't have flight data for that.  See https://www.gequest.com/c/flight2/forums/t/5358/warning-on-weather-training for details.

This can be seen as a search problem with a huge search space. Best approach would be to generate a large set of routes (sample routes) using some form of probabilistic distribution sampling of our suggestions (latitude, longitude, altitude, and airspeed) given state signals (use flight_history table to build state signals + additional data like weather). Then you would take the cost function to assign credit to different routes.

Can be done using Monte Carlo sampling of the suggestion search space. A good start to bound/seed the search space would be the actual flight tracks flown.

Let me moderate myself a bit. "Best approach" should be changed to "A possible approach". Additionally, constructing an effective state signal is not a trivial task.

Is this correct?

If you were GE and asked some scientist to model this problem, how would you rank the solutions? Surely you would partition data into two sets: train(public) and test (private). Then, a scientist submits a solution of flight paths. And then what? GE checks that the paths are realistic, that there are no collisions between planes, that no planes go through a hurricane etc. and then calculates the cost function for each flight. Is this the point of the so called "simulator"?  Maybe controversy can be spared if it is renamed to "path_validator" ?


I'm sure we'll learn a lot more when we see the simulator code, but while we're waiting, here's my take on the contest.

It's an intelligent search/optimization problem: a routing problem. And it's well studied. Suppose I was hoping that I'd be the first one to ever think of using Ant Colony Optimization. A quick Google of "aircraft flight routing ant" would be sobering.

For each flight, I must find (and submit before the contest ends) a low cost path to the destination airport. A path consists of a list of waypoints (maximum length TBD). A waypoint defines latitude, longitude, altitude, and airspeed. They are like bread-crumbs along the path.

To win, I need to find paths with lower costs, on average, than any of the rest of you do.

The simulator code is the cost function. I don't care (other than curiosity) how realistic it is. No Coriolis effect? Fine, today the Earth stands still. Doesn't use dew point? Fine, who wants to think about humidity in August anyway?

As soon as we get the simulator, I'll be able to score my own solutions and won't have much need for submitting to the leaderboard.

The organizers have given fair warning that they may, and probably will, modify the simulator while the contest is running. Whatever I come up with, I need to be prepared to make 11th hour adjustments. That alone imposes some speed constraints on my algorithm.

I see 2 basic approaches: on-line learning and off-line learning.

On-line learning

An example would be naive search. Start with some nominal path and define a reasonable range of perturbations to that path, adding waypoints as needed. Methodically construct and score as many paths as time and my computer resources permit. Submit the best path, for each flight, that I find before the deadline.

It would be better, of course, to use a smarter search algorithm, e.g. Particle Swarm Optimizer. And it would be prudent to use some heuristics too, rather than treat this as a black box.

Off-line learning

As an example, I could calculate a lot of features for each flight: distance to airport, angle to airport, time available, weather, flight-specific parameters to the cost function (the simulator), etc.. Then I could train an artificial neural net to output the next waypoint. A path would be formed by repeatedly moving to each waypoint and calling the neural net to create the next one. I could train the weights with a GA.


Correction

So, if I’ve got this right…

When the simulator runs a flight path on my computer, it will use a truncated weather file that doesn’t have updates for the remainder of the flight, (past the cutoff time). Therefore, the path will be scored as if the weather didn’t change during the remainder of the flight. (This is not to say that it will be spatially uniform.) But when the path is scored on the leaderboard, it will use the full weather file, and the weather probably did change some during the course of the flight.

My locally-calculated score will not be the same as the leaderboard score. So, it looks like I will need to use the leaderboard to measure how well I’m doing.

Also, I’ll need to somehow factor in uncertainty in the weather, when optimizing with the simulator on my computer.

Plus, I gather there may be some other confounding factors that haven’t been revealed to us yet.

Thanks, glazed, for your thoughts.

Surely, the simulator (with or without the source code) can't be the cost function? If so, then your 'on-line learning' approach would surely yield near-perfect results. 

If we are at liberty to run the cost function as many times as we like, for an individual flight path, then I expect it to be easy to home in on a global minimum cost. (yes, there might be various local minima, but I seriously doubt these would be difficult to avoid). Then the problem just becomes an exercise in computing power. There would be no need to use weather data, or train any models. Just run the black box cost function with varying flight paths until you get the best answer...

Am I misunderstanding?

Thanks

Dave

Dave wrote:

Thanks, glazed, for your thoughts.

Surely, the simulator (with or without the source code) can't be the cost function? If so, then your 'on-line learning' approach would surely yield near-perfect results. 

If we are at liberty to run the cost function as many times as we like, for an individual flight path, then I expect it to be easy to home in on a global minimum cost. (yes, there might be various local minima, but I seriously doubt these would be difficult to avoid). Then the problem just becomes an exercise in computing power. There would be no need to use weather data, or train any models. Just run the black box cost function with varying flight paths until you get the best answer...

Am I misunderstanding?

Thanks

Dave


Kaggle recently ran a Traveling Santa contest. “All we had to do” was search through the space of all possible permutations of the list of cities and apply a simple cost function to find the best path. Some people have faster computers than others. And yes, size mattered, but there was a lot of technique involved too.

In this contest, the search space of possible paths is also very large. (Theoretically, it could be infinite.) As to how tricky the search will be, we’re just speculating about a piece of software we haven’t seen yet. But since we’re here: I boldly predict that the simulator code will produce a satisfyingly complex fitness landscape…

by the end of the contest.

glazed wrote:

 we’re just speculating about a piece of software we haven’t seen yet. 

We're not speculating about anything. All we want to know is..

'can the simulator be used as in integral part of the solution'

The answer is either yes or no.

Yes, the simulator can be used as an integral part of the solution. We wouldn't be releasing it otherwise.

Ben Hamner wrote:

Yes, the simulator can be used as an integral part of the solution. We wouldn't be releasing it otherwise.

Thanks Ben - its taken a while to get that answer ;-)

This is not explicitly stated in the rules and getting the answer to that question was the whole point of this thread. May I suggest you update the rules to clarify.

The next question is then what stops this becoming just a computing power problem? You are actually giving us the means to find the definitive solution given infinite time and resources. To me the cost function should not be based on who can find the best solution, but also how long it takes to find it. There is nothing smart about an algorithm that just does a brute force search.

Thanks Ben.

Can you confirm that the simulator will provide the *same* cost evaluation function that the scoring system will provide? In other words, we'll have no great need to submit lots of online submission attempts because we will have access to the same functionality locally?

Thanks!

Dave wrote:

Can you confirm that the simulator will provide the *same* cost evaluation function that the scoring system will provide? In other words, we'll have no great need to submit lots of online submission attempts because we will have access to the same functionality locally?

Yes, the Simulator code release will also include the cost function so that you may test locally.

Dave wrote:

Can you confirm that the simulator will provide the *same* cost evaluation function that the scoring system will provide? In other words, we'll have no great need to submit lots of online submission attempts because we will have access to the same functionality locally?

The simulator will include the same cost function. However, it won't include some input data that occurs after the cutoff time on each day that we use to run the simulator on our end (for example, future winds and airport landings).

Hi Ben,

I am trying to build an Agent that will help each pilot reduce his/her costs and stay on schedule ( I believe that is the core  intent  of the challenge ... correct me if I am wrong)

The test file drops each flight 'out of the sky' and says you are here, at this altitude and ground speed, and here is your destination, and here are the costs incurred and penalties of not executing well ( and with no current weather data?)

In the context of this challenge, the Agent must figure out where it came from (the Departure airport), what time it is  (approximately at best) what probable flight number am I, what kind of probable plane am I, what was my probable flight plan, assume sunny skies, and other sundry probabilistic details.

Shouldn't an on-board Agent know about these details as facts? Would you as a pilot, trust an on-board Agent that just woke up and then doesn't know the time, the plane, the flight number, and then must comb through thousands of flights of yore to figure this all out... ?

     PILOT: The Agent rebooted and  says:... hmmm .. 'go to warp factor 3...'

     Co-PILOT: OK Sir, Turning off Agent.  ....  CLICK

I am pretty sure an Agent can figure out the potential Victor-airway, but Jet-airways above the Victor can pretty much self direct these days, with a nod to the VORS ,NAVAIDS, rules of the road, and with the help of ATC and flight plans.

You gave the Agent gigabytes of past history, can you at least give the Agent (via the test data csv) a simple watch?

As always, I could be very wrong about the contents of the massive data provided and the many many unstated intentions of this challenge

Thanks in advance

Hi Karle,

The information about the test flights (origin airport, aircraft type, etc) will be provided with the Simulator code release. The file testFlightsRev2.csv only contains information about the flights needed for calculating the cost function. 

What is time interval between ordinals of a one flight? Or is this time interval varying and determined by path simulator based on delta coordinate and delta speed?

sfin wrote:

What is time interval between ordinals of a one flight? Or is this time interval varying and determined by path simulator based on delta coordinate and delta speed?

You are correct, there is no fixed time interval between ordinals in the submitted flight route. Time between ordinals will depend on the route, airspeed and wind conditions.

joycenv wrote:

sfin wrote:

What is time interval between ordinals of a one flight? Or is this time interval varying and determined by path simulator based on delta coordinate and delta speed?

You are correct, there is no fixed time interval between ordinals in the submitted flight route. Time between ordinals will depend on the route, airspeed and wind conditions.

So how I am penalized (by cost function) if at interval 0 I am at point say p with velocity v and at interval 1 I am at point p+epsilon (consider epsilon vector almost 0,0 longitude,latitude change) and velocity v1 such that v1 >> v. So my coordinates say that I have moved "epsilon" (almost zero time) but velocity v1 says that I must have moved more than zero time. Thus basicalyl this is case of "breaking physics" (accelerating plane's speed from v to v1 in "almost zero time").

Hi Mali 

I too think that the solution should be an algorithm that chooses the best path. But little confused about how should we define 'Best'. Is it that takes less time and thus consumes less fuel? 

Sali Mali wrote:

William Cukierski wrote:

Lots of great questions and enthusiasm here, but think folks are getting a bit mired down in the details without even seeing the thing yet.

Thanks Will, but detail is quite important - there have been numerous Kaggle competitions (dare I say the majority) where there has been too much haste in starting the comp only for the competitors to point out data issues and flaws.

wrt the simulator - I don't think we need to see it before we are told how we can use it.

My solution is going to be to be an algorithm that tests every possible flight path with the simulator and chooses the best one.  I don't see anything in the details we have been given to date that forbids this - so  as Vik pointed out above, it is pointless even looking at the data or getting excited about this comp until we know if it is going to be the 'real' best algorithm that wins or just the person who is smart enough to 'game the game'.

As Vik also pointed out - what is the point of 'other' data sets being made available?

Hi Mali

I too think that the solution should be an algorithm that chooses the best path. But little confused about how should we define 'Best'. Is it that takes less time and thus consumes less fuel?

<123>

Reply

Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?