57 days ago
4 months ago
8 months ago
9 months ago
11 months ago
11 months ago
Deadlinesview timeline »
November 29, 2012Competition Launch
December 18, 2012Leaderboard Activated
February 14, 2013Model Submission Deadline
March 4, 2013Final Data Released
March 11, 2013Final Submission Deadline
Flight Quest Phase 1 Winners
1st Prize: Team Gxav &*
Xavier Conort, Cao Hong, Clifton Phua, Ghim-Eng Yap and Kenny Chua (Singapore)
Team Gxav &* used a mixture of gradient boosting and random forest models to predict gate and runway arrival times. With average errors of 4.2 and 3.2 minutes for gate and runway arrivals, respectively, this translates to 40% and 45% improvements over the standard industry benchmark estimates. Key to their success was careful feature selection with their final models using only 58 and 84 features for gate and runway arrivals, respectively, from the total 258 features they painstakingly constructed and optimized. Q&A with Xavier
- Xavier Conort is a French actuary with more than 15 years of working experience in France, Brazil, China and Singapore. He gained experience in Machine Learning by running Gear Analytics, a Singapore-based consultancy and by competing on Kaggle and is currently the #1 ranked data scientist on Kaggle overall. Xavier earned a MSc in Actuarial Science and Statistics from Paris Tech and Paris VII University and is a Chartered Enterprise Risk Actuary. He recently joined the Institute for Infocomm Research, I2R, in Singapore. Website »
- Hong Cao (Singapore) is currently a data analytics scientist in Institute for Infocomm Research (I2R) of Singapore's Agency of Science, Technology and Research (A*STAR), since 2011. He received the first-class honors B.Eng, and Ph.D degrees from Nanyang Technological University, Singapore, in 2001 and 2011, respectively. His current research interests include mobile data analytics, time series data mining, machine learning and multimedia forensics. His previous work in image forensics received the best paper award in IWDW 2010 and the honorary mention in ISCAS 2010. Recently, he also led teams to win international and local benchmarking challenges such as Opportunity Activity Recognition Challenge 2011 and Up-Singapore Hackathon 2012. He currently serves as secretary for IEEE signal processing society, Singapore section. Website »
- Clifton Phua (Singapore) is currently a Senior Consultant in SAS Institute Pte Ltd, since Feb 2013. He was formerly a data analytics scientist in Infocomm Research (I2R) of Singapore's Agency of Science, Technology and Research (A*STAR) for more than 5 years. Clifton has published more than 30 technical papers on various industry applications of data analytics, such as fraud detection and activity recognition. Recently, his teams have won challenges such as Fraud Detection in Mobile Advertising Competition 2012 and Up-Singapore Hackathon 2012. Clifton's PhD and bachelor's (first class honors) degrees are from Clayton School of Information Technology, Monash University, Australia. Website »
- Ghim-Eng Yap (Singapore) is a Principal Investigator with the Institute for Infocomm Research (I2R) in the Agency for Science, Technology and Research (A*STAR), Singapore. He received his Bachelor Degree (First Class Honors) from Nanyang Technological University (NTU) of Singapore, and was awarded the A*STAR Scholarship to complete his PhD degree from NTU in 2009. During the course of his work, Ghim-Eng has led multilple I2R R&D projects in the areas of recommender systems and privacy preserving audience measurement, and he is currently the I2R Principal Investigator for Data Privacy. Recently, his team won the First Prize in the Fraud Detection in Mobile Advertising Competition 2012. As a Scientist and Inventor, Ghim-Eng has published widely in various areas of data analytics including recommender systems and Bayesian statistics, and he holds a A*STAR patent on a method for privacy-preserving data aggregation.
- Hon Nian "Kenny" Chua (Singapore) is currently a data analytics scientist in Institute for Infocomm Research (I2R) of Singapore's Agency of Science, Technology and Research (A*STAR) since March 2012. Prior to that, he was a postdoctoral fellow at Roth Lab in the University of Toronto and the Harvard Medical School, where he worked on applications of machine learning in biology and biotechnology. During his PhD he worked on the prediction of protein functions, complexes, and interaction networks, and led a team that won a protein subnetwork prediction challenge at the DREAM2 conference in 2007. He was awarded the A*STAR graduate scholarship in 2003 and the A*STAR postdoctoral fellowship in 2008. He received his bachelor degree and his PhD from the National University of Singapore.
2nd Prize: Team As High As Honor
Jonathan Peters (Preston, UK) and Paweł Jankiewicz (Warsaw, Poland)
Team As High As Honor used a two-step approach that combined the results of a generalized linear model that encoded intuition about important variables with refinements derived from a random forest model. The team capitalized on the success of the linear model to add the effects of multiple variables and cleanly resolve issues of missing data.
- Jonathan is an Epidemiologist, specifically a Public Health Analyst and Data Modeler with a particular interest in machine learning. His work for Trilogy SmartLeads involves creating and refining models to predict the probability that an internet automotive lead will convert to a sale.
- Paweł is Reporting & Analysis Team Leader at Raiffeisen Polbank and earned a Masters in Investment Banking from the Warsaw School of Economics. Paweł is 29 years old and has 5 years of experience in the banking industry as a reporting specialist. He began learning programming as a hobby 5 years ago and is currently proficient in Python, R and SQL. He loves competing on Kaggle in his free time (if any). After succeeding in the Flight Quest competition he decided to quit his job in a bank to spend more time on data mining projects. Website »
3rd Prize: Team TakiGabor Takacs(Gyor, Hungary)
Team Taki used a six layer model relying on successive ridge regressions and gradient boosting machines to model both gate and runway arrival times. This approach used 56 features extracted from the raw data, with all but two coming from the test day data.
- Gabor is an Assistant Professor at Szechenyi University, a small university in Hungary and part-time consultant at Gravity R&D, a recommender system company. He was the captain of The Ensemble team that finished second in the Netflix Prize competition. Gabor received an MSc in Informatics and PhD in machine learning from the Budapest University of Technology and Economics. His main research interests are machine learning and artificial intelligence. Website »
4th Prize: Team Sun
Sergey Kozub (Kursk, Russia)
Team Sun’s approach to predicting gate and runway arrival times relied on creating a derived data set with new variables encoding information about the aircraft, airport, airway, gate, hour, and flight path times. Important features used in this model include aircraft GPS position, ASDI flight plans and the direction from which airplanes approached airport runways.
- Sergey Kozub is an IT developer in financial applications with an MSc in Computers and Networking from Technical University of Moldova.
5th Prize: Team Jacques Kvam
Jacques Kvam (Livermore, CA, USA)
Jacques Kvam’s approach for predicting runway and gate arrival times used gradient boosting for a model using 10,000 trees and a whopping 1,102 features trained on 260,000 flights. Most significant among these included the distance between the final waypoint and the arrival airport. Many weather features were important as well including temporary vertical visibility and wind speed at the arrival airport.
- Jacques Kvam is an electronics engineer with an MS in EE from UW-Madison.
Honorable Mention: Team __mtb__
Matt Berseth (Jacksonville, FL, USA)
Team __mtb__ used random forest and gradient boosted models to estimate runway and gate arrival times. The final solution included over 100 different individual models, each focused on a narrow set of features (i.e. wind/weather, flight plan, aircraft's current location, etc.). These individual models were blended together to generate the final estimates. The training data was created by randomly selecting eight cutoff times for each day in the training period. A separate cross validation data set was used to select hyper-parameters.
- Matt Berseth is co-founder and lead data scientist at NLP Logix, technology company based in Jacksonville, FL. Before NLP Logix, Matt created software solutions in a variety of different industries including healthcare, marketing and supply chain / logistics. Matt was first introduced to machine learning while earning his bachelors and masters degrees from North Dakota State University (go Bison!). His favorite modeling techniques include boosting and neural networks.