Providing Business Value with Data Science at Finnair: The Five-Step Model

Julkaistu aiheella Analytiikka, Liiketoiminta
FIN_Airbus_A330_high_Resolution_jpg_79.jpg
Eero Lihavainen profile blue cropped
Kirjoittaja
Eero Lihavainen
Data Scientist

Eero Lihavainen is a full stack Data Scientist who strives to combine strong theoretical foundations with software engineering best practices in his work. When not digging into a dataset, designing a statistical model or deploying data infrastructure in the cloud, he can spend hours tweaking sounds on a synthesizer.

Jukka Toivanen profile blue cropped
Kirjoittaja
Jukka Toivanen
Senior Data Scientist

Jukka Toivanen is a Data Scientist whose interests lie in using scientific methodology, mathematics/statistics and machine learning to solve practical problems and gain insight from data. In his spare time he likes running, reading books, and making music.

November 23, 2020 · 8 min lukuaika

We had the opportunity of improving Finnair's Operational Analytics efforts with Data Science. In this blog post, we'll walk you through the five key steps we took to ensure the best possible outcome.

Airline operations are complex. Keeping flights in schedule, rerouting passengers who missed their connections, and in general, reacting to unexpected events are just some problems Operations Controllers must deal with, while considering factors such as weather conditions, travel restrictions, and crew availability. In managing operations, airlines make numerous decisions every day, the outcomes of which may depend on each other and which can have ripple effects on many flights. For example, delaying a flight to wait for connecting passengers may cause the aircraft to arrive late on its return flight, possibly leading to other passengers missing their connecting flights.

Such effects depend on many factors, such as the passenger makeup of all affected flights, which makes it difficult to weigh their importance in decision making. Consequently, coarse, heuristic models have traditionally been applied, which can simplify the problem but also suffer from biases, leading to suboptimal decisions. In today's competitive market, in order for airlines to provide the best quality of service, it's increasingly important to take full advantage of their data assets in operational decision making.

Recently, we had an opportunity to do Data Science work with Finnair, focusing on their Operational Analytics efforts. Our work resulted in mathematical models and software tools, the use of which significantly improved customer experience, improved workflows in the Operations Control Center, and resulted in remarkable cost savings. In this blog post, we describe the five main steps that contributed to the success of our journey with Finnair.

1. Identifying the Problem

Working at an airline of Finnair's scale is, for a Data Scientist, like a dream come true: the data is plentiful and well organized, and there are seemingly endless optimization problems to solve. Of course, as for any mature industry, many of the fundamental problems are already solved. Thus, when we started our work, the first questions were: what problems we can solve, and what problems should we solve utilizing the customer's data assets to create as much value as possible, in terms of both financial and customer success.

When starting a project, our goal as consultants is to understand the challenges the customer is facing. Thus, it's natural to begin by workshopping with the customer's domain experts. In this phase, focus is the key: real-life situations always consist of many interacting variables, and the amount of details can be daunting. Filtering out irrelevant characteristics of the problem and striving to simplify a possible solution will pay off in the form of better results, faster.

When we entered Finnair, a high-level list of important business problems had already been collected in internal workshops. Together, we prioritized this list and considered potential solutions in more detail. We quickly identified that the most useful starting point would be a mathematical model of the passengers' journeys end-to-end, to replace less accurate, higher-level indicators that were used previously. This would allow us to paint a clearer picture of upcoming and ongoing flights, leading to easier and more informed decision-making.

2. Getting the First Version Out

After choosing the first problem to tackle, we set out to build a Minimum Viable Product (MVP). In software, an MVP is an application with as few features as possible to satisfy users.The goal is that after release, new features will be added in response to feedback from users. By contrast, in a Data Science product, an MVP can also involve including simplifying assumptions in a model. The idea is that it's easier to engage stakeholders in a feedback loop when they can interact with the results and validate them against their own domain knowledge. The model can then be made more complex in response to feedback, if it turns out to be inadequate.

When starting to work on the MVP, a lot of subtle technical and data-related aspects need to be clarified with the customer. Typical questions addressed in this phase include, for instance, what are the right data sources to be used when solving the problem, and how the varying data sources need to be combined, cleaned and pre-processed for further modeling efforts. Diving into the data, formulating and validating assumptions of the models, and fleshing out initial solution designs makes it clear if the planned approach will work. From the customer’s point of view, this phase validates that the link between domain knowledge and the data is correctly interpreted and verifies that results produced by the model seem reasonable. Often, this phase also produces secondary business value as it can, for example, reveal deficiencies in data quality and clarify some aspects of existing business processes.

In our work with Finnair, there was a great deal of domain knowledge to digest in the first few weeks in particular. Luckily, we were able to work closely together with their helpful and extremely knowledgeable staff, and we managed to have an MVP, a Python application periodically running the model and an R Shiny visualization, running in two months.

3. Gathering Feedback and Iterative Development of the Model

After the MVP was running, we started gathering feedback from the customer domain experts and stakeholders. From this feedback, it became clear where further development efforts should be concentrated.

Some key insights from this process were:

  • For users to be engaged and provide feedback, the user experience needs to be smooth, and there needs to be a clear feedback channel.

  • In order for expert users to trust a model, they need transparency over what goes in and what comes out.

While iterative development continues after this initial MVP stage, the initial iteration is crucial as it improves the application holistically and makes it more robust for the next phase. For instance, with Finnair, the iterative development led into leaving out some irrelevant features in the user interfaces of the applications while at the same time including other features that were initially not seen important. End users also spotted several exceptional cases that were not handled correctly by the initial models, which led to improving them further.

4. Evolution into a “Production-Ready” Application

After the application was iterated on and validated through initial feedback, it started to be ready for wider adoption. By following good software design practices from the beginning, transitioning from the MVP into a production application was seamless. To achieve this, some key points we followed were:

  • Portability: in Python, for example, using virtual environments or containers ensures that the code works when deployed in different environments (e.g. moving from development to staging).

  • Maintainability: writing modular code makes it easy to make modifications and switch out dependencies when required; following best practices of software version control helps keep track of changes and is useful in debugging.

  • Performance: in data-intensive applications, designing code to be performant from the start helps keep the development feedback loop fast. In near-real-time applications like ours, it is also crucial to ensure that adding complexity to models doesn't degrade performance significantly. Luckily, Data Science-oriented software libraries like Pandas are written with performance in mind, so it's often enough to follow best practices.

  • Customer's needs: it is important to select tools that the customer is invested in; common reporting tools for results, and programming languages that are commonly used in the organization (e.g., with Finnair, it was deemed better to transition to a reporting tool widely used in the organization for presenting results instead of the initial custom tools developed).

  • Monitoring and security have to be considered thoroughly.

In our project, we were in the "production ready" stage in about five months, which is when it was introduced to a larger group of users. At this stage, we dropped the custom dashboard built in the MVP, and delivered results in Finnair's internal Business Intelligence environment; this had two main benefits:

  1. it was already familiar to the users, so adoption was frictionless, and

  2. managing user access became trivial.

All in all, the transition to a wider user base was surprisingly smooth, although the amount of feedback naturally increased remarkably. Thus, we introduced some custom forms for collecting and storing feedback, which significantly improved the feedback loop.

5. A Suite of Further Applications

At this point, we had a mathematical model of the core business. While still being iteratively developed, it was mature enough to yield more accurate insight than previous solutions. Having selected the initial problem carefully, we knew that the core model created opportunities for a suite of other tools and optimization processes to further improve the efficiency of business operations. Thus, together with Finnair, we planned several new MVPs that would build on this model. There were two motivations for moving forward with multiple applications: on the one hand, we wanted to quickly create more value where it was possible; and on the other hand, we wanted to lay out clarified directions for Finnair’s operational analytics by exploring the application space and validating our ideas.

The suite of applications developed was adopted to daily operational use smoothly, which was supported by training sessions for the end users as well as an on-going gathering of feedback. Sharing information about the tools for the organization and quickly incorporating improvement suggestions to the models enabled the inclusion of key people in the development efforts and fast extraction of value from the tools. This resulted in increased customer satisfaction as well as significant cost savings at Finnair. In conclusion, the five-step data science process described in this blog post has proven to be a valuable guideline as Finnair’s operational analytics area continues to evolve.

Eero Lihavainen profile blue cropped
Kirjoittaja
Eero Lihavainen
Data Scientist

Eero Lihavainen is a full stack Data Scientist who strives to combine strong theoretical foundations with software engineering best practices in his work. When not digging into a dataset, designing a statistical model or deploying data infrastructure in the cloud, he can spend hours tweaking sounds on a synthesizer.

Jukka Toivanen profile blue cropped
Kirjoittaja
Jukka Toivanen
Senior Data Scientist

Jukka Toivanen is a Data Scientist whose interests lie in using scientific methodology, mathematics/statistics and machine learning to solve practical problems and gain insight from data. In his spare time he likes running, reading books, and making music.