Managing Project Cost Overruns Using Bayesian Network

High levels of project management maturity are often associated with improved operational outcomes—such as enhanced quality, timely delivery, and cost efficiency. However, this correlation does not always hold true. Many existing project management maturity models lack clarity and consistency; they are typically constructed from extensive lists of project domains, vague categories, ambiguous criteria, and loosely defined practices. These models often suffer from the absence of standardized terminology, fragmented integration of key project areas, and poor usability—frequently overwhelming practitioners with an excessive number of elements to assess during maturity audits.

In light of these limitations, Bayesian networks offer a promising alternative for modeling and analyzing the complexities of project management maturity. They provide a structured, visual, and data-informed approach to understanding causal relationships among variables. By leveraging causal reasoning, Bayesian networks help explain how multiple factors contribute to a particular outcome. Their adaptability makes them well-suited to the current landscape shaped by widespread IT adoption and the growing influence of machine learning.

A Bayesian network is a probabilistic graphical model that uses a Directed Acyclic Graph (DAG) to represent a set of variables and the causal dependencies between them. In this framework, nodes represent random variables, while directed edges (arcs) denote the influence of one variable (the cause) on another (the effect). Each connection reflects a conditional dependency, and the strength of these relationships is quantified using probability distributions.

Bayesian networks support two fundamental types of inference:

Backward (diagnostic) inference – used to identify the most likely cause of an observed effect.
Forward (predictive) inference – used to estimate the likelihood of an outcome based on known or assumed causes.

A critical strength of Bayesian networks lies in their ability to incorporate both expert judgment and empirical data. A robust network is built on a foundation of accessible and well-structured data, relevant in volume and quality to the issue at hand, and presented in a clear, visual, and interpretable format. This enables organizations to model project management maturity using a cohesive and data-driven causal structure, ultimately enhancing both understanding and decision-making.

To assess cost overrun risks linked to low project management (PM) maturity, the proposed method leverages two complementary data sources within a Bayesian Network (BN) framework: expert knowledge and historical project data. Experts contribute to structuring the BN by identifying input, intermediate, and outcome nodes. In contrast, a relevant dataset is used to train the algorithm, enabling the calculation of each node’s Conditional Probability Table (CPT).

It’s essential to differentiate between two modeling layers: semantic modeling and Bayesian (causal) modeling. Semantic modeling refers to the conceptual representation of project management maturity, while Bayesian modeling encompasses the systematic steps required to construct a causal BN structure. The methodology follows a seven-step process, starting with the definition of a streamlined framework for evaluating PM maturity.

Step 1: Define a Simplified PM Maturity Framework
Traditional Project Management Maturity Models (PMMMs) often involve exhaustive audits with over 150 criteria, which, while comprehensive, are impractical for correlational analysis due to the volume and lack of sufficient audit data. Instead, a more efficient two-tiered model is proposed:

A macro-level model with a reduced number of key maturity indicators to help consultants quickly identify major weaknesses.
A detailed level, informed by expert input, to be used when more granular analysis is needed.

This simplification allows for easier integration into causal models while maintaining diagnostic value.

To structure PM maturity, project features are organized into a three-layer classification system:

Layer 1: Project Domains — Aligned with PMBOK® areas:
- Social (S): Covers communication, HR, and integration management; emphasizes team dynamics and the project manager’s leadership.
- Contract (C): Encompasses scope and risk management.
- Results (R): Relates to cost, schedule, and quality management.
- Interface (I): Includes procurement and stakeholder management.
Layer 2: Project Phase Timing — Determines when a PM practice occurs:
- Prepare (P): Before execution.
- Monitor (M): During the project.
- Valorize (V): At project closure or evaluation.
Layer 3: Execution Characteristics — Describes how PM activities are conducted:
- Activity Granularity (A): Level of task detail or time frame.
- Resource Involvement (R): Who is involved and what tools are required.
- Frequency (F): Timing and repetition of activities (e.g., daily, cyclical, or milestone-based).

These categories are used to build a tagged matrix of PM maturity criteria, creating a standardized nomenclature for assessing each PM task.

Step 2: Define Measurement Scales for BN Input Nodes
Maturity is typically assessed using a five-level performance scale, where advancing from level n to n+1 means all best practices at level n are fully implemented, and implementation of the next level is underway. The levels include:

Level 1: Absent/Discovery — PM processes are undefined or applied inconsistently and informally.
Level 2: Definition and Implementation – At this stage, the project manager and team define and establish key project management (PM) practices to be used throughout the project.
Level 3: Measurement and Analysis – Performance data is collected from various sources (e.g., monitoring and forecasting systems), enabling the team to analyze results and make data-driven adjustments.
Level 4: Managing Gaps and Interdependencies – The team becomes capable of handling complexity and uncertainty, especially when managing interrelated tasks across concurrent projects.
Level 5: Capitalization and Continuous Improvement – The final stage focuses on institutionalizing learning. The team adapts quickly to changes, manages complexity, and shares improvements across the organization.

Step 3: Identify Drift Factors
This step links project maturity levels to performance deviations (drift). Drift factors represent recurring, high-impact causes of performance shortfalls such as cost overruns or delays. The identification process includes:

Drift Cause Identification – Experts compile a set of potential causes of deviation, based on project data.
Cause Selection – Non-recurring or low-impact causes are filtered out to ensure predictive reliability.
Cause Matching – Relationships between selected drift factors and PM maturity model (PMMM) components are established as tuples, forming the basis for Bayesian modeling.

Step 4: Define Aggregation Rules
Experts define how each maturity level correlates with the selected drift factors. Their insights are used to construct the conditional probability tables (CPTs) for the Bayesian Network (BN), effectively translating domain knowledge into mathematical rules.

Step 5: Prepare and Structure the Dataset
Since Bayesian Networks are data-driven, data preparation is critical. This involves cleaning the dataset by removing invalid values, standardizing formats, and handling outliers. Each row should represent a unique project instance. The quality and structure of the data must support the number of variables (columns) and ensure meaningful learning.

Step 6: Define the Target Node
The target node in the BN represents a measurable project performance issue (e.g., budget overrun or schedule delay). This variable should be specific enough to inform PM decision-making but generalized enough to avoid data fragmentation. The number of its possible states must balance interpretability with statistical reliability, typically optimized using metrics such as the Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC).

Step 7: Apply Learning and Inference Algorithms
Once the network structure is defined, it is trained using historical project data, with additional validation against a separate test set. The expectation–maximization (EM) algorithm is employed for parameter learning due to its robustness in handling incomplete data. BN models serve two core purposes:

Diagnostic analysis – Identifying the most likely causes (drift factors) of current or future project failures and mapping them to key maturity criteria.
Predictive analysis – Estimating the likelihood of future performance issues based on assumed changes in PM maturity levels.

Bayesian Networks rely on two types of algorithms:

Structure and parameter learning algorithms – Define the network’s architecture and CPTs.
Inference algorithms – Propagate evidence throughout the network for analysis and prediction.

Hyperparameter tuning in BNs is inherently iterative, requiring trial-and-error testing to find an optimal balance between model complexity and predictive accuracy.

BN Validation and Quality Metrics

To ensure a valid and effective BN, several evaluation criteria should be considered:

Semantic coherence and interpretability – Results must be understandable and actionable for PM experts.
Completeness and coverage – Nodes and states should accurately reflect key PM dimensions, supported by sufficient data.
Relevance and granularity – The target variable should have meaningful, precise outcomes.
Model conciseness – The structure must avoid combinatorial explosion, ensuring computational efficiency.
Learning quality – The CPTs must be well-defined and complete. Metrics like AIC and BIC can assess whether the model will generalize well to new data.

A well-constructed BN supports more informed project decisions by linking PM practices to measurable performance outcomes through data-driven, causal modeling.

USE CASE:

This model is especially relevant for consulting firms working with large-scale engineering projects in sectors such as oil and gas. These firms are often brought in when project performance has deviated significantly from expectations. Their clients expect:

A clear diagnosis of the issues causing underperformance,
Immediate recommendations for corrective actions, and
A forecast of the likely outcomes if those improvements are implemented.

With access to detailed long-term data—such as deviation dates, financial impacts, root causes, and implemented corrective actions—consultants can use Bayesian networks to create predictive models. These models not only explain why issues occurred but also help anticipate future risks and guide decision-making to prevent recurrence.

The application of the BN model predicted that the project had a 15% probability of overcost corresponding to less than 1% of the expenses. An overcost between 1% and 10% has a probability of 49%. An overcost between 10% and 100% had a probability of 21% and an overcost corresponding to more than the 100% of the budget had a probability of 15%. The training and testing accuracy were within the 90% to 96% range with the relative error between the training data and the test data being less than 6%.

The model helps to predict the probability of entering into an overcost situation, based on the level of maturity evaluated in the audited project.

Source: https://doi.org/10.1016/j.compind.2020.103227

Leave a Reply Cancel reply