Optimising Data Modelling: Leveraging Power BI

Optimising Data Modelling: Leveraging Power BI

Transforming Raw Data into Strategic Insights: A Comprehensive Case Study

Summary

In today's competitive business landscape, effective data modelling is crucial. This article explores how Power BI can transform data analysis in the upstream sector of the oil and gas industry through a real-world case study. Learn about the complexities of data modelling and how a well-structured model can simplify analysis, leading to smarter data-supported decisions in production activities.

Introduction

In the business world, clear data insights are key to staying competitive. But how can companies achieve this? The answer lies in business intelligence (BI) tools.

Data modelling is at the heart of the BI process. It involves structuring and organising data in a way that makes it accessible and useful for analysis. Through a well-designed data model, it is possible to simplify the analysis of complex data, facilitating informed decision-making.

This article explores data modelling in Power BI using a real-world case study from the oil and gas industry. It highlights the complexities and challenges of data modelling and demonstrates how a well-structured data model can transform data analysis, leading to more informed and strategic decisions.

The Challenge of Data Modelling

If your role involves data analysis, you’ve likely spent considerable time using Excel pivot tables and formulas. You might also be familiar with the engines in Power BI and Excel Data Model. These tools combine the full power of a database with the DAX (Data Analysis Expressions) language. However, power must be used wisely and responsibly!

I have often observed individuals struggling to calculate the figures they need to address specific situations or problems. Formulas can sometimes become overly complex, making them difficult to manage, or worse, leading to incorrect results. For example, on one occasion, a complex formula issue was traced back to an incorrect relationship in the data model. Once the model was corrected, the formula not only worked correctly but also became much simpler to write.

Is data modelling simple? Not necessarily. It can be complex and challenging, requiring significant effort to master and the ability to visualise the model across different scenarios. Think of it like navigating a ship through stormy seas —every decision must be carefully considered, with a deep understanding of the environment and potential obstacles.

In the oil and gas industry, where our case study is based, data modelling proved crucial for analysing vast amounts of production and injection data. By structuring the data model correctly, we were able to transform raw data into actionable insights, transforming chaos into conceptual order and aiding in more informed decision-making.

Data modelling can often be complex, challenging, and intellectually stimulating. However, one thing is certain: it is never boring! Mastering data modelling is essential if you want to enhance your analytical capabilities. It enables you to focus on making the right decisions rather than getting bogged down in complex formulas, ultimately leading to more efficient and accurate analyses. Wouldn’t you agree?

What is a Data Model?

In simple terms, a data model is a set of tables linked by relationships based on common keys between the tables. Even a single-table model is a data model, but relationships make the model much more powerful and interesting to analyse.

In this context, we’re referring to a relational or tabular data model, the type that Power BI works with. There are different types of data models, but this article will focus strictly on the relational type.

Case Study: Analysis of a Development Plan

Context of the Project

To understand the complexity and impact of this case study, it's important to first set the stage with some context. As the newly appointed Engineer Manager, I joined the Company at a pivotal moment, when the process of acquiring the exploitation rights of 11 onshore fields from Petrobras in Brazil, was in its final phase. This acquisition was a strategic move, as Petrobras had decided to shift its focus towards offshore developments, leaving behind mature yet still lucrative fields.

Disclaimer: the production and injection figures presented in this article have been deliberately altered to protect proprietary company information. These numbers are purely illustrative and do not represent the actual data. This ensures that the focus remains on the methods and techniques discussed, rather than on specific operational details.

Evaluating the Development Plan

Evaluating a development and production plan in the industry is an inherently iterative and highly technical process. It involves transdisciplinary collaboration among engineers, geologists, economists, and various other experts. Given the volatile nature of market conditions and fluctuating oil prices, our long-term strategy needed to be highly adaptable, requiring a flexible approach and a readiness to revise plans as circumstances evolved.

Key Factors

  • Real-time data: essential for making informed decisions and for the continuous optimisation of both development and production plans. Having access to up-to-date information enabled us to respond swiftly to changing conditions.
  • Collaboration: vital for ensuring the success of the project. Effective collaboration between technical and financial teams facilitated seamless integration of diverse expertise, fostering innovative solutions and strategic alignment.

The Source Data Set

Understanding the source data set is crucial for the success of our development plan. The starting point consisted of production (O: oil, G: gas, W: water) and injection (WI: water and SI: steam) fluid forecasts provided by the Reservoir Development team, with the figures being monthly averages. The dataset included five original tables:

  1. OIL PRODRATE Table: contains the monthly average O production rate measured in barrels per day (bpd).
  2. GAS PRODRATE Table: includes the monthly average G production rate measured in thousand standard cubic feet per day (MSCFD).
  3. WATER PRODRATE Table: details the monthly average W production rate measured in barrels per day (bpd).
  4. WATER INJRATE Table: records the water injection rate measured in barrels per day (bpd).
  5. STEAM INJRATE Table: documents the steam injection rate measured in tonnes per day (ton/d).

The first three tables, corresponding to the producing wells, differed only in the metrics, meaning each table contained the metric associated with a single production fluid (O, G, or W). These metrics, which are the numbers to be aggregated and processed, are located at each intersection of a row corresponding to a Well TAG and a column corresponding to a particular month. The data points associated with production are substantial, with 2,032,277 raw production data points spread across three tables, and 307,469 raw injection data points in two tables. This extensive amount of data provides a comprehensive view of production and injection activities, essential for thorough analysis and decision-making.

Figure 1: Illustrates the configuration of the production tables, showcasing the organisation of data for producing wells over time.

The configuration of these tables (not being in a tabular format), combined with the absence of relationships, makes it extremely challenging to develop accurate formulas and obtain meaningful insights. This complexity can hinder our ability to analyse and understand the development plan's impact on various departments of the Company. By implementing a well-structured tabular data model, we can simplify this process, ensuring that our analysis is both efficient and precise.

Clarification: each time "flow rate" is mentioned, it must be interpreted as forecasted values provided by the Reservoir Development team.

Constructing the Initial Data Model

Building an effective data model begins with a clear understanding of the different perspectives from which the information will be analysed. This step is crucial as it ensures that the data model is tailored to meet the diverse needs of all stakeholders, thereby providing meaningful and actionable insights. This foundational step helps in structuring the data model appropriately, making the subsequent analysis more efficient and effective.

The following visual map illustrates the diverse perspectives and analyses that I mapped out when constructing the initial data model for fluid production. This map highlights the key areas of focus, including executive overview, wells, fields, and facilities, ensuring that all critical aspects of production are captured and analysed effectively.

Figure 2: Visual map of fluid production, outlining the key points of view and analyses required for a comprehensive understanding of production data across different categories such as executive overview, wells, fields, and facilities.

For Injection (Water and Steam), the visual maps were analogous to those for production, considering the particularities of each.

Having a full grasp of these perspectives was necessary because it influenced how we organised the data tables and relationships. With this clear understanding, I could ensure that the data model would support all necessary analyses, from high-level executive summaries to detailed well-specific evaluations.

The importance of comprehending the different perspectives in four points:

  1. Tailored Data Organisation. Understanding the viewpoints allows for tailoring the data organisation to meet specific needs. For example, the executive overview requires a high-level aggregation of data, while the analysis of wells demands more granular details. By knowing these requirements upfront, the data model can be designed to efficiently handle both summary and detailed data.
  2. Enhanced Data Relationships. Identifying these viewpoints is essential for accurately defining the relationships between tables. For instance, production and injection data need to be linked to specific wells and dates to provide a comprehensive view. Recognising these connections ensures that the data model supports robust and meaningful analysis.
  3. Improved Analytical Capabilities. A clear understanding of the different analytical perspectives enhances the model's capability to deliver valuable insights. It ensures that the data model is not just a repository of information but a powerful tool for analysis, enabling stakeholders to make informed decisions based on comprehensive and well-structured data.
  4. Efficient Query Performance. Designing the data model with the end analysis in mind helps in optimising query performance. It ensures that the queries run efficiently, providing quick and accurate results, which is particularly important for real-time decision-making scenarios when the stakes are high.

By focusing on these key perspectives, we built an initial data model that was robust, flexible, and capable of addressing the diverse analytical needs of the project. This approach not only simplified the development process but also ensured that the final model was highly effective in delivering the necessary insights.

Separation of Assets and Facts

The distinction between assets and facts leads to a data-modelling technique known as the star schema, where tables are classified into two categories:

  • Dimensions: informational assets like a well or a date, which have their own particular attributes.
  • Facts: events involving some dimensions with metrics to aggregate for insights.

Why star schemas? They are easy to comprehend and use. Dimensions are used to slice and dice the data, whereas numbers are aggregated from fact tables.

For this particular case, the metrics depend on the type of well:

  • Producer wells: have forecasted average flow rates on a monthly basis for each produced fluid (oil, gas, and water).
  • Water injection wells: have forecasted average flow rates on a monthly basis for the treated produced water being injected into the reservoir.
  • Steam injection wells: have forecasted average flow rates on a monthly basis for water vapour being injected into the reservoir.

This is why I decided to have three FACT tables, one for each well category group.

Initial data model tables

Fact Tables with granularity at the month and well levels:

  1. Fluid Production: this table consolidates the oil, gas, and water production rates.
  2. Water Injection: this table records the water injection rates.
  3. Steam Injection: this table tracks the steam injection rates.

Dimension Tables:

  1. Prod Well: contains details about producing wells, including field, category, and well type.
  2. WI Well: has information about water injection wells.
  3. SI Well: includes details about steam injection wells.
  4. Date: covers date attributes to support time-based analyses.

In a tabular data model, fact tables store quantitative data for analysis and are related to dimension tables, but dimensions must not have relationships among them that introduce ambiguity to the model. An ambiguous model presents not only a technical problem but also a logical one.

Establishing relationships between dimension (DIM) and fact (FACT) tables using keys like DateKey and Well TAG enabled the model to unleash its full potential.


Figure 3: A screenshot of the model view, with dimension tables strategically positioned above the fact tables. This layout serves as a visual cue, emphasising how filters cascade down from the dimension tables to the fact tables.

Other observations:

  • Dimension tables tend to be “small,” whereas fact tables are much larger.
  • Production and injection are interrelated from the physical model perspective, but datasets can be treated relatively independently from the data model perspective.
  • The Date Table is the only dimension table related to the three fact tables.
  • The remaining dimension tables are linked only to their respective fact table via the Well TAG.
  • All three fact tables capture the state of a process at regular monthly intervals, and in data warehousing, these types of fact tables are classified as 'Periodic Snapshot Fact Tables.' In contrast, a 'transactional' fact table is the more commonly used type.

Particularities of Snapshot Fact Tables

Computing values on top of snapshot tables hides some complexity because snapshots do not generate additive measures. Additive measures are those that can be summed across all dimensions in the data model. These measures are straightforward to aggregate and are often used in various analyses.

Semi-additive measures can be summed across some dimensions but not all. These measures pose challenges because they cannot be simply added up across certain dimensions like time. Non-additive measures are those that cannot be summed across any dimension. These measures require special calculations or aggregations that do not involve summing up values.

Mixing changes in granularity with semi-additivity can be problematic. The formulas tend to be more difficult to author. Moreover, if you don’t pay attention to the details, performance will suffer. It’s always good to double-check all the numbers before considering them correct.

As an example, the metrics in the Production Fluid Snapshot Table consist of monthly production rates, which are averages rather than quantities. Consequently, these metrics cannot be aggregated over time. Detailed definitions of some base measures, along with explanations, are provided in the Annex 1.

Expanding the Data Model to Cover Downstream Infrastructure

With this version of the data model ensemble, I was able to cover a wide spectrum of critical knowledge for various essential departments, such as executive management, production engineering, cost control, and oil trade, among others. Using Power BI, I developed dashboards and visuals to answer questions clearly and concisely, facilitating better understanding and analysis of the data.

Key analyses performed include:

  • Forecasted Production: estimating the production of oil, gas, and water for each field over time, helping in strategic planning and resource allocation.
  • Production Decline Rate: assessing the decline rate for each field to predict future production levels and manage reserves efficiently.
  • Field Ranking: ranking fields concerning oil production and waterflooding demands to identify the most productive areas and focus investment.
  • API Gravity Variation: predicting changes in the API gravity of oil to understand its quality and market value over time.
  • Wells Distribution: forecasting the allocation of new and returning wells across different fields and years to plan drilling and workover activities, ensuring accurate budgeting.

However, the model does not map the wells to the surface facilities, nor does it map the connections between the different facilities. Consequently, there is no way to model how the production and injection fluids are distributed and flow downstream of the wells. This limitation prevents us from addressing issues related to downstream infrastructure and logistics from the wells.

Concerns that needed to be resolved:

  • How does the development plan impact the existing gathering, processing, and transportation facilities?
  • Would these facilities need revamping?
  • What investments are necessary to debottleneck and optimise the existing facilities?
  • When would the improvements need to be implemented to guarantee the planned production?

To properly respond to these inquiries and gain comprehensive insights, it is necessary to expand the model to explore all concerns related to the transport, separation, and processing demands of the production and injection fluids.

Expanding the Data Model

The first step was to model the configuration of the existing fluid collection, processing facilities, and transport network. This included determining:

  • The production satellite to which each well is assigned.
  • The facility to which the production of each satellite is directed to.
  • The distribution arrangement among surface facilities, depending on the O, G, and W fluids.

The expanded model allowed for a holistic study of the assets. Three new dimension tables were developed and incorporated: Prod Facility, WI Facility, and SI Facility, each having a direct relationship with its respective well table.


Figure 4: A screenshot showcasing the expanded data model, featuring three new dimension tables: Prod Facility, WI Facility, and SI Facility. Each of these new dimension tables is positioned above and has a relationship with its associated well dimension table (e.g., Prod Facility above Prod Well), which in turn has a relationship with the respective fact table. This layout once again reinforces the concept that filters cascade from the higher-level dimension tables, such as Prod Facility, down to the well dimension tables, and ultimately to the fact tables, ensuring a streamlined flow of data filtering.

In case it hasn’t been noticed, it is no longer a star schema but a snowflake schema. A snowflake is a variation of a star schema where a dimension is not linked directly to the fact table. Rather, it is linked through another dimension. I could have avoided snowflakes by denormalising the columns from the farthest tables into the nearer to the fact table. However, in this case, a snowflake schema was fit for purpose.

Advanced Techniques in Data Modelling to Gain Deeper Insights

To unlock the full potential of data analysis, it is crucial to go beyond basic data modelling methodologies and explore advanced techniques. Advanced data modelling techniques can provide deeper insights, enabling more precise and strategic decision-making. In this section, we will delve into some of the sophisticated approaches used to enhance our data model, covering topics such as dynamic segmentation, ABC classification, calculation groups, and advanced visualisations. These techniques not only improve the robustness and flexibility of the data model but also facilitate a comprehensive understanding of complex data relationships and trends. By implementing these advanced strategies, we can extract maximum value from our data and drive more impactful business outcomes.

1.   Dynamic Well Segmentation

When analysing well metrics, it is relatively straightforward to compute metrics such as productive time, total production, or average flow rate for each well. However, a more insightful approach involves clustering producing wells based on forecasted oil flow rates or other relevant metrics. Instead of partitioning flow rates by every unique value (which can be overwhelming due to the large number of different values), grouping flow rates into ranges provides meaningful insights.

This technique, known as segmentation, allows for stratifying data based on a configuration table. In this case, the challenge was to cluster wells dynamically based on the average oil production flow rate, taking into account the current filter context. This means that the segmentation adjusts dynamically according to the period, or filters applied in the report. For instance, a well might belong to one segment for one period and another segment for a different period.

To achieve this dynamic segmentation, a virtual relationship was established using DAX, bypassing the need for physical relationships in the data model. The first step was defining configuration tables (oil and gas), which defined four segments: silver, gold, platinum, and diamond. Using percentiles, median, and average flow rates as reference, the ranges for each segment were determined. A DAX calculation group was then created to dynamically compute the number of wells in each segment, or any other selected measure, considering all current filters.

2.  ABC Classification | Pareto Analysis

The ABC/Pareto analysis is another impactful technique that introduces new ways of classifying entities based on values, grouping the entities together that contribute to a certain percentage of the total. For instance, this method helps identify the most impactful wells on the overall business by categorising them into three classes (A, B, or C), ensuring:

  • Class A wells account for 70% of production
  • Class B wells account for 20% of production
  • Class C wells account for 10% of production

These class limits can be modified as needed, provided they sum to 100% without overlapping. Similar to well segmentation, the ABC classification was implemented dynamically, allowing for adjustments based on the filter context.

3.  Calculation Groups

Adopting calculation groups in Power BI provided numerous advantages, enhancing the efficiency and flexibility of data analysis. Calculation groups centralise and simplify the management of recurring calculations, reducing the number of measures and code redundancy, thus improving model maintainability. For example, rather than creating multiple measures for different time related computations (year-to-date, period-to-date growth, moving averages, etc.), calculation groups allow these transformations to be defined once and applied dynamically.

In this model, calculation groups were defined for various purposes, including:

  • Time Intelligence. Provides a comprehensive range of time-related calculations.
  • Dynamic Segmentation. Enables ABC Classification calculations (ABC class, ABC class value), Segment Growth Value, and related functions.
  • TopN Filter. Dynamically ranks the top N producing wells while also displaying an "Other Wells" row for the remaining wells, based on a user-selected measure. Additionally, it displays the computed values for other measures not used in the ranking.
  • Cumulative Total. Computes running totals, such as the accumulated oil production up to a specified date, based on a user-selected measure.
  • Statistical. Performs statistical evaluations, including determining max value, date of max value, median, quartiles, interquartile range (IQR), and more.
  • # Wells. Calculates the number of new, lost, and returning wells, as well as the count of active wells based on their production status.
  • % of Parent Value. Accurately computes percentages within hierarchies, showing the ratio of a category against the grand total, a subcategory against its category, and a well against its subcategory. The calculation adjusts depending on the hierarchical level.
  • Fluid Measure. Allows for selective replacement of measures in DAX expressions.

However, it is crucial to understand the complexities of calculation groups before implementation, as they can be challenging to manage. Investing significant time to master these powerful tools is essential for achieving their full potential.

4.  Sankey Diagram

The Sankey diagram is a visualisation technique used to represent the flow of values from one set to another. In this context, it was implemented to illustrate the flow of oil, gas, and water from gathering stations (facilities receiving production from wells) through various surface facilities to the final endpoint. Although this information could be presented as a matrix, the Sankey diagram offers a clearer and more visually appealing representation of data flows, capturing interrelationships and magnitudes more effectively.

To use this custom visual from Microsoft AppSource, the data source required specific formatting, which necessitated adjustments to the data model. The aim was to dynamically represent the flow of fluid production based on any changes in the filter context.

5.  Transition Matrix

Wells' classifications can change over time based on their decline rates and production levels. A transition matrix is a powerful modelling technique that uses snapshots to analyse these changes. By creating a calculated table that classifies each production well monthly and establishing the necessary relationships, the transition matrix helps answer questions about the evolution of well classifications over time.

For example, it can track wells classified as "gold" in one period and show how their classification changes in subsequent periods.

6.  Like-for-like Comparison

Like-for-like comparison is an adjusted metric that compares two time periods, restricting the comparison to wells with consistent production statuses. This technique ensures that only wells open and producing in all considered periods are evaluated, preventing skewed results from wells that were closed during some periods. Although this comparison was computed dynamically, it could also be done statically if needed.

By implementing these advanced data modelling techniques, the model becomes more robust and capable of delivering deeper insights, ultimately supporting more informed and strategic decision-making. Moreover, the combined use of different techniques significantly enhances the ability to explore the dataset. For instance, Figure 5 features a Sankey Diagram that represents the oil production rate from the Gathering Stations (ECOs) to the Export Terminal (TERMAP).

The analysis is further enriched by the accompanying matrix, which presents additional statistical metrics, such as minimum, median, and maximum for the selected fluid measure (oil prod rate, in this case) within the analysed time period. The matrix also includes the specific dates on which these extremes are anticipated, providing a comprehensive view of the production dynamics over time.


Figure 5: This page report view combines a Sankey Diagram and a matrix to provide a comprehensive view of oil production flow and its main statistical characteristics. The Sankey Diagram traces the journey of oil from the gathering facilities to the export terminal, illustrating the distribution and movement of production rates across the network. The matrix complements this by offering detailed statistical insights, including the projected minimum, median, and maximum oil production rates over the selected period, along with the specific dates these values are expected to occur. This combined view facilitates valuable insights for debottlenecking analysis.

Conclusion

This case study demonstrates the transformative power of effective data modelling in the oil and gas industry using Power BI. Through meticulous structuring and organisation of data, we were able to convert raw data into actionable insights, significantly enhancing our decision-making capabilities.

The initial challenge of working with complex data sets was addressed by understanding the key analytical perspectives and constructing a robust data model. This foundational work enabled us to create dynamic and interactive dashboards that provided clear and concise answers to critical business questions.

By expanding the data model to include downstream infrastructure, we tackled the limitations of our initial setup and were able to analyse the entire lifecycle of production fluids from the well to the delivery point. This holistic view is essential for making informed decisions about infrastructure investments and operational improvements.

Moreover, the incorporation of advanced data modelling techniques such as dynamic segmentation, ABC classification, and calculation groups further enriched our analytical capabilities. These sophisticated approaches allowed us to gain deeper insights, identify trends, and make more precise and strategic decisions. The use of visualisation tools like the Sankey diagram and transition matrix provided a clearer understanding of data flows and changes over time.

Through this journey, it became evident that mastering data modelling is not merely about handling data but about harnessing its potential to drive business success. The capacity to structure data in meaningful ways, apply advanced analytical techniques, and visualise complex relationships is crucial for any organisation striving to remain competitive in today's data-driven world.

In summary, our experience underscores the importance of investing in data modelling expertise and leveraging powerful BI tools like Power BI. By doing so, organisations can transform their data into a strategic asset, enabling smarter decisions and fostering sustainable growth in the ever-evolving landscape of the oil and gas industry.


Giving Credit Where It's Due: Recommended Reads and Resources

For those looking to explore the concepts and techniques discussed in this article further, the following books and resources have been instrumental in my own professional growth and are highly recommended.

Books:

  • "The Definitive Guide to DAX: Business Intelligence for Microsoft Power BI, SQL Server Analysis Services, and Excel" by Marco Russo and Alberto Ferrari . An essential book for mastering the DAX language and applying it effectively in Power BI and other platforms.
  • "Analyzing Data with Power BI and Power Pivot for Excel" by Marco Russo and Alberto Ferrari. This book is ideal for both beginners and seasoned data modelers, offering practical techniques for shaping data models in Excel and Power BI.
  • "DAX Patterns" by Marco Russo and Alberto Ferrari. A comprehensive guide to applying DAX patterns in real-world scenarios.

Online Resources:


Annex 1: Special Consideration for Production Metrics

In data modelling, particularly when working with snapshot tables, it is crucial to understand the distinction between additive, semi-additive, and non-additive measures. Additive measures, such as total production, can be summed across dimensions without any issues. However, semi-additive measures, like oil production rate, require special handling because they cannot be aggregated simply by summing values across time periods.

This annex explains these concepts using two DAX measures: "Oil Prod" and "Oil Prod Rate." Each measure is broken down to explain the purpose and function of each part of the code, with specific considerations for handling semi-additive measures in snapshot tables.

Oil Prod Measure

This measure calculates the total oil production, in barrels, within the filter context in place. The Oil Prod measure is additive across all dimensions. However, since the fact table 'Fluid Production' stores the oil production rate for each well on a given month, it’s necessary to account for the number of days in those months.

Oil Prod = -- total volume of oil produced in barrels within the given filter context
SUMX(
    'Fluid Production',
    'Fluid Production'[O [bpd]]] * RELATED( 'Date'[DaysInMonth] )
    )        

Breakdown of the Measure.

SUMX: This iterator function processes each visible row in the 'Fluid Production' table (the fact table), based on the current filter context.

  • 'Fluid Production'[O [bpd]]: this column represents the oil production rate in barrels per day (bpd) for each well on a given month.
  • RELATED('Date'[DaysInMonth]): the RELATED function retrieves the number of days in the corresponding month from the 'Date' table. The relationship between the 'Fluid Production' table and the 'Date' table is leveraged to access the correct number of days.
  • Multiplication: for each visible row in the 'Fluid Production' table, the measure multiplies the daily production rate (O [bpd]) by the number of days in the month. This calculation provides the total oil production for that particular well in that month.
  • SUMX final calculation: the SUMX function sums the results of the multiplication across all rows, providing the total oil production for the given filter context.

Considerations: the "Oil Prod" measure is additive, meaning it can be summed across different wells or months without issue. This is because the measure calculates total production quantities, which are straightforward to aggregate.

Oil Prod Rate Measure

The "Oil Prod Rate" measure computes the average daily oil production rate, making it a semi-additive measure. This type of measure requires careful handling because it cannot be aggregated across time without recalculating for the specific context.

Oil Prod Rate = 
/* The average daily oil production rate, in barrels per day,
   within the specified filter context */
VAR _OilProd = [Oil Prod]
VAR _NumberOfDays = 
    CALCULATE(
        SUMX(
            'Date',
            'Date'[DaysInMonth]
        ),
        VALUES( 'Fluid Production'[DateKey] )
    )
VAR _Result =
    DIVIDE(
        _OilProd,
        _NumberOfDays
    )
RETURN
    _Result        

Breakdown of the Measure.

VAR _OilProd: this variable stores the total oil production value calculated by the previously defined "Oil Prod" measure, representing the total production for the filter context.

VAR _NumberOfDays: this variable calculates the total number of days in the period being analysed.

  • CALCULATE: this function modifies the context of the calculation, ensuring it reflects the correct filter context.
  • SUMX: This function iterates over each row in the 'Date' table within the context established by CALCULATE, summing the DaysInMonth for relevant dates.
  • 'Date'[DaysInMonth]: This column in the 'Date' table contains the number of days in each month.
  • VALUES('Fluid Production'[DateKey]): The VALUES function returns a one-column table containing the distinct values from the 'Fluid Production'[DateKey] column. This is crucial in modifying the filter context for the calculation.

How VALUES Works in This Context:

o   Filtering the Date Table: the VALUES('Fluid Production'[DateKey]) function extracts the unique dates from the 'Fluid Production' table based on the current filter context (such as a specific well, month, or year selected in the report).

o   This ensures that the calculation focuses only on those dates relevant to the data in the 'Fluid Production' table, instead of applying the calculation across the entire 'Date' table.

Applying the Filter: The CALCULATE function applies this filter to the 'Date' table, meaning that the SUMX function sums the DaysInMonth only for those dates matching the filtered DateKey values from the 'Fluid Production' table.

Result of the Calculation: the result is a sum of the DaysInMonth values, but only for those months corresponding to the DateKey values found in the filtered 'Fluid Production' table.

This precise calculation ensures that the "Oil Prod Rate" measure accurately reflects the average daily production rate based on actual periods of production data, avoiding an incorrect average that might include days from unrelated time periods.

VAR Result: this variable compute the average daily production rate by dividing the total oil production (`OilProd`) by the total number of days (`_NumberOfDays`).

  • DIVIDE: this division operation ensures that the rate is calculated correctly, handling any potential division by zero errors by returning a blank value if the denominator is zero.
  • RETURN. The RETURN statement outputs the calculated oil production rate (stored in the _Result VAR), which represents the average daily production over the specified period.

Considerations:

  • Semi-Additive Nature: The "Oil Prod Rate" measure is semi-additive, meaning it cannot be aggregated across time periods simply by summing or averaging the values. Instead, it must be recalculated for each specific time period to ensure accuracy.
  • Recalculation Requirement: When working with semi-additive measures like this one, it is crucial to recalculate the measure for each time period or dimension. This ensures that the rate reflects the actual average production per day, rather than an incorrect sum of daily rates.

Importance of Handling Semi-Additive Measures

In snapshot tables, additive measures allow for straightforward aggregation, but semi-additive measures like "Oil Prod Rate" require more nuanced handling. Summing semi-additive measures over time would lead to incorrect results, as it would not account for the varying number of days in each period or the differences in production volumes. By carefully structuring DAX calculations, as demonstrated with "Oil Prod Rate," we ensure that our data model provides accurate and actionable insights, avoiding common pitfalls associated with improper aggregation of semi-additive measures.

Conclusion:  understanding the behaviour of semi-additive measures in snapshot tables is essential for creating accurate and reliable data models. By breaking down the DAX measures "Oil Prod" and "Oil Prod Rate," we can see the importance of recalculating measures for each time period rather than simply summing values. This careful approach ensures that the insights derived from the data model are both accurate and meaningful, providing a solid foundation for decision-making in the context of oil production analysis.


#DataModelling #DataAnalysis #DataStrategy #PowerBI #DataDriven #BusinessIntelligence

Ana María Bisbé York

Consultora Business Intelligence | Autora en ANAYA | Linkedin Learning Trainer | Microsoft Data Platform MVP | Microsoft Partner Power BI | Ganadora Globant Award Techfluencer España 2022

3mo

Excelente artículo, Leandro, teoría, práctica y todo bien detallado. Gracias por compartirlo, Saludos, Ana

Excelente Articulo Lean

Hoy la info esta. El problema es que hacer con ella! Sin duda hay que sumar estas herramientas

Santos Muratore

Consultor de Gestión de Proyectos

3mo

Muy interesante!!! Un fuerte abrazo!!!

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics