Probabilistic knowledge-based characterization of conceptual geological models

Júlio Hoffimann, Sandro Rama Fiorini, Breno de Carvalho, Andres Codas, Carlos Raoni, Bianca Zadrozny, Rogerio de Paula, Oksana Popova, Maxim Mityaev, Irina Shishmanidi, Anastasiia Gorelova, Zoya Filippova

IBM Research, Brazil
Gazprom Neft, Russia


The construction of conceptual geological models is an essential task in petroleum exploration, especially during the early stages of investment, when evidence about the subsurface is limited. In this task, geoscientists recreate the most likely geological scenarios that led to potential accumulation of reserves in a target block, based on past experience, historical analogues, and interpreted “signatures” that were left in the data by physical processes. Due to cognitive constraints, this task has traditionally focused on the single most likely conceptual scenario, or at most, a reduced set of scenarios chosen a priori via ad-hoc methods, which often lead to improper block valuation and severe money losses. In this work, we propose a probabilistic framework for reasoning about conceptual geological scenarios that helps domain experts maintain multiple hypotheses throughout the exploration program. The framework is extensible and can be instantiated automatically from simple knowledge templates, a form of “knowledge standard” in the company. We show how the acquired knowledge can be leveraged for uncertainty mitigation using concepts from information theory, and assess the framework qualitatively in a real case study.

1. Introduction 

In petroleum geology, decisions of great economic impact such as shooting new seismic, and drilling pilot wells, need to be made early during exploration stages, based on limited evidence about the subsurface. In order to support these decisions, geologists rely on their prior experience to form a coherent conceptual model of the subsurface (Bowden, 2004), to estimate the value of the asset and their chance of success. In many cases, this conceptualization of the subsurface is achieved via an analytical method in which a holistic view—the conceptual geological model—is decomposed into multiple conceptual components (or submodels), each of them carrying an important aspect of the buried, and hence unobserved object (e.g. hydrocarbon reservoir, rock formation). Examples of these conceptual submodels are the sedimentary environment (SE), the structural (ST), the petrophysical (PT), and the fluid (FL) models. Each submodel is still quite complex to characterize, and instead the literature provides distinctive categories (or scenarios) that represent a typical instance or prototype concept. For example, the SE submodel can be categorized into various conceptual scenarios including shallowmarine, braided river, and delta environments. The task of characterizing the conceptual geological model (CGM) is therefore reduced to multiple classification tasks for the constituent submodels, where each submodel is classified into one of its multiple geological scenarios (see Fig. 1). Classifying a submodel into conceptual scenarios involves interpreting “signatures” that are left in partial and/or indirect measurements of the subsurface (e.g. well logs), and which are often insufficient for identifying a single scenario. In practice, at least three issues arise: (1) geologists anchor their investigation in the scenario they know how to


justify best (Baddeley et al., 2004; Bond et al., 2007), (2) multiple members of the exploration team disagree in what the most likely scenario is (Bond et al., 2015; Polson and Curtis, 2010), and (3) members specialize on very specific workflows and cannot foresee important connections across different data sources. All these issues lead to inefficiencies in resource exploration. In the first case, the team anchors their evaluation of the asset in the wrong scenario, and the company makes a suboptimal investment. In the second case, too much time is spent trying to conciliate multiple conflicting views, and the decision at the end is still open ended, without an agreed standardized process. 

Finally, in the third case, time-consuming workflows are executed without guarantees that the obtained signatures will mitigate the overall uncertainty about the CGM. In order to support a holistic understanding of the subsurface and minimize the aforementioned biases and disagreements, an expert system must rely on a priori knowledge, which is collectively constructed by the exploration team, their company, and the broader geosciences community. It is reasonable to assume that a purely data-driven system cannot full-fill this goal, especially in early stages of exploration when few signatures are available; and that any effective solution to those issues includes some knowledge-based component. 

The literature of knowledge-based systems in geosciences dates back to the expert systems of late 70s (Hart et al., 1978; Miller, 1810; Biswas et al., 1990; Miller, 1995); however, most past attempts to characterize CGMs have concentrated their features around issues (2) and (3) above, and less so around issue (1). These systems are capable of sophisticated uncertainty quantification and reasoning about conceptual scenarios (Smith and Shafer, 1976), but their implementations are to a great extent limited to a linear form of reasoning, i.e. a sequence of questions is triggered by the system on the basis of previous answers provided by the user towards the most likely scenario of the CGM. This linear form of reasoning was appropriate for the technology of past decades, with fewer computational resources and lack of efficient inference algorithms. However, the geosciences community soon noticed that it often constrained experts who wished to investigate different paths of evidence that were not offered by the system, or wished to maintain multiple hypotheses or scenarios of the CGM running in parallel. 

The first knowledge-based system in the geosciences that we are aware of is called PROSPECTOR (Hart et al., 1978). With a focus on mineral deposit characterization, the system directed questions to users about the presence or absence of particular types of minerals and attributes deemed relevant for deposit characterization. Users answered these questions using a ½ 5; 4; 0;…; þ4; þ5 scale that represented strong evidence for presence ( þ 5) or strong evidence for absence ( 5) of the attribute. These answers were then propagated into the system using

Bayes rule, and other logical constraints, leading to the most likely scenario of the mineral deposit. PROSPECTOR was developed in the late 70s and inspired various other expert systems in geosciences, including its successor muPETROL (Miller) for sedimentary basin characterization. Another knowledge-based expert system developed after PROSPECTOR, the PLAYMAKER (Biswas et al., 1990) module of the eXpert eXplorer (XX) system characterizes petroleum plays, prospects, and reservoirs based on a similar workflow. Questions are asked about geological attributes of a body of rock, and geological rules are triggered, based on users answers. 

Each geological rule in PLAYMAKER has an associated belief that is propagated with Dempster-Shafer evidence theory (Smith and Shafer, 1976). The system is capable of propagating beliefs over hierarchical attributes in a knowledge tree and is, perhaps, the first belief-based expert system in the geosciences with this feature. In spite of its advanced evidence theory, PLAYMAKER suffers from the same linear form of reasoning of previous implementations. The necessity to hard-code beliefs for all geological rules is a major limitation of the approach, which can only infer new geological facts based on a pre-established set of questionable causal rules. Similar to PLAYMAKER, GEOPLAY (Miller, 1995) was also developed to address the problem of play and reservoir characterization. Different than the previous approach, however; GEOPLAY was implemented on top of a general development tool for expert systems called MAHOGANY. The tool allows experts to assign certainties to rules and to data, as well as to their answers to questions. This feature was deemed useful by the creators of GEOPLAY who wanted to encode various types of uncertainty in the system, including uncertainty in geophysical measurements, uncertainty in geological rules and heuristics, and uncertainty in geological interpretation. More recently, knowledge-based systems have been developed for various tasks in petroleum exploration such as 3D geological modeling (Janssens-Coron et al., 2010; Shahbazi et al., 2020) and petrographic analysis (Abel et al., 2004; Carbonera et al., 2015). Among the many concerns addressed by the literature, one of the main ones is the challenging process of knowledge acquisition and representation. 

For example, the PetroGrapher system (Abel et al., 2004) and other derived works (Carbonera et al., 2015) employ cognitive science theories (e.g. visual chunking) and ontology modeling techniques to orient, standardize, manage, retrieve, and process petrographic information, including visual knowledge. Skjæveland et al. (2018) demonstrated the use of tables as an acquisition tool for extensive ontologies in offshore platform design. Despite these advances, the issue of knowledge acquisition and representation is still relevant (Gil et al., 2018). The laborious process of gathering, structuring and inserting domain knowledge into an expert system is prone to errors, and can lead to conflicting hypotheses upon use of the knowledge, i.e. the system can reach a state of knowledge that is known by experts to be impossible according to physical understanding or additional knowledge not built into the system. 

The more flexible is the design, and the more complex are the inference rules, the more difficult it becomes to calibrate an expert system for practical use. Furthermore, knowledge acquisition is particularly challenging in probabilistic settings, given that this knowledge is hard to externalize and calibrate (O’Hagan et al., 2006). In this work, we propose a probabilistic knowledge-based system that reasons through a collaboratively constructed set of signatures, to infer conceptual geological scenarios based on geoscientific knowledge acquired mainly by domain experts. More specifically, our contribution is many-fold: A formalization of the CGM characterization problem (section 2). A Bayesian framework to address the problem, together with a set of modeling assumptions guided by cognitive constraints that are existent in the acquisition of probabilistic geoscientific knowledge (section 3). An information-theoretic approach to accelerate uncertainty reduction about CGM scenarios through the recommendation of exploration activities that maximize information gain (section 4). A table-based knowledge acquisition tool, namely knowledge templates, to allow experts themselves to build a knowledge base to support the Bayesian framework for CGM characterization (section 5). A case study with data from a real exploration program to illustrate how the proposed framework can be used in practice (section 6). The workflow in Fig. 2 gives an overview of the main elements of the system. We describe these elements in the following sections.

2. Characterization of conceptual geological models







In spite of the dense notation, the probabilistic framework described in this section is very simple. It relies on interconnected Bayesian networks and on features aggregated over a given volume of rock. Extensions of this framework to highly heterogeneous media are not considered in this work, but should be possible with methods from geostatistics literature that model geospatial probabilities. Besides serving for reasoning about CGM scenarios, the framework can be extended to perform knowledge-driven recommendations as described in the next section.


4. Knowledge-driven uncertainty mitigation

In this section, we exploit the probabilistic model introduced in section 3 to mitigate uncertainty about conceptual geological scenarios. Because the model was built with expert knowledge, we can utilize this knowledge to identify which features are most informative at any given exploration stage. We consider an information-theoretic approach in which we estimate the gain of information that a feature would bring to the conceptual understanding of the object under study. The expected information gain of a feature fj is the decrease in entropy from a state without the feature observed (i.e. prior entropy) to any other state with the feature observed. Formally it is defined as:






5. Probabilistic knowledge acquisition and representation

Our Bayesian framework is supported by a knowledge base (KB) that specifies the domain and probabilistic knowledge necessary for CGM characterization (see Fig. 2). It is structured as a collection of ontologies capturing core domain knowledge, extended with task-specific terminology and probability measures. These were acquired from experts by use of knowledge templates, a knowledge acquisition technique based on input tables. In the following sections, we give a brief description of the knowledge base itself and focus on how knowledge templates are used.


5.1. The knowledge base

Our knowledge base consists of a collection of ontologies capturing general domain concepts and relations. They were specified as lightweight description logic theories (Baader et al., 2003). Probabilistic knowledge in the form of conditional probabilities have been implemented as reified entities, linked to domain entities. Punning has been used in cases where probabilities refer to domain types. These ontologies were coded in a Hyperknowledge Base (HKB). HKB is knowledge graph database for management of multi-modal knowledge graphs (Moreno et al., 2017). It provides contextualized hypergraphs as base representation formalism, allowing for specification of modularized ontologies and querying. Description logic constructs used to build the ontologies were mapped to the hypergraph representation using specific algorithms provided by Hyperknowledge. User data about use cases is also kept in the knowledge graph and structured by the ontology. Construction of data structures needed for Bayesian reasoning and other inference tasks is carried out by custom algorithms using KB querying.

5.2. Knowledge templates

Acquiring geoscientific knowledge from different sources can be expensive and time consuming. We propose a simple acquisition process based on three main steps: vocabulary acquisition, probability acquisition and model assessment. Vocabulary acquisition aims at acquiring a list of features, submodel scenarios, confidence questions and exploration activities. Differently, probability acquisition aims at acquiring signatures in the form of probabilities for the values of each feature in each submodel scenario. These probabilities are then assessed based on a collection of synthetic test cases in the model assessment step. The process is iterative, with numerous feedback cycles. In order to facilitate acquisition, we rely on the notion of knowledge templates. A knowledge template is a tabular structure specifying a pattern of attributes of one or more entities, such that each column specifies an element of the pattern and each line corresponds to a particular instance of the pattern. Tabular representations have been


used in the literature to facilitate knowledge acquisition to populate ontologies (Skjæveland et al., 2018), as well as facilitate the mapping from acquired knowledge to ontology patterns in knowledge bases. They restrict ways in which attributes and relationships can be specified among entities, guiding specification. Our goal is to have a structure that simplifies the acquisition process to a point in which experts themselves can insert their domain knowledge in the system. In particular, we will see that the tabular format of knowledge templates allows a clear view of the required probabilities for probabilistic reasoning. In our implementation, each knowledge template table is associated to an algorithm that maps each its rows to a set of ontology entities in the knowledge base. As a brief example of how this mapping is made, consider a system that is dependent on the simple ontology pattern shown in Fig. 5a. In this ontology, a geological object can be characterized by a variety of properties (i.e. grain size) with a restricted collection of values (i.e. sandstone, conglomerate, etc.) In order to be used (i.e. in CGM inference), this ontology must be extended with a collection of relevant geological properties and values that can be used to describe geological objects. 

This task can be carried out by experts using knowledge templates. For that, we develop a knowledge template (Fig. 5b) with columns for properties and values, such that each line corresponds to an instantiation of the pattern in the ontology. After experts fill the template, it is ingested into the KB by an algorithm specific to that template, extending the KB (Fig. 5c). These algorithms also perform template validation and consistency checks, which are informed to the user. Template format and construction are specially relevant. In our experience, they became the experts’ main view of the knowledge base, more so than the underlying ontology. For that reason, in the following subsections, we describe the different types of knowledge templates proposed in this work, and display excerpts from actual templates used in the case study of section 6. We also quickly illustrate an assessment that was performed multiples times during the acquisition of these templates.

5.2.1. Model template

A model template captures the features (or properties) and scenarios of a conceptual submodel along with the probabilities for the feature values for each scenario as depicted in Table 1. The original model template, from which this excerpt was extracted, comprises 44 properties and 20 different conceptual scenarios. In this tabular structure, the data source column refers to the data from which the property was obtained. In this excerpt, all properties have been obtained from “Core” samples. The columns property comment and value comment are used to store additional comments by experts about the meaning of a property and/or value. The property column refers to the property itself that can be observed within this submodel. The single valued column indicates if the property is single or multivalued. A single-valued property can only assume a single value from the corresponding list of values in the template, whereas a multi-valued property can assume multiple values simultaneously. Finally, the domain column refers to the range of values of the property, which can be shared across different properties. 

For example, the properties “Reservoir colour main” and “Reservoir colour secondary” can assume the same set of values indicated by the “Reservoir colour” domain. All these columns are related to the vocabulary of terms instantiated in the underlying knowledge graph. Besides the columns with vocabulary, the excerpt includes columns highlighted in green, which refer to conceptual scenarios of the submodel. Each green column is filled by experts with probability tokens such as Unlikely and Highly likely that are mapped by the tool to actual probability values. After an empirical evaluation and discussions with experts, we arbitrarily settled to the following tokens and values: Inevitable (0.9999), Highly likely (0.9), Likely (0.7), Possible (0.5), Unlikely (0.3), Highly unlikely (0.1), Impossible (0.0001). These columns with probabilities correspond to Equation (3), and are the likelihood of a feature fj in a given conceptual scenario Ck (1) the property is single-valued, the probabilities of its values are simply normalized to sum up to one. On the other hand, if the property is multi-valued, it is split by the system into multiple binary single-valued properties for each of the values.

5.2.2. Model-model template

A model-model template captures the likelihoods of scenarios of a submodel given scenarios of another submodel. It is used to connect the corresponding Bayesian networks for joint reasoning as depicted in Table 2. In this excerpt, we see that the structural (ST) submodel has three scenarios and that the sedimentary environment (SE) has four scenarios. The probability that ST is a LithologicalþStratigraphic given that SE is Meandering river is informed and equal to the Highly likely probability token. With this specific tabular structure, we assume that a submodel can only have a single submodel as a parent. For multiple parents, a different tabular structure would have to be proposed, but we do not consider this complexity.

5.2.3. Activity template

An activity template captures activities of data acquisition, data transformation and feature interpretation that can be performed during exploration. Conceptually, we can order these activities as chains such that each activity chain is an end-to-end procedure for feature extraction as depicted in Table 3 and Table 4. In Table 3, the column Activity refers to the name of the acquisition/processing activity. The columns Execution Mode ID and Execution Mode refer to the different execution modes of the activity, which are variations of the activity with different levels of confidence in the results. In this excerpt, only one execution mode is listed for each activity, but more generally, the activity can be performed more quickly with less accuracy as a trade-off. The column Affected Question


contains the list of questions related to the activity that are used to compute confidence of features in activity chains. Finally, the column Update Answer To contains the answers to the affected questions when the execution mode of the activity is performed. For example, we consider the first line of the table where the execution of 3D Seismic Acquisition with 75% Coverage would lead to SeismicSurvey¼3D, SeismicQuality¼High, and SeismicCoverage¼75%. In Table 4, the column Property Categories indicates which properties from the model templates can be produced by the analysis. The column Related Question contains the list of questions that for which the answers can be affected by the analysis. The column Required Data Questions indicate which questions need to be answered before the analysis can be performed. For instance, an analysis can only be performed on a data source if the data source is available and satisfies certain criteria. The columns Execution Mode ID, Execution Mode, Affected Questions and Update Answer To have the same meaning of the previous template.

5.2.4. Question template

A question template captures the questions and possible answers that an expert can give to the questions about the data and interpreted features as depicted in Table 5. The column Question ID refers to the unique identifier of the question used in other templates. The column Question contains the actual question shown by the system to the user. The column Answer contains the possible answers to the question plus a special token “Absent” for when the question is not answered. Finally, the column Confidence refers to a contribution in between 0 and 1 to the overall confidence when the question is answered in an activity chain.

5.2.5. Recommendation template

Having a list of activities and their execution modes captured in the activity templates of subsubsection 5.2.3, and a list of questions captured in the question template ofsubsubsection 5.2.4, we create a list of activity chains to be considered for recommendation by the system as depicted in Table 6. The column Chain refers to an activity chain, a sequence of identifiers for execution modes of the different activities. The column Comment contains a description or mnemonic for the chain. As explained in section 4, the system recommends activity chains based on information gain. The information gain is the difference between the prior entropy based on available features and answers, and the posterior entropy assuming that a chain has been executed. 

To compute the prior entropy from an initial set of available features and answers, the system queries the current confidence of the different features obtained via specific execution modes. This is done by querying the related questions of each activity in a given chain, and averaging the confidence contribution of the available answers. Because each chain ends with an analysis activity, the system can also query the features for which the confidence value should be assigned. With confidence values assigned to each available feature, the system can compute the distribution of CGM scenarios given features with confidence, and consequently compute the entropy of this distribution. To compute the posterior entropy, the system assumes that a given chain has been executed, and collects the corresponding features and answers to the questions. Together with the



previous features and answers, the system can compute a new entropy for the posterior distribution of CGM scenarios given features with confidences. The difference in entropy is the information gain of the chain in consideration. This process is repeated for each chain in Table 6, and the chain with maximum information gain is returned as the final recommendation by the system, see Fig. 6.

5.3. Knowledge assessment

Knowledge acquisition is an iterative process in which the concepts explained in the previous sections are refined through repeated assessment. This iteration is important to produce probability distributions that actually represent the studied phenomena and to lead to an agreement on vocabulary. The assessment performed in this project is divided in two approaches. The first approach is expert-dependent whereas the second is not. In the first approach domain experts are presented with prototypical cases and empirically compare the results of the system with their own expectations. In particular, they are presented a set of features from these prototypical cases, and compare the resulting distribution of CGM scenarios with the distribution they would have expected. In the second approach, we check if the features provided to the system are discriminative of the CGM scenarios. This check consists of two steps. In the first step, the system performs Monte Carlo simulation using the probabilities provided in the model templates of subsubsection 5.2.1 and model-model templates of subsubsection 5.2.2. This simulation leads to hundreds of synthetic cases with known CGM scenarios. In a second step, the system hides these scenarios, and tries to recover it as the most likely scenario based on a subset of the features. The result is a confusion matrix like the one shown in Fig. 7 for 1000 cases where the diagonal entries indicate the cases for which the system was able to recover the correct scenario.


6. Case study

In this section, we share results of the proposed framework on a real exploration case provided by the same experts who helped construct the knowledge and recommendation activities. First, we report the reasoning process that was followed during the exploration program, which started in 2012 with the acquisition of an exploration block on the basis of regional information, analogues, and 2D seismic data. Then, we compare this reasoning process qualitatively with the proposed framework by showing the posterior distribution of CGM scenarios at each step (or year) of the exploration program. Finally, we share recommendation results based on a greedy strategy that attempts to maximize information gain without constraints, and discuss how this strategy aligns well with the historical decisions that were made in this exploration case.

6.1. Historical case Year 2012
From 2D seismic, experts interpreted a structural trap and performed an initial assessment of resources. From nearby analogues (see Fig. 8), experts associated deposits to distributary mouth bars of a fluvial delta, characterized by good reservoir properties and large trap sizes (assessment step 1 in Fig. 9). Year 2013
An exploration well was drilled and corroborated the expected thickness of the reservoir given the most likely CGM scenario that was considered in the previous assessment step. The size of the structures identified in the seismic did not change, and by the end of the year the estimated amount of reserves was defended in the states reserves committee (assessment step 2).



25.PNG Year 2015
3D seismic was acquired in parallel with additional drilling of various production wells. The production wells penetrated a thickness much lower than the expected thickness from the previous steps. The CGM scenario had to be revisited in assessment step 3. Additional interpretation on 3D seismic revealed the presence of channel-like structures, and the most likely scenario was updated to deltaic channel deposits (assessment step 4). The estimated amount of reserves reduced significantly.

6.2. Framework results Year 2012
Based on the same features derived from regional information, analogues, and 2D seismic data, the PGM with Sedimentary Environment and Trap Type sub-models resulted in large uncertainties (see Fig. 10). The posterior distribution illustrates that one cannot safely discard scenarios that were not considered in the original exploration program such as transitional sedimentary environments including deltas, estuaries and coastal environments. Year 2013
Features interpreted in core data, well tests, and well logs became available after the first exploration well was drilled. Based on these additional features, the system reveals that some of the CGM scenarios from the previous step are unlikely. Particularly, the uncertainty is considerably reduced when core data features are taken into account (see Fig. 11). The posterior distribution is concentrated around the actual conceptual scenario (known today to be tide-dominated delta) and the company would have been able to reassess the reserves without additional drilling of production wells in areas of low thickness.


27.PNG Year 2015
For comparison with the original exploration program, we consider features interpreted in production wells that were drilled based on the incorrect CGM scenario, and features interpreted in the 3D seismic data also acquired during this year. The two sets of features support the tidedominated delta scenario, and uncertainty is further reduced as depicted in Fig. 12. For this specific exploration case, the application of the proposed framework could have helped experts in the identification of other likely CGM scenarios that were not considered initially. The posterior distribution of CGM scenarios in 2013 when the first exploration well was drilled could potentially have changed future decisions, and forced the exploration team to reassess their original hypotheses. Hypotheses which led to money losses in 2015 with the drilling of non-productive wells and acquisition of 3D seismic data. In Fig. 13, we illustrate the uncertainty reduction throughout the exploration program. We rank the CGM scenarios according to their posterior probabilities assigned by the PGM on each step (or year) of the program, and observe that the corresponding quantiles of the distribution decrease, i.e. we observe that there are fewer scenarios with more probability as the years pass. Additionally, we observe that most of the CGM scenarios included in the P90 range (i.e. set of most likely scenarios that sum up to 90% probability) are preserved at each step when more features become available, which indicate that the set of features is coherent, and that the posterior distribution is approximated reasonably well on the basis of these features.



Going back to the first step of the exploration program in 2012, before sub-optimal decisions were made, the exploration team could have reassured their next exploration activity by assessing the information gain of the alternatives as illustrated in Fig. 14. The activity with the largest information gain at that moment consisted of interpreting 3D seismic data and analyzing core data. Given that 3D seismic was not available and that it would be expensive to acquire, the next activity with the largest information gain consisted of just analyzing core data. This second activity in the list was the activity that was chosen historically given the time and cost constraints of the exploration program. Although the framework was able to rank activities in a satisfactory manner, we emphasize that the full potential of this recommendation system can only be achieved with more refined activities as opposed to macro activities, and with activities that consider quick interpretations with low-tomedium confidence on features.

7. Conclusions

In this work, we present a probabilistic knowledge-based framework for the characterization of conceptual geological models. The framework consists of (1) a probabilistic graphical model, instantiated automatically from knowledge templates, and (2) a recommendation system, designed to mitigate uncertainty via information gain. Based on a real exploration case, we demonstrate the value of the framework qualitatively by depicting the posterior probability of conceptual geological model scenarios throughout the exploration program, and by comparing it with the historical understanding of the subsurface from the viewpoint of domain experts who participated in the exploration case. The framework results are promising and align well with experts’ expectations given the knowledge that was acquired. They also highlight the importance of more advanced recommendation strategies that are not solely based on the unconstrained maximization of information gain. Future work could consider more robust methodologies for assessment and refinement of acquired knowledge, more sophisticated probability aggregation methods for composition of multiple conceptual


submodels, and efficient optimization algorithms for knowledge-driven recommendations with cost and time constraints. Additionally, extensions to knowledge templates could be proposed to accommodate multiple submodel parents and continuous features, among other limitations of the tabular layout presented in this initial work.

Credit author statement

Júlio Hoffimann:Conceptualization, Methodology, Software, Formal analysis, Investigation, Validation, Writing – original draft, Visualization; Sandro Rama Fiorini: Methodology, Software, Investigation, Validation, Writing – review & editing, Data curation; Breno de Carvalho: Methodology, Software, Formal analysis, Investigation, Validation, Writing – review & editing, Data curation; Andres Codas: Methodology, Software, Investigation, Validation, Writing – review & editing, Data curation; Carlos Raoni: Methodology, Software, Investigation, Validation, Writing – review & editing, Data curation; Bianca Zadrozny: Methodology, Investigation, Validation, Supervision; Rogerio de Paula: Methodology, Project administration; Oksana Popova: Methodology, Investigation, Validation, Data curation; Maxim Mityaev: Methodology, Investigation, Validation, Data curation; Irina Shishmanidi: Methodology, Investigation, Validation, Data curation; Anastasiia Gorelova: Methodology, Investigation, Validation, Data curation; Zoya Filippova: Methodology, Validation, Data curation, Visualization, Project administration.


The authors acknowledge the leadership of the Cognitive Geologist project, a research collaboration between IBM Research Brazil and Gazprom Neft.


Abel, M., Silva, L.A., De Ros, L.F., Mastella, L.S., Campbell, J.A., Novello, T., 2004. Petrographer: managing petrographic data and knowledge using an intelligent database application. Expert Syst. Appl. 26 (1), 9–18. Baader, F., Calvanese, D., McGuinness, D., Patel-Schneider, P., Nardi, D., et al., 2003. The Description Logic Handbook: Theory, Implementation and Applications. Cambridge university press. Baddeley, M.C., Curtis, A., Wood, R., 2004. An Introduction to Prior Information Derived from Probabilistic Judgements: Elicitation of Knowledge, Cognitive Bias and Herding. Geological Society Special Publication. GSL.SP.2004.239.01.02. Biswas, G., Yu, X., Hagins, W., Bezdek, J., Strobel, J., Kendall, C., Cannon, R., 1990. Playmaker: a knowledge-based approach to characterizing hydrocarbon plays. Int. J. Pattern Recognit. Artif. Intell. Bond, C.E., Gibbs, A.D., Shipton, Z.K., Jones, S., 2007. What do you think this is? ”Conceptual uncertainty. In: Geoscience Interpretation. GSA Today. 10.1130/GSAT01711A.1. Bond, C.E., Johnson, G., Ellis, J.F., 2015. Structural model creation: the impact of data type and creative space on geological reasoning and interpretation. In: Geological Society Special Publication. Bowden, R.A., 2004. Building Confidence in Geological Models. Geological Society Special Publication. Boyd, et al., 1992. Classification of clastic coastal depositional environments. Sediment. Geol. 80 (3–4) Carbonera, J.L., Abel, M., Scherer, C.M., 2015. Visual interpretation of events in petroleum exploration: an approach supported by well-founded ontologies. Expert Syst. Appl. 42 (5), 2749–2763. Cover, T.M., Thomas, J.A., 2005. Elements of Information Theory. Wiley. 10.1002/047174882x. URL Gil, Y., Pierce, S.A., Babaie, H., Banerjee, A., Borne, K., Bust, G., Cheatham, M., EbertUphoff, I., Gomes, C., Hill, M., Horel, J., Hsu, L., Kinter, J., Knoblock, C., Krum, D., Kumar, V., Lermusiaux, P., Liu, Y., North, C., Pankratius, V., Peters, S., Plale, B., Pope, A., Ravela, S., Restrepo, J., Ridley, A., Samet, H., Shekhar, S., Skinner, K., Smyth, P., Tikoff, B., Yarmey, L., Zhang, J., 2018. Intelligent systems for geosciences: an essential research agenda. Commun. ACM 62 (1), 7684. 3192335. URL Hart, P.E., Duda, R.O., Einaudi, M.T., 1978. PROSPECTOR-A computer-based consultation system for mineral exploration. J. Int. Assoc. Math. Geol. https:// Hoffimann, J., Scheidt, C., Barfod, A., Caers, J., 2017. Stochastic simulation by image quilting of process-based geological models. Comput. Geosci. 106, 18–32. https:// URL j.cageo.2017.05.012. Janssens-Coron, E., Pouliot, J., Moulin, B., Rivera, A., 2010. An experimentation of expert systems applied to 3d geological models construction. In: Developments in 3D GeoInformation Sciences. Springer, pp. 71–91. Journel, A.G., 2002. Combining knowledge from diverse sources: an alternative to traditional data independence hypotheses. Math. Geol. 34 (5), 573–596. https:// URL 1016047012594. Koller, D., 2009. Probabilistic Graphical Models: Principles and Techniques (Adaptive Computation and Machine Learning Series). The MIT Press. ef/a/0262013193/. Krishnan, S., 2008. The tau model for data redundancy and information combination in earth sciences: theory~and~application. Math. Geosci. 40 (6), 705–727. https:// URL 9165-5. Miller, B.M., 1810. The muPETROL Expert System for Classifying World Sedimentary Basins. US Geological Survey Bulletin. Miller, B.M., 1995. GEOPLAY, a Knowledge-Based Expert System - a Model for Exploration Play Analysis. US Geological Survey Bulletin. 0148-9062(96)83501-7. Moreno, M.F., Brandao, R., Cerqueira, R., 2017. Extending hypermedia conceptual models to support hyperknowledge specifications. arXiv: S1793351X17400037 Int. J. Semantic Comput. (IJSC) 11 (1), 43–64. 10.1142/S1793351X17400037. URL S1793351X17400037. O’Hagan, A., Buck, C.E., Daneshkhah, A., Eiser, J.R., Garthwaite, P.H., Jenkinson, D.J., Oakley, J.E., Rakow, T., 2006. Uncertain Judgements: Eliciting Experts’ Probabilities. John Wiley & Sons. Polson, D., Curtis, A., 2010. Dynamics of uncertainty in geological interpretation. J. Geol. Soc. Shahbazi, A., Monfared, M.S., Thiruchelvam, V., Ka Fei, T., Babasafari, A.A., 2020. Integration of knowledge-based seismic inversion and sedimentological investigations for heterogeneous reservoir. J. Asian Earth Sci. 202, 104541. https:// le/pii/S1367912020303345. Skjæveland, M.G., Lupp, D.P., Karlsen, L.H., Forssell, H., 2018. Practical ontology pattern instantiation, discovery, and maintenance with reasonable ontology templates. In: Vrandecic, D., Bontcheva, K., Suarez-Figueroa, M.C., Presutti, V., Celino, I., Sabou, M., Kaffee, L.-A., Simperl, E. (Eds.), The Semantic Web – ISWC 2018. Springer International Publishing, Cham, pp. 477–494. Smith, A.F.M., Shafer, G., 1976. A mathematical theory of evidence. Biometrics. https:// Tarantola, A., 2004. Inverse Problem Theory and Methods for Model Parameter Estimation. SIAM: Society for Industrial and Applied Mathematics. https://www.xar

Возврат к списку