pub:projects:pacmel:start
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
pub:projects:pacmel:start [2020/05/01 14:17] – created gjn | pub:projects:pacmel:start [2022/06/26 15:46] (current) – gjn | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ~~NOTOC~~ | + | |
====== PACMEL ====== | ====== PACMEL ====== | ||
Line 10: | Line 10: | ||
* **Partners: | * **Partners: | ||
* **Start time:** 01.04.2019 | * **Start time:** 01.04.2019 | ||
- | * **Duration: | + | |
- | * **www:** [[https:// | + | |
+ | * **www:** [[http:// | ||
+ | * **chistera | ||
</ | </ | ||
===== Motivation ===== | ===== Motivation ===== | ||
- | Nowadays great attention is paid to the Industry 4.0 concept, whose central idea is the exploitation of large amounts of data generated by different kinds of sensors, to enact highly automatized, | + | Nowadays great attention is paid to the Industry 4.0 concept, whose central idea is the exploitation of large amounts of data generated by different kinds of sensors, to enact highly automatized, |
- | ===== Intended results | + | ===== Project structure |
- | The main challenge tackled by the PACMEL project | + | ==== Consortium ==== |
- | interpretation and use of sensor data in factories | + | PACMEL |
- | This challenge is especially important in the classic production plants | + | - AGH, Poland (AGH University of Science |
- | transitioning to Industry 3.0 and later to Industry 4.0. In such facilities, although it is common | + | - UNIBZ, Italy (Free University of Bozen-Bolzano), PI: Diego Calvanese, |
- | to encounter both low-level sensor networks and high-level BP management | + | - UPM, Spain (Universidad Politecnica de Madrid), PI: David Camacho. |
- | there is still a semantic gap between the low-level sensor readings and the high-level BP | + | |
- | models. | + | |
- | Specifically, in PACMEL we will address the following research questions: | + | |
- | - How can we extract relevant events from different dataset formats (unstructured/ | + | ==== Work packages ==== |
- | - How can we combine relevant events to discover complex business processes operating in a networked ecosystem? | + | The workplan included 6 work packages, further partitioned into tasks, briefly described below. |
- | - How can we efficiently support process analysis and modeling for BPM purposes using sensor data-based event logs? | + | |
- | To address these questions, we will combine knowledge extraction techniques with semantic | + | **WP1: Identification of requirements for smart factories, 9 m. (M1-9), Led by: AGH** |
- | technologies, | + | The first objective of this WP was the analysis of case studies with respect to Industry 4.0/Smart Factory to identify the industrial requirements that would guide the development of a general process-aware analytics framework. To this end, an exploratory analysis of the related |
- | a general process-aware analytics framework | + | |
- | domains. | + | |
- | ===== Related | + | **WP2: Knowledge extraction and data mining, 12 months (M4-15), Led by: UPM** |
+ | The main objectives of this WP included: the study and analysis of dimension reduction of the considered data sets and, the selection of the appropriate granulation and abstraction level for the data analysis with respect to sensor readings. The data analysis and filtering was provided to solve existing data quality problems, such as missing values and data redundancy, and to identify sources of unique states of factory machines/ | ||
+ | |||
+ | **WP3: Ontology driven interpretation, | ||
+ | The objective of this WP was to provide a conceptual and technological framework for the interpretation of the data produced as a result of WP2, in terms of the semantically meaningful elements that constitute the knowledge about the domain of interest. Such knowledge, suitably encoded in an ontology, would provide the basis for addressing the number of fundamental problems. A work on extension of the ontology-based data access (OBDA) paradigm (with its techniques for efficient automatic query transformation and evaluation) to deal with the type of data at hand was planned. | ||
+ | |||
+ | **WP4: Process-aware analytics framework, 15 months (M9-M23), Led by: UPM** | ||
+ | The main objectives included: (1) the provisioning of model mapping methods and visualization techniques that allow one to relate the interpreted sensor data (provided by methods in WP3) to the process models; (2) implementation of proof-of-concept software tools; and (3) the creation of feasibility studies for possible applications of the developed framework in mining as well as in other domains. | ||
+ | |||
+ | **WP5: Management, 24 months (M1-24), Led by: AGH** | ||
+ | The activities included the organization and coordination of telecons and project meetings. A set up and maintenance of an on-line work repository, based on a GitLab installation used by AGH was planned. We will guarantee the participation in the yearly CHIST-ERA meetings, along with monitoring of the project progress and reporting. Finally, risk monitoring, management and mitigation techniques would be implemented. | ||
+ | |||
+ | **WP6: Dissemination and exploitation, | ||
+ | The overall goal of the systematic efforts in this WP was to raise awareness and to foster participation in the related scientific communities and among industry stakeholders as well as to disseminate knowledge to research teams beyond the project consortium. Activities in this WP would include: the project website setup and maintenance; | ||
+ | |||
+ | ==== Dependencies ==== | ||
+ | The dependencies between the Work Packages in the project are presented below. | ||
+ | |||
+ | {{ : | ||
+ | ===== Project team ===== | ||
+ | ==== AGH, Poland ==== | ||
+ | * Principal Investigator: | ||
+ | * Main investigator: | ||
+ | * Co-investigators: | ||
+ | * Supportinng investigators: | ||
+ | * PhD Students: Maciej Szelążek, Agnieszka Trzcinkowska, | ||
+ | |||
+ | ==== UPM, Spain ==== | ||
+ | * Principal Investigator: | ||
+ | * Main investigator: | ||
+ | * Co-investigators: | ||
+ | * Students: David Montalvo | ||
+ | |||
+ | ==== UNIBZ, Italy ==== | ||
+ | * Principal Investigator: | ||
+ | * Main investigator: | ||
+ | * Co-investigators: | ||
+ | |||
+ | ===== Main results ===== | ||
+ | The project targeted four general main outcomes: | ||
+ | - Improved process mining and knowledge extraction techniques; | ||
+ | - The integration of various heterogeneous data sources by employing ontologies to support semantically enriched search of information about industrial processes extending knowledge modelling opportunities; | ||
+ | - Process design support by recommendation methods of process modelling notation according to industrial needs; | ||
+ | - Process monitoring support using time-series aware process mining methodologies that will use directly the raw timed data from the industrial domain, and visualization tools for the purpose of process-aware analytics expanding process exploratory analysis. | ||
+ | |||
+ | ==== Implementation of new advanced scientific methodologies ==== | ||
+ | Thanks to the cooperation with FAMUR and PGG (Polska Grupa Górnicza – Polish Mining Group), AGH developed a complete description of the mining use case, including the specific requirements. We built a repository of sensor data from 5 distinct longwalls from several polish mines owned by PGG and using equipment from FAMUR. Moreover, we performed exploratory data analysis of the sensor data, which allowed us for dimensionality reduction. Finally, we built a hierarchical formal model of the longwall shearer, as published in [3]. | ||
+ | |||
+ | In WP2 UPM and AGH designed a new approach to conformance checking by adapting its basic elements to the paradigm of time-based data and time-aware processes. Instead of event logs, time series logs are defined and used. In the same way, the use of WF-Nets to represent the process model has been replaced by Workflow Net with time series (TSWF-net). This changes the perspective of the conformance checking methods completely, and thus, a new method is proposed based on these new ideas. The achievements made here may open up a new way for researchers to investigate how to adapt other families of process mining techniques, such as discovery and enhancement, | ||
+ | |||
+ | In WP4 UPM developed new model mapping methods based on raw data instead of ontologies (due to the lack of progress in WP3). In this way, deep learning architectures were used to map the raw data to a compressed representation which can be easily visualized. A software tool, called DeepVATS was created to explore and visualize cyclic patterns and outliers from raw multivariate time series data, displaying them in a 2D space which is suitable for a domain expert. This tool was released as an open-source platform in 2022Q2. As part of WP4 the DeepVATS tool was evaluated using different use cases, such as the longwall shearer operation in an underground coal mine. The patterns found in this tool, which are completely data-driven and unsupervised, | ||
+ | |||
+ | In 2020 work of AGH included the implementation of the formal model for conformance checking of a longwall shearer process. This work was described in the paper in the Energies journal. The approach uses place-transition Petri nets with inhibitor arcs for modelling purposes. We used event log files collected from the mining use case. One of the main advantages of the approach is the possibility for both offline and online analysis of the log data. In the paper we presented a detailed description of the longwall process, an original formal model we developed, and the implementation in the TINA software (http:// | ||
+ | |||
+ | In [13] we performed detailed survey on different explanation mechanism and knowledge integration approaches with data mining pipeline, and application to real-life industrial use-cases. This survey along with first attempts to combine domain knowledge with visual analytics and data mining models resulted in works published in [12], where we present the Immersive Parallel Coordinates Plots (IPCP) system for Virtual Reality (VR). The system provides data science analytics built around a well-known method for visualization of multidimensional datasets in VR. The data science analytics enhancements consist of importance analysis and a number of clustering algorithms to automate the work previously done by the experts manually. | ||
+ | |||
+ | The results and experience from these works were foundation for implementation of PACMEL framework and tools that form it: KnAC [17], ClAMP [16] and DeepVATS [21]. | ||
+ | |||
+ | ==== Development of innovative software ==== | ||
+ | The source code of the implementation of the methodology presented in [1] is publicly available in Github (https:// | ||
+ | In 2019 UniBZ carried out some foundational work that is relevant for PACMEL. Specifically, | ||
+ | |||
+ | KnAC [17] is a tool for expert knowledge extension with a usage of automatic clustering algorithms. It allows to refine expert-based labelling with splits and merges recommendations of expert labelling augmented with explanations. The explanations were formulated as rules and therefore can be easily interpreted incorporated with expert knowledge. It was implemented as an open-source software available at https:// | ||
+ | |||
+ | ClAMP toolkit [16] is a mechanism that is complement to KnAC and in some cases can be used within it. It aims at helping experts in cluster analysis with human-readable rule-based explanations. The developed state-of-the-art explanation mechanism is based on cluster prototypes represented by multidimensional bounding boxes. This allows to represent an arbitrary shaped clusters and combine strengths of local explanations with the generality of the global ones. The main goal of our work was to provide a methods for cluster analysis, that will be agnostic with respect to the clustering method and classification algorithm and will provide explanations in a form of executable and human-readable form. | ||
+ | |||
+ | DeepVATS [21] is an open-source tool that supports domain experts in the analysis and understanding of time series, especially when these are of long duration, due to they entail a high information overload. It works by presenting the domain expert with a plot containing the projection of the latent space of a Masked Time Series Autoencoder trained to reconstruct partial views of the input dataset. | ||
+ | The intersection of goals of DeepVATS and KnAC was a motivation to integrate both tools into one framework. This module is responsible for implementing the interaction layer between human users and the remaining system modules. The DeepVATS provides tools for time-series analytics, dimensionality reduction, clustering and visualization, | ||
+ | |||
+ | ==== New ideas, new knowledge, new interpretative models of complex phenomena ==== | ||
+ | Working towards provisioning to domain experts we developed a concept of Explanation-driven model stacking, presented in [6]. Practical usage of explainability methods in AI (XAI) is limited nowadays in most of the cases to the feature engineering phase of the data mining process. In our work we argue that explainability as a property of a system should be used along with other quality metrics such as accuracy, precision, recall in order to deliver better AI models. We developed a method that allows for weighted ML model stacking [7]. The code is made public on GitHub (https:// | ||
+ | In [18] we proposed an approach for evaluation of selected imperative and declarative models to decide which mode is more appropriate from a practical point of view for process monitoring. We formulated quantitative and qualitative criteria for models comparison. We then performed an analysis using the PACMEL mining case study. We used sensor data, and our approach consists of the following stages: (i) event log creation, (ii) process modelling, and (iii) process mining. We used the selected models in conformance checking tasks, with the use of a real event log. Evaluation of the created models indicated that in the case of the longwall mining the declarative model is more appropriate for practical use. | ||
+ | |||
+ | ===== Tools ===== | ||
+ | As described above three main tools developed during the project are available publically | ||
+ | * KnAC: [[https:// | ||
+ | * ClAMP: [[https:// | ||
+ | * DeepVATS: [[https:// | ||
+ | |||
+ | ===== Public benchmark ===== | ||
+ | Representative prototypes of computational procedures with data samples where prepared, tested, stored and finally published in the project GitLab repository: | ||
+ | [[https:// | ||
+ | |||
+ | ===== Papers ===== | ||
+ | * [1] V. Rodriguez-Fernandez, | ||
+ | * [2] D. Calvanese, S. Ghilardi, A. Gianola, M. Montali, and A. Rivkin. SMT-based Verification of Data-Aware Processes: A Model-Theoretic Approach. Mathematical Structures in Computer Science. 2020. 30(3): 271-313 (2020) https:// | ||
+ | * [3] M. Szpyrka, E. Brzychczy, A. Napieraj, J. Korski, G. J. Nalepa, Conformance Checking of a Longwall Shearer Operation Based on Low-Level Events, Energies 2020, 13, 6630. https:// | ||
+ | * [4] M. Szelążek, S. Bobek, A. Gonzalez-Pardo, | ||
+ | * [5] S. Bobek, A. Trzcinkowska, | ||
+ | * [6] S. Bobek and G. J. Nalepa. Augmenting automatic clustering with expert knowledge and explanations. In Computational Science – ICCS 2021: 21st International Conference, Krakow, Poland, June 16–18, 2021, Proceedings, | ||
+ | * [7] S. Bobek, M. Mozolewski, and G. J. Nalepa. Explanation-driven model stacking. In M. Paszynski, D. Kranzlmüller, | ||
+ | * [8] F. Piccialli, F. Giampaolo, E. Prezioso, D. Camacho, and G. Acampora, “Artificial intelligence and healthcare: Forecasting of medical bookings through multi-source time-series fusion,” Information Fusion, vol. 74, pp. 1–16, Oct. 2021 https:// | ||
+ | * [9] H. Liz, M. Sánchez-Montañés, | ||
+ | * [10] J. Huertas-Tato, | ||
+ | * [11] A. I. Torre-Bastida, | ||
+ | * [12] S. Bobek, S. K. Tadeja, Struski, P. Stachura, T. Kipouros, J. Tabor, G. J. Nalepa, and P. O. Kristensson. “Virtual reality-based parallel coordinates plots enhanced with explainable AI and data-science analytics for decision-making processes.” Applied Sciences, 12(1), 2022 https:// | ||
+ | * [13] G. J. Nalepa, S. Bobek, K. Kutt, and M. Atzmueller. “Semantic data mining in ubiquitous sensing: A survey.” Sensors, 21(13), 2021. https:// | ||
+ | * [14] M. Kuk, S. Bobek and G. J. Nalepa, " | ||
+ | * [15] S. Bobek, M. Kuk, J. Brzegowski, E. Brzychczy, and G. J. Nalepa. “KnAC: an approach for enhancing cluster analysis with background knowledge and explanations.” CoRR, abs/ | ||
+ | * [16] S. Bobek, M. Kuk, G. J. Nalepa, " | ||
+ | * [17] S. Bobek, M. Kuk, J. Brzegowski, E. Brzychczy, and G. J. Nalepa. “KnAC: an approach for enhancing cluster analysis with background knowledge and explanations.” Applied Intelligence (submitted, under second round review) | ||
+ | * [18] E. Brzychczy, M. Szpyrka, J. Korski, G. J. Nalepa, Imperative vs. declarative modelling of industrial process. The case study of the longwall shearer operation, 2021. (Submitted to IEEE Access). | ||
+ | * [19] K. Kutt, G. J. Nalepa, Loki – the Semantic Wiki for Collaborative Knowledge Engineering, | ||
+ | * [20] M. Szelążek, S. Bobek, G. J. Nalepa, Semantic Data Mining Based Decision Support for Quality Assessment in Steel Industry, 2022 (Submitted to Expert Systems). | ||
+ | * [21] V. Rodriguez-Fernandez, | ||
+ | * [22] S. Bobek, V. Rodriguez Fernandez, M. Szpyrka, E. Brzychczy, M. Mozolewski, D. Camacho, G. J. Nalepa, Framework for Process-aware Analytics for Industrial Processes Based on Heterogeneous Data Sources and Domain Knowledge, 2022 (Submitted to Engineering Applications of AI) | ||
+ | |||
+ | ===== Project-related | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * [[http:// | ||
* [[http:// | * [[http:// | ||
* [[https:// | * [[https:// | ||
- | |||
- | {{tag> | ||
Go back to -> [[pub: | Go back to -> [[pub: | ||
+ |
pub/projects/pacmel/start.1588342633.txt.gz · Last modified: 2020/05/01 14:17 by gjn