Allihies Copper Mine Trail_master copy.j

Special Session Proposal for ISMLG 2023


Institute of Rock Mechanics and Tunnelling, Graz University of Technology, Austria (Dr.Alla Sapronova)

Amberg Technology, Switzerland (Dr.Thomas Dickmann)

Leoben Mountain University, Austria (Dr.Marlene Villeneuve)


Main contact: Assistant Prof., Dr.Alla Sapronova


Topic: Data Quality Assurance and Pre-processing in Geoscience



Discovering “valid, novel, potentially useful, and ultimately understandable patterns in data” via exploratory analysis in datasets is often referred to as data mining and consists of five main steps: problem identification, data selection, data preprocessing, data analysis, and communication of the results. While the success of data mining depends on the choice of analytical methods, the accuracy of analysis is sensitive to the data quality. The latest is usually addressed during the preprocessing step.

The proposed session will focus on discussing tools and methods to check and improve the quality of data used in geoscience, including but not limited to discussions on handling sparse, imbalanced, and mislabeled datasets. The session will also address the problem of rare events detection: with a vast amount of collected observations, the events having a significant impact and long-lasting consequences are often the most uncommon in frequency. During the preprocessing the improvement the data quality with employment of standard tools like noise reduction and outliers’ removal, may negatively impact the detection rare events. The session will offer an overview of the current state-of-the-art followed by practical recommendations on how to perform the data quality assessment while preserving the loss of information necessary for rare events detection.


During this special section we propose to present the following:

  1. “An overview of methods and recommended workflows for data quality assurance in geoscience, with special focus on handling data sparsity and uncertainty”;

  2. “Major techniques for rare events’ detection, followed by a discussion of the recommendations on data preprocessing techniques that help to ensure both data quality and novelty in patterns”;

  3. “Methods for handling data imbalance and the accuracy of forecasting machine learning models”;

  4. “Cascading machine learning models for detecting the human perception in data labeling and interpretation of the geological condition ahead of the tunnel face”;

  5. “Extraction of patterns in data during the preprocessing step by applying correlational analysis to the compositional”

  6.  “Use of time series constructed of temporal changes in the relationship between multiple variables for risk analysis”;

  7. “Use and limitations on various metrices to assess the quality of extreme sparse datasets”.


All the examples will be given using geotechnical data, and more specifically TBM and seismic datasets.