Publicly accessible documents and publications are valuable sources of information about insights, announcements, and developments in the domain of ground water, surface water, drinking water and waste water. However, due to the wealth and heterogeneity of these sources, there is a danger of overlooking important data. Moreover, the relevance of different sources and the requirements for the editing and presentation of the contents rely heavily on the target audience. For this reason, the Thuringian Water Innovation Cluster is in need of a semi-automatic monitoring and recommendation system that monitors these sources of information in regular intervals, gathers them, and processes them. The aim of the “ThWIC Sonar” framework is the development of an integrated system for water documents with a modern architecture that gathers information about the environment, processes it taxonomically, and recommends it proactively to different user groups.
"ThWIC Sonar" includes two sub-projects. In the sub-project “Sprachmodell und Ontologie-Einbindung” (language model and ontology integration), an AI system (a so-called language model) is being developed. This system will be able to classify and tag new text documents about the topic of water automatically. This tagging will include links to existing machine-parsable formal vocabularies (ontologies) and knowledge graphs, which makes the automatic extraction of further information possible. During development, special attention is paid to a well-documented and transparent process in the spirit of the FAIR principles. These principles demand that data and processes must be findable, accessible, interoperable and reusable by third parties. Furthermore, the process is developed in a way that is as resource-efficient as possible, makes do with as little training data as possible, and thus minimizes the energy consumption needed for the construction of the model.
In the sub-project “Validierung eines Frameworks zur Integration von Umweltinformationen in ein Informationshub und relevanzbasierte Informationsdistribution” (validation of a framework for the purpose of integrating environmental information into an information hub, and distribution of information based on relevance), AI-based technology is used to develop intelligent information management with a high degree of personalization. This information management will be provided for the cluster actors. To this end, data and information around the topic of water coming from different data sources will be monitored, amalgamated into a hub of information (“Sonar-Hub”), and made accessible. Contextual factors from user behaviour are linked methodically with the output of the language models/ontologies and considered in a recommendation algorithm. Additionally to their semantic linking, water topics are correlated dynamically with each other. The self-learning algorithm adapts to the information demands of different target audiences and their behaviour on the hub and thus supplies every user with information that is dynamic and relevance-based.