Banner Banner

Towards Building Live Open Scientific Knowledge Graphs

Anh Le-Tuan
Carlos Franzreb
Danh Le Phuoc
Sonja Schimmler
Manfred Hauswirth

August 16 , 2022

Due to the large number and heterogeneity of data sources, it becomes increasingly difficult to follow the research output and the scientific discourse. For example, a publication listed on DBLP may be discussed on Twitter and its underlying data set may be used in a different paper published on arXiv. The scientific discourse this publication is involved in is divided among not integrated systems, and for researchers it might be very hard to follow all discourses a publication or data set may be involved in. Also, many of these data sources—DBLP, arXiv, or Twitter, to name a few—are often updated in real-time. These systems are not integrated (silos), and there is no system for users to query the content/data actively or, what would be even more beneficial, in a publish/subscribe fashion, i.e., a system would actively notify researchers of work interesting to them when such work or discussions become available.

In this position paper, we introduce our concept of a live open knowledge graph which can integrate an extensible set of existing or new data sources in a streaming fashion, continuously fetching data from these heterogeneous sources, and interlinking and enriching it on-the-fly. Users can subscribe to continuously query the content/data of their interest and get notified when new content/data becomes available. We also highlight open challenges in realizing a system enabling this concept at scale.