DiploCloud - Scalable Distributed RDF Data Management System

dipLODocusRDF is a system for RDF data processing supporting both simple transactional queries and complex analytics efficiently. dipLODocusRDF is based on a hybrid storage model considering RDF data both from a graph perspective (by storing RDF subgraphs or RDF molecules) and from a “vertical” analytics perspective (by storing compact lists of literal values for a given attribute).

DiploCloud is distributed version of dipLODocusRDF. It is an efficient and scalable distributed RDF data management system for the cloud. Contrary to previous approaches, DiploCloud runs a physiological analysis of both instance and schema information prior to partitioning the data. DiploCloud architecture follows the architecture of many modern cloud-based distributed systems (e.g., Google’s BigTable, where one (Master) node is responsible for interacting with the clients and orchestrating the operations performed by the other nodes (Worker). DiploCloud has been conceived from the ground up to support distributed data partitioning and co-location schemes in an efficient and flexible way. DiploCloud adopts an intermediate solution between tuple-partitioning and graph-partitioning by opting for a recurring, fine-grained graph-partitioning technique taking advantage of molecule templates. DiploCloud’s molecule templates capture recurring patterns occurring in the RDF data naturally, by inspecting both the instance-level (physical) and the schema-level (logical) data.

dipLODocusRDF

dipLODocus[RDF] - Short and Long-Tail RDF Analytics for Massive Webs of Data

DiploCloud: Efficient and Scalable Management of RDF Data in the Cloud

Avatar
Marcin Wylot, PhD
Data Scientist & Machine Learning Engineer

20 years of experience in data processing from A to Z