DiploCloud - Scalable Distributed RDF Data Management System
dipLODocusRDF is a system for RDF data processing supporting both simple transactional queries and complex analytics efficiently. dipLODocusRDF is based on a hybrid storage model considering RDF data both from a graph perspective (by storing RDF subgraphs or RDF molecules) and from a “vertical” analytics perspective (by storing compact lists of literal values for a given attribute).
DiploCloud is distributed version of dipLODocusRDF. It is an efficient and scalable distributed RDF data management system for the cloud. Contrary to previous approaches, DiploCloud runs a physiological analysis of both instance and schema information prior to partitioning the data. DiploCloud architecture follows the architecture of many modern cloud-based distributed systems (e.g., Google’s BigTable, where one (Master) node is responsible for interacting with the clients and orchestrating the operations performed by the other nodes (Worker). DiploCloud has been conceived from the ground up to support distributed data partitioning and co-location schemes in an efﬁcient and ﬂexible way. DiploCloud adopts an intermediate solution between tuple-partitioning and graph-partitioning by opting for a recurring, ﬁne-grained graph-partitioning technique taking advantage of molecule templates. DiploCloud’s molecule templates capture recurring patterns occurring in the RDF data naturally, by inspecting both the instance-level (physical) and the schema-level (logical) data.