Linux

Ischemic Stroke Risk Assessment

The goal of this project was to develop an AI backend engine for an intelligent decision support system which asses an ischemic stroke risk. The system was developed to allow preventive interventions for patients with high risk of a stroke. In collaboration with a health insurer we collected historical electronic health record, social-demographic data, and quality of life related data to train and evaluate machine learning models. The data was analyzed to find which aspects present in the collected datasets had the highest impact on the ischemic stroke risk.

Live Contract Trading Data Harvesting

The goal of this project was to build an ETL system collecting and computing data about trades and trading positions for various cryptocurrencies from different trading platforms. The data about trades and trading positions was fetched via multiple REST APIs and web sockets (depending on the trading platform). The data was then normalized, and some elements were aggregated, computed, and estimated. Finally, the normalized data was loaded into a common data model in a relational database (Exasol DB).

Expert System for Blood Management

The goal of this project was to develop a rules-based decision support system. The solution was assisting medical doctors to make fast decision about blood transfusion. In collaboration with multiple healthcare domain experts and researchers we built a knowledge base describing rules on executing blood transfusion. The knowledge was then modelled within a rule-based decision support system. The system was deployed and served as a REST API within a docker container, what enabled horizontal scalability of the solution.

Predicting Blood Transfusion Needs

The goal of this project was to develop an AI-backed intelligent decision support system assisting medical doctors to make decision about blood transfusion. In collaboration with multiple hospitals we collected historical data about blood transfusions, patients, and medical tests relevant to blood transfusion. The data was analyzed to find which aspects present in the collected datasets had the highest impact on a decision if blood transfusion had been executed or not.

Sentiment (Popularity Index) Data Harvesting - Google Trends

The goal of this project was to build an ETL system collecting, normalizing, and aggregating data about popularity index (search volume) of various keywords from Google Trends. The system had two subsystems one for pulling historical data in bulk, the other for current data. After fetching RAW data, it is then normalized to align short and long term values. Following, the data is aggregated for different time granularity. The data was finally loaded into a relational database (Exasol DB).

SolarData - Simulation the Performance of Photovoltaic Energy Systems

This platform allows to simulate the amount of energy produced by a photostatic energy system. It extracts weather data from Grib files (binary format of weather data). Based on this data and geographic coordinates the system computes how much energy a PV installation is able to produce for particular weather conditions and location. Finally, it generates a graphical report to present this data on charts. SolarData is used to evaluate current performance and determine the future value of PV generation projects (expressed as the predicted energy yield) and, by extension, influence how PV projects and technologies are perceived in terms of investment risk.

Machine Learnin backed FAQ Chatbot

The goal of this project was to develop an AI backend for a FAQ chatbot. A client had collected significant amount of questions and answers that were used to train machine learning models. Besides text, the questions had assigned tags which were used to cluster questions info topic-based segments. The overall algorithm was trained as follows: first we trained a topic classifier based on the assigned tags; for this we performed TF-IDF transformation and we trained an XGBoost classifier second, for each topic we built a Doc2Vec embeddings targeting only the specific topic.

ETL system for cryptoassets transactions data

The goal of this project was to build an ETL system collecting and aggregating data about various cryptoassets, from multiple data sources. The data about blocks and transactions on the cryptoassets was fetched via multiple REST APIs, the data was then normalized, and some elements were aggregated and computed. Finally, the normalized data was loaded into a common data model in a relational database (Exasol DB). The system was designed to run for a long-time with a minimal supervision.

TripleProv - RDF Provenance

TripleProv is an in-memory RDF database capable to store, trace, and query provenance information in processing RDF queries. TripleProv returns an understandable description of the way the results of an RDF query were derived; specifically it gives a detailed explanation which pieces of data and how were combined to produce the answer of a query. Moreover, with TripleProv you can tailor query execution with provenance information. You can input a provenance specification of the data you want to use to derive the answer.

DiploCloud - Scalable Distributed RDF Data Management System

dipLODocusRDF is a system for RDF data processing supporting both simple transactional queries and complex analytics efficiently. dipLODocusRDF is based on a hybrid storage model considering RDF data both from a graph perspective (by storing RDF subgraphs or RDF molecules) and from a “vertical” analytics perspective (by storing compact lists of literal values for a given attribute). DiploCloud is distributed version of dipLODocusRDF. It is an efficient and scalable distributed RDF data management system for the cloud.