“GYDRA - Get Your Data Ready for Analysis" Tool
GYDRA (previously TAQIH) is a web-based tool for improving data quality in tabular datasets. It basically is able to deal with any dataset in tabular format (e.g. comma separated value csv files), which are typically semi-structured and is a common format for health services data exports, when there is not an strict common data model already in place.
GYDRA builds an intuitive, easy-to-use, web-based interface for implementing interactive data quality assessment and improvement processes: exploration and filtering of variables, value harmonization, correlation detection, missing values imputation, etc. Additional harmonization processes may be incorporated, we are currently starting to work on defining a data / knowledge model for COVID19 so we have a target common data model to harmonize the input variables (and possibly a set of COVID19 protocols as rule-sets). The idea is also to incorporate visual analytics tool that leverage the COVID19 scenario, as these visualizations are better if they are oriented to provide support to a specific data discovery / decision support problem. Again, we are seeking to develop this in the ongoing COVID19 initiatives.
Another relevant characteristic of GYDRA is that it supports large datasets, up to the level of millions of records. This is enabled via a very similar interface, but here the tool operates in a processing pipeline in batch mode, so you prepare all the transformations as with such large size they cannot be interactive. Possibly some of the processes rely on good approximations, as sometimes for a single calculation (e.g., calculating the average or median value) you would need all the data.