Dubito is a library for data scientists of predictive algorithms applicable to data sets ranging in size from very small to very large samples. These algorithms are specially designed to account for sensor imprecision, missing data, censored data, uncertain biases, and epistemic uncertainties that conventional methods neglect. Dubito offers analytical solutions for highly heterogeneous data that includes quantitative and qualitative information.
The problem it solves
Dubito is based on the Quiet Doubt philosophy which holds that uncertainty analysis is too important to be left in the hands of analysts, who sometimes neglect it or may not really appreciate how critical the consideration of imprecision and uncertainty often is.
Traditional risk assessment strategies are not always up to the challenge, and this deficiency can be serious. For instance, Monte Carlo simulation is very widely used, but it is computationally slow and it does not always yield a complete picture of the uncertainty. As one analyst ruefully noted, “Random sampling is terrible at finding worst-case scenarios, although terrorists are pretty good at it.”
The predictive algorithms in Dubito are used to make predictions from available data, to detect anomalies in data sets in real time, forecast future trends, and a host of other uses in risk analysis, strategic and tactical planning, forensic analyses and vulnerability studies. The methods are also useful in system design and scenario gaming. The list of predictive algorithms currently being implemented in Dubito includes:
When the inputs include epistemic uncertainty (uncertainty arising from imprecise measurements), most of these computational problems are known to be NP-hard in general, but we have developed an array of work-around solutions that make practical calculations highly scalable.
Characterizing uncertainty on the dataset
Dubito provides multiple avenues for assessing and characterizing uncertainty in inputs, including:
Significant Digit Conventions
Natural Language Approximators
Poisson Count Model
These methods ascribe estimated uncertainties to data values even if the data as provided lack any specification or statement about their imprecision. The methods can be combined.
Dubito has built-in features that detect data problems and calculation errors that would otherwise render conclusions specious. For instance, it automatically checks that measurement dimensions balance and units, if present, conform and can be compared or combined in mathematical operations.
Dubito also provides protection against overfitting, a grave yet very common methodological error that is rarely noticed in practice. Overfitting typically means statistical predictions should be believed. The software automatically considers the effects of model uncertainty which arises from possible doubt that the form of the model was initially correct and fully described by the analyst. Dubito takes these and other matters into consideration as it expresses the reliability of output calculations and inferences to ensure that it does not overstate its conclusions.
Dubito expresses its output in a way that is understandable. It employs a variety of schemes for expressing uncertainty in computed results in formats that are understandable by users despite the wide array of biases and misconceptions that psychometry has documented in human cognition. It can use natural-language expressions to clearly communicate analytical results to data scientists and less-technical decision makers.
Dubito is modular in design and ready for multiple environments. It can be deployed on the web or as a totally secured, behind-the-firewall solution.