The philosophy behind Dubito: Quiet Doubt

All data is uncertain, although it is easy to neglect this uncertainty. Quiet Doubt acknowledges and tracks this uncertainty through calculations without need for expertise in uncertainty analysis.

Quiet Doubt

 

Quiet Doubt arises from the idea that uncertainty analysis is too important to be left in the hands of analysts, who often lack the time, skill or resources to undertake the proper accounting of uncertainties.  Existing programs and software tools for calculating with uncertainty, even those intended to be simple to use, require analysts to learn about theories or methods of uncertainty propagation to use them effectively.  Software should conduct uncertainty quantification and propagation automatically without the user even knowing it is happening.  Quiet Doubt is a software feature that facilitates the spread of routine uncertainty analysis by making it the responsibility of the software infrastructure rather than a working concern of analysts.  It should be, to the maximum extent possible, an automated process that happens behind the scenes, much as spell checking and correction occurs quietly and unobtrusively as documents are originally typed. The software should interject with warnings only when reducing the implied precision of outputs no longer suffices to indicate the uncertainty of the resulting calculations.  Quiet Doubt involves a wide variety of strategies for automating uncertainty quantification, propagation and reporting.

 

Significant digits

 

For instance, in order to assess uncertainty about inputs without requiring a user to explicitly characterize uncertainty, the software can recognize significant digits in all inputs as uncertainty-encoding conventions.  Thus, as data values and parameters are entered into software, they are assumed to be associated with at least as much epistemic uncertainty as would be implied by missing digits of their decimal representations.  For instance, if a user enters (or the software reads from a file) the number 23.45, the software will interpret this input to be the interval [23.445, 23.455].  Likewise, if an entered value is 1200, it is interpreted as the interval [1150, 1250].  These minimal uncertainties should be propagated through any calculations the software makes.  Well-designed software will simultaneously recognize mathematical constants such as 3.14156, unit conversion fractions, and the 2 in a square as precise mathematical values and not apply the significant digit interpretation to estimate uncertainty about them.

 

Hedge words

 

Quiet doubt also understands linguistic hedges known as approximators (e.g., about, around, almost, up to, около, 左右, حدود) which are often used in natural languages to express uncertainty attending numerical values.  The implications of the approximators for the magnitude of these uncertainties have been quantitatively studied for English expressions.  Research to quantitatively characterize the implications of approximators in other languages is relatively straightforward to conduct using online tools such as Amazon Mechanical Turk and games with a purpose in the sense of von Ahn.

 

There are other techniques, appropriate for particular situations, are available for estimating input uncertainties when they are not fully specified explicitly by the analyst.  These methods ascribe estimated uncertainties to data values even when the data as provided lack any specification or statement about their imprecision, and even when the numbers are specified with many apparently significantly significant digits.   

 

Robust uncertainty analysis

 

After uncertainties about the inputs are characterized, Quiet Doubt automatically applies robust uncertainty projection algorithms to all the calculations that underlie the analyses requested by a user.  The intent of Quiet Doubt is to invisibly conduct appropriate ancillary uncertainty analyses.  The software then modifies calculation outputs, reducing the number of digits in decimal numbers to reflect the reliability of each value in the face of the uncertainty analysis.  When reducing the implied precision of outputs is no longer sufficient to represent the actual uncertainty associated with the resulting calculations, warning or error messages should appear in addition to or in replacement of the computed numbers.  The point of Quiet Doubt is to be as unobtrusive as possible while ensuring that the software outputs do not mislead the user.

 

Intermediate strategies are also available when requested by users.  For instance, summary textual messages or graphical depictions can also accompany results that briefly explain their trustworthiness.  Human perception of risks and uncertainties are well known to be affected by many cognitive biases.  Quiet Doubt employs a variety of schemes for expressing uncertainty in computed results in formats that are most easily understandable by users despite the wide array of biases and misconceptions that psychometry has documented in human cognition.  For instance, it can use natural-language expressions to clearly communicate analytical results to data scientists and less-technical decision makers.

 

Extensive checking

 

Quiet Doubt automatically makes a variety of other checks that help to ensure the integrity of the analyses.  If an analysis makes an assumption about the underlying statistical distribution of a variable, software checks that the available data actually have that distribution, and that the available data do not contradict the assumption about the distribution.  The software can also detect a variety of other data problems and calculation errors. It automatically checks that measurement dimensions balance and units, if present, conform and can be compared or combined in mathematical operations the analysts requests.  As a part of its automated uncertainty analysis Quiet Doubt provides protection also against model overfitting, a grave yet very common methodological error that is rarely noticed in practice.  Software can likewise automatically consider the effects of model uncertainty and other matters as it expresses the reliability of output calculations and inferences to ensure that it does not overstate the conclusions.

© 2016 Applied Biomathematics