# Datamining against Healthcare Waste – \$1.6 Trillion Revolutionary Hyperbolic Dirac Net (HDN) based Data Mining Technique in fretting out Rogue Claims – Dr. Barry Robson, Ingine, Inc.

DiracSmash, or just SMASH for short, is a Q-UEL application in the sense that it is compatible with QUEL. It extracts probabilistic knowledge from csv files and renders it in the form of Q-UEL tags. DiracSmash is a development of techniques developed in The BioIngine.com, DiracMiner, DiracBuilder and other Q-UEL applications to treat sporadic data efficiently, and is being progressively adapted to handle sporadic data such as payment claims data. Note that Q-UEL has a full set of tags enabling translation of codes for diseases, procedures, triggers, complications, management etc to allow conversion from the codes to more readable forms. The typical and main purpose of

DiracSMASH is two fold, exemplified by the following. use case.

i. “data mining” and construction of potentially huge inference nets to obtain e.g. the probability that a payment will normally be above a certain amount given the input data, when for example a particular patient has obtained a claim for that amount, and

ii. “pattern discovery”, e.g. to help explain this probability by discovering patterns that are associated with cases where this probability is above that amount.

For example, it may build an HDN inference network (analogous to a Bayes Net but not confined to an Directed Acyclic Graph) implying thousands or millions of conditional probabilities, though for special reason discussed below (sporadic data), there are in this payment example merely 85 odds ratios as positive predictive odds and 85 as the corresponding odds likelihood ratio (analogous to relative risk), two probabilities comprising each, i.e. just 85 x 2 x2 = 340 probabilities.

################ NET of 85 odds ratios.

################ NETforward (predictive odds) = 2.038 ######################

################ NETbackward (likelihood ratio) = 16.477 ####################

################ NETassoc (ratio of association constants) = 9.098 #########

FORWARD PROBABILTY P(‘CLM_PMT_AMT’:=’ge100′ | NET) = 0.110

Joint probability ratio forward = 2.03780243432802 should ideally agree with following.

Joint probability ratio backward = 2.03103720070852

Real part = 2.03441981751827 (existential, coherence, extent of agreement).

Imaginary part = 0.00338261680975416 (universal, incoherence, extent of disagreement).

It can seek to help explain this with many discovered patterns, such as

<Q-UEL-PATFACTORS-3 ‘HCPCS_CD_32′:=’97110’ Pfwd:=0.00000529 | if:=count:=36 | ‘CLM_PMT_AMT’:=’ge100′ ‘ICD9_DGNS_CD_1′:=’V5832’ Q-UELPATFACTORS-3>

<Q-UEL-PATFACTORS-7 ‘ICD9_DGNS_CD_1′:=’V5832’ ‘ICD9_DGNS_CD_5′:=’78079’ ‘ICD9_PRCDR_CD_4′:=’40390’ ‘HCPCS_CD_33′:=’94762’ ‘HCPCS_CD_35′:=’94761’ Pfwd:=0.00000029 | if:=count:=2 | ‘CLM_PMT_AMT’:=’ge100′ ‘ICD9_DGNS_CD_1′:=’V5832’ Q-UEL-PATFACTORS-7>

The principles are not confined to the above scenario, nor even to payment data at all. No questions may be asked at all, and mining can still be done. Conversely, there may also be an indefinitely large list of “cases” (“conditions”, “constraints”, “denominators”) such as as say age, blood pressure 140 etc, and the data mining will apply to these cases considered collectively, i.e. to cases that satisfy all. The questions asked may also be of a different nature, such as equal or not equal to a name or code (see below). For example, the DiracSmash process produced list of all those tags having the predictive risk over 0.1 for the along with other supporting evidences, but this level is adjustable, as is an optional minimum number of required observations, and a test on significant information content. Although as noted above SMASH can be run without any guidance it is almost always given a “hitlist” file. For example

‘CLM_PMT_AMT’:=>’100′

‘ICD9_DGNS_CD_1′:=’V5832’

# ‘ICD9_DGNS_CD_2′:=’V5861’

# ‘ICD9_DGNS_CD_5′:=’V5869’

means predict and calculate probability for payment amounts greater than \$100, considering only cases in which ‘ICD9_DGNS_CD_1′:=’V5832’. Convenient input is X:=value, X:=>value, X:=<value, but a full range of logical comparitors, eq, ne, gt,ge, lt, le is available. Optionally the primary condition, the second line on the list, may also use the range notation. The two entries starting ‘#’ are simply ignored, and use of these “comment out” feature familiar to programmers allows one to experiment with various conditions ad constraints. The first line is special and called the “target”. Questions asked by the first line can be greater than (gt), less than (lt), greater than or equal to (ge), less than or equal to (le), or equal to (eq) or note equal to (ne), for quantitative data, or equal to (eq, here meaning the same as) or not equal to (ne, here meaning different to) specified categorical data. The alternative and more usual input is to use the following, though it converted to the above notation internally and in reports.

‘CLM_PMT_AMT’:=>’100′ (ge, this value or higher as opposed to less than)

‘CLM_PMT_AMT’:=<‘100’ (le, this value or lower as opposed to equal to of higher)

‘CLM_PMT_AMT’:= ‘100’ (eq, this value or word as opposed to anything else.

Relevant Definitions

Hyperbolic Dirac Net (HDN) – A probabilistic inference-based statistical reasoning algorithm and technology described in some detail below. An HDN may be considered as related to the Bayes Net (BN, see below) but, the HDN does not have the severe and unrealistic graph-theoretic constraints that define the traditional BN, and naturally extends to a more inference using general probabilistic semantics and exploiting natural language processing. The HDN approach was developed employing the following.

Q-UEL – Quantum Universal Exchange Language (Q-UEL) is an algebraic notational language derived from the Dirac Notation, the mathematical machinery that defines quantum mechanics and a long and widely accepted standard in physics. Q-UEL was originally proposed as an interoperability language in response [8-13] to a Federal report of the President’s Council of Advisors on Science and Technology for a Universal Exchange Language (UEL) for healthcare in December 2010 . Q-UEL has from the outset been applied to electronic health records and biomedical data. Its concept endures as a powerful architectural principle, managing the problem of the interchange and merging of medical data and knowledge from a variety of formats and ontologies.

Dirac Notation – The HDN and Q-UEL are both based on the long used standard in quantum mechanics (QM) called Dirac Notation . “Notation” is generally understood to be an understatement as it is also a algebra for expressing uncertainty in observations and measurements. The notational and algebraic aspects can also map to use in the everyday world, interpreting it as a probabilistic inference algorithm with semantic applications.