# The BioIngine.com – Deep Learning Comprehensive Statistical Framework – Descriptive to Probabilistic Inference

Given the challenge of analyzing against the large data sets both structured (EHR data) and unstructured data; the emerging Healthcare analytics are around below discussed methods d (multivariate regression), e (neural-net) and f (multivariate probabilistic inference); Ingine is unique in the Hyperbolic Dirac Net proposition for probabilistic inference.

The basic premise in engineering The BioIngine.com™ is in acknowledging the fact that in solving knowledge extraction from the large data sets (both structured and unstructured), one is confronted by very large data sets riddled with high-dimensionality and uncertainty.

Generally in solving insights from the large data sets the order in complexity is scaled as follows.

### a)   Insights around :- “what”

For large data sets, descriptive statistics are adequate to extract a “what” perspective. Descriptive statistics generally delivers statistical summary of the ecosystem and the probabilistic distribution.

Descriptive statistics : Raw data often takes the form of a massive list, array, or database of labels and numbers. To make sense of the data, we can calculate summary statistics like the mean, median, and interquartile range. We can also visualize the data using graphical devices like histograms, scatterplots, and the empirical cdf. These methods are useful for both communicating and exploring the data to gain insight into its structure, such as whether it might follow a familiar probability distribution.

### b)   Univariate Problem :- “what”

Considering some simplicity in the variables relationships or is cumulative effects between the independent variables (causing) and the dependent variables (outcomes):-

i) Univariate regression (simple independent variables to dependent variables analysis)

### c)    Bivariate Problem :- “what”

Correlation Cluster – shows impact of set of variables or segment analysis.

https://en.wikipedia.org/wiki/Correlation_clustering

From above link :- In machine learningcorrelation clustering or cluster editing operates in a scenario where the relationships between the objects are known instead of the actual representations of the objects. For example, given a weighted graph G = (V,E), where the edge weight indicates whether two nodes are similar (positive edge weight) or different (negative edge weight), the task is to find a clustering that either maximizes agreements (sum of positive edge weights within a cluster plus the absolute value of the sum of negative edge weights between clusters) or minimizes disagreements (absolute value of the sum of negative edge weights within a cluster plus the sum of positive edge weights across clusters). Unlike other clustering algorithms this does not require choosing the number of clusters k in advance because the objective, to minimize the sum of weights of the cut edges, is independent of the number of clusters.

http://www.statisticssolutions.com/correlation-pearson-kendall-spearman/

From above link. :- Correlation is a bivariate analysis that measures the strengths of association between two variables. In statistics, the value of the correlation coefficient varies between +1 and -1. When the value of the correlation coefficient lies around ± 1, then it is said to be a perfect degree of association between the two variables. As the correlation coefficient value goes towards 0, the relationship between the two variables will be weaker. Usually, in statistics, we measure three types of correlations: Pearson correlation, Kendall rank correlation and Spearman correlation

### d)   Multivariate Analysis (Complexity increases) :- “what”

§ Multiple regression (considering multiple univariate to analyze the effect of the independent variables on the outcomes)

§ Multivariate regression – where multiple causes and multiple outcomes exists

### e)   Neural Net :- “what”

The above discussed challenges of analyzing multivariate pushes us into techniques such as Neural Net; which is the next level to Multivariate Regression Statistical Approach…. where multiple regression models are feeding into the next level of clusters, again an array of multiple regression models.

The above Neural Net method still remains inadequate in depicting “how” probably the human mind is operates. In discerning the health ecosystem for diagnostic purposes, for which “how”, “why” and “when” interrogatives becomes imperative to arrive at accurate diagnosis and target outcomes effectively. Its learning is “smudged out”. A little more precisely put: it is hard to interrogate a Neural Net because it is far from easy to see what are the weights mixed up in different pooled contributions, or where they come from.

“So we enter Probabilistic Computations which is as such Combinatorial Explosion Problem”.

### f)    Hyperbolic Dirac Net (Inverse or Dual Bayesian technique): – “how”, “why”, “when” in addition to “what”.

All the above are still discussing the “what” aspect. When the complexity increases the notion of independent and dependent variables become non-deterministic, since it is difficult to establish given the interactions, potentially including cyclic paths of influence in a network of interactions, amongst the variables. A very simple example in just a simple case is that obesity causes diabetes, but the also converse is true, and we may also suspect that obesity causes type 2 diabetes cause obesity. In such situation what is best as “subject” and what is best as “object” becomes difficult to establish. Existing inference network methods typically assume that the world can be represented by a Directional Acyclic Graph, more like a tree, but the real world is more complex than that that: metabolism, neural pathways, road maps, subway maps, concept maps, are not unidirectional, and they are more interactive, with cyclic routes. Furthermore, discovering the “how” aspect becomes important in the diagnosis of the episodes and to establish correct pathways, while also extracting the severe cases (chronic cases which is a multivariate problem). Indeterminism also creates an ontology that can be probabilistic, not crisp.

Note: From Healthcare Analytics perspective most Accountable Care Organization (ACO) analytics addresses the above based on the PQRS clinical factors, which are all quantitative. Barely useful for advancing the ACO into solving performance driven or value driven outcomes most of which are qualitative.

To conduct HDN Inference, bear in mind that getting all the combinations of factors by data mining is “ combinatorial explosion ” problem, which lies behind the difficulty of Big Data as high dimensional data.

It applies in any kind of data mining, though it is most clearly apparent when mining structured data, a kind of spreadsheet with many columns, each of which are our different dimensions. In considering combinations of demographic and clinical factors, say A, B, C, D, E.., we ideally have to count the number of combinations (A), (A,B) (A, C) …(B, C, E)…and so on. Though sometimes assumptions can be made, you cannot always deduce a combination with many factors from those with fewer, nor vice versa. In the case of the number N of factors A,B,C,D,E,… etc. the answer is that there are 2N-1 possible combinations. So data with 100 columns as factors would imply about

1,000,000,000,000,000,000,000,000,000,000

combinations, each of which we want to observe several times and so count them, to obtain probabilities. To find what we need without knowing what exactly it is in advance, distinguishes unsupervised data mining from statistics in which traditionally we test a hunch, a hypothesis. But worse still, in our spreadsheet the A, B, C, D, E are really to be seen as column headings with say about n possible different values in the columns below them, and so roughly we are speaking of potentially needing to count not just, say, males and females but each of nN different kinds of patient or thing. This results in truly astronomic number of different things, each to observe many time. If merely n=10, then nN is

10,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,00,000,000

There is a further implied difficulty, which in a strange way lifts much the above challenge from the shoulders of researchers and of their computers. In most cases of the above, must of the things we are counting contain many of the factors A,B,C,D, E..etc. Such concurrences of so many things is typically rare, so many of the things we would like to count will never be seen at all, and most of the rest will just be seen 1, 2, or 3 times. Indeed, any reasonably rich patient record with lots of data will probably be unique on this planet. However, most approaches are unable to make proper use of that sparse data, since it seems that it would need to be weighted and taken into account in the balance of evidence according to the information it contains, and it is not evident how. The zeta approach tells us how to do that. In short, the real curse of high dimensionality is in practice not that our computers lack sufficient memory to hold all the different probabilities, but that this is also true for the universe: even in principle we do not have all the data to work to determine probabilities by counting with even if we could count and use them. Note that probabilities of things that are never observed are, in the usual interpretation of zeta theory and of Q-UEL, assumed to have probability 1. In a purely multiplicative inference net, multiplying by probability 1 will have no effect. Information I = –log(P) for P = 1 means that information I = 0. Most statements of knowledge are, as philosopher Karl Popper argued, assertions awaiting refutation.

Nonetheless the general approach in the fields of semantics, knowledge representation, and reasoning from it is to gather all the knowledge that can be got into a kind of vast and ever growing encyclopedia.

In The BioIngine.com™ the native data sets have been transformed into Semantic Lake or Knowledge Representation Store (KRS) based on Q-UEL Notational Language such that they are now amenable to HDN based Inferences. Where possible, probabilities are assigned, if not, the default probabilities are again 1.

# The Bioingine.com :- On-boarding PICO – Evidence Based Medicine [Large Data Driven Medicine]

### The BioIngine.com Platform Beta launch on the anvil with below discussed EBM examples for all to Explore !!!

The Bioingine.com Platform is built on Wolfram Enterprise Private Cloud

• using the technology from one of the leading science and tech companies
• using Wolfram Technology, the same technology that is at every Fortune 500 company
• using Wolfram Technology, the same technology that is at every major educational facility in the world
• leveraging the same technology as Wolfram|Alpha, the brains behind Apple’s Siri

Medical Automated Reasoning Programming Language environment [MARPLE]

References:- On PICO Gold Standard

Formulating a researchable question: A critical step for facilitating good clinical research

MARPLE – Question Format Medical Exam / PICO Setting

A good way to use Marple/HDNsudent is to set it up like an exam then the student answers. Marple then answers with its choices, i.e. candidate answers ranked by probability proposing its own choice of answer as the most probable and explaining why it did that (by the knowledge elements successfully used). This can then be compared with the intended answer of the examiner of which, of course Marple’s probability assessment of it can be seen.

It is already the case that MARPLE is used to test exam questions and it is scary that questions that have been issued by a Medical Licensing Board can turn out to be assigned an incorrect or unreachable answer by the examiner. The reason on inspection is that the question was ambiguous and potentially misleading, even though that may have not been obvious, or simply out of date – progress in science changed the answer and it shows up fast on some new web page (Translational Research for Medicine in action!). Often it is wrong or misleading because there turns out to be a very strong alternative answer.

Formulating the Questions in PICO Format

The modern approach to formulation is the recommendation for medical best practice known as PICO.

• P is the patient, population or problem (Primarily, what is the disease/diagnosis Dx?)
• I is intervention or something happening that intervenes (What is the proposed therapy Rx (drug, surgery, or life style recommendation)
• C is some alternative to that intervention or something happening that can be compared (with what options (including no treatment)? May also include this in the context of different compared types of patient female, diabetic, elderly, or Hispanic etc.
• O is the outcome, i.e. a disease state or set of such that occurs, or fails to occur, or is ideally terminated by the intervention such that health is restored. (Possibly that often means the prognosis, but often prognosis implies a more complex scenario on a longer timescale further in the future).

Put briefly “For P does I as opposed to C have outcome O” is the PICO form.

The above kinds of probabilities are not necessarily the same as an essentially statistical analysis by structured data mining would deliver. All of these except C relate to associations, symptoms, Dx, Rx, outcome.  It is C that is difficult. Probably the best interpretation is replacing Rx in associations with no Rx and then various other Rx. If C means say in other kinds of patients, then it is a matter of associations including those.

A second step of quantification is usually required in which probabilities are obtained as measures of scope based on counting. Of particular interest here is the odds ratio

Two Primary Methods of Asking a Question in The BioIngine

1. Primarily Symbolic and Qualitative. (more unstructured data dependent) [Release 1]

HDN is behind the scenes but focuses mainly on contextual probabilities between statements. HDNstudent is used to address the issue as a multiple choice exam with indefinitely large numbers of candidate answers, in which the expert end-user can formulate PICO questions and candidate answers, or all these can be derived automatically or semi-automatically. Each initial question can be split into a P, I, C, and O question.

2. Primarily Calculative and Quantitative. (more structured – EHR data dependent) [Release 2]

Focus on intrinsic probabilities, the degree of truth associated with each statement by itself. DiracBuilder used after DiracMiner addresses EBM decision measures as special cases of HDN inference. Of particular interest is an entry

<O |  P, I > / <O   |  P, C>

which is the HDN likelihood or HDN relative risk of the outcome O given patient/population/problem P given I as opposed to C, usually seen as a “NOT I”, and

<NOT O  |  P, I> / <NOT O | P, C>

which is the HDN likelihood or HDN relative risk of NOT getting the outcome O given patient/population/problem P given I as opposed to C usually seen as a “NOT I”. Note though that you get a two for one, because we also have <P, I |  O>, the adjoint form, at the same time, because on the complex conjugate of the other. Note that the ODDS RATIO is the former likelihood ratio over the latter, and hence the HDN odds ratio as it would normally be entered in DiracBuilder is as follows:-

<O | P, I>

/<NOT O | P, C>

<NOT O | P, C>

/<NOT O | P, I>

• QUALITATIVE / SYMBOLIC

An 84-year-old man in a nursing home has increasing poorly localized lower abdominal pain recurring every 3-4 hours over the past 3 days. He has no nausea or vomiting; the last bowel movement was not recorded. Examination shows a soft abdomen with a palpable, slightly tender, lower left abdominal mass. Hematocrit is 28%. Leukocyte count is 10,000/mm3. Serum amylase activity is within normal limits. Test of the stool for occult blood is positive. What is the diagnosis?

•This is usually addressed by a declared list of multiple choice candidate answers, though the list can be indefinitely large. 30 is not unusual.

•The answers are all assigned probabilities, and the most probable is considered the answer, at least for testing purposes in a medical licensing exam context. These probabilities can make use of probabilities, but predominantly they are contextual probabilities, depending in the relationships between chains and networks of knowledge elements that link the question to each answer.

• QUANTITATIVE / CALCULATIVE:

Will my female patient age 50-59 taking diabetes medication and having a body mass index of 30-39 have very high cholesterol if the systolic BP is 130-139 mmHg and HDL is 50-59 mg/dL and non-HDL is 120-129 mg/dL?”.

•This forms a preliminary  Hyperbolic Dirac Net (inference net) from the query, which may be refined and to each statement intrinsic probabilities are assigned, e.g. automatically by data mining.

•This question could properly start “What is the probability that…” . The real answers of interest here are not qualitative statements, but the final probabilities.

•Note the “IF”. But POPPER extends this to relationships beyond IF associative or conditional ones, e.g. verbs of action.

Quantitative Computations :- Odds Ratio and Risk Computations

• Medical Necessity
• Laboratory Testing Principles
• Quality of Diagnosis
• Diagnosis Test Accuracy
• Diagnosis Test
• Sensitivity
• Specificity
• Predictive Values – Employing Bayes Theorem (Positive and Negative Value)
• Coefficient of Variations
• Resolving Power
• Prevalence and Incidence
• Prevalence and Rate
• Relative Risk and Cohort Studies
• Predictive Odds
• Attributable Risk
• Odds Ratio

### Examples Quantitative / Calculative HDN Queries

In The Bioingine.com Release 1 – we are only dealing with Quantitative / Calculative type questions

Examples discussed in section A below are simple to play with to appreciate the HDN power for conducting inference. However, Problems B2 onwards requires some deeper understanding of the Bayesian and HDN analysis.

<‘Taking BP medication’:=’1’ |  ‘Taking diabetes medication’:= ‘1’>

/<‘Taking BP medication’:=’1’ | ‘Taking diabetes medication’:= ‘0’>

A.   Against Data Set 1.csv (2114 records with 33 variables created for Cardiovascular Risk Studies (Framingham Risk Factor)

B.   Against Data Set2.csv (nearing 700,000 records with 196 variables. Truly a large data set with high dimensionality (many columns of clinical and demographic factors), leading to a combinatorial explosion.

Note: in the examples below, you are forming questions or HDN queries such as

For African Caribbean patients 50-59 years old with a BMI of 50-59 what is the Relative Risk of needing to be on BP medication if there is a family history as opposed to no family history?

IMPORTANT: THE TWO-FOR-ONE EFFECT OF THE DUAL. Calculations report a dual value for any probabilistic value implied for the expression ented. In some cases you may be only interest in the first number in the dual, but the second number is always meaningful and frequently very useful. Notably, we say Relative Risk by itself for brevity, but in fact this is only the first number in the dual that is reported. In general, the form

<’A’:=’1’|’B’:=’1’>

/<’A’:=’1’|’B’:=’0’>

yields the following  dual probabilistic value…

(P(’A’:=’1’|’B’:=’1’)/ P(’A’:=’1’|’B’:=’0’),   ( P(’B’:=’1’|’A’:=’1’)/ P(’B’:=0’|’B’:=’1’),

where the first ratio is relative risk RR(P(’A’:=’1’|’B’:=’1’) and the second ratio is predictive odds RR(P(’A’:=’1’|’B’:=’1’).

a.   This inquiry seeking the risk of BP requires being translated into Q-UEL specification as shown below. [All the below Q-UEL queries in red can be copied and entered in the HDN query to get the HDN inference for the pertinent Data Sets.]

< ‘Taking BP medication’:=’1’ | ‘Family history of BP’:=’1 ‘ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and BMI:= ’50-59’ >

/< ‘Taking BP medication’:=’1’ | ‘Family history of BP’:=’0’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

b.    The Q-UEL specified query enables Notational Algebra to work while making inference from the giant semantic lake or the knowledge repository store (KRS).

c.    Recall, KRS is the representation of the universe as a Hyperbolic Dirac Net. This was created by transformation process of the uploaded data set to activate the automated statistical studies.

d.    The query works against the KRS and extracts the inference in HDN format displaying an inverse Bayesian Result; which calculates both classical and zeta probabilities :- Pfwd, Pzfwd & Pbwd, Pzbwd

A1. Relative Risk – High BP Case

Example: – Study of BP = blood pressure (high) in the population data set considered.

This case is very similar, because high BP and diabetes are each comorbidities with high BMI and hence to some extent with each other. Consequently we just substitute diabetes by BP throughout.

Note: for the values enter discreet or continuous

(0) We can in fact test the strength of the above with the following RR, which in effect reads as “What is the relative risk of needing to take BP medication if you are diabetic as opposed to not diabetic?

<‘Taking BP medication’:=’1’ | ‘Taking diabetes medication’:= ‘1’>

/<‘Taking BP medication’:=’1’ | ‘Taking diabetes medication’:= ‘0’>

The following predictive odds PO make sense and are useful here:-

<‘Taking BP medication’:=’1’ | ‘BMI’:= ’50-59’ >

/<‘Taking BP medication’:=’0’ | ‘BMI’:= ’50-59’ >

and (separately entered)

<‘Taking diabetes medication’:=’1’ | ‘BMI’:= ’50-59’ >

/<‘Taking diabetes medication’:=’0’ | ‘BMI’:= ’50-59’ >

And the odds ratio OR would be a good measure here (as it works in both directions). Note Pfwd = Pbw theoretically for an odds ratio.

<‘Taking BP medication’:=’1’ | ‘Taking diabetes medication’:= ‘1’>

<‘Taking BP medication’:=’0’ | ‘Taking diabetes medication’:= ‘0’>

/<‘Taking BP medication’:=’1’ | ‘Taking diabetes medication’:= ‘0’>

/<‘Taking BP medication’:=’0’ | ‘Taking diabetes medication’:= ‘1’>

(1)          For African Caribbean patients 50-59 years old with a BMI of 50-59 what is the Relative Risk of needing to be on BP medication if there is a family history as opposed to no family history?

< ‘Taking BP medication’:=’1’ | ‘Family history of BP’:=’1‘ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and BMI:= ’50-59’ >

/< ‘Taking BP medication’:=’1’ | ‘Family history of BP’:=’0’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

(2)          For African Caribbean patients 50-59 years old with a family history of BP what is the Relative Risk of needing to be on BP medication if there is a BMI of 50-59 as opposed to a reasonable BMI of ’20-29’?

< ‘Taking BP medication’:=’1’ | ‘Family history of BP’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking BP medication’:=’1’ | ‘Family history of BP’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’20-29’ >

(3)          For African Caribbean patients with a family history of BP, what is the Relative Risk of needing to be on BP medication if there is an age of 50-59 rather than 40-49?

< ‘Taking BP medication’:=’1’ | ‘Family history of BP’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking BP medication’:=’1’ | ‘Family history of BP’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’40-49’ and ‘BMI’:= ’50-59’>

(4)          For African Caribbean patients with a family history of BP, what is the Relative Risk of needing to be on BP medication if there is an age of 50-59 rather than 40-49?

< ‘Taking BP medication’:=’1’ | ‘Family history of BP’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking BP medication’:=’1’ | ‘Family history of BP’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59and ‘BMI’:= ’40-49’>

(5)          For African Caribbean patients with a family history of BP, what is the Relative Risk of needing to be on BP medication if there is an age of 50-59 rather than 40-49?

< ‘Taking BP medication’:=’1’ | ‘Family history of BP’:=’1‘ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking BP medication’:=’1’ | ‘Family history of BP’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59and ‘BMI’:= ’40-49’>

(6)          For African Caribbean patients with a family history of BP, what is the Relative Risk of needing to be on BP medication if there is an age of 50-59 rather than 30-39?

< ‘Taking BP medication’:=’1’ | ‘Family history of BP’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking BP medication’:=’1’ | ‘Family history of BP’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’30-39and ‘BMI’:= ’40-49’>

(7)          For African Caribbean patients with a family history of BP, what is the Relative Risk of needing to be on BP medication if there is an age of 50-59 rather than 20-29?

< ‘Taking BP medication’:=’1’ | ‘Family history of BP’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking BP medication’:=’1’ | ‘Family history of BP’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’20-29 and ‘BMI’:= ’40-49’>

(8)          For patients with a family history of BP age 50-59 and BMI of 50-59, what is the Relative Risk of needing to be on BP medication if they are African Caribbean rather than Caucasian?

< ‘Taking BP medication’:=’1’ | ‘Family history of BP’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking BP medication’:=’1’ | ‘Family history of BP’:=’1’ and ‘Ethnicity’:=‘Caucasian’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59>

(9)          For patients with a family history of BP age 50-59 and BMI of 50-59, what is the Relative Risk of needing to be on BP medication if they are African Caribbean rather than Asian?

< ‘Taking BP medication’:=’1’ | ‘Family history of BP’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking BP medication’:=’1’ | ‘Family history of BP’:=’’1 and ‘Ethnicity’:=‘Asian’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59>

(10)       For patients with a family history of BP age 50-59 and BMI of 50-59, what is the Relative Risk of needing to be on BP medication if they are African Caribbean rather than Hispanic

< ‘Taking BP medication’:=’1’ | ‘Family history of BP’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and  ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking BP medication’:=’1’ | ‘Family history of BP’:=’1’ and ‘Ethnicity’:=‘Hispanic’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59>

A2. Relative Risk – Diabetes Case

Against Data Set1.csv

Type 2 diabetes is implied here.

(11)       For African Caribbean patients 50-59 years old with a BMI of 50-59 what is the Relative Risk of needing to be on diabetes medication if there is a family history as opposed to no family history?

< ‘Taking diabetes medication’:=’1’ | ‘Family history of diabetes’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and BMI:= ’50-59’ >

/< ‘Taking diabetes medication’:=’1’ | ‘Family history of diabetes’:=’0’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

(12)       For African Caribbean patients 50-59 years old with a family history of diabetes what is the Relative Risk of needing to be on diabetes medication if there is a BMI of 50-59 as opposed to a reasonable BMI of ’20-29’?

< ‘Taking diabetes medication’:=’1’ | ‘Family history of diabetes’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking diabetes medication’:=’1’ | ‘Family history of diabetes’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’20-29’ >

(13)       For African Caribbean patients with a family history of diabetes, what is the Relative Risk of needing to be on diabetes medication if there is an age of 50-59 rather than 40-49?

< ‘Taking diabetes medication’:=’1’ | ‘Family history of diabetes’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking diabetes medication’:=’1’ | ‘Family history of diabetes’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’40-49’ and ‘BMI’:= ’50-59’>

(14)       For African Caribbean patients with a family history of diabetes, what is the Relative Risk of needing to be on diabetes medication if there is an age of 50-59 rather than 40-49?

< ‘Taking diabetes medication’:=’1’ | ‘Family history of diabetes’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking diabetes medication’:=’1’ | ‘Family history of diabetes’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59and ‘BMI’:= ’40-49’>

(15)       For African Caribbean patients with a family history of diabetes, what is the Relative Risk of needing to be on diabetes medication if there is an age of 50-59 rather than 40-49?

< ‘Taking diabetes medication’:=’1’ | ‘Family history of diabetes’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and  ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking diabetes medication’:=’1’ | ‘Family history of diabetes’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59and ‘BMI’:= ’40-49’>

(16)       For African Caribbean patients with a family history of diabetes, what is the Relative Risk of needing to be on diabetes medication if there is an age of 50-59 rather than 30-39?

< ‘Taking diabetes medication’:=’1’ | ‘Family history of diabetes’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking diabetes medication’:=’1’ | ‘Family history of diabetes’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’30-39and ‘BMI’:= ’40-49’>

(17)       For African Caribbean patients with a family history of diabetes, what is the Relative Risk of needing to be on diabetes medication if there is an age of 50-59 rather than 20-29?

< ‘Taking diabetes medication’:=’1’ | ‘Family history of diabetes’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking diabetes medication’:=’1’ | ‘Family history of diabetes’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’20-29and ‘BMI’:= ’40-49’>

A3. Relative Risk – Cholesterol Case

Against Data Set1.csv

(18)       For African Caribbean patients 50-59 years old with a fat% of 40-49, what is the Relative Risk of needing to be on cholesterol medication if there is a family history as opposed to no family history?

< ‘Taking cholesterol medication’:=‘1’ | ‘Fat(%)’:=‘40-49’and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and BMI:= ’50-59’ >

/< ‘Taking cholesterol medication’:=‘’ | ‘Fat(%)’:=‘40-49’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

(19)       For African Caribbean patients 50-59 years old with a fat% of 40-49, with a family history of cholesterol, what is the Relative Risk of needing to be on cholesterol medication if there is a BMI of 50-59 as opposed to a reasonable BMI of ’20-29’?

< ‘Taking cholesterol medication’:=‘1’ | ‘Fat(%)’:=‘40-49’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking cholesterol medication’:=‘1’ | ‘Fat(%)’:=‘40-49’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’20-29’ >

(20)       For African Caribbean patients with a family history of cholesterol, with a fat% of 40-49, what is the Relative Risk of needing to be on cholesterol medication if there is an age of 50-59 rather than 40-49?

< ‘Taking cholesterol medication’:=‘1’ | ‘Fat(%)’:=‘40-49’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking cholesterol medication’:=‘1’ | ‘Fat(%)’:=‘40-49’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’40-49’ and ‘BMI’:= ’50-59’>

(21)       For African Caribbean patients with a family history of cholesterol, with a fat% of 40-49, what is the Relative Risk of needing to be on cholesterol medication if there is an age of 50-59 rather than 40-49?

< ‘Taking cholesterol medication’:=‘1’ | ‘Fat(%)’:=‘40-49’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking cholesterol medication’:=‘1’ | ‘Fat(%)’:=‘40-49’ and ‘Ethnicity’:=‘African Caribbean’ and  ‘age(years):=’50-59and ‘BMI’:= ’40-49’>

(22)       For African Caribbean patients with a family history of cholesterol, with a fat% of 40-49, what is the Relative Risk of needing to be on cholesterol medication if there is an age of 50-59 rather than 40-49?

< ‘Taking cholesterol medication’:=‘1’ | ‘Fat(%)’:=‘40-49’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking cholesterol medication’:=‘1’ | ‘Fat(%)’:=‘40-49’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59and ‘BMI’:= ’40-49’>

(23)       For African Caribbean patients with a family history of cholesterol , with a fat% of 40-49, what is the Relative Risk of needing to be on cholesterol medication if there is an age of 50-59 rather than 30-39?

< ‘Taking cholesterol medication’:=‘1’ | ‘Fat(%)’:=‘40-49’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking cholesterol medication’:=‘1’ | ‘Fat(%)’:=‘40-49’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’30-39and ‘BMI’:= ’40-49’>

(24)       For African Caribbean patients with a family history of cholesterol, with a fat% of 40-49, what is the Relative Risk of needing to be on cholesterol medication if there is an age of 50-59 rather than 20-29?

< ‘Taking cholesterol medication’:=‘1’ | ‘Fat(%)’:=‘40-49’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking cholesterol medication’:=‘1’ | ‘Fat(%)’:=‘40-49’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’20-29and ‘BMI’:= ’40-49’>

(25)       For patients with a family history of cholesterol age 50-59 and BMI of 50-59, with a fat% of 40-49, what is the Relative Risk of needing to be on cholesterol medication if they are African Caribbean rather than Caucasian?

< ‘Taking cholesterol medication’:=‘1’ | ‘Fat(%)’:=‘40-49’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking cholesterol medication’:=1‘’ | ‘Fat(%)’:=‘40-49’ and ‘Ethnicity’:=‘Caucasian’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59>

(26)       For patients with a family history of cholesterol age 50-59 and BMI of 50-59, with a fat% of 40-49, what is the Relative Risk of needing to be on cholesterol medication if they are African Caribbean rather than Asian?

< ‘Taking cholesterol medication’:=‘1’ | ‘Fat(%)’:=‘40-49’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking cholesterol medication’:=‘1’ | ‘Fat(%)’:=‘40-49’ and ‘Ethnicity’:=‘Asian’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59>

(27)       For patients with a family history of cholesterol age 50-59 and BMI of 50-59, with a fat% of 40-49, what is the Relative Risk of needing to be on cholesterol medication if they are African Caribbean rather than Hispanic

< ‘Taking cholesterol medication’:=‘1’ | ‘Fat(%)’:=‘40-49’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking cholesterol medication’:=‘1’ | ‘Fat(%)’:=‘40-49’ and ‘Ethnicity’:=‘Hispanic’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59>

(28)       For ‘African Caribbean’ patients with a family history of cholesterol age 50-59 and BMI of 50-59, what is the Relative Risk of needing to be on cholesterol medication if they have fat% 40-49 rather than 30-39?

< ‘Taking cholesterol medication’:=‘1’ | ‘Fat(%)’:=‘40-49’ and ‘Ethnicity’:= ‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking cholesterol medication’:=‘1’ | ‘Fat(%)’:=‘40-49’ and ‘Ethnicity’:= ‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59>

< ‘Taking diabetes medication’:=’1’ | ‘Family history of diabetes’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking diabetes medication’:=’1’ | ‘Family history of diabetes’:=’1’ and ‘Ethnicity’:=‘Caucasian’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’>

(29)       For patients with a family history of diabetes age 50-59 and BMI of 50-59, what is the Relative Risk of needing to be on diabetes medication if they are African Caribbean rather than Asian?

< ‘Taking diabetes medication’:=’1’ | ‘Family history of diabetes’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking diabetes medication’:=’1’ | ‘Family history of diabetes’:=’1’ and  ‘Ethnicity’:=‘Asian’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’>

(30)       For patients with a family history of diabetes age 50-59 and BMI of 50-59, what is the Relative Risk of needing to be on diabetes medication if they are African Caribbean rather than Hispanic

< ‘Taking diabetes medication’:=’1’ | ‘Family history of diabetes’:=’1’ and ‘Ethnicity’:=‘African Caribbean’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-59’ >

/< ‘Taking diabetes medication’:=’1’ | ‘Family history of diabetes’:=’1’ and ‘Ethnicity’:=‘Hispanic’ and ‘age(years):=’50-59’ and ‘BMI’:= ’50-5’9>

(31)       For patients with a family history of diabetesage 50-59 and BMI of 50-59, what is the Relative Risk of needing to be on diabetes medication if they are African Caribbean rather than Caucasian?

# The BioIngine.com Platform Beta Release 1.0 on the Anvil

The BioIngine.com™

Ingine; Inc™, The BioIngine.com™, DiracIngine™, MARPLE™ are all Ingine Inc © and Trademark Protected; also The BioIngine.com is Patent Pending IP belonging to Ingine; Inc™.

High Performance Cloud based Cognitive Computing Platform

The below figure depicts the healthcare analytics challenge as the order of complexity is scaled.

1. Introduction Beta Release 1.0

It is our pleasure to introduce startup venture Ingine; Inc that brings to market The BioIngine.com™Cognitive Computing Platform for the Healthcare market, delivering Medical Automated Reasoning Programming Language Environment (MARPLE) capability based on the mathematics borrowed from several disciplines and notably from late Prof Paul A M Dirac’s Quantum Mechanics.

The BioIngine.com™; is a High Performance Cloud Computing Platformdelivering HealthCare Large-Data Analytics capability derived from an ensemble of bio-statistical computations. The automated bio-statistical reasoning is a combination of “deterministic” and “probabilistic” methods employed against both structured and unstructured large data sets leading into Cognitive Reasoning.

The BioIngine.com™; delivers Medical Automated Reasoning based on a Medical Automated Programming Language Environment (MARPLE) capability, so better achieving 2nd order semantic interoperability1 in the Healthcare ecosystem. (Appendix Notes)

The BioIngine.com™ is a result of several years of efforts with Dr. Barry Robson; former Chief Scientific Officer, IBM Global Healthcare, Pharmaceutical and Life Science. His research has been in developing quantum math driven exchange and inference language achieving semantic interoperability, while also enabling Clinical Decision Support System, that is inherently Evidence Based Medicine (EBM). The solution, besides enabling EBM, also delivers knowledge graphs for Public Health surveys including those sought by epidemiologists. Based on Dr Robson’s experience in the biopharmaceutical industry and pioneering efforts in bioinformatics, this has the data mining driven potential to advance pathways planning from clinical to pharmacogenomics.

The BioIngine.com™; brings the machinery of Quantum Mechanics to Healthcare analytics; delivering a comprehensive data science experience that covers both Patient Health and Population Health (Epidemiology) analytics, driven by a range of bio-statistical methods from descriptive to inferential statistics, leading into evidence driven medical reasoning.

The BioIngine.com™; transforms the large clinical data sets generated by interoperability architectures, such as in Health Information Exchange (HIE) into “semantic lake” representing the Health ecosystem that is more amenable to bio-statistical reasoning and knowledge representation. This capability delivers evidence-based knowledge needed for Clinical Decision Support System, better achieving Clinical Efficacy by helping to reduce medical errors.

The BioIngine.com™; platform working against large clinical data sets or while residing within the large Patient Health Information Exchange (HIE) works in creating opportunity for Clinical Efficacy, while it also facilitates in the better achievement of “Efficiencies in the Healthcare Management” that Accountable Care Organization (ACO) seeks.

Our endeavors have resulted in the development of revolutionary Data Science to deliver Health Knowledge by Probabilistic Inference. The solution developed addresses critical areas in both scientific and technical, notably the healthcare interoperability challenges of delivering semantically relevant knowledge both at patient health (clinical) and public health level (Accountable Care Organization).

2. WhyThe BioIngine.com™?

The basic premise in engineering The BioIngine.com™ is in acknowledging the fact that in solving knowledge extraction from the large data sets (both structured and unstructured), one is confronted by very large data sets riddled by high-dimensionality and uncertainty.

Generally in solving insights from the large data sets the order in complexity is scaled as follows:-

A. Insights around :- “what”

For large data sets, descriptive statistics are adequate to extract a “what” perspective. Descriptive statistics generally delivers statistical summary of the ecosystem and the probabilistic distribution.

B. Univariate Problem :- “what”

Considering some simplicity in the variables relationships or is cumulative effects between the independent variables (causing) and the dependent variables (outcomes):-

a) Univariate regression (simple independent variables to dependent variables analysis)

b) Correlation Cluster – shows impact of set of variables or segment analysis.

https://en.wikipedia.org/wiki/Correlation_clustering

C. Multivariate Analysis (Complexity increases) :- “what”

a) Multiple regression (considering multiple univariate to analyze the effect of the independent variables on the outcomes)

b) Multivariate regression – where multiple causes and multiple outcomes exists

All the above are still discussing the “what” aspect. When the complexity increases the notion of independent and dependent variables become non-deterministic, since it is difficult to establish given the interactions, potentially including cyclic paths of influence in a network of interactions, amongst the variables. A very simple example in just a simple case is that obesity causes diabetes, but the also converse is true, and we may also suspect that obesity causes type 2 diabetes cause obesity… In such situation what is best as “subject” and what is best as “object” becomes difficult to establish. Existing inference network methods typically assume that the world can be represented by a Directional Acyclic Graph, more like a tree, but the real world is more complex than that that: metabolism, neural pathways, road maps, subway maps, concept maps, are not unidirectional, and they are more interactive, with cyclic routes. Furthermore, discovering the “how” aspect becomes important in the diagnosis of the episodes and to establish correct pathways, while also extracting the severe cases (chronic cases which is a multivariate problem). Indeterminism also creates an ontology that can be probabilistic, not crisp.

Most ACO analytics addresses the above based on the PQRS clinical factors, which are all quantitative. Barely useful for advancing the ACO into solving performance driven or value driven outcomes most of which are qualitative.

D. Neural Net :- “what”

https://www.wolfram.com/language/11/neural-networks/?product=mathematica

The above discussed challenges of analyzing multivariate pushes us into techniques such as Neural Net; which is the next level to Multivariate Regression Statistical Approach…. where multiple regression models are feeding into the next level of clusters, again an array of multiple regression models.

The Neural Net method still remains inadequate in exposing “how” probably the human mind is organized in discerning the health ecosystem for diagnostic purposes, for which “how”, “why”, “when” etc becomes imperative to arrive at accurate diagnosis and target outcomes efficiently. Its learning is “smudged out”. A little more precisely put: it is hard to interrogate a Neural Net because it is far from easy to see what are the weights mixed up in different pooled contributions, or where they come from.

“So we enter Probabilistic Computations which is as such Combinatorial Explosion Problem”.

E. Hyperbolic Dirac Net (Inverse or Dual Bayesian technique): – “how”, “why”, “when” in addition to “what”.

Note:- Beta Release 1.0 only addresses HDN transformation and inference query against the structured data sets and Features A, B and E. However, as a non-packaged solution C and D features can still be explored.

Release 2.0 will deliver full A.I driven reasoning capability MARPLE working against both structured and unstructured data sets. Furthermore, it will be designed to be customized for EBM driven “Point Of Care” and “Care Planning” productized user experience.

The BioIngine.com™offers a comprehensive bio-statistical reasoning experience in the application of the data science as discussed above that blends descriptive and inferential statistical studies.

The BioIngine.com™; is a High Performance Cloud Computing Platformdelivering HealthCare Large-Data Analytics capability derived from an ensemble of bio-statistical computations. The automated bio-statistical reasoning is a combination of “deterministic” and “probabilistic” methods employed against both structured and unstructured large data sets leading into Cognitive Reasoning.

Given the challenge of analyzing against the large data sets both structured (EHR data) and unstructured data; the emerging Healthcare analytics are around above discussed methods D and E; Ingine Inc is unique in the Hyperbolic Dirac Net proposition.

# BioIngine.com :- High Performance Cloud Computing Platform

Non-Hypothesis driven Unsupervised Machine Learning Platform delivering Medical Automated Reasoning Programming Language Environment (MARPLE)

Evidence Based Medicine Decision Process is based on PICO

From above link “Using medical evidence to effectively guide medical practice is an important skill for all physicians to learn. The purpose of this article is to understand how to ask and evaluate questions of diagnosis, and then apply this knowledge to the new diagnostic test of CT colonography to demonstrate its applicability. Sackett and colleagues1 have developed a step-wise approach to answering questions of diagnosis:”

Uncertainties in the Healthcare Ecosystem

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3146626/

### BioIngine.com Platform

Is High Performance Cloud Computing Platform delivering both probabilistic and deterministic computations; while combining HDN Inferential Statistics and Descriptive Statics.

The bio-statistical reasoning algorithm have been implemented in the Wolfram Language; which is a knowledge based programming unified symbolic language. As such symbolic language has a good synergy in implementing Dirac Notational Algebra.

The Bioingine.com; brings the Quantum Mechanics machinery to Healthcare analytics; delivering a comprehensive data science experience that covers both Patient Health and Public Health analytics driven by a range of bio-statistical methods from descriptive to inferential statistics, leading into evidence driven medical reasoning.

The Bioingine.com transforms the large clinical data sets generated by interoperability architectures, such as in Health Information Exchange (HIE) into semantic lake representing the Health ecosystem that is more amenable to bio-statistical reasoning and knowledge representation. This capability delivers evidence based knowledge needed for Clinical Decision Support System better achieving Clinical Efficacy by helping to reduce medical errors.

### Algorithm based on Hyperbolic Dirac Net (HDN)

An HDN is a dualization procedure performed on a given inference net that consists of a pair of split-complex number factorizations of the joint probability and its dual (adjoint, reverse direction of conditionality). Hyperbolic Dirac Net is derived from Dirac Notational Algebra that forms the mechanism to define Quantum Mechanics.

A Hyperbolic Dirac Net (HDN) is a truly Bayesian model and a probabilistic general graph model that includes cause and effect as players of equal importance. It is taken from the mathematics of Nobel Laureate Paul A. M. Dirac that has become standard notation and algebra in physics for some 70 years.  It includes but goes beyond the Bayes Net that is seen as a special and (arguably) usually misleading case. In attune with nature, the HDN does not constrain interactions and may contain cyclic paths in the graphs representing the probabilistic relationships between all things (states, events, observations, measurements etc.).  In the larger picture, HDNs define a probabilistic semantics and so are not confined to conditional relationships, and they can evolve under logical, grammatical, definitional and other relationships. It is also, in its larger context, a model of the nature of natural language and human reasoning based on it that takes account of uncertainty.

Explanation: An HDN is an inference net, but it is also best explained by showing that it stands in sharp contrast to the current notion of an inference net that, for historical reasons, is today often taken as meaning the same thing as a  Bayes Net. “A Bayesian network, Bayes network, belief network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of random variables and their conditional dependencies via a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.”  [https://en.wikipedia.org/ wiki/Bayesian_ network].  In practice, such nets have little to do with Bayes, nor Bayes’ rule, law, theorem or equation that  allows verification that probabilities used are consistent with each other and all other probabilities that can be derived from data. Most importantly, in reality, all things interact in the manner of a general graph, and a DAG is in general a poor model of reality since it consequently may miss key interactions.

### DiracMiner

Is a machine learning based biostatistical algorithm that transforms Large Data Sets such as Millions of Patient Records  into Semantic Lake as defined by HDN driven computations that is a mix of Numbers theory (Riemann Zeta) and Information Theory (Dual Bayesian or HDN)

The HDN – Semantic Lake, represents the health-ecosystem as captured in Knowledge Representation Store (KRS) consisting of Billions of Tags (Q-UEL Tags).

### DiracBuilder

Send an HDN query to KRS to seek HDN probabilistic inference / estimate. The Query for the inference contains the HDN that the user would like to have, and DiracBuilder helps get the best similar dual net by looking at what Billions of QUEL tags and joint probabilities are available.

### High Performance Cloud Computing

The Bioingine.com Platform computes (probabilistic computations) against the billions of Q-UEL tags employing extended in-memory processing technique. The creation of the billions of Q-UEL tags and querying against them is combinatorial explosionproblem.

The Bioingine platform working against large clinical data sets or while residing within the large Patient Health Information Exchange (HIE) works in creating opportunity for Clinical Efficacy and also facilitates in the better achievement of “Efficiencies in the Healthcare Management” that ACO seeks.

Our endeavors have resulted in the development of revolutionary Data Science to deliver Health Knowledge by Probabilistic Inference. The solution developed addresses critical areas both scientific and technical, notably the healthcare interoperability challenges of delivering semantically relevant knowledge both at patient health (clinical) and public health level (Accountable Care Organization).

### Multivariate Cognitive Inference from Uncertainty

Solving High-dimentional Multivariate Inference involving variables factors excess of factor 4 representing the high-dimentioanlity that characteristics of the healthcare domain.

EBM Diagnostic Risk Factors and Calculating Predictive Odds

Q-UEL tags of form

< A Pfwd:=x |  assoc:=y | B Pbwd:=z >

Say A = disease, B = cause,  drug,  or diagnostic prediction of disease, are designed to imply the following, knowing numbers x, y, and z.

P(A|B) = x

K(A; B) = P(A,B) / (P(A)P(B))   = y

P(BIA) = z

From which we can calculate the following….

P(A) = P(A|B)/K(A;B)

P(B) = P(B|A)/K(A;B)

P( NOT A) = 1 – P(A)

P(NOT B) = 1 – P(B)

P(A, B) = P(A|B)P(B) = P(B|A) P(A)

P(NOT A,  B)= P(B) – P(A B)

P(A, NOT B) = P(A) – P(A B)

P(NOT A, NOT B) = 1 – P(A, B) – P(NOT A, B) – P(A NOT B)

P(NOT A | B)  = 1  – P(A|B)

P(NOT B | A) = 1 –  P(B|A)

P(A | NOT B) =  P(A, NOT B)/P(NOT B)

P(B | NOT A) =  P(NOT A, B)/P(NOT A)

Positive Predictive Value P+ = P(A | B)

Negative Predictive value  P- = P(NOTA | NOT B)

Sensitivity = P(B | A)

Specificity = P(NOT B | NOT A)

Accuracy A =   P(A | B) + P(NOT A | NOT B)

Predictive odds PO = P(A | B) / P(NOT A | B)

Relative Risk RR = Positive likelihood ratio  LR+ =  P(A | B) / P(A | NOT B)

Negative  likelihood ratio  LR- =  P(NOT A | B) /  NOT A | NOT B)

Odds ratio OR = P(A, B)P(NOT A, NOT B)  /  (  P(NOT A,  B)P(A, NOT B) )

Absolute risk reduction ARR =  P(NOT A | B) – P(A | B) (where A is disease and B is drug etc).

Number  Needed to Treat NNT = +1 / ARR if ARR > 0 (giving positive result)

Number  Needed to Harm  NNH = -1 / ARR  if ARR > 0 (giving positive result)

Example:-

BP = blood pressure (high)

This case is very similar, because high BP and diabetes are each comorbidities with high BMI and hence to some extent with each other.  Consequently we just substitute diabetes by BP throughout.

(0) We can in f act test the strength of the above  with the following RR, which in effect reads as “What is the relative risk of needing to take BP medication if you are diabetic as opposed to not diabetic?

<‘Taking BP  medication’:=’1’  |  ‘Taking diabetes medication’:= ‘1’>

/<‘Taking BP  medication’:=’1’  | ‘Taking diabetes medication’:= ‘0’>

The following predictive odds  PO make sense and are useful here:-

<‘Taking BP  medication’:=’1’  |  ‘BMI’:= ’50-59’  >

/<‘Taking BP  medication’:=’0’  |  ‘BMI’:= ’50-59’  >

and (separately entered)

<‘Taking diabets medication’:=’1’  |  ‘BMI’:= ’50-59’  >

/<‘Taking diabetes  medication’:=’0’  |  ‘BMI’:= ’50-59’  >

And the odds ratio OR would be a good measure here (as it works in both directions). Note Pfwd = Pbw theoretically for an odds ratio.

<‘Taking BP  medication’:=’1’  | ‘Taking diabetes medication’:= ‘1’>

<‘Taking BP  medication’:=’0’  | ‘Taking diabetes medication’:= ‘0’>

/<‘Taking BP  medication’:=’1’  | ‘Taking diabetes medication’:= ‘0’>

/<‘Taking BP  medication’:=’0’  | ‘Taking diabetes medication’:= ‘1’>

# Value Added Partners Invited – BioIngine.com; Cognitive Computing Platform democratizing Medical Knowledge at Point of Care.

Commoditization of Data Science and unleashing Democratized Medical Knowledge.

The mission of Ingine Inc as a startup is to bring advancement in data science as applicable to medical knowledge extraction from large data sets.

Particularly following are the differentiators owing to which Ingine Inc is a candidate startup in hope of advancing science in difficult to solve areas; driven by decades of research by Dr. Barry Robson.

1. Introducing Hyperbolic Dirac Net (HDN); a machinery created borrowing from Quantum Mechanics to advance data mining and deep learning beyond what Bayesian could deliver; against the backdrop of very large data sets riddled with uncertainty and high-dimentionality. Most importantly, HDN based non-hypothesis approach allows us to create a learning system workbench that is also amenable to research and discovery related efforts based on deep learning techniques.
2. Create large data driven evidence based medicine (EBM). This means creating scientifically curated medical knowledge having gone through a process akin to systematic review.
3. Integrate Patient centric studies with epidemiological studies to achieve a comprehensive framework to advance integrated large data driven bio-statistical approach which addresses both systemic and also functional concerns. This means blending both descriptive and inferential (HDN) statistical approaches.
4. Introduce a comprehensive notational and symbolic programming framework that allows us to create a unified mathematical framework to deliver both probabilistic and deterministic methods of reasoning which allows us to create varieties of cognitive experience from large sets of data riddled with uncertainty.
5. Use all of the above in creating a Point of Care platform experience that delivers EBM in a PICO format as followed by the industry as a gold standard.

While PICO is employed as a framework to create EBM driven diagnosis process as a consequence of both qualitative and quantitative methods that better achieves systematic review; medical exam setting is used as a specification to define the template for enacting the EBM process. This is based on the caveat that for a system to qualify as an expert system in the medical area, it should also be able to pass medical exams based on the knowledge the learning system has acquired that is scientifically curated by both automated machine learning and manual intervention efforts.

As part of the overall architecture, that employs some ingenious design techniques such as non-predicated, non -hypothesis driven and schema-less design; semantic lake a tag driven knowledge repository is created from which the cognitive experience is created employing inferential statistics. Furthermore the capability can be delivered as a cloud computing platform where parallelization, in-memmory processing, high performance computing (HPC) and elastic scaling are addressed.

# Precision Medicine: With new program from White House; also comes redundant grant funding and waste – How does all these escape in high science areas?

Recently announced Precision Medicine a fantastic mission to bring all the research institutions country wide to collaborate together and holistically solve the civilization’s most complex and pressing problem Cancer, employing genomics while engaging science in an integrative discipline approach.

While the Precision Medicine mission is grand and certainly requires much attention and focus; that many new tools are now available for medical research such as complex algorithms in the areas of cognitive science (data mining, deep learning, etc), bigdata processing, cloud computing, etc; we also need efforts to arrest redundant spend and grants.

Speaking of precision medicine such waste what an irony.

### The White House Hosts a Precision Medicine Initiative Summit

Grand Initiative Redundant Research Grants for Same Methods

\$1,399,997 :- Study Description: We propose to develop Bayesian double-robust causal inference methods that are accurate, vigorous, and efficient for evaluating the clinical effectiveness of ATSs, utilizing electronic health records and registry studies, through working closely with our stakeholder advisory panel. The proposed “PCATS” R package will allow easy application of our methods without requiring R programming skills. We will assess clinical effectiveness of the expert-recommended ATSs for the pJIA patient population using a multicenter new-patient registry study design. The study outcomes are clinical responses and the health-related quality of life after a year of treatment.

\$832,703 :- Bayesian statistical approach in contrary try to use present as well as historical trial data in a combined framework and can provide better precision for CER. Bayesian methods also flexible in capturing subjecting prior opinion about multiple treatment options and tend to be robust. Despite these advantages, the Bayesian method for CER is underused and underdeveloped (see PCORI Methodology Report, pg. 64, 2013). The primary reasons being a lack of understanding about the role, the lack of methodological development, and the unavailability of easy-to-use software to design and conduct such analysis.

\$839,943 :- We propose to use a method of analysis called Bayes method, in which data on the frequency of a disease in a population is combined with data taken from an individual patient (for example, the result of a diagnostic test) to calculate the chance that the patient has the disease given his or her test result. Clinicians currently use Bayes method when screening patients for disease, but we believe the utility of this methodology extends far beyond its current use.

\$535,277 Specific Aims:

1. To encourage Bayesian analysis of HTE:
• To develop recommendations on how to study HTE using Bayesian statistical models
• To develop a user-friendly, free, validated software for Bayesian methods for HTE analysis

2. To develop recommendations about the choice of treatment effect scale for the assessment of HTE in PCOR. The main products of this study will be:

• recommendations or guidance on how to do Bayesian analysis of HTE in PCOR
• software to do the Bayesian methods
• recommendations or guidance on choosing appropriate treatment effect scale for HTE analysis in PCOR, and
• demonstration of our products using data from large comparative effectiveness trials.

# Probabilistic Modeling, Predictive Analytics & Intelligent Design from Multiple Medical Knowledge Sources

Bioingine.com; Probabilistic Modeling and Predictive Analytics Platform for A.I driven Deep Learning to discover Pathways from Clinical Data to Suggested Ontology for Pharmacogenomics; achieving Personalization and Driving Precision Medicine.

Data Integration in the Life Sciences: 11th International Conference, DILS 2015, Los Angeles, CA, USA, July 9-10, 2015, Proceedings

The Feature Diagram from the book above:-

#### Combining Multiple knowledge Sources and also Ontologies:-

##### [Suggested Ontologies for Pharmacogenomics converging to help find a Pathway]
• Patient Data (HL7, C-CDA)
• Gene Ontology
• ChEBI Ontology

#### Integration of Knowledge for Personalized Medicine:- Pharmacogenomics case-study

Looking Forward: The Case for Intelligent Design (and Infrastructure) in Life Science Biologics R&D Sponsored by: Dassault Systèmes; Alan S. Louie, Ph.D. January 2015

http://gate250.com/tc2/IDC%20Biologics%20White%20Paper.pdf

# Clinical Decisions and Empirical Dilemma :- Priori Knowledge independent of Experience vs Posterior Knowledge dependent on Experience

Leon Festinger, American social psychologist, is credited to have developed the idea around Cognitive Dissonance

As such evidence in Evidence Based Medicine is sought and built around empirical evidence, as experienced and commonly observed, the dilemma is that this empirical evidence has the hazard of being fraught with cognitive dissonance.

Bioingine.com employs algorithmic approach based on Hyperbolic Dirac Net that allows inference nets that are a general graph (GC), including cyclic paths, thus surpassing the limitation in the Bayes Net that is traditionally a Directed Acyclic Graph (DAG) by definition.

The Bioingine.com approach thus more fundamentally reflects the nature of probabilistic knowledge in the real world, which has the potential for taking account of the interaction between all things without limitation, and ironically this more explicitly makes use of Bayes rule far more than does a Bayes Net.  It also allows more elaborate relationships than mere conditional dependencies, as a probabilistic semantics analogous to natural human language but with a more detailed sense of probability.

To identify the things and their relationships that are important and provide the required probabilities, the Bioingine.com scouts the large complex data of both structured and also  information of unstructured textual character. It treats initial raw extracted knowledge rather in the manner of potentially erroneous or ambiguous prior knowledge, and validated and curated knowledge as posterior knowledge, and enables the refinement of knowledge extracted from authoritative scientific texts into an intuitive canonical “deep structure” mental-algebraic form that the Bioingine.com can more readily manipulate.

Empiricity has the hazard of introducing Cognitive Dissonance.

### The rationale for making medicine more science-based

https://www.painscience.com/articles/ebm-vs-sbm.php

### A priori and a posteriori

The Latin phrases a priori ( “from the earlier”) and a posteriori ( “from the latter”) are philosophical terms of art popularized by Immanuel Kant‘s Critique of Pure Reason (first published in 1781, second edition in 1787), one of the most influential works in the history of philosophy.[1] However, in their Latin forms they appear in Latin translations of Euclid‘s Elements, of about 300 bc, a work widely considered during the early European modern period as the model for precise thinking.

These terms are used with respect to reasoning (epistemology) to distinguish necessary conclusions from first premises (i.e., what must come before sense observation) from conclusions based on sense observation (which must follow it). Thus, the two kinds of knowledgejustification, or argument[clarification needed] may be glossed:

There are many points of view on these two types of knowledge, and their relationship is one of the oldest problems in modern philosophy.

The terms a priori and a posteriori are primarily used as adjectives to modify the noun “knowledge” (for example, a priori knowledge”). However, “a priori” is sometimes used to modify other nouns, such as “truth”. Philosophers also may use “apriority” and “aprioricity” as nouns to refer (approximately) to the quality of being “a priori“.[4]

https://en.wikipedia.org/wiki/A_priori_and_a_posteriori

Although definitions and use of the terms have varied in the history of philosophy, they have consistently labeled two separate epistemological notions. See also the related distinctions: deductive/inductiveanalytic/syntheticnecessary/contingent.

# Semantic Data Lake Delivering Tacit Knowledge – Evidence based Clinical Decision Support

Can the complexity be removed and tacit knowledge delivered from the plethora of the medical information available in the world.

” Let Doctors be Doctors”

Semantic Data Lake becomes the Book of Knowledge ascertained by correlation and causation resulting into Weighted Evidence

Characteristics of Bioingine.com Cognitive Computing Platform

• Architecture style moves from Event driven into Semantics driven
• Paradigm shift in defining system behavior – it is no more predicated and deterministic – Non Predicated Design
• Design is “systemic” contrasting the technique such as objected oriented based design, development and assembling components
• As such a system is better probabilistically studied.
• Design is context driven, where the boundary diminishes between context and concept
• System capability is probabilistically programmed by machine learning based on A.I, NLP and algorithms driven by ensemble of Math
• Design based on Semantic mining and engineering takes precedence to complex event processing (CEP). CEP and Event Driven Architecture (EDA) are the part of the predicated system design. Business rules engine may be an overkill.
• Ontology is created driven by both information and numbers theory

–Algebra – relationship amongst variables

–Calculus – rate of change in variable and its impact on the other

–Vector Space – study of states of the variables

Bioingine.com algorithm design driven by Probabilistic Ontology

• Probabilistic Ontology characterizes the ecosystem’s behavior
• Complex System’s semantic representation evolves generatively
• System better represented by semantic multiples. Overcomes the barrier of Triple Store (RDF)
• Human’s interact with the system employing knowledge inference technique
• Inductive knowledge precedes knowledge by deduction

Bioingine.com is a Probabilistic Computing Machine

• System’s behavior better modeled by the employ of probability, statistics and vector calculus (Statistics based on HDN an advancement to Bayes Net, where acyclic in DAG is overcome)
• Generally the system is characterized by high dimensionality in its data set (variability) in addition to volume and velocity.
• Most computing is in-memory

BioIngine.com; is designed based on mathematics borrowed from several disciplines and notably from Paul A M Dirac’s quantum mechanics. The approach overcomes many of the inadequacies in the Bayes Net that is based on the directed acyclic graph (DAG). Like knowledge relationships in the real word, and as was required for quantum mechanics, our approaches are neither unidirectional nor do they avoid cycles.

Bioingine.com Features –

• Bi-directional Bayesian Probability for knowledge Inference and Biostatistics (Hyperbolic complex).
• Built upon medical ontology (in fact this is discovered by machine learning, AI techniques).
• Can be both hypothesis and non-hypotheses driven.
• Quantum probabilities transformed to classical integrating vector space, Bayesian knowledge inference, and Riemann zeta function to deal with sparse data and finally driven by overarching Hyperbolic Dirac Net.
• Builds into web semantics employing NLP. (Integrates both System Dynamics and Systems Thinking).

Framework of Bioingine –Dirac-Ingine Algorithm Ensemble of Math