INFERENTIAL STATISTICS

The BioIngine.com – Deep Learning Comprehensive Statistical Framework – Descriptive to Probabilistic Inference

screen-shot-2016-12-12-at-12-54-49-pm

 

Given the challenge of analyzing against the large data sets both structured (EHR data) and unstructured data; the emerging Healthcare analytics are around below discussed methods d (multivariate regression), e (neural-net) and f (multivariate probabilistic inference); Ingine is unique in the Hyperbolic Dirac Net proposition for probabilistic inference.

The basic premise in engineering The BioIngine.com™ is in acknowledging the fact that in solving knowledge extraction from the large data sets (both structured and unstructured), one is confronted by very large data sets riddled with high-dimensionality and uncertainty.

Generally in solving insights from the large data sets the order in complexity is scaled as follows.

a)   Insights around :- “what”

For large data sets, descriptive statistics are adequate to extract a “what” perspective. Descriptive statistics generally delivers statistical summary of the ecosystem and the probabilistic distribution.

Descriptive statistics : Raw data often takes the form of a massive list, array, or database of labels and numbers. To make sense of the data, we can calculate summary statistics like the mean, median, and interquartile range. We can also visualize the data using graphical devices like histograms, scatterplots, and the empirical cdf. These methods are useful for both communicating and exploring the data to gain insight into its structure, such as whether it might follow a familiar probability distribution. 

b)   Univariate Problem :- “what”

Considering some simplicity in the variables relationships or is cumulative effects between the independent variables (causing) and the dependent variables (outcomes):-

i) Univariate regression (simple independent variables to dependent variables analysis)

c)    Bivariate Problem :- “what”

Correlation Cluster – shows impact of set of variables or segment analysis.

https://en.wikipedia.org/wiki/Correlation_clustering

From above link :- In machine learningcorrelation clustering or cluster editing operates in a scenario where the relationships between the objects are known instead of the actual representations of the objects. For example, given a weighted graph G = (V,E), where the edge weight indicates whether two nodes are similar (positive edge weight) or different (negative edge weight), the task is to find a clustering that either maximizes agreements (sum of positive edge weights within a cluster plus the absolute value of the sum of negative edge weights between clusters) or minimizes disagreements (absolute value of the sum of negative edge weights within a cluster plus the sum of positive edge weights across clusters). Unlike other clustering algorithms this does not require choosing the number of clusters k in advance because the objective, to minimize the sum of weights of the cut edges, is independent of the number of clusters.

http://www.statisticssolutions.com/correlation-pearson-kendall-spearman/

From above link. :- Correlation is a bivariate analysis that measures the strengths of association between two variables. In statistics, the value of the correlation coefficient varies between +1 and -1. When the value of the correlation coefficient lies around ± 1, then it is said to be a perfect degree of association between the two variables. As the correlation coefficient value goes towards 0, the relationship between the two variables will be weaker. Usually, in statistics, we measure three types of correlations: Pearson correlation, Kendall rank correlation and Spearman correlation

d)   Multivariate Analysis (Complexity increases) :- “what”

§ Multiple regression (considering multiple univariate to analyze the effect of the independent variables on the outcomes)

§ Multivariate regression – where multiple causes and multiple outcomes exists

https://www.linkedin.com/pulse/api/edit/embed?embed=%257B%2522request%2522%3A%257B%2522originalUrl%2522%3A%2522https%3A%252F%252Fwww.researchgate.net%252Fpublication%252F51046127_Introduction_to_Multivariate_Regression_Analysis%2522%2C%2522finalUrl%2522%3A%2522https%3A%252F%252Fwww.researchgate.net%252Fpublication%252F51046127_Introduction_to_Multivariate_Regression_Analysis%2522%257D%2C%2522images%2522%3A%255B%257B%2522width%2522%3A100%2C%2522url%2522%3A%2522https%3A%252F%252Fi1.rgstatic.net%252Fpublication%252F51046127_Introduction_to_Multivariate_Regression_Analysis%252Flinks%252F02e7e522e0814e1a12000000%252Fsmallpreview.png%2522%2C%2522height%2522%3A115%257D%2C%257B%2522width%2522%3A50%2C%2522url%2522%3A%2522https%3A%252F%252Fc5.rgstatic.net%252Fm%252F2671872220764%252Fimages%252Ftemplate%252Fdefault%252Fprofile%252Fprofile_default_m.jpg%2522%2C%2522height%2522%3A50%257D%255D%2C%2522data%2522%3A%257B%2522com.linkedin.treasury.Link%2522%3A%257B%2522width%2522%3A-1%2C%2522html%2522%3A%2522Official%2520Full-Text%2520Publication%3A%2520Introduction%2520to%2520Multivariate%2520Regression%2520Analysis%2520on%2520ResearchGate%2C%2520the%2520professional%2520network%2520for%2520scientists.%2522%2C%2522url%2522%3A%2522https%3A%252F%252Fwww.researchgate.net%252Fpublication%252F51046127_Introduction_to_Multivariate_Regression_Analysis%2522%2C%2522height%2522%3A-1%257D%257D%2C%2522provider%2522%3A%257B%2522display%2522%3A%2522ResearchGate%2522%2C%2522name%2522%3A%2522ResearchGate%2522%2C%2522url%2522%3A%2522http%3A%252F%252Fwww.researchgate.net%2522%257D%2C%2522description%2522%3A%257B%2522localized%2522%3A%257B%2522en_US%2522%3A%2522Official%2520Full-Text%2520Publication%3A%2520Introduction%2520to%2520Multivariate%2520Regression%2520Analysis%2520on%2520ResearchGate%2C%2520the%2520professional%2520network%2520for%2520scientists.%2522%257D%257D%2C%2522title%2522%3A%257B%2522localized%2522%3A%257B%2522en_US%2522%3A%2522Introduction%2520to%2520Multivariate%2520Regression%2520Analysis%2522%257D%257D%2C%2522type%2522%3A%2522link%2522%257D&signature=AYqcCeqOdz8mUzY85N4OFM__3OEp

 e)   Neural Net :- “what”

https://www.linkedin.com/pulse/api/edit/embed?embed=%257B%2522request%2522%3A%257B%2522originalUrl%2522%3A%2522https%3A%252F%252Fwww.wolfram.com%252Flanguage%252F11%252Fneural-networks%252F%253Fproduct%3Dmathematica%2522%2C%2522finalUrl%2522%3A%2522https%3A%252F%252Fwww.wolfram.com%252Flanguage%252F11%252Fneural-networks%252F%253Fproduct%3Dmathematica%2522%257D%2C%2522images%2522%3A%255B%257B%2522width%2522%3A329%2C%2522url%2522%3A%2522https%3A%252F%252Fwww.wolfram.com%252Flanguage%252F11%252Fneural-networks%252Fassets.en%252Ffeaturedimage.png%2522%2C%2522height%2522%3A241%257D%2C%257B%2522width%2522%3A300%2C%2522url%2522%3A%2522https%3A%252F%252Fwww.wolfram.com%252Flanguage%252F11%252Fneural-networks%252Fassets.en%252Favoid-overfitting-using-a-hold-out-set%252Fsmallthumb_8.png%2522%2C%2522height%2522%3A300%257D%2C%257B%2522width%2522%3A300%2C%2522url%2522%3A%2522https%3A%252F%252Fwww.wolfram.com%252Flanguage%252F11%252Fneural-networks%252Fassets.en%252Flearn-to-classify-points-from-different-clusters%252Fsmallthumb_5.png%2522%2C%2522height%2522%3A300%257D%2C%257B%2522width%2522%3A300%2C%2522url%2522%3A%2522https%3A%252F%252Fwww.wolfram.com%252Flanguage%252F11%252Fneural-networks%252Fassets.en%252Flearn-a-parameterization-of-a-manifold%252Fsmallthumb_4.png%2522%2C%2522height%2522%3A300%257D%2C%257B%2522width%2522%3A300%2C%2522url%2522%3A%2522https%3A%252F%252Fwww.wolfram.com%252Flanguage%252F11%252Fneural-networks%252Fassets.en%252Funsupervised-learning-with-autoencoders%252Fsmallthumb_2.png%2522%2C%2522height%2522%3A300%257D%255D%2C%2522data%2522%3A%257B%2522com.linkedin.treasury.Link%2522%3A%257B%2522width%2522%3A-1%2C%2522html%2522%3A%2522Introducing%2520high-performance%2520neural%2520network%2520framework%2520with%2520both%2520CPU%2520and%2520GPU%2520training%2520support.%2520Vision-oriented%2520layers%2C%2520seamless%2520encoders%2520and%2520decoders.%2522%2C%2522url%2522%3A%2522https%3A%252F%252Fwww.wolfram.com%252Flanguage%252F11%252Fneural-networks%252F%253Fproduct%3Dmathematica%2522%2C%2522height%2522%3A-1%257D%257D%2C%2522provider%2522%3A%257B%2522display%2522%3A%2522Wolfram%2522%2C%2522name%2522%3A%2522Wolfram%2522%2C%2522url%2522%3A%2522http%3A%252F%252Fwww.wolfram.com%2522%257D%2C%2522description%2522%3A%257B%2522localized%2522%3A%257B%2522en_US%2522%3A%2522Introducing%2520high-performance%2520neural%2520network%2520framework%2520with%2520both%2520CPU%2520and%2520GPU%2520training%2520support.%2520Vision-oriented%2520layers%2C%2520seamless%2520encoders%2520and%2520decoders.%2522%257D%257D%2C%2522title%2522%3A%257B%2522localized%2522%3A%257B%2522en_US%2522%3A%2522Neural%2520Networks%3A%2520New%2520in%2520Wolfram%2520Language%252011%2522%257D%257D%2C%2522type%2522%3A%2522link%2522%257D&signature=AceUI_VD_Va_c_32intSjEg6NvJU

The above discussed challenges of analyzing multivariate pushes us into techniques such as Neural Net; which is the next level to Multivariate Regression Statistical Approach…. where multiple regression models are feeding into the next level of clusters, again an array of multiple regression models.

The above Neural Net method still remains inadequate in depicting “how” probably the human mind is operates. In discerning the health ecosystem for diagnostic purposes, for which “how”, “why” and “when” interrogatives becomes imperative to arrive at accurate diagnosis and target outcomes effectively. Its learning is “smudged out”. A little more precisely put: it is hard to interrogate a Neural Net because it is far from easy to see what are the weights mixed up in different pooled contributions, or where they come from.

“So we enter Probabilistic Computations which is as such Combinatorial Explosion Problem”.

f)    Hyperbolic Dirac Net (Inverse or Dual Bayesian technique): – “how”, “why”, “when” in addition to “what”.

All the above are still discussing the “what” aspect. When the complexity increases the notion of independent and dependent variables become non-deterministic, since it is difficult to establish given the interactions, potentially including cyclic paths of influence in a network of interactions, amongst the variables. A very simple example in just a simple case is that obesity causes diabetes, but the also converse is true, and we may also suspect that obesity causes type 2 diabetes cause obesity. In such situation what is best as “subject” and what is best as “object” becomes difficult to establish. Existing inference network methods typically assume that the world can be represented by a Directional Acyclic Graph, more like a tree, but the real world is more complex than that that: metabolism, neural pathways, road maps, subway maps, concept maps, are not unidirectional, and they are more interactive, with cyclic routes. Furthermore, discovering the “how” aspect becomes important in the diagnosis of the episodes and to establish correct pathways, while also extracting the severe cases (chronic cases which is a multivariate problem). Indeterminism also creates an ontology that can be probabilistic, not crisp.

Note: From Healthcare Analytics perspective most Accountable Care Organization (ACO) analytics addresses the above based on the PQRS clinical factors, which are all quantitative. Barely useful for advancing the ACO into solving performance driven or value driven outcomes most of which are qualitative.

To conduct HDN Inference, bear in mind that getting all the combinations of factors by data mining is “ combinatorial explosion ” problem, which lies behind the difficulty of Big Data as high dimensional data.

It applies in any kind of data mining, though it is most clearly apparent when mining structured data, a kind of spreadsheet with many columns, each of which are our different dimensions. In considering combinations of demographic and clinical factors, say A, B, C, D, E.., we ideally have to count the number of combinations (A), (A,B) (A, C) …(B, C, E)…and so on. Though sometimes assumptions can be made, you cannot always deduce a combination with many factors from those with fewer, nor vice versa. In the case of the number N of factors A,B,C,D,E,… etc. the answer is that there are 2N-1 possible combinations. So data with 100 columns as factors would imply about 

1,000,000,000,000,000,000,000,000,000,000 

combinations, each of which we want to observe several times and so count them, to obtain probabilities. To find what we need without knowing what exactly it is in advance, distinguishes unsupervised data mining from statistics in which traditionally we test a hunch, a hypothesis. But worse still, in our spreadsheet the A, B, C, D, E are really to be seen as column headings with say about n possible different values in the columns below them, and so roughly we are speaking of potentially needing to count not just, say, males and females but each of nN different kinds of patient or thing. This results in truly astronomic number of different things, each to observe many time. If merely n=10, then nN is

10,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,00,000,000

There is a further implied difficulty, which in a strange way lifts much the above challenge from the shoulders of researchers and of their computers. In most cases of the above, must of the things we are counting contain many of the factors A,B,C,D, E..etc. Such concurrences of so many things is typically rare, so many of the things we would like to count will never be seen at all, and most of the rest will just be seen 1, 2, or 3 times. Indeed, any reasonably rich patient record with lots of data will probably be unique on this planet. However, most approaches are unable to make proper use of that sparse data, since it seems that it would need to be weighted and taken into account in the balance of evidence according to the information it contains, and it is not evident how. The zeta approach tells us how to do that. In short, the real curse of high dimensionality is in practice not that our computers lack sufficient memory to hold all the different probabilities, but that this is also true for the universe: even in principle we do not have all the data to work to determine probabilities by counting with even if we could count and use them. Note that probabilities of things that are never observed are, in the usual interpretation of zeta theory and of Q-UEL, assumed to have probability 1. In a purely multiplicative inference net, multiplying by probability 1 will have no effect. Information I = –log(P) for P = 1 means that information I = 0. Most statements of knowledge are, as philosopher Karl Popper argued, assertions awaiting refutation.

Nonetheless the general approach in the fields of semantics, knowledge representation, and reasoning from it is to gather all the knowledge that can be got into a kind of vast and ever growing encyclopedia. 

In The BioIngine.com™ the native data sets have been transformed into Semantic Lake or Knowledge Representation Store (KRS) based on Q-UEL Notational Language such that they are now amenable to HDN based Inferences. Where possible, probabilities are assigned, if not, the default probabilities are again 1. 

Advertisements