From Dr. Barry Robson’s notes:-
Is Data Analysis Particularly Difficult in Biomedicine?
Looking for a single strand of evidence in billions of possible semantic multiple combinations by Machine Learning
Of all disciplines, it almost seems that it is clinical genomics, proteomics, and their kin, which are particularly hard on the data-analytic part of science. Is modern molecular medicine really so unlucky? Certainly, the recent explosion of biological and medical data of high dimensionality (many parameters) has challenged available data analytic methods.
In principle, one might point out that a recurring theme in the investigation of bottlenecks to development of 21st century information technology relates to the same issues of complexity and very high dimensionality of the data to be transformed into knowledge, whether for scientific, business, governmental, or military decision support. After all, the mathematical difficulties are general, and absolutely any kind of record or statistical spreadsheet of many parameters (e.g., in medicine; age, height, weight, blood-pressure, polymorphism at locus Y649B, etc.) could, a priori, imply many patterns, associations, correlations, or eigensolutions to multivariate analysis, expert system statements, or rules, such as jHeight:)6ft, Weight:)210 lbs> or more obviously jGender:)male, jPregnant:)no>. The notation jobservation> is the physicists’ ket notation that forms part of a more elaborate “calculus” of observation. It is mainly used here for all such rule-like entities and they will generally be referred to as “rules”.
As discussed, there are systems, which are particularly complex so that there are many complicated rules not reducible to, and not deducible from, simpler rules (at least, not until the future time when we can run a lavish simulation based on physical first principles).
Medicine seems, on the whole, to be such a system. It is an applied area of biology, which is itself classically notorious as a nonreducible discipline.
In other words, nonreducibility may be intrinsically a more common problem for complex interacting systems of which human life is one of our more extreme examples. Certainly there is no guarantee that all aspects of complex diseases such as cardiovascular disease are reducible into independently acting components that we can simply “add up” or deduce from pairwise metrics of distance or similarity.
At the end of the day, however, it may be that such arguments are an illusion and that there is no special scientific case for a mathematical difficulty in biomedicine. Data from many other fields may be similarly intrinsically difficult to data mine. It may simply be that healthcare is peppered with everyday personal impact, life and death situations, public outcries, fevered electoral debates, trillion dollar expenditures, and epidemiological concerns that push society to ask deeper and more challenging questions within the biomedical domain than routinely happen in other domains.
Large Number of Possible Rules Extractable a Priori from All Types of High-Dimensional Data
For discovery of relationships between N parameters, there are almost always x (to the power N) potential basic rules, where x is some positive constant greater than unity and which is characteristic of the method of data representation and study. For a typical rectangular data input like a spreadsheet of N columns,
[2 to the power of N] – N – 1 = X numbers of tag rules from which evidence requires being established. Record with 100 variables and joint probability 2 means;
2^100-100-1 = 1.267650600228229401496703205275 × 10^30