A model of reality (or simply knowledge or a theory) within a domain is essentially a “compression” of facts that allows us to calculate in an abstract world, and predict in the real world. A classic example is the ability to explain the motion of a wide range of objects in a wide range of situations with just three Newton’s laws of motion.
Science strives for theories as foundational axioms, plus rules governing production of new facts out of them, allowing us to make predictions with pen and paper or computers, verify those predictions in the physical world via experiment, and put them to use for practical applications. Typically, as new facts are discovered which an established theory is not able to explain, exceptions and special rules are grafted onto the theory in order to maintain the ability to continue making predictions, leading to a decrease in the level of “compression” in our knowledge. Then at some point, new great minds emerge and help usher in a radical new theory, getting rid of all the exceptions and special rules, resulting in a new round of “compression” in our knowledge. Examples of such phenomena in relatively modern times include relativity, modern quantum mechanics, plate tectonics, genetics, etc.
Two approaches for models
The power of a model/theory in a given domain is determined by the level of “compression” present in it. At one end, if all we had were facts (e.g. data), there would be zero “compression” and therefore, no viable theory, and no powerful knowledge. At the other end, a handful of equations that explains everything in a domain would make for highly powerful knowledge. There is a continuum between these two extremes of “data” based knowledge, and “mathematics” based knowledge. I am using the word “mathematics” in a broad sense to also include computation. Physics for example has striven for mathematical theories of reality with great success so far. Biology and chemistry, though not yet amenable to a broad mathematical treatment, have managed a large enough “compression” of facts in their fields that allows practitioners to make accurate predictions in the real world. Such fields are more empirical and data driven than they are mathematics driven. Having made the point about physics above, particle physics has seen the discovery of a zoo of new particles in relatively modern times, making it look more like biology, ripe for a round of “compression”. Wolfgang Pauli remarked “… Had I foreseen this, I would have gone into botany”.
We’ve grappled with these two different ways of thinking about knowledge since at least thousands of years ago. It is no different from the dichotomy between the Pythagorean (aka rational) view that mathematics is inherent to nature with data arising out of it, and the Aristotelian (aka empirical) view that data is inherent to nature, and mathematics is a way of understanding that data. Personally, I see no reason for nature’s mathematics to exactly match what our limited human brains are capable of conceiving, unless we make some metaphysical assumptions. Our models can only approximate reality to varying degrees of accuracy, much in the way that the shape of a given leaf may not be capturable analytically because it can only be done with the mathematics we’ve invented so far. However, it can be approximated, to an arbitrary degree of precision with a cubic spline whose parameters are optimized with real data. In rare cases, we may get lucky where a leaf’s shape closely matches an equation. However, a data driven approach seems more general and practically applicable.
It is interesting to ask what we are trying to accomplish with our efforts to create and use theories in various domains. One purpose is utilitarian - we want applications. Another purpose is epistemological - we want a deeper understanding of the world. Clearly, epistemological hunger is better satiated by a mathematics driven theory than a data driven theory. On the other hand, if our purpose is utilitarian, we need not care whether a theory is mathematical or data driven. All we care about is whether it works in practice.
Representations in AI
When AI started in the 1950s there was little data and compute power available. With all the grand successes in the hard sciences up to then, researchers naturally took to employing mathematical theories to attack AI problems. Unfortunately, in the real world, input data are unstructured, and arise from stochastic processes governed by a long list of known, unknown, and unknowable variables. Until machine learning came along, AI done with mathematical theories succeeded partially at best. AI is an applied field. While one cannot rule out an epistemological purpose to AI - we want to understand intelligence by creating an artificial one, a utilitarian purpose drives almost all work in AI today. In a utilitarian world, there is no use for elegant mathematics if it doesn’t ultimately lead to a system that can demonstrably solve a non-trivial problem on real world data. In the fundamental sciences we close the loop with carefully designed experiments to test the predictive power of a proposed theory in novel, real world situations. In AI, the loop is closed by evaluating the accuracy of a proposed method or model on novel, real world data. Modern AI is undoubtedly data driven, its success having come from a confluence of favorable factors including gargantuan amounts of freely available real world data, a methodology for building function approximators with them (i.e. neural networks) large amounts of compute power, and open source software frameworks.
The road ahead for AI theory and practice
Ever since the success of modern deep neural networks for classic AI problems, scientists have found broader uses for them across diverse disciplines. Just as software ate the world, neural networks have started eating science. But despite all the successes of neural networks, they are clearly not the “final answer” to our utilitarian or epistemological needs. Even as a utilitarian solution to many of humanity’s problems, neural networks leave a lot to be desired. They are like plastic: convenient, energy intensive, inelegant, and bad for the environment. As an AI community, we’re also acting like the proverbial guy searching for his lost keys under a lamp, because there’s light only available there. The only data available to us in voluminous amounts is open web data, so we’re focused on modeling it, and trying to apply it wherever we can.
As Alex Efros put it in a great talk recently, “… really stupid algorithms with lots of data gets you this unreasonable effectiveness …”, in a bit of a blow to researcher egos. Mapping massive amounts of data to a lower dimensional space with good regularization seems to be all it takes for good performance, rather than fancy algorithms. Academic researchers can use this golden opportunity to go back to pursuing the next big, bold ideas that deliver a much higher level of “compression”, and work in situations where data is truly scarce (no cheating allowed by leveraging large pre-trained models for so-called “low shot learning”!). In many ways the current situation is similar to one a couple of decades ago, where a handful of academics ignored the mainstream AI community which had given up on neural networks, and continued developing them in relative quiet until they had massive breakthroughs. In the meantime, industry researchers can tinker with paradigms that continue to be data and compute hungry - we have copious amounts of both, and entrepreneurs can seek good applications for them. We’re very behind in real applications for AI, especially “generative” AI, despite lots of technology having been developed. Lots of exciting work ahead, whether you are in academia or industry!