Classical Statistics in University Under-graduate courses or even Graduate courses starts with descriptive statistics and then moves into distribution fitting and then all the way to complex multivariate analysis. Essentially covering hypothesis testing, correlation, regression , factor analysis and Principal Component analysis.

Statistics assumes a lot of a-priori knowledge about the data and its properties and does not necessarily cover a lot of trial and error or even tinkering.

Machine Learning in new age looks at wide array of techniques and algorithms which themselves learn from the data. Deep Machine Learning, Supervised Learning and Reinforcement Learning covers very interesting algorithm which learn themselves from the wide array of data. So data becomes input and model becomes output. This happens without any human intervention ( except in supervised learning). This is the real beauty of ML over conventional statistics. Although new age ML ( covering CNN/Deep Learning/Reinforcement Learning) draws a lot from statistics, cognitive biology, neuroscience, mathematics and control theory, most of the ML applications have been very new and have large technical and business impact.

In Reinforcement Learning classical optimization functions are used and behaviorism invested in psychology by Skinner comes int play in terms of “reward and punishment”. So behavior of the RL Algorithm is shaped in the same way a child’s behaviour is shaped by parents. Eventually use of Dynamic Programming from the classical optimization ( Operations Research) is used along with Bellman’s optimality conditions and MDP ( Markov Decision Process)

RL ensures that you can start “learning” with minimum domain or problem knowledge. Algorithm has power to learn and come up with its parameters depending on the error conditioning and reward optimization. Multiple algorithms like Temporal Difference Learning, Deep ! Learning and Actor Critic Methods ( A3c) ensure that algorithms in RL have power to create truly domain independent ways to learn in many many new domains without need to have domain knowledge.

ML Tribe( collection of AI Scientists, Data Analysts, ML practitioners, Students, Professors and Industry Professionals) is significantly different from old school statistics in many ways. Statistics assumes a lot of knowledge about the system. Statistical thinking in many ways is top-down, a-priori thinking. ML( Broad umbrella of algorithms in RL, Deep Learning) thinking is inherently is posterior, does not assume much and is bottom-up. In many ways as Richard Dawkins puts it “ The Darwinian thinking is mindless, purposeless bottom-up processes involving R&D, Trial and Error and Tinkering all the way”. ML resembles our own biological evolution. The same way as biological evolution ML algorithms are also evolving. The big advantage is ML algorithms evolution is much faster unlike biological gradual, slow evolution.

ML works a lot like biological processes seen elsewhere in nature. Sometimes ML does not necessarily try to Optimize in the classical Optimization Sense ( finding the best possible solution from large scale solution space). ML tries a process of sophisticated tinkering which moves from finding one sub-optimal solution and then move ahead. This process ensures continuity in learning as well as learning becomes in many ways autonomous.

Statistics used to need a lot of careful sampling, sometimes meticulously planned data cleaning would pre-date a rigorous statistical analysis. ML works with existing data and tries to create inferences.

One of the families of ML algorithms, Bayesian Inferencing using basic Bayes Probability coupled with state-space generators like Monte Carlo simulation so that you create simulated data where data is non-existent or not accurate. ML algorithms this way build a kind of robustness against the Data Quality problems.