Making statistical thinking more productive

Similar documents
UNBIASED ESTIMATE FOR b-value OF MAGNITUDE FREQUENCY

KULLBACK-LEIBLER INFORMATION THEORY A BASIS FOR MODEL SELECTION AND INFERENCE

Probability and Statistics

PROBLEMS OF CAUSAL ANALYSIS IN THE SOCIAL SCIENCES

CHAPTER 0: BACKGROUND (SPRING 2009 DRAFT)

Available online at ScienceDirect. The 2014 conference of the International Sports Engineering Association

A SYSTEM VIEW TO URBAN PLANNING: AN INTRODUCTION

Creative Objectivism, a powerful alternative to Constructivism

Course: Physics 1 Course Code:

Lecture Notes on Certifying Theorem Provers

arxiv: v2 [physics.class-ph] 15 Dec 2016

Inertia and. Newton s First Law

Existence is Difference and is expressed and transformed by Information

Selecting an optimal set of parameters using an Akaike like criterion

Milton Friedman Essays in Positive Economics Part I - The Methodology of Positive Economics University of Chicago Press (1953), 1970, pp.

Consideration of Expression Method of the Entropy Concept

The World According to Wolfram

AR-order estimation by testing sets using the Modified Information Criterion

Fitting a Straight Line to Data

Hugh Everett III s Many Worlds

Quantum Mechanics as Reality or Potentiality from A Psycho-Biological Perspective

Supplementary Logic Notes CSE 321 Winter 2009

The Laplace Rule of Succession Under A General Prior

So Who was Sir Issac Newton??

Changes in properties and states of matter provide evidence of the atomic theory of matter

All instruction should be three-dimensional. Page 1 of 12

Manual of Logical Style

PREAMBLE (Revised) Why Einstein was Mistaken About the Velocity of Light.

Why Likelihood? Malcolm Forster and Elliott Sober * Philosophy Department University of Wisconsin, Madison. March 8, 2001

Objective probability-like things with and without objective indeterminism

Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition

Regression. Oscar García

A Strong Relevant Logic Model of Epistemic Processes in Scientific Discovery

Formal Epistemology: Lecture Notes. Horacio Arló-Costa Carnegie Mellon University

Energy Transformations IDS 101

How to write maths (well)

Proof strategies, or, a manual of logical style

PHY 123 Lab 1 - Error and Uncertainty and the Simple Pendulum

130 Great Problems in Philosophy and Physics - Solved? Chapter 10. Induction. This chapter on the web informationphilosopher.com/problems/induction

Course Goals and Course Objectives, as of Fall Math 102: Intermediate Algebra

The Game (Introduction to Digital Physics) *

Scientific Explanation- Causation and Unification

de Blanc, Peter Ontological Crises in Artificial Agents Value Systems. The Singularity Institute, San Francisco, CA, May 19.

Experimental Design and Graphical Analysis of Data

I. Induction, Probability and Confirmation: Introduction

What s the right answer? Doubt, Uncertainty, and Provisional Truth in Science

Using Fuzzy Logic as a Complement to Probabilistic Radioactive Waste Disposal Facilities Safety Assessment -8450

DECISIONS UNDER UNCERTAINTY

An Axiomatic Approach to Quantum Mechanics

Available online at ScienceDirect. The 2014 conference of the International Sports Engineering Association

Confirmation Theory. Pittsburgh Summer Program 1. Center for the Philosophy of Science, University of Pittsburgh July 7, 2017

Algebra II (One-Half to One Credit).

Using Possibilities to Compare the Intelligence of Student Groups in the van Hiele Level Theory

Assessing Risks. Tom Carter. tom/sfi-csss. Complex Systems

Springer-Verlag Berlin Heidelberg GmbH

HEAT AND THERMODYNAMICS PHY 522 Fall, 2010

LAB 21. Lab 21. Conservation of Energy and Pendulums: How Does Placing a Nail in the Path of a Pendulum Affect the Height of a Pendulum Swing?

Marc Lange -- IRIS. European Journal of Philosophy and Public Debate

An Absorbing Markov Chain Model for Problem-Solving

Albert Einstein THE COLLECTED PAPERS OF. The Principal Ideas of the Theory of Relativity REPRINTED FROM: VOLUME 7

Geographically Weighted Regression as a Statistical Model

APPLIED OPTICS. Lecture-1: EVOLUTION of our UNDERSTANDING of LIGHT. Is it a stream of particles?

Volume URL: Chapter Title: On The Uncertainty of Future Preferences

Fig.1b. Fig.1. Fig.1c. Fig.1d

Mappings For Cognitive Semantic Interoperability

May the force be with you

Variation of geospatial thinking in answering geography questions based on topographic maps

Chapter 7: Impulse and Momentum Tuesday, September 17, 2013

Solving Quadratic Equations Using Multiple Methods and Solving Systems of Linear and Quadratic Equations

INTRODUCTION. Fig. 1.1

Data Collection: What Is Sampling?

Alternative Probability Theories for Cognitive Psychology

SCOPE & SEQUENCE. Algebra I

On the Triangle Test with Replications

Lecture notes (Lec 1 3)

DESIGN AND ANALYSIS OF ALGORITHMS. Unit 6 Chapter 17 TRACTABLE AND NON-TRACTABLE PROBLEMS

Unit 6 Chapter 17 TRACTABLE AND NON-TRACTABLE PROBLEMS

Quality Control Using Inferential Statistics in Weibull-based Reliability Analyses

Proceedings The Behavior of Badminton Shuttlecocks from an Engineering Point of View

Commentary on Guarini

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis

UC Santa Barbara CSISS Classics

Other Problems in Philosophy and Physics

GOVERNMENT GIS BUILDING BASED ON THE THEORY OF INFORMATION ARCHITECTURE

Grade 6 Social Studies

Teaching Geography Author Phil GERSMEHL

Propositions and Proofs

Physics 12 Unit 2: Vector Dynamics

Algebra II. Key Resources: Page 3

Simulations of Sunflower Spirals and Fibonacci Numbers

The University of Texas at Austin. Air Resistance

** New Energy and Industrial Technology Development Organization (NEDO) Keisoku Co., Ltd. (CK)

Weather Second Grade Virginia Standards of Learning 2.6 Assessment Creation Project. Amanda Eclipse

A Little Deductive Logic

Angular Motion Maximum Hand, Foot, or Equipment Linear Speed

Grade 7. South Carolina College- and Career-Ready Mathematical Process Standards

FORMAL PROOFS DONU ARAPURA

MITOCW free_body_diagrams

Solving Quadratic & Higher Degree Equations

Final EXAM. Physics 203. Prof. Daniel A. Martens Yaverbaum. John Jay College of Criminal Justice, the CUNY. Sections 1, 2. Wednesday, 5/21/14

Transcription:

Ann Inst Stat Math (2010) 62:3 9 DOI 10.1007/s10463-009-0238-0 Making statistical thinking more productive Hirotugu Akaike Received: 19 June 2008 / Revised: 17 January 2009 / Published online: 15 May 2009 The Institute of Statistical Mathematics, Tokyo 2009 Abstract The use of the concept of plausibility is proposed for the comparison of psychological or physical images of an object with extremely complex structure. This concept helps the process of developing new image of an object that is not captured by the past observational data. It is argued that this process is the essential aspect of statistical thinking developed by original thinkers in various fields of scientific research with the aid of a model. The use of plausibility helps this process of thinking. A practical example is given by the analysis of golf swing motion. Keywords Likelihood Information criterion Plausibility Verbally defined model Golf swing motion 1 What is statistical thinking? When we make a decision without complete knowledge of the subject we make some inference on the possible outcome of the decision. The decision in this case cannot be reached by deductive handling of available information and contains inherent uncertainty. There is a need of logic or methodology for the handling of this uncertainty. Pierce discussed three types of uncertainty, probability, verisimilitude or likelihood, and plausibility (Buchler 1955, pp. 166 167). Pierce defined plausibility as the This paper is a completely revised version of a paper presented at the International Symposium on Skill Science 2007, held at Keio University, Tokyo, on September 18, 2007, under the title Statistical Thinking and Golf Swing Analysis. H. Akaike (B) 26-17 Inarimae, Tsukuba 305-0061, Japan e-mail: hiroakaike2001@ybb.ne.jp

4 H. Akaike representation of very low level of certainty of a theory, as illustrated by his statement if it be highly plausible, justify us in seriously inclining toward belief in it. The use of probability and likelihood has been well developed in the conventional theory of statistics but the concept of plausibility or reasonableness of a statement of an object is left undeveloped. Pierce also discussed abduction, the act of developing a hypothesis (Buchler 1955, pp. 151 156). The evaluation of plausibility is closely connected to this activity of abduction. It is important to recognize the fact that a theory of very low level of certainty is usually handled as a hypothesis. A hypothesis is originally developed through a psychological representation of the state of an object. A typical example of the use of psychical entities in mathematical investigations is given by the testimonial from Professor Einstein presented in Hadamard (1996, pp. 142 143). There it is mentioned that in the case of Einstein these entities are of visual and some of muscular type and the play with these elements is aimed to be analogous to certain logical connections one is searching for. This example clearly shows the importance of modeling by physical or psychological representation of an object as the starting point of a scientific investigation. However there is always some kind of uncertainty in the handling of this type of models and this suggests the necessity of a methodology for the effective reduction of the uncertainty. Statistical thinking was originally developed to construct an image for the handling of an object with extremely complex structure. Typical example is the case of handling of a nation by the ruler. Without proper image of the collection of people it is impossible to decide on the proper choice of governmental action. Obviously in this case there is no unique choice of the image and some kind of subjective judgment is necessary. In this connection it is known that Adolph Quetelet developed the concept of Social Physics to handle the dynamics of a country. The subjective nature of the image in these cases is quite clear. Also the image of the distribution of molecular velocities adopted by Maxwell is obviously a conceptual image. Boltzmann further developed the use of the distribution for the handling of entropy and this finally led to the introduction of the concept of quantum by Max Plank. These are well-known historical events in physics. The justification of the use of such an image is provided through the confirmation of the predicted results. These examples clearly demonstrate the subjective nature of the image developed for the handling of scientific problems. With these facts in mind plausibility is defined here as the psychological measure of acceptability or reasonableness of a model represented by an image of an object and is intended to help the subjective process of comparison of such models. Statistical thinking is effectively developed by successively improving the image through the process of increasing the plausibility by proper use of available information. The image may be psychological or physical and is eventually expressed by a verbal representation to make it communicable to others. Although the model is originally defined subjectively it becomes useful for others through the accumulation of successful examples of its predictive use. Eventually this provides an objectivity or inter-subjectivity to the model. It must be emphasized here that there is no mathematical or mechanical representation of an object when an original investigation starts, particularly for an object with extremely complicated structure.

Making statistical thinking more productive 5 1.1 Critical review of the enigmatic basis of the theoretical statistics developed by Fisher It must be recognized that the monumental paper of Fisher (1922) of the foundations of theoretical statistics is based on the assumption of the knowledge of the functional form of the distribution. Fisher clearly recognized this basic aspect of the theory and handled it as the problem of specification. It is mentioned that the problems of specification are entirely a matter for the practical statistician and that Evidently these are considerations the nature of which may change greatly during the work of a single generation (Fisher 1922, p. 314). The brilliant results due to the introduction of the concept of likelihood are based on the assumption of the knowledge of true distribution that produced the present observations. However, the nature of the concept of true distribution is left untouched. This shows the enigmatic character of the basis of his theory of theoretical statistics. The introduction of AIC (An Information Criterion) by Akaike (1973) that is often called Akaike Information Criterion produced a solution of this enigma. The Information Criterion I (g : f ) that measures the deviation of a model specified by the probability distribution f from the true distribution g is defined by the formula I (g : f ) = E log g(x) E log f (X), (1) where E denotes the expectation with respect to the true distribution g of X. The criterion is a measure of the deviation of the model f from the true model g, or the best possible model for the handling of the present problem. Here the following relation shows the significant characteristic of the log likelihood; I (g : f 1 ) I (g : f 2 ) = E(log f 1 (X) log f 2 (X)). (2) This formula clarifies the intersubjective nature of the log likelihood as a measurement of the closeness to the truth that is useful even when the truth g is unknown. This observation provides a basis for the use of likelihood f (x) as the measure of plausibility of a probabilistic model f based on the observation x. In this connection it should be noted that Boltzmann (1878) introduced a quantity equal to I (g : f ) defined with g equal to uniform distribution and f the present molecular distribution as a representation of the thermodynamic entropy of a gas. 2 Necessity of the reasoning for modeling Although the introduction of the Information Criterion I (g : f ) provided a reasonable measure for the comparison of models it does not eliminate the role of statistician as the provider of a new model. The modeling in this case starts by developing an image of the object suitable for a particular application. To realize the approximation to the best possible model successive refinement of the model or image is required. This is actually the process of improvement of plausibility with the help of available information.

6 H. Akaike The plausibility of a model can be checked by the information supplied by the Informational Data Set (IDS) defined in Akaike (1997) as the set of available objective and empirical knowledge and observational data to be used for the construction of the model. The plausibility is then evaluated by confirming both the logical consistency and predictive efficiency of the model with the information provided by the IDS. The content of the IDS may be expanded with the development of modeling, for example, by adding the result of practical application of the model. Systematic effort for the increase of plausibility realizes the refinement of the model even without specification of the model as a probability distribution. This is the common sense approach usually adopted in the modeling of an object with significant complexity. 3 Actual process of statistical analysis What is usually called statistical analysis is not an analysis of the meaning of a given set of data but simply a representation by some coordinate system arbitrarily chosen for the convenience of representation. Popular examples are the principal component and factor analyses. Results of application of these methods cannot provide practical meaning by themselves. It is the process of interpretation of the result by the user that makes it practically useful. The actual process of analysis of an object with unknown complex structure is the process of trial and error of modeling by the use of available knowledge. In this process the inference on the connections between the elements of IDS is necessary to reach a practically useful model. The construction of such a model is realized by the process of synthesis of the information supplied by the IDS. The synthesis aims at reaching a model of the object that allows the use of the information for a particular purpose. This process requires the recognition of the nature of the environment where the application of the model is intended. 3.1 Verbal representation of a concept Usually further synthesis of synthesized concepts is necessary to reach a practically useful concept or a model. A concept obtained as the result of such synthesis usually requires a complex representation. This causes difficulty for its application. This is particularly so in the case of human use of a synthesized concept for the performance of a skill. Simplified representation of a concept by proper use of language is necessary to solve this difficulty. The importance of this process can be seen by the example of the analysis of bodily movement. The use of language in this situation can be characterized as the use of language for action and clearly shows the necessity of simplicity of the representation to assure the performance in practical applications. To represent an object with a complex structure by a simple image or notation is historically the basic function of statistical thinking. The conventional methods of statistics are restricted to the handling of a given set of observations of some object. Typical example is the fitting of a probabilistic description to a set of observations. Other example is the use of an arbitrary mathematical

Making statistical thinking more productive 7 structure for the representation of observations as used in the image processing and data-mining. These methods are useful for the extraction of some characteristic features of a given set of observations by the user but cannot provide the information necessary for the possible modification of the structure of the object. In the case of the golf swing analysis to be disused in the following section conventional methods can provide mechanical representations of currently common styles of swing motion but cannot suggest the possibility of a new style of motion. 4 Golf swing analysis The characteristic feature of actual process of modeling can be seen by the example of golf swing analysis. An example of statistical modeling of the golf swing motion is given in Akaike (2001). However, the main part of the modeling in this paper takes the form of a mechanical combination of two types of almost independent components of motion obtained from the information supplied by the IDS. This type of modeling eliminates the possibility of introduction of an entirely new type of motion. In this case systematic application of plausibility for the evaluation of tentatively proposed models or images by the use of available information allows the realization of an entirely new type of swing motion. In this case clear definition of the objective of swing motion is the prerequisite. The final objective of the golf swing motion is to carry the ball to the desired target position. However in the present paper attention is restricted to the basic problem, the realization of powerful straight shot with sufficiently high trajectory. This demands the modeling of the motion by effective use of the biomechanical knowledge of the human body along with the use of empirical knowledge. 4.1 Dominant elements of the body and their connections The main parts of the body related to the swing motion are arms, legs, shoulders, pelvis and spine. There are several important connections within these parts of the body and also the very important connection between the body and the earth. This last connection is basic for the realization of powerful and reliable motion of the body. 4.2 Fight against preoccupation The plausibility of a model generally increases with the reduction of ambiguity of the model, though the plausibility of a model must finally be checked by the result of its application. One particularly important reduction of ambiguity of the golf swing motion was attained through a serendipitous experience. When the author tried to check the swing motion of the arms, while lying upward on the bed, a curious characteristic of the movement of the arms became suddenly clear. When the left arm was swung to the extreme right and then pulled back to the extreme left, the right arm connected to this horizontal movement of the left arm by the hands showed a vertical movements of going up and coming down. This was a serendipitous moment

8 H. Akaike (Metron and Barber 2004) that shattered the popular image of swing motion that is composed of symmetric movements of swinging back and forward. The next step of the modeling is realized by adopting the simple image of swinging back by the left side of the body and swinging down by the right side. This image leads to the swing motion realized by the back swing motion by the right turn of the lumbar and the down swing motion realized by the left turn of the lumber. This is extremely condensed description of the swing motion. However, the plausibility of the image can be confirmed by actually swinging the club with this image. For the reduction of the ambiguity of the image a firm connection of the arms to the grip of the handle of the club is the prerequisite. This is established by the adoption of the magic grip. 4.3 The magic grip Holding the handle of the club by the palms of both hands, with the right hand in front of the left hand, and then extending the elbows realizes a grip of hands that places the index finger of the left hand over the little finger of the right hand. Common grip in the golf swing is composed of the grip by the last three fingers of both hands. However, this allows flexible motion of the wrists. By adding the action of pulling the middle fingers of both hands back to the outside a very tight grip of the handle and a full extension of the forearms of both side is established. To characterize the grip thus obtained it is called the magic grip. With the establishment of this tight structure the extension of elbows also induces a tight connection of legs to the earth. Trial with a full swing motion confirms the expected performance of this magic grip and the plausibility of the model of the swing motion is increased. The expected performance of the swing motion has been confirmed by hitting balls. This simple description clarifies the structure of the swing motion that realizes the crux image of the movement of club head generated by the back and down swing motions. This swing is thus simply called the crux swing. Details of the actual process of construction and performance of the crux swing are described in the author s weblog http://ameblo.jp/linear/ (in Japanese). 5 Boltzmann s view of model Boltzmann (1902) states Models in the mathematical, physical and mechanical science are of the great importance. Long ago philosophy perceived the essence of our process of thought to lie in the fact that we attach to the various real object around us particular physical attributes our concepts and by means of these try to represent the objects to our minds. He further notes that J. C. Maxwell and other scientists and philosophers have brought such views into intimate relation with the whole body of mathematical and physical theory. He further mentions that on this view our thoughts stand to things in the same way as models to the objects they represent. He further notes that the essence of the process is the attachment of our concept having a definite content to each thing, but without implying complete similarity between thing and thought. This is exactly the use of image as a model of an object. Taking into account

Making statistical thinking more productive 9 the originality of the work of Boltzmann in thermodynamics this provides a strong support of our view of modeling. 6 Concluding discussion The discussion developed in this paper has shown the necessity and utility of statistical analysis based on the use of plausibility for the evaluation of verbally defined models. The systematic process of formation of necessary concepts through the analysis of available knowledge with verification by experiments actually produced an efficient new model of golf swing motion. Taking into account the extreme complexity of the motion of the body this result demonstrates the potential of the systematic application of the plausibility in the construction of a model. It is hoped that the extension of statistical analysis obtained by the use of plausibility as defined in the present paper will find wide application in various areas where the subjects of investigation do not easily allow representation by mathematical formulas. Hopefully this will eventually establish the status of statistical thinking as the science of creative thinking. Acknowledgments The author would like to thank Professor Masaki Suwa, Chukyo University, for helpful discussions on the subject of the scientific approach to the enhancement of the skill of sports. He is also grateful to Professor Genshiro Kitagawa for helpful suggestions for the improvement of readability of the present paper. References Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov, F. Csaki (Eds.), Proceedings of the second international symposium on information theory (pp. 267 281). Budapest: Akademiai Kiado. Akaike, H. (1997). On the role of statistical reasoning in the process of identification. In Y. Sawaragi, S. Sagara (Eds.), SYSID 97 11th IFAC symposium on system identification (Vol.1,pp.1 8).International federation of automatic control. New York: Pergamon Press. Akaike, H. (2001). Golf swing analysis: An experiment on the use of verbal analysis in statistical reasoning. Annals of the Institute of Statistical Mathematics, 53, 1 10. Boltzmann, L. (1878). Weitere Bemerkungen uber einige Probleme der mecahnischen Warmethorie. Wissenschaftliche Abhandlungen von Ludwig Boltzmann Band II (pp. 250 288). New York: Chelsea Publishing Company. Boltzmann, L. (1902). Model. In B. Mc Guinness (Ed.), Encyclopedia Britanica (10th ed.). English translation included in Ludwig Boltzmann theoretical physics and philosophical problems (1974). Dordrecht: D. Reidel Publishing Company. Buchler, J. (Ed.). (1955). Philosophical writings of Pierce. New York: Dover. Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London, 222, 309 368. Floyd, R. T., Thompson, C. W. (1998). Manual of structural kinesiology (13th ed.). Singapore: WCB/McGraw-Hill. Hadamard, J. (1996). The mathematician s mind. USA: Princeton University Press, Princeton Science Library. Merton, R. K., Barber, E. (2004). The travels and adventures of serendipity. Princeton: Princeton University Press.