Data VIDEO analysis TEXT RISK APPLICATION SPEED INTERNET DATA INFORMATION CONSUME BUSINESS CAPITAL RESOURCE VIDEO MEDIA ECONOMIC DIVERSE YES NO TRAINING TOWER COMMERCIAL LEAD FOLLOW DESIGN IMPROVE FACT FICTION EXPAND UNITED ENGAGE WEB NETWORK SOCIAL MIRROR BLOG TEXT RISK APPLICATION SPEED INTERNET DATA SECRET CONSUME BUSINESS CAPITAL RESOURCE MEDIA ECONOMIC DIVERSE YES NO TRAINING TOWER COMMERCIAL LEAD FOLLOW DESIGN IMPROVE FACT FICTION EXPAND UNITED ENGAGE WEB NETWORK MIRROR BLOG FICTION TEXT INTERNET DATA NO TELEVISION TIME WORLD CLOCK DIGITAL ADVERT SMART PHONE EDUCATION TALENT PHONE Luís Nunes
What will we talk about? Time-series analysis and forecast Multivariate data analysis Year -2100-1800 -1500-1200 -900-600 -300 0 300 600 900 1200 1500 1800 2100-27 -28-29 -30-31 -32-33 -34 Mean T ( o C) Kobashi et al. 2011 Greenland Temperature Reconstruction
What is data analysis
Process of analysis of sampled data in space and time May include the use of statistics, or not!
Non-statistical tools Graphs Visual inspection tools mechanistic models nature-based methods Statistical methods: less common Statistical methods: traditional approaches
Graphs
Non-statistical tools Graphs: traditional www.goldensoftware.com
Non-statistical tools Graphs: new styles We will come back to this later Krzywinski M, et al. (2009) Circos: An information aesthetic for comparative genomics. Genome Res 19(9):1639 1645.
Visual inspection
Non-statistical tools Visual inspection tools: e.g. with the help of some empirical evidences: repetitive patterns
repetitive patterns: Fibonacci's Rabbits Fibonacci investigated (in 1202) how fast rabbits could breed in ideal circumstances. Suppose a pair of rabbits, one male, one female, are put in a field. Rabbits are able to mate at the age of one month so that at the end of its second month a female can produce another pair of rabbits. Suppose that our rabbits never die and that the female always produces one new pair (one male, one female) every month from the second month on. How many pairs will there be in one year? The number of pairs of rabbits in the field at the start of each month is 1, 1, 2, 3, 5, 8, 13, 21, 34,...
repetitive patterns: The Golden Ratio There s an interesting property of Fibonacci numbers: if one takes the ratio of two successive numbers of the Fibonacci's series, (1, 1, 2, 3, 5, 8, 13,..) the result is: Why is it important? 1/1=1 2/1=2 3/2= 1.5 5/3= 1.666... 8/5) 1.6 13/8= 1.625 f(n)=f(n-1)+f(n-2) if n>2 The ratio converges quickly to 1.618
The Golden Ratio Many of nature s patterns follow the Golden Ratio (1.618), or integral multiples of submultiples Draw two squares of dimension 1 side by side; add another of d=2 on top of these; then another of size d=3 at the side of; and so on, following the Fibonacci sequence. Add a quarter of circle inside the squares connecting it in sequence r r r The distance between the center of the square and the spiral, r, increases from square to the following by 1.618...
The Golden Ratio Many of nature s patterns follow the Golden Ratio (1.618), or integral multiples of submultiples www.tierkreis.nu
The Golden Ratio Many of nature s patterns follow the Golden Ratio (1.618), or integral multiples of submultiples Golden ratios: The distance from the hip to the knee, and from the knee to the ankle. The distance from the top of the head to the nose, and from the nose to the chin. The distance between the shoulder joint to the elbow, and from the elbow to the finger tips. The distance between the wrist, the knuckles, the first and second joints of the fingers, and the finger tips. http://www.kjmaclean.com/
The Golden Ratio Has inspired artists and architects throughout the time, maybe it can inspire also scientists... It ahs been hypothesized that the Fibonacci relations found in nature may originate from fractal structures and quasiperiodic crystals starting at a molecular level (Gardiner, 2012). http://www.zengardner.com/ Gardiner, J., 2012. Fibonacci, quasicrystals and the beauty of flowers. Plant Signal. Behav. 7, 1721 3.
Non-statistical tools Visual inspection tools: e.g. Technical analysis
Non-statistical tools Visual inspection tools: e.g. Technical analysis (market analysis)
Non-statistical tools Visual inspection tools: e.g. Technical analysis
Mechanistic models
Mechanistic models Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful (Box, G. E. P., and Draper, N. R., (1987), Empirical Model Building and Response Surfaces, John Wiley & Sons, New York, NY) A mechanistic model has the following advantages: 1. It contributes to our scientific understanding of the phenomenon under study. 2. It usually provides a better basis for extrapolation (at least to conditions worthy of further experimental investigation if not through the entire range of all input variables). 3. It tends to be parsimonious (i.e, frugal) in the use of parameters and to provide better estimates of the response. (op. cit.)
Non-mechanistic models: statistical and naturebased We will review briefly some nature-based approaches and leave the statistical methods for latter
Nature-based models Cellular automaton
Non-mechanistic models: nature-based e.g.: Cellular automaton Neural networks Ant colonies, genetic algorithms, etc.
Non-mechanistic models: Cellular automaton (Stanisław Ulam and John von Neumann, 1940) A cellular automaton is a model of a system of cell objects, used to model physical systems and to perform parallel computations, with the following characteristics: The cells live on a grid (1D, 2D, or 3D); Each cell has a state. The number of state possibilities is typically finite. The simplest example has the two possibilities of 1 and 0 ( on and off or alive and dead ); Each cell has a neighborhood, defined in many ways, but it is typically a list of adjacent cells. Barone, D. (2003). Sociedades artificiais, Bookman, Artmed Editora. S.A., Porto Alegre, Brasil.
Non-mechanistic models: Cellular automaton (Stanisław Ulam and John von Neumann, 1940) A cellular automaton (CA) is defined by: A grid, L, where each element is designated a cell ; A finite index of neighbours, N, such that N =n; A finite number of states, S (e.g., on, off, or alive, dead ) A transition function, f: Sn -> S The 4-tuple(L,S,N,f) is a cellular automaton. A CA is fully characterized by i) its geometry, ii) cell s states, iii) transition function.
Non-mechanistic models: Cellular automaton (Stanisław Ulam and John von Neumann, 1940) A cellular automaton (CA) is defined by: A grid, L, where each element is designated a cell ; 2D Polar 1D
Non-mechanistic models: Cellular automaton (Stanisław Ulam and John von Neumann, 1940) A cellular automaton (CA) is defined by: A finite index of neighbours Arbitrary
Non-mechanistic models: Cellular automaton (Stanisław Ulam and John von Neumann, 1940) A cellular automaton (CA) is defined by: A finite index of neighbours (type of frontier) Closed The automaton is reflected on the boundary Open The automaton crosses the boundary
Illustration What s data analysis? Non-mechanistic models: Cellular automaton (Stanisław Ulam and John von Neumann, 1940) Examples Biological models (Game of Life): alive = 1 dead = 0 --------------- A dead cell becomes alive at the next generation if exactly 3 of its 8 neighbors are alive; A live cell at the next generation remains alive if either 2 or 3 of its 8 neighbors is alive but otherwise it dies. ------------- 8-cell Moore neighborhood ------------- Initial grid is defined by modeller States Function Neighborhood Grid
Illustration What s data analysis? Non-mechanistic models: Cellular automaton (Stanisław Ulam and John von Neumann, 1940) Biological models (Game of Life): Lifeforms Initial grid (random) Some time afterwards
Illustration What s data analysis? Non-mechanistic models: Cellular automaton (Stanisław Ulam and John von Neumann, 1940) Biological models (Game of Life): Hands-on Run program CavGb (Cav.exe) (available here: http://www.rennard.org/alife/english/acgb.html) Choose 0-Vie on the window on the top left Click on button Rand on the bottom Click on the arrow buttons to see the evolution of Life Uncheck tore to close the boundaries (e.g., an Island) Alter dimensions to X to reproduce microbiological growth
Non-mechanistic models: Cellular automaton (Stanisław Ulam and John von Neumann, 1940) Neural Activity (Brian s brain): resemblance with how neurons in the brain behave ready = 0 firing = 1 refractory = 2 --------------- A cell fires only if it is in the ready (0) state and exactly 2 of its neighbors are firing (1); Upon firing, a cell changes to the refractory state (2) for one time step and then reverts to the ready state (0). ------------- 8-cell Moore neighborhood ------------- Initial grid is defined by modeller States Function Neighborhood Grid
Illustration What s data analysis? Non-mechanistic models: Cellular automaton (Stanisław Ulam and John von Neumann, 1940) Neural Activity (Brian s brain): resemblance with how neurons in the brain behave ready = 0 firing = 1 refractory = 2 --------------- A cell fires only if it is in the ready (0) state and exactly 2 of its neighbors are firing (1); Upon firing, a cell changes to the refractory state (2) for one time step and then reverts to the ready state (0). ------------- 8-cell Moore neighborhood ------------- Initial grid is defined by modeller States Function Neighborhood Grid
Illustration What s data analysis? Non-mechanistic models: Cellular automaton (Stanisław Ulam and John von Neumann, 1940) Neural Activity (Brian s brain): resemblance with how neurons in the brain behave Initial grid (random) Some time afterwards
Illustration What s data analysis? Non-mechanistic models: Cellular automaton (Stanisław Ulam and John von Neumann, 1940) Neural Activity (Brian s brain) Hands-on Run program CavGb (Cav.exe) (available here: http://www.rennard.org/alife/english/acgb.html) Choose Braian s Brain on the window on the top left Click on button Rand on the bottom Click on the arrow buttons to see the process of neurotransmission
Illustration What s data analysis? Non-mechanistic models: Cellular automaton (Stanisław Ulam and John von Neumann, 1940) Chemical reaction (Zhabotinsky reaction): non-equilibrium thermodynamics, resulting in the establishment of a nonlinear chemical oscillator ready = 0 firing = 1 States refractory = 2 --------------- A ready cell requires exactly 2 firing neighbors to get turned on; A firing cell keeps firing if it has exactly 2 firing neighbors. Upon firing, a cell changes to a refractory state for one time step and then reverts to another refractory state When a cell leaves the firing state it goes into a sequence of refractory states. ------------- 8-cell Moore neighborhood ------------- Initial grid is defined by modeller Function Neighborhood Grid
Illustration What s data analysis? Non-mechanistic models: Cellular automaton (Stanisław Ulam and John von Neumann, 1940) Chemical reaction (Zhabotinsky reaction): non-equilibrium thermodynamics, resulting in the establishment of a nonlinear chemical oscillator https://www.youtube.com/watch?v=3jaqrrnkfho
Illustration What s data analysis? Non-mechanistic models: Cellular automaton (Stanisław Ulam and John von Neumann, 1940) Chemical reaction (Zhabotinsky reaction): non-equilibrium thermodynamics, resulting in the establishment of a nonlinear chemical oscillator Initial grid (random) Some time afterwards
Illustration What s data analysis? Non-mechanistic models: Cellular automaton (Stanisław Ulam and John von Neumann, 1940) Chemical reaction (Zhabotinsky reaction): Hands-on Run program CavGb (Cav.exe) (available here: http://www.rennard.org/alife/english/acgb.html) Choose RhaiZha on the window on the top left Click on button Rand on the bottom Click on the arrow buttons to see the evolution of the Zhabotinsky reaction
Non-mechanistic models: Cellular automaton (Stanisław Ulam and John von Neumann, 1940) For more information on cellular automaton (CA) see: Schiff, J. E. (Ed.) (2008). Cellular Automata: A Discrete View of the World. Willey Interscience, 280p. (in chapters here: http://psoup.math.wisc.edu/491/) Barone, D. (2003). Sociedades artificiais, Bookman, Artmed Editora. S.A., Porto Alegre, Brasil. CavGB s user guide provided in the software directory Cellab s user guide (http://www.fourmilab.ch/cellab/manual/) See description of rules here: http://www.mirekw.com/ca/rullex_gene.html#belzhab
Non-mechanistic models: Cellular automaton (Stanisław Ulam and John von Neumann, 1940) CA programs are available here: Mcell: http://www.mirekw.com/ca/download.html Cellab (runs only on old computers): http://www.fourmilab.ch/cellab/ CavGB: http://www.rennard.org/alife/english/acgb.html
Nature-based models Neural networks
Non-mechanistic models: neural networks An artificial neuron is a computational model inspired in the natural neurons: Artificial neural networks basically consist of inputs (like synapses), which are multiplied by weights (strength of the respective signals), and then computed by a mathematical function which determines the activation of the neuron. Another function (which may be the identity) computes the output of the artificial neuron. http://natureofcode.com/book/chapter-10-neural-networks/
Non-mechanistic models: neural networks An artificial neuron is a computational model inspired in the natural neurons: Computation of weights Input vector Xj output vector Xi Computation of activation function Activation function Computation of output vector
Non-mechanistic models: neural networks An artificial neuron is a computational model inspired in the natural neurons: hidden layer of neurons (one or more) Input layer of neurons output layer of neurons
Illustration What s data analysis? Non-mechanistic models: neural networks Reproduction of complex interdependencies (relation between metals in soils) Hands-on Open file Fujian Data - metals in soils.xlsxt in SPSS Use SPSS to calculate the Spearman correlation coeficiente between Fe, Mo, Cu, Se, Cu, Ni Use SPSS to produce a neural network model relating As with {Fe, Mo, Cu, Se, Cu, Ni}
Illustration What s data analysis? : Neural networks
Illustration What s data analysis? : Neural networks Results:
Illustration What s data analysis? : Neural networks Results:
Illustration What s data analysis? : Neural networks Results:
Illustration What s data analysis? : Neural networks Results:
: Neural networks When the model is ready, export it to a xml file. Aftwards you can import it in the Analysis Simulations to estimate As using other dataset with the model now produced
Nature-based models Ant colonies et al.
Non-mechanistic models: Ant colonies et al. https://www.youtube.com/watch?v=d58nlnlkb0i
Non-mechanistic models: Genetic algorithm https://www.youtube.com/watch?v=ejxfty4li6i
Non-mechanistic models: Application example https://www.youtube.com/watch?v=u2t77mqmjiy
Non-mechanistic models: Application example #2 https://www.youtube.com/watch?v=7mnsy86tefw
Statistical models Less common: fractals
Statistical methods Less common: Fractals Please read text on my web page (in portuguese): http://w3.ualg.pt/~lnunes/pessoal/disciplinas/modelacao-texto.htm
Statistical methods Less common: Fractals deterministic or Sierpinski s cube Cantor s dust Sierpinski s carpet Koch s island
Statistical methods Less common: Fractals deterministic N 1 =2, r 1 =1/3, N 2 =4, r 2 =1/9 D=0.631 < 1!
Statistical methods Less common: Fractals deterministic Sierpinski s carpet N 1 =8, r 1 =1/3, N 2 =64, r 2 =1/9 D=1.893 < 2!
Statistical methods Less common: Fractals deterministic Divide each cube in 27 cubes; Remove central ones Sierpinski s cube N 1 =20, r 1 =1/3, N 2 =400, r 2 =1/9 D=2.727 < 3!
Statistical methods Less common: Fractals deterministic Koch s island P 1 =3, r 1 =1, P 2 =12, r 2 =1/9 D=1.262 < 2!
Statistical methods Less common: Fractals deterministic
Statistical methods Less common: Fractals statistic D = 2 - H Being H the Hurst exponent, a well-known measure of long-term memory of time series. 0.5 < H < 1: data with long-term positive autocorrelation; 0 < H < 0.5: data switching between high and low values; H= 0.5 may indicate completely uncorrelated data
Illustration What s data analysis? Statistical methods Less common: Fractals statistic Load file Kobashi time series.csv into Gretl and calculate the Hurst exponente. Calculate the fractal dimension of the time series (for Greenland temperature for the last 4000 years)
Statistical methods Less common: Fractals https://www.youtube.com/watch?v=xwwytts06tu