Estimating the Shape of Covert Networks

Size: px
Start display at page:

Download "Estimating the Shape of Covert Networks"

Transcription

1 Estimating the Shape of Covert Networks -Matthew Dombroski -Paul Fischbeck -Kathleen Carley Center for Computational Analysis of Social and Organizational Systems Matthew J. Dombroski Porter Hall Pittsburgh, PA Tel: Fax: This work was supported by the NSF IGERT in CASOS 23 1

2 Agenda Problem Statement Model Building Relationship of Interest (ROI) and Aggregation Network Properties Empirical Data Transformation Dyadic Dependency Models Comparison of updating models Command and Control Policy Analysis Covert network destabilization Comparison of Decision Performance Metrics versus Aggregate Network Metrics Conclusion and Future Research June 23 CMU - Matthew Dombroski - EPP, CASOS 2

3 Problem Statement and Motivation Command and Control may be able to make better decisions by determining Who is in organization Who is important in organization How information flows through the organization Important applications in improving Destabilization of covert and illicit organizations Communication in relief and military operations with uncertain communication structure Most organizations have an unknown communication structure Data available are often disparate and incomplete High degree of uncertainty in the data May not be able to fully characterize its structure June 23 CMU - Matthew Dombroski - EPP, CASOS 3

4 Proposed Approach Goal: Using a predefined relationship (dyad), construct a model that infers relationships and builds a prediction of the network quickly using limited data Limited Generate Input Baseline Data Networks aggregation Incorporate Inferences from Network Properties gap filling Generate Overall Network EVPI Policy Analysis Policy question: improved decisions? June 23 CMU - Matthew Dombroski - EPP, CASOS 4

5 Context Determines The Relationship of Interest (ROI) Clearly define the relationship that is of interest Context dependent Covert organizations (conspiring to commit an illicit act) Military operations (request for resources or assistance) Limits set of empirical data available for model Input data arrives according to Poisson process Quantify and distinguish input data informing ROI Reliability Strength of data towards relationship June 23 CMU - Matthew Dombroski - EPP, CASOS 5

6 Aggregation by Bayes Rule If L ij is the event that persons i and j share the ROI And I ij is the event that certain data is observed about the relationship between persons i and j P ( I ) Then Define: ij Lij and P( I ij Lij ) P( L ij I ij ) = P( I ij L ij P( I ij ) P( L ij L ij ) + ) P( L P( I ij ij ) L ij ) P( L ij ) Network priors can be assigned a priori June 23 CMU - Matthew Dombroski - EPP, CASOS 6

7 Incorporating Inference Using Empirical Data (cont.) Before constructing inference model Transform empirical frequency networks into probabilities of ROI Assume frequency of interaction indicates strength of relationship Transforming Empirical Network Data Define ROI Threshold number of interactions required to attain relationship of interest (x max ) Define marginal increase of each additional interaction to probability of ROI Transforming Empirical Network Data (λ) Transform interaction frequency into probabilities P( ROI) = (1 e (1 e λx June 23 CMU - Matthew Dombroski - EPP, CASOS 7 λ x max ) )

8 Network Transformation June 23 CMU - Matthew Dombroski - EPP, CASOS 8

9 Network Transformation June 23 CMU - Matthew Dombroski - EPP, CASOS 9

10 Network Properties Triad Closure Property (Skvoretz, 199) j If then occurs with a probability greater than chance i k Adjacency Property Corollary of triad closure that is modeled i k i If occurs with probability greater than chance then j j k and occur with a probability greater than chance i k June 23 CMU - Matthew Dombroski - EPP, CASOS 1

11 .8 Incorporating Inference Using Empirical Data (cont.) Probability of ROI in Adjacent Relationship.7.6 Average Count of Conversations on Adjacent Relationship Count of Conversations on Reference Relationship Probability of ROI in Reference Relationship 8 th Perc. 6 th Perc. 4 th Perc. 2 th Perc. Adjacency Property Density of Network Density of Network Plot shows percentile contours for adjacent relationships in fraternity network data Model fit using neural network of 9 nodes and 3 layers (percent current error on 4 training data points is.9%) June 23 CMU - Matthew Dombroski - EPP, CASOS 11

12 Several Possible Implementations of Model Three inference models constructed: Model 1-Direct update dyads using incoming information and Bayes Rule only (control model) Model 2-Model 1 plus indirect inferencing only on immediately adjacent relationships using inference model Model 3-Model 2 plus indirect inferencing on all other relationships using inference model June 23 CMU - Matthew Dombroski - EPP, CASOS 12

13 Performance of Models Compared Using Simulation Sample data (2 of 58 nodes), perform an update, calculate performance, another update Constant uniform prior (.2) Reconstruct by varying: Input data accuracy Proportion of information supporting existence of ROI (.5) Performance Metric Absolute Error (sum of prediction probability minus actual probability) Absolute Error = ( N i> j j= 1 abs( p ij A ij )) June 23 CMU - Matthew Dombroski - EPP, CASOS 13

14 .35.3 Absolute Error Model1 Model2 Model3 Absolute Error Observations.15 Basecase.1.5 Absolute Error Observations.24 Random Conditionals Observations Sensor Reliable Conditionals (.8,.1) June 23 CMU - Matthew Dombroski - EPP, CASOS 14

15 .35.3 Absolute Error Model1 Model2 Model3 Absolute Error Observations.25 Basecase Absolute Error Observations P(observing dyad supporting information).2 P(observing non dyad supporting information) Observations.2 P(observing dyad supporting information).8 P(observing non dyad supporting information) June 23 CMU - Matthew Dombroski - EPP, CASOS 15

16 Conclusions from Simulations Inference model results are mixed Under conditions of balance between positive and negative tie input data and random or moderately accurate conditionals Performance on Absolute Error is relatively good for inference models After about 4 updates, base model outperforms inference models Under conditions of imbalance Base model outperforms inference models for nearly all updates Inference over/under predicts the network Under conditions of accurate conditionals Models are indistinguishable June 23 CMU - Matthew Dombroski - EPP, CASOS 16

17 Policy Analysis What does this mean for command and control decisions? C2 Terrorist Example C2 must make a decision of who to remove from network of conspirators using models Get Network prediction after x updates Estimate the n most critical individuals to remove from network Remove them from network Assess how many fragments the network is broken into Compare the number of fragments across different model implementations June 23 CMU - Matthew Dombroski - EPP, CASOS 17

18 Policy Analysis Scenario-Covert Networks (cont.) Remove 7 out of 2 people Network Prediction-Terrorist Example 12 Calculate expected number of fragments the network is broken into 1 Number of Fragments Fragments total perfect information 12 EVPI-avg(Model1) 4.9 EVPI-avg(Model2) 5.4 EVPI-avg(Model3) 4.9 lower 95% min average max upper 95% 2 On average, model 2 approximates perfect information closest to within 6.6 total fragments of perfect information 7.1 for models 1 and 3 model1 model2 model3 model1 model2 model3 model1 model2 model3 total isolates subgroups Network Fragments June 23 CMU - Matthew Dombroski - EPP, CASOS 18

19 A Comparison of Simulation Performance Metrics to Decision Metrics.3.25 Absolute Error Terrorism Example-Med Density Graph Observations Mid Density Network Number of Fragments lower 95% min average max upper 95% 2 model 1 model 2 model 3 model 1 model 2 model 3 model 1 model 2 model 3 total isolates subgroups Network Fragments June 23 CMU - Matthew Dombroski - EPP, CASOS 19

20 Network Match Performance Versus Decision Performance Conduct a paired t-test for each network prediction for absolute error and fragments Null Hypotheses: E(absolute error 1 - absolute error 2 ) =, E(fragments 1 fragments 2 ) = E(absolute error 1 - absolute error 3 ) =, E(fragments 1 - fragments 3 ) = E(absolute error 2 - absolute error 3 ) =, E(fragments 2 fragments 3 ) = Hypothesis Mean Value Standard Deviation T-value Dof Significance Level -(Abs 1 Abs 2 ) >.1 Frag 1 Frag <.1 -(Abs 1 Abs 3 ) >.1 Frag 1 Frag < S <.5 -(Abs 2 Abs 3 ) <.1 Frag 2 Frag <.1 June 23 CMU - Matthew Dombroski - EPP, CASOS 2

21 Conclusions Good/bad performance on simulation metric does not imply good/bad performance in C2 decision space Absolute Error recommends 3 over 2, but 2 = 1 and 3 = 1 Fragments recommends 2, 3, then 1 Decision metrics may predict performance better than aggregate network metrics Fragments is a better measure of the actual decision we will make Absolute error is one step abstracted from the decision C2 decisions can benefit from a model Care must be taken in evaluating performance of such models Slight improvement in the number of fragments generated Errors when too many updates occur June 23 CMU - Matthew Dombroski - EPP, CASOS 21

22 Future Work Explore the relationship between other C2 decision metrics and aggregate network metrics Improving Communication for Military Operations Network prediction indicates where communication is weak Add dyads to force individuals to communicate better Expected number of individuals to learn a new fact Vaccination Strategies Against a Contagious Disease Develop more rigorous model using triad closure Characterize several different types of models for different types of networks Work relationships Friend relationships June 23 CMU - Matthew Dombroski - EPP, CASOS 22

23 Questions and Comments Acknowledgements: This research has been supported, in part, by National Science Foundation under the IGERT program for training and research in CASOS and the ONR. Additional support was also provided by the Center for Computational Analysis of Social and Organizational Systems and the Department of Engineering and Public Policy at Carnegie Mellon. The views and conclusions contained in this document are those of the author and should not be interpreted as representing official policies, either expressed or implied, of the Office of Naval Research grant number , the National Science Foundation, or the U.S. government. June 23 CMU - Matthew Dombroski - EPP, CASOS 23

24 Incorporating Inference Using Empirical Data 7 Average Count of Conversations on Adjacent Relationship Adjacency Property Density of Network Density of Network Count of Conversations on Reference Relationship Bernard and Killworth (1976) fraternity brother network Observed number of interactions between 58 individuals over a week Average interactions between individuals-1.89, min-, max-51 June 23 CMU - Matthew Dombroski - EPP, CASOS 24

25 .35.3 Absolute Error Model1 Model2 Model3.5.2 Absolute Error Observations.1 Basecase Observations.2 Absolute Error Informed Prior Observations Misinformed Prior June 23 CMU - Matthew Dombroski - EPP, CASOS 25

26 .35.3 Absolute Error Model1 Model2 Model Absolute Error Observations.15 Basecase Absolute Error Observations.1 Mid Density Network.5 Model1 Model2 Model Observations Low Density Network June 23 CMU - Matthew Dombroski - EPP, CASOS 26

27 Expected Number Infected Policy Analysis Scenario-Contagious Diseases Innoculate max 7 out of 2 people Calculate expected number of remaining people to contract disease Assume SIR model of disease through network (Newman, 22) Assume probability of transmission (p) p 1. perfect information 2.35 E (model 1) 6.3 E (model 2) 5.19 E (model 3) 5.14 model 1 model 2 min average model 3 model 1 model 2 model 3 model 1 model 2 Model 3 approaches perfect information closest model model 1 to within 1.9 persons when p is.1 to within 2.79 persons when p is 1 model 2 model 3 Probability of Transmission model 1 model 2 model 3 model 1 model 2 model 3 June 23 CMU - Matthew Dombroski - EPP, CASOS 27

28 Expected Number Communicated to model 1 Policy Analysis Scenario 3-Relief Organizations Add maximum of 3 relationships Calculate expected number of individuals to learn new information assuming transmission probabiilty p number of of added dyads relationships transmission probability perfect information E(model 1) E(model 2) E(model 3) Model 2 approximates perfect information closest model 2 model 3 model 1 model 2 model 3 model 1 model 2 model 3 To within.9 people with 1 new relationship and p = To 1. within.14 people with 2. 2 new relationships and 3. p =.7 N umber of N ew R elationships and Transmission Probability model 1 June 23 CMU - Matthew Dombroski - EPP, CASOS 28 model 2 model 3 model 1 model 2 model 3 model 1 model 2 model 3 min avg max

Bargaining, Information Networks and Interstate

Bargaining, Information Networks and Interstate Bargaining, Information Networks and Interstate Conflict Erik Gartzke Oliver Westerwinter UC, San Diego Department of Political Sciene egartzke@ucsd.edu European University Institute Department of Political

More information

COGNITIVE SOCIAL STRUCTURES. Presented by Jing Ma

COGNITIVE SOCIAL STRUCTURES. Presented by Jing Ma COGNITIVE SOCIAL STRUCTURES Presented by Jing Ma 301087280 Agenda 2 v Introduction v Definition of Cognitive Social Structure v Aggregations v An Empirical Example v Conclusion Introduction 3 v Bernard,

More information

Introduction to Bayesian Inference

Introduction to Bayesian Inference Introduction to Bayesian Inference p. 1/2 Introduction to Bayesian Inference September 15th, 2010 Reading: Hoff Chapter 1-2 Introduction to Bayesian Inference p. 2/2 Probability: Measurement of Uncertainty

More information

Introduction to Bayesian Learning

Introduction to Bayesian Learning Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

More information

Chapter 7: Simple linear regression

Chapter 7: Simple linear regression The absolute movement of the ground and buildings during an earthquake is small even in major earthquakes. The damage that a building suffers depends not upon its displacement, but upon the acceleration.

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

(1) Introduction to Bayesian statistics

(1) Introduction to Bayesian statistics Spring, 2018 A motivating example Student 1 will write down a number and then flip a coin If the flip is heads, they will honestly tell student 2 if the number is even or odd If the flip is tails, they

More information

Bayesian Social Learning with Random Decision Making in Sequential Systems

Bayesian Social Learning with Random Decision Making in Sequential Systems Bayesian Social Learning with Random Decision Making in Sequential Systems Yunlong Wang supervised by Petar M. Djurić Department of Electrical and Computer Engineering Stony Brook University Stony Brook,

More information

Math , Fall 2014 TuTh 12:30pm - 1:45pm MTH 0303 Dr. M. Machedon. Office: Math Office Hour: Tuesdays and

Math , Fall 2014 TuTh 12:30pm - 1:45pm MTH 0303 Dr. M. Machedon. Office: Math Office Hour: Tuesdays and Math 411 0201, Fall 2014 TuTh 12:30pm - 1:45pm MTH 0303 Dr. M. Machedon. Office: Math 3311. Email mxm@math.umd.edu Office Hour: Tuesdays and Thursdays 2-3 Textbook: Advanced Calculus, Second Edition, by

More information

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning

More information

Modeling, Analysis, and Control of Information Propagation in Multi-layer and Multiplex Networks. Osman Yağan

Modeling, Analysis, and Control of Information Propagation in Multi-layer and Multiplex Networks. Osman Yağan Modeling, Analysis, and Control of Information Propagation in Multi-layer and Multiplex Networks Osman Yağan Department of ECE Carnegie Mellon University Joint work with Y. Zhuang and V. Gligor (CMU) Alex

More information

Application of Indirect Race/ Ethnicity Data in Quality Metric Analyses

Application of Indirect Race/ Ethnicity Data in Quality Metric Analyses Background The fifteen wholly-owned health plans under WellPoint, Inc. (WellPoint) historically did not collect data in regard to the race/ethnicity of it members. In order to overcome this lack of data

More information

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM Lecture 9 SEM, Statistical Modeling, AI, and Data Mining I. Terminology of SEM Related Concepts: Causal Modeling Path Analysis Structural Equation Modeling Latent variables (Factors measurable, but thru

More information

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies The t-test: So Far: Sampling distribution benefit is that even if the original population is not normal, a sampling distribution based on this population will be normal (for sample size > 30). Benefit

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Application of Monte Carlo Simulation to Multi-Area Reliability Calculations. The NARP Model

Application of Monte Carlo Simulation to Multi-Area Reliability Calculations. The NARP Model Application of Monte Carlo Simulation to Multi-Area Reliability Calculations The NARP Model Any power system reliability model using Monte Carlo simulation consists of at least the following steps: 1.

More information

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Nicholas C. Henderson Thomas A. Louis Gary Rosner Ravi Varadhan Johns Hopkins University July 31, 2018

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Class 19. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Class 19. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700 Class 19 Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science Copyright 2017 by D.B. Rowe 1 Agenda: Recap Chapter 8.3-8.4 Lecture Chapter 8.5 Go over Exam. Problem Solving

More information

Data Exploration vis Local Two-Sample Testing

Data Exploration vis Local Two-Sample Testing Data Exploration vis Local Two-Sample Testing 0 20 40 60 80 100 40 20 0 20 40 Freeman, Kim, and Lee (2017) Astrostatistics at Carnegie Mellon CMU Astrostatistics Network Graph 2017 (not including collaborations

More information

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH The First Step: SAMPLE SIZE DETERMINATION THE ULTIMATE GOAL The most important, ultimate step of any of clinical research is to do draw inferences;

More information

Agent-Based Methods for Dynamic Social Networks. Duke University

Agent-Based Methods for Dynamic Social Networks. Duke University Agent-Based Methods for Dynamic Social Networks Eric Vance Institute of Statistics & Decision Sciences Duke University STA 395 Talk October 24, 2005 Outline Introduction Social Network Models Agent-based

More information

In matrix algebra notation, a linear model is written as

In matrix algebra notation, a linear model is written as DM3 Calculation of health disparity Indices Using Data Mining and the SAS Bridge to ESRI Mussie Tesfamicael, University of Louisville, Louisville, KY Abstract Socioeconomic indices are strongly believed

More information

Collaborative topic models: motivations cont

Collaborative topic models: motivations cont Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.

More information

A Generative Model for Dynamic Contextual Friendship Networks. July 2006 CMU-ML

A Generative Model for Dynamic Contextual Friendship Networks. July 2006 CMU-ML A Generative Model for Dynamic Contextual Friendship Networks Alice Zheng Anna Goldenberg July 2006 CMU-ML-06-107 A Generative Model for Dynamic Contextual Friendship Networks Alice X. Zheng Anna Goldenberg

More information

Bayesian Inference. Definitions from Probability: Naive Bayes Classifiers: Advantages and Disadvantages of Naive Bayes Classifiers:

Bayesian Inference. Definitions from Probability: Naive Bayes Classifiers: Advantages and Disadvantages of Naive Bayes Classifiers: Bayesian Inference The purpose of this document is to review belief networks and naive Bayes classifiers. Definitions from Probability: Belief networks: Naive Bayes Classifiers: Advantages and Disadvantages

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

STAT FINAL EXAM

STAT FINAL EXAM STAT101 2013 FINAL EXAM This exam is 2 hours long. It is closed book but you can use an A-4 size cheat sheet. There are 10 questions. Questions are not of equal weight. You may need a calculator for some

More information

Introduction to Statistics for Traffic Crash Reconstruction

Introduction to Statistics for Traffic Crash Reconstruction Introduction to Statistics for Traffic Crash Reconstruction Jeremy Daily Jackson Hole Scientific Investigations, Inc. c 2003 www.jhscientific.com Why Use and Learn Statistics? 1. We already do when ranging

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

How is the Statistical Power of Hypothesis Tests affected by Dose Uncertainty?

How is the Statistical Power of Hypothesis Tests affected by Dose Uncertainty? How is the Statistical Power of Hypothesis Tests affected by Dose Uncertainty? by Eduard Hofer Workshop on Uncertainties in Radiation Dosimetry and their Impact on Risk Analysis, Washington, DC, May 2009

More information

Fundamental Statistical Concepts and Methods Needed in a Test-and-Evaluator s Toolkit. Air Academy Associates

Fundamental Statistical Concepts and Methods Needed in a Test-and-Evaluator s Toolkit. Air Academy Associates Fundamental Statistical Concepts and Methods Needed in a Test-and-Evaluator s Toolkit Mark Kiemele Air Academy Associates mkiemele@airacad.com ITEA 010 Symposium Glendale, AZ 13 September 010 INTRODUCTIONS

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

tossing a coin selecting a card from a deck measuring the commuting time on a particular morning

tossing a coin selecting a card from a deck measuring the commuting time on a particular morning 2 Probability Experiment An experiment or random variable is any activity whose outcome is unknown or random upfront: tossing a coin selecting a card from a deck measuring the commuting time on a particular

More information

CSCI 3210: Computational Game Theory. Cascading Behavior in Networks Ref: [AGT] Ch 24

CSCI 3210: Computational Game Theory. Cascading Behavior in Networks Ref: [AGT] Ch 24 CSCI 3210: Computational Game Theory Cascading Behavior in Networks Ref: [AGT] Ch 24 Mohammad T. Irfan Email: mirfan@bowdoin.edu Web: www.bowdoin.edu/~mirfan Course Website: www.bowdoin.edu/~mirfan/csci-3210.html

More information

Modeling Relational Event Dynamics

Modeling Relational Event Dynamics Modeling Relational Event Dynamics Carter T. Butts12 University of California, Irvine Department of Sociology 2 Institute for Mathematical Behavioral Sciences 1 This work was supported by ONR award N00014-08-1-1015,

More information

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics Mathematics Curriculum A. DESCRIPTION This is a full year courses designed to introduce students to the basic elements of statistics and probability. Emphasis is placed on understanding terminology and

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Chapter 8&9: Classification: Part 3 Instructor: Yizhou Sun yzsun@ccs.neu.edu March 12, 2013 Midterm Report Grade Distribution 90-100 10 80-89 16 70-79 8 60-69 4

More information

Hypothesis Testing. ) the hypothesis that suggests no change from previous experience

Hypothesis Testing. ) the hypothesis that suggests no change from previous experience Hypothesis Testing Definitions Hypothesis a claim about something Null hypothesis ( H 0 ) the hypothesis that suggests no change from previous experience Alternative hypothesis ( H 1 ) the hypothesis that

More information

Evaluation Strategies

Evaluation Strategies Evaluation Intrinsic Evaluation Comparison with an ideal output: Challenges: Requires a large testing set Intrinsic subjectivity of some discourse related judgments Hard to find corpora for training/testing

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

Time varying networks and the weakness of strong ties

Time varying networks and the weakness of strong ties Supplementary Materials Time varying networks and the weakness of strong ties M. Karsai, N. Perra and A. Vespignani 1 Measures of egocentric network evolutions by directed communications In the main text

More information

Simple Counter-terrorism Decision

Simple Counter-terrorism Decision A Comparative Analysis of PRA and Intelligent Adversary Methods for Counterterrorism Risk Management Greg Parnell US Military Academy Jason R. W. Merrick Virginia Commonwealth University Simple Counter-terrorism

More information

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018 15-388/688 - Practical Data Science: Basic probability J. Zico Kolter Carnegie Mellon University Spring 2018 1 Announcements Logistics of next few lectures Final project released, proposals/groups due

More information

Directionally Sensitive Multivariate Statistical Process Control Methods

Directionally Sensitive Multivariate Statistical Process Control Methods Directionally Sensitive Multivariate Statistical Process Control Methods Ronald D. Fricker, Jr. Naval Postgraduate School October 5, 2005 Abstract In this paper we develop two directionally sensitive statistical

More information

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS In our work on hypothesis testing, we used the value of a sample statistic to challenge an accepted value of a population parameter. We focused only

More information

Generalization to a zero-data task: an empirical study

Generalization to a zero-data task: an empirical study Generalization to a zero-data task: an empirical study Université de Montréal 20/03/2007 Introduction Introduction and motivation What is a zero-data task? task for which no training data are available

More information

Geography 1103: Spatial Thinking

Geography 1103: Spatial Thinking Geography 1103: Spatial Thinking Lecture: T\TH 8:00-9:15 am (McEniry 401) Lab: Wed 2:00-4:30 pm (McEniry 420) Instructor: Dr. Elizabeth C. Delmelle Email: edelmell@uncc.edu Office: McEniry 419 Phone: 704-687-5932

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information

Learning Outbreak Regions in Bayesian Spatial Scan Statistics

Learning Outbreak Regions in Bayesian Spatial Scan Statistics Maxim Makatchev Daniel B. Neill Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213 USA maxim.makatchev@cs.cmu.edu neill@cs.cmu.edu Abstract The problem of anomaly detection for biosurveillance

More information

Learning Objectives for Stat 225

Learning Objectives for Stat 225 Learning Objectives for Stat 225 08/20/12 Introduction to Probability: Get some general ideas about probability, and learn how to use sample space to compute the probability of a specific event. Set Theory:

More information

Concordia University. Department of Economics. ECONOMICS 221: Statistical Methods I. Fall Term Office Hours:

Concordia University. Department of Economics. ECONOMICS 221: Statistical Methods I. Fall Term Office Hours: Concordia University Department of Economics ECONOMICS 221: Statistical Methods I Fall Term 1999 Section: Name: Office Address: Class Hours: Office Hours: Telephone: General Description The general purpose

More information

Frequency table: Var2 (Spreadsheet1) Count Cumulative Percent Cumulative From To. Percent <x<=

Frequency table: Var2 (Spreadsheet1) Count Cumulative Percent Cumulative From To. Percent <x<= A frequency distribution is a kind of probability distribution. It gives the frequency or relative frequency at which given values have been observed among the data collected. For example, for age, Frequency

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Explained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

Lecture 2. Conditional Probability

Lecture 2. Conditional Probability Math 408 - Mathematical Statistics Lecture 2. Conditional Probability January 18, 2013 Konstantin Zuev (USC) Math 408, Lecture 2 January 18, 2013 1 / 9 Agenda Motivation and Definition Properties of Conditional

More information

Bayesian Learning Features of Bayesian learning methods:

Bayesian Learning Features of Bayesian learning methods: Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct. This provides a more

More information

Structure Learning: the good, the bad, the ugly

Structure Learning: the good, the bad, the ugly Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 Structure Learning: the good, the bad, the ugly Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 29 th, 2006 1 Understanding the uniform

More information

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression: Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of

More information

Section 10.1 (Part 2 of 2) Significance Tests: Power of a Test

Section 10.1 (Part 2 of 2) Significance Tests: Power of a Test 1 Section 10.1 (Part 2 of 2) Significance Tests: Power of a Test Learning Objectives After this section, you should be able to DESCRIBE the relationship between the significance level of a test, P(Type

More information

GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018)

GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018) GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018) Jiaxuan You, Rex Ying, Xiang Ren, William L. Hamilton, Jure Leskovec Presented by: Jesse Bettencourt and Harris Chan March 9, 2018 University

More information

Introduction to Artificial Intelligence. Unit # 11

Introduction to Artificial Intelligence. Unit # 11 Introduction to Artificial Intelligence Unit # 11 1 Course Outline Overview of Artificial Intelligence State Space Representation Search Techniques Machine Learning Logic Probabilistic Reasoning/Bayesian

More information

Graph Detection and Estimation Theory

Graph Detection and Estimation Theory Introduction Detection Estimation Graph Detection and Estimation Theory (and algorithms, and applications) Patrick J. Wolfe Statistics and Information Sciences Laboratory (SISL) School of Engineering and

More information

Specification and estimation of exponential random graph models for social (and other) networks

Specification and estimation of exponential random graph models for social (and other) networks Specification and estimation of exponential random graph models for social (and other) networks Tom A.B. Snijders University of Oxford March 23, 2009 c Tom A.B. Snijders (University of Oxford) Models for

More information

Association Between Variables Measured at the Ordinal Level

Association Between Variables Measured at the Ordinal Level Last week: Examining associations. Is the association significant? (chi square test) Strength of the association (with at least one variable nominal) maximum difference approach chi/cramer s v/lambda Nature

More information

Math Review Sheet, Fall 2008

Math Review Sheet, Fall 2008 1 Descriptive Statistics Math 3070-5 Review Sheet, Fall 2008 First we need to know about the relationship among Population Samples Objects The distribution of the population can be given in one of the

More information

Probability theory and inference statistics! Dr. Paola Grosso! SNE research group!! (preferred!)!!

Probability theory and inference statistics! Dr. Paola Grosso! SNE research group!!  (preferred!)!! Probability theory and inference statistics Dr. Paola Grosso SNE research group p.grosso@uva.nl paola.grosso@os3.nl (preferred) Roadmap Lecture 1: Monday Sep. 22nd Collecting data Presenting data Descriptive

More information

Probability Based Learning

Probability Based Learning Probability Based Learning Lecture 7, DD2431 Machine Learning J. Sullivan, A. Maki September 2013 Advantages of Probability Based Methods Work with sparse training data. More powerful than deterministic

More information

Additional file A8: Describing uncertainty in predicted PfPR 2-10, PfEIR and PfR c

Additional file A8: Describing uncertainty in predicted PfPR 2-10, PfEIR and PfR c Additional file A8: Describing uncertainty in predicted PfPR 2-10, PfEIR and PfR c A central motivation in our choice of model-based geostatistics (MBG) as an appropriate framework for mapping malaria

More information

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables N-1 Experiments Suffice to Determine the Causal Relations Among N Variables Frederick Eberhardt Clark Glymour 1 Richard Scheines Carnegie Mellon University Abstract By combining experimental interventions

More information

Learning a probabalistic model of rainfall using graphical models

Learning a probabalistic model of rainfall using graphical models Learning a probabalistic model of rainfall using graphical models Byoungkoo Lee Computational Biology Carnegie Mellon University Pittsburgh, PA 15213 byounko@andrew.cmu.edu Jacob Joseph Computational Biology

More information

Improving Biosurveillance System Performance. Ronald D. Fricker, Jr. Virginia Tech July 7, 2015

Improving Biosurveillance System Performance. Ronald D. Fricker, Jr. Virginia Tech July 7, 2015 Improving Biosurveillance System Performance Ronald D. Fricker, Jr. Virginia Tech July 7, 2015 What is Biosurveillance? The term biosurveillance means the process of active datagathering of biosphere data

More information

Optimizing Robotic Team Performance with Probabilistic Model Checking

Optimizing Robotic Team Performance with Probabilistic Model Checking Optimizing Robotic Team Performance with Probabilistic Model Checking Software Engineering Institute Carnegie Mellon University Pittsburgh, PA 15213 Sagar Chaki, Joseph Giampapa, David Kyle (presenting),

More information

Enhancing the Geospatial Validity of Meta- Analysis to Support Ecosystem Service Benefit Transfer

Enhancing the Geospatial Validity of Meta- Analysis to Support Ecosystem Service Benefit Transfer Enhancing the Geospatial Validity of Meta- Analysis to Support Ecosystem Service Benefit Transfer Robert J. Johnston Clark University, USA Elena Besedin Abt Associates, Inc. Ryan Stapler Abt Associates,

More information

Chapter 9: Association Between Variables Measured at the Ordinal Level

Chapter 9: Association Between Variables Measured at the Ordinal Level Chapter 9: Association Between Variables Measured at the Ordinal Level After this week s class: SHOULD BE ABLE TO COMPLETE ALL APLIA ASSIGNMENTS DUE END OF THIS WEEK: APLIA ASSIGNMENTS 5-7 DUE: Friday

More information

Econ 325: Introduction to Empirical Economics

Econ 325: Introduction to Empirical Economics Econ 325: Introduction to Empirical Economics Lecture 2 Probability Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 3-1 3.1 Definition Random Experiment a process leading to an uncertain

More information

Superchain Procedures in Clinical Trials. George Kordzakhia FDA, CDER, Office of Biostatistics Alex Dmitrienko Quintiles Innovation

Superchain Procedures in Clinical Trials. George Kordzakhia FDA, CDER, Office of Biostatistics Alex Dmitrienko Quintiles Innovation August 01, 2012 Disclaimer: This presentation reflects the views of the author and should not be construed to represent the views or policies of the U.S. Food and Drug Administration Introduction We describe

More information

U.S. - Canadian Border Traffic Prediction

U.S. - Canadian Border Traffic Prediction Western Washington University Western CEDAR WWU Honors Program Senior Projects WWU Graduate and Undergraduate Scholarship 12-14-2017 U.S. - Canadian Border Traffic Prediction Colin Middleton Western Washington

More information

Detection and Estimation Final Project Report: Modeling and the K-Sigma Algorithm for Radiation Detection

Detection and Estimation Final Project Report: Modeling and the K-Sigma Algorithm for Radiation Detection Detection and Estimation Final Project Report: Modeling and the K-Sigma Algorithm for Radiation Detection Sijie Xiong, 151004243 Department of Electrical and Computer Engineering Rutgers, the State University

More information

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual Outline: Ensemble Learning We will describe and investigate algorithms to Ensemble Learning Lecture 10, DD2431 Machine Learning A. Maki, J. Sullivan October 2014 train weak classifiers/regressors and how

More information

Boosting: Algorithms and Applications

Boosting: Algorithms and Applications Boosting: Algorithms and Applications Lecture 11, ENGN 4522/6520, Statistical Pattern Recognition and Its Applications in Computer Vision ANU 2 nd Semester, 2008 Chunhua Shen, NICTA/RSISE Boosting Definition

More information

Statistical Methods for Astronomy

Statistical Methods for Astronomy Statistical Methods for Astronomy Probability (Lecture 1) Statistics (Lecture 2) Why do we need statistics? Useful Statistics Definitions Error Analysis Probability distributions Error Propagation Binomial

More information

Modeling Strategic Information Sharing in Indian Villages

Modeling Strategic Information Sharing in Indian Villages Modeling Strategic Information Sharing in Indian Villages Jeff Jacobs jjacobs3@stanford.edu Arun Chandrasekhar arungc@stanford.edu Emily Breza Columbia University ebreza@columbia.edu December 3, 203 Matthew

More information

RaRE: Social Rank Regulated Large-scale Network Embedding

RaRE: Social Rank Regulated Large-scale Network Embedding RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat

More information

Statistics for IT Managers

Statistics for IT Managers Statistics for IT Managers 95-796, Fall 2012 Module 2: Hypothesis Testing and Statistical Inference (5 lectures) Reading: Statistics for Business and Economics, Ch. 5-7 Confidence intervals Given the sample

More information

Novel spectrum sensing schemes for Cognitive Radio Networks

Novel spectrum sensing schemes for Cognitive Radio Networks Novel spectrum sensing schemes for Cognitive Radio Networks Cantabria University Santander, May, 2015 Supélec, SCEE Rennes, France 1 The Advanced Signal Processing Group http://gtas.unican.es The Advanced

More information

Interact with Strangers

Interact with Strangers Interact with Strangers RATE: Recommendation-aware Trust Evaluation in Online Social Networks Wenjun Jiang 1, 2, Jie Wu 2, and Guojun Wang 1 1. School of Information Science and Engineering, Central South

More information

Power Laws & Rich Get Richer

Power Laws & Rich Get Richer Power Laws & Rich Get Richer CMSC 498J: Social Media Computing Department of Computer Science University of Maryland Spring 2015 Hadi Amiri hadi@umd.edu Lecture Topics Popularity as a Network Phenomenon

More information

Week 2: Probability: Counting, Sets, and Bayes

Week 2: Probability: Counting, Sets, and Bayes Statistical Methods APPM 4570/5570, STAT 4000/5000 21 Probability Introduction to EDA Week 2: Probability: Counting, Sets, and Bayes Random variable Random variable is a measurable quantity whose outcome

More information

On service level measures in stochastic inventory control

On service level measures in stochastic inventory control On service level measures in stochastic inventory control Dr. Roberto Rossi The University of Edinburgh Business School, The University of Edinburgh, UK roberto.rossi@ed.ac.uk Friday, June the 21th, 2013

More information

A Posteriori Corrections to Classification Methods.

A Posteriori Corrections to Classification Methods. A Posteriori Corrections to Classification Methods. Włodzisław Duch and Łukasz Itert Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland; http://www.phys.uni.torun.pl/kmk

More information

Challenges in Privacy-Preserving Analysis of Structured Data Kamalika Chaudhuri

Challenges in Privacy-Preserving Analysis of Structured Data Kamalika Chaudhuri Challenges in Privacy-Preserving Analysis of Structured Data Kamalika Chaudhuri Computer Science and Engineering University of California, San Diego Sensitive Structured Data Medical Records Search Logs

More information

Empirical Risk Minimization, Model Selection, and Model Assessment

Empirical Risk Minimization, Model Selection, and Model Assessment Empirical Risk Minimization, Model Selection, and Model Assessment CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 5.7-5.7.2.4, 6.5-6.5.3.1 Dietterich,

More information

Introduction to statistical analysis of Social Networks

Introduction to statistical analysis of Social Networks The Social Statistics Discipline Area, School of Social Sciences Introduction to statistical analysis of Social Networks Mitchell Centre for Network Analysis Johan Koskinen http://www.ccsr.ac.uk/staff/jk.htm!

More information

Lecture 9: Bayesian Learning

Lecture 9: Bayesian Learning Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes Optimal

More information

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017 Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent

More information