Using Entropy-Related Measures in Categorical Data Visualization
|
|
- Amie Cynthia Rose
- 5 years ago
- Views:
Transcription
1 Using Entropy-Related Measures in Categorical Data Visualization Jamal Alsakran The University of Jordan Xiaoke Huang, Ye Zhao Kent State University Jing Yang UNC Charlotte Karl Fast Kent State University
2 Categorical Datasets Generated in a large variety of applications E.g., health/social studies, bank transactions, online shopping records, and taxonomy classifications Contain a series of categorical dimensions (variables)
3 Categorical Discreteness Values of a dimension comprise a set of discrete categories Mushroom dataset 8,124 records and 23 categorical dimensions capshapsurface cap- gillattachme gill- classes cap-color bruises odor nt spacing gill-size gill-color c c c c c c c c c c p x s n t p f c n k e x s y t a f c b k e b s w t l f c b n p x y w t p f c n n
4 Challenges Multidimensional visualization methods are often undermined when directly applied to categorical datasets the limited number of categories creates overlapping elements and visual clutter the lack of an inherent order (in contrast to numeric variables) confounds the visualization design
5 Categorical data visualization Sieve diagram and Mosaic display Contigency Wheel Parallel Sets Mapping to numbers
6 Our Work Investigate the use of entropy-related measures in visualizing multidimensional categorical data Show how entropy-related measures can help users understand and navigate categorical data Employ these measures in managing and ordering dimensions within the parallel set visualization Conduct user studies on real-world data
7 Entropy and Related Measures Considering a categorical variable as a discrete random variable X, Probability distribution Entropy Measure diversity of one dimension Joint Entropy Measure diversity with two variables Mutual Information Measure the variables' mutual dependence
8 Use of Entropy Chen and Janicke proposed an informationtheoretic framework for visualization. Pargnostics: pixel-based entropy used for order optimization of coordinates We use entropy and mutual information in categorical data visualization
9 Visualize Data Facts Mushroom dataset Size: the number of categories Color: entropy
10 Navigation Guide: Scatter Plot Matrix Joint entropy matrix High joint entropy indicates diversely distributed data records in a scatter plot Low joint entropy reveals lots of overlaps
11 Navigation Guide: Scatter Plot Matrix Mutual information matrix Large mutual information indicates high dependency between two dimensions Small mutual information reveals less dependency
12 Dimension Management on Parallel Sets Use entropy related measures to help users manage dimension spacing, ordering and filtering Ribbon colors defined by mushroom classes Green: edible Blue: poisonous
13 Filtering and Spacing Remove low diversity dimensions by setting an entropy threshold Arrange space between neighboring coordinates with joint entropy
14 Sorting Categories over Coordinates Unlike numerical dimensions, no inherent order exists for categorical variables reading order alphabetical order We use pairwise joint probability distribution to find an optimal sequence Reduce ribbon intersections
15 Sorting Categories over Coordinates Using the reading orders of coordinates and categories over them After Sorting categories of neighboring coordinates
16 Optimal Ordering of Multiple Coordinates For parallel coordinates many existing approaches reduce line crossings between neighboring coordinates Using line crossings as cost function between every pair of dimensions, global cost minimization is achieved by a graph theory based method [32] However, reducing crossings does not necessarily lead to more effective insight discovery ribbon crossings reliant on the sequences of categories over axes (reading order? Alphabeta order?)
17 Our Method We use mutual information as the cost function Benefit: the cost is not related to the sequences of categories over axes Globally maximize the sum of mutual information of a series of dimensions A Hamiltonian path algorithm of the Traveling Salesman Problem is solved to create optimal ordering
18 C2: Optimized by ribbon crossings with alphabetical category sequence C3: Optimized by mutual information with alphabetical category sequence C4: Optimized by mutual information with optimized category sequence
19 User Studies Assess user performance on insight discovery with different ordering approaches Design specific tasks for users to complete in a limited time period Apply statistical analysis on the results
20 Mushroom Data 11 participants received training and 10 minutes practice before test Each participant was given 90 seconds to find the mushroom characteristics as many as possible, which are (T1) All-edible; (T2) Allpoisonous; (T3) Mostly-edible; (T4) Mostly poisonous Compared with ground truth, each participant was given a score
21 Results Average percentage of user findings over ground truth on each task
22 Results Total performance of user findings using different visualizations
23 Results Total error rate of user findings using different visualizations
24 Statistical Test We applied the Friedman test of variance by ranks (a non-parametric statistical test) Statistical significant differences are discovered Between C1 and C4 (p-value= 0.011) Between C2 and C4 (p-value = 0.035) Between C3 and c4 (p-value = 0.007)
25 Congressional Voting Records Green: Democrat Red: Republican C1: Using the reading order C2: Using the optimized order
26 Congressional Voting Records leftmost dimension is the votes of education-spending Green: nay Red: yea C3: Using the reading order C4: Using the optimized order
27 User Study of Voting Dataset 35 participants were given 2 mins to complete tasks Using C1 and C2, for each bill (T1) which party vote more for yea? (T2) which party vote more for nay? Using C3 and C4, for each bill (T3) which congressmen group vote more for yea? (T4) which congressmen group vote more for nay?
28 Results We graded each participant 1 point if the answer was correct -1 point if the answer was incorrect 0 points if they said it was hard to identify The average score of using C1 was 11.5 The average score of using C2 was 20.1 The average score of using C3 was 13.2 The average score of using C4 was 18.0
29 Statistical Test One-way analysis of variance (ANOVA) to compare the effect of using different visualizations One test was performed for C1 and C2 F = 17.1 and p-value = Another test was performed for C3 and C4 F = 5.64 and p-value = 0.02
30 Conclusion Utilize measures from information theory to enhance the visualization of high dimensional categorical data Support users to browse data facts among dimensions, to determine starting points of data analysis, and to test- and-tune parameters for visual reasoning
31 Thanks! This work is partially supported by US NSF IIS , IIS , and Google Faculty Research Award
Data Mining and Machine Learning (Machine Learning: Symbolische Ansätze)
Data Mining and Machine Learning (Machine Learning: Symbolische Ansätze) Learning Individual Rules and Subgroup Discovery Introduction Batch Learning Terminology Coverage Spaces Descriptive vs. Predictive
More informationLecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4 Data reduction, similarity & distance, data augmentation
More informationRandomized Decision Trees
Randomized Decision Trees compiled by Alvin Wan from Professor Jitendra Malik s lecture Discrete Variables First, let us consider some terminology. We have primarily been dealing with real-valued data,
More informationDecision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18
Decision Tree Analysis for Classification Problems Entscheidungsunterstützungssysteme SS 18 Supervised segmentation An intuitive way of thinking about extracting patterns from data in a supervised manner
More informationfrom
8Map Generalization and Classification Our human and natural environments are complex and full of detail. Maps work by strategically reducing detail and grouping phenomena together. Driven by your intent,
More informationOnline Passive-Aggressive Algorithms. Tirgul 11
Online Passive-Aggressive Algorithms Tirgul 11 Multi-Label Classification 2 Multilabel Problem: Example Mapping Apps to smart folders: Assign an installed app to one or more folders Candy Crush Saga 3
More informationThe Ties that Bind Characterizing Classes by Attributes and Social Ties
The Ties that Bind WWW April, 2017, Bryan Perozzi*, Leman Akoglu Stony Brook University *Now at Google. Introduction Outline Our problem: Characterizing Community Differences Proposed Method Experimental
More informationDiscovering Correlation in Data. Vinh Nguyen Research Fellow in Data Science Computing and Information Systems DMD 7.
Discovering Correlation in Data Vinh Nguyen (vinh.nguyen@unimelb.edu.au) Research Fellow in Data Science Computing and Information Systems DMD 7.14 Discovering Correlation Why is correlation important?
More informationIntroduction to Machine Learning
Introduction to Machine Learning CS4731 Dr. Mihail Fall 2017 Slide content based on books by Bishop and Barber. https://www.microsoft.com/en-us/research/people/cmbishop/ http://web4.cs.ucl.ac.uk/staff/d.barber/pmwiki/pmwiki.php?n=brml.homepage
More informationIntroduction to Statistics
Introduction to Statistics Data and Statistics Data consists of information coming from observations, counts, measurements, or responses. Statistics is the science of collecting, organizing, analyzing,
More informationData Mining and Knowledge Discovery. Petra Kralj Novak. 2011/11/29
Data Mining and Knowledge Discovery Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2011/11/29 1 Practice plan 2011/11/08: Predictive data mining 1 Decision trees Evaluating classifiers 1: separate test set,
More informationWorkshop Research Methods and Statistical Analysis
Workshop Research Methods and Statistical Analysis Session 2 Data Analysis Sandra Poeschl 08.04.2013 Page 1 Research process Research Question State of Research / Theoretical Background Design Data Collection
More informationINTRODUCTION TO PATTERN RECOGNITION
INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take
More informationDescription of the case study
Description of the case study During the night and early morning of the 14 th of July 011 the significant cloud layer expanding in the West of the country and slowly moving East produced precipitation
More informationComputational Complexity
Computational Complexity Problems, instances and algorithms Running time vs. computational complexity General description of the theory of NP-completeness Problem samples 1 Computational Complexity What
More informationMachine Learning, Fall 2009: Midterm
10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all
More informationTOPIC: Descriptive Statistics Single Variable
TOPIC: Descriptive Statistics Single Variable I. Numerical data summary measurements A. Measures of Location. Measures of central tendency Mean; Median; Mode. Quantiles - measures of noncentral tendency
More informationIntroduction of Recruit
Apr. 11, 2018 Introduction of Recruit We provide various kinds of online services from job search to hotel reservations across the world. Housing Beauty Travel Life & Local O2O Education Automobile Bridal
More informationLast Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics
Last Lecture Distinguish Populations from Samples Importance of identifying a population and well chosen sample Knowing different Sampling Techniques Distinguish Parameters from Statistics Knowing different
More informationClassification Based on Logical Concept Analysis
Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract.
More informationLEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach
LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits
More informationMining Class-Dependent Rules Using the Concept of Generalization/Specialization Hierarchies
Mining Class-Dependent Rules Using the Concept of Generalization/Specialization Hierarchies Juliano Brito da Justa Neves 1 Marina Teresa Pires Vieira {juliano,marina}@dc.ufscar.br Computer Science Department
More informationStatistics lecture 3. Bell-Shaped Curves and Other Shapes
Statistics lecture 3 Bell-Shaped Curves and Other Shapes Goals for lecture 3 Realize many measurements in nature follow a bell-shaped ( normal ) curve Understand and learn to compute a standardized score
More informationThank you for your purchase!
TM Thank you for your purchase! Please be sure to save a copy of this document to your local computer. This activity is copyrighted by the AIMS Education Foundation. All rights reserved. No part of this
More informationShort Note: Naive Bayes Classifiers and Permanence of Ratios
Short Note: Naive Bayes Classifiers and Permanence of Ratios Julián M. Ortiz (jmo1@ualberta.ca) Department of Civil & Environmental Engineering University of Alberta Abstract The assumption of permanence
More informationMidterm, Fall 2003
5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are
More informationML (cont.): DECISION TREES
ML (cont.): DECISION TREES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 Some slides from Andrew Moore http://www.cs.cmu.edu/~awm/tutorials
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationActivity Identification from GPS Trajectories Using Spatial Temporal POIs Attractiveness
Activity Identification from GPS Trajectories Using Spatial Temporal POIs Attractiveness Lian Huang, Qingquan Li, Yang Yue State Key Laboratory of Information Engineering in Survey, Mapping and Remote
More informationStatistics Toolbox 6. Apply statistical algorithms and probability models
Statistics Toolbox 6 Apply statistical algorithms and probability models Statistics Toolbox provides engineers, scientists, researchers, financial analysts, and statisticians with a comprehensive set of
More informationCollaborative topic models: motivations cont
Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.
More informationINTRODUCTION TO PATTERN
INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take
More informationUnit 27 One-Way Analysis of Variance
Unit 27 One-Way Analysis of Variance Objectives: To perform the hypothesis test in a one-way analysis of variance for comparing more than two population means Recall that a two sample t test is applied
More informationLesson 19: Understanding Variability When Estimating a Population Proportion
Lesson 19: Understanding Variability When Estimating a Population Proportion Student Outcomes Students understand the term sampling variability in the context of estimating a population proportion. Students
More informationMachine Learning, Midterm Exam: Spring 2009 SOLUTION
10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of
More informationTHE CRYSTAL BALL SCATTER CHART
One-Minute Spotlight THE CRYSTAL BALL SCATTER CHART Once you have run a simulation with Oracle s Crystal Ball, you can view several charts to help you visualize, understand, and communicate the simulation
More informationFinal Exam, Machine Learning, Spring 2009
Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3
More informationAction-Decision Networks for Visual Tracking with Deep Reinforcement Learning
Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning Sangdoo Yun 1 Jongwon Choi 1 Youngjoon Yoo 2 Kimin Yun 3 and Jin Young Choi 1 1 ASRI, Dept. of Electrical and Computer Eng.,
More informationECE 592 Topics in Data Science
ECE 592 Topics in Data Science Final Fall 2017 December 11, 2017 Please remember to justify your answers carefully, and to staple your test sheet and answers together before submitting. Name: Student ID:
More informationCHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC
CHI SQUARE ANALYSIS I N T R O D U C T I O N T O N O N - P A R A M E T R I C A N A L Y S E S HYPOTHESIS TESTS SO FAR We ve discussed One-sample t-test Dependent Sample t-tests Independent Samples t-tests
More informationDeep learning on 3D geometries. Hope Yao Design Informatics Lab Department of Mechanical and Aerospace Engineering
Deep learning on 3D geometries Hope Yao Design Informatics Lab Department of Mechanical and Aerospace Engineering Overview Background Methods Numerical Result Future improvements Conclusion Background
More informationDETECTING HUMAN ACTIVITIES IN THE ARCTIC OCEAN BY CONSTRUCTING AND ANALYZING SUPER-RESOLUTION IMAGES FROM MODIS DATA INTRODUCTION
DETECTING HUMAN ACTIVITIES IN THE ARCTIC OCEAN BY CONSTRUCTING AND ANALYZING SUPER-RESOLUTION IMAGES FROM MODIS DATA Shizhi Chen and YingLi Tian Department of Electrical Engineering The City College of
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationJoint GPS and Vision Estimation Using an Adaptive Filter
1 Joint GPS and Vision Estimation Using an Adaptive Filter Shubhendra Vikram Singh Chauhan and Grace Xingxin Gao, University of Illinois at Urbana-Champaign Shubhendra Vikram Singh Chauhan received his
More informationZecoByte. Panelgranskning. SQCT - Survey Quality Control Tool. technical presentation. 12 april 2012
SQCT - Survey Quality Control Tool Panelgranskning SQCT - Survey Quality Control Tool 12 april 2012 technical presentation Innehåll Bakgrund sid. 1 Uppdrag sid. 1 SQCT sid. 1 Resultat sid. 1 Om sid. 2
More informationExam details. Final Review Session. Things to Review
Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit
More informationTrendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues
Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +
More informationCheck off these skills when you feel that you have mastered them. Write in your own words the definition of a Hamiltonian circuit.
Chapter Objectives Check off these skills when you feel that you have mastered them. Write in your own words the definition of a Hamiltonian circuit. Explain the difference between an Euler circuit and
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationRecovering Social Networks by Observing Votes
Recovering Social Networks by Observing Votes Lev Reyzin work joint with Ben Fish and Yi Huang UIC, Department of Mathematics talk at ITA paper to appear in AAMAS 2016 February 2016 Set-up Hidden G: Set-up
More informationPredictive analysis on Multivariate, Time Series datasets using Shapelets
1 Predictive analysis on Multivariate, Time Series datasets using Shapelets Hemal Thakkar Department of Computer Science, Stanford University hemal@stanford.edu hemal.tt@gmail.com Abstract Multivariate,
More informationGeovisualization. Luc Anselin. Copyright 2016 by Luc Anselin, All Rights Reserved
Geovisualization Luc Anselin http://spatial.uchicago.edu from EDA to ESDA from mapping to geovisualization mapping basics multivariate EDA primer From EDA to ESDA Exploratory Data Analysis (EDA) reaction
More informationReview for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data
Review for Exam #1 1 Chapter 1 Population the complete collection of elements (scores, people, measurements, etc.) to be studied Sample a subcollection of elements drawn from a population 11 The Nature
More informationCan a Pseudo Panel be a Substitute for a Genuine Panel?
Can a Pseudo Panel be a Substitute for a Genuine Panel? Min Hee Seo Washington University in St. Louis minheeseo@wustl.edu February 16th 1 / 20 Outline Motivation: gauging mechanism of changes Introduce
More informationProtein Complex Identification by Supervised Graph Clustering
Protein Complex Identification by Supervised Graph Clustering Yanjun Qi 1, Fernanda Balem 2, Christos Faloutsos 1, Judith Klein- Seetharaman 1,2, Ziv Bar-Joseph 1 1 School of Computer Science, Carnegie
More informationEnsemble Methods: Jay Hyer
Ensemble Methods: committee-based learning Jay Hyer linkedin.com/in/jayhyer @adatahead Overview Why Ensemble Learning? What is learning? How is ensemble learning different? Boosting Weak and Strong Learners
More informationPage 1 of 8 of Pontius and Li
ESTIMATING THE LAND TRANSITION MATRIX BASED ON ERRONEOUS MAPS Robert Gilmore Pontius r 1 and Xiaoxiao Li 2 1 Clark University, Department of International Development, Community and Environment 2 Purdue
More informationKernel Density Topic Models: Visual Topics Without Visual Words
Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de
More informationAlgebra 1. Statistics and the Number System Day 3
Algebra 1 Statistics and the Number System Day 3 MAFS.912. N-RN.1.2 Which expression is equivalent to 5 m A. m 1 5 B. m 5 C. m 1 5 D. m 5 A MAFS.912. N-RN.1.2 Which expression is equivalent to 5 3 g A.
More informationDo not copy, post, or distribute
14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible
More informationDecision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1
Decision Trees Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, 2018 Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1 Roadmap Classification: machines labeling data for us Last
More informationData Analytics for Social Science
Data Analytics for Social Science Johan A. Elkink School of Politics & International Relations University College Dublin 17 October 2017 Outline 1 2 3 4 5 6 Levels of measurement Discreet Continuous Nominal
More informationI. Pairwise Display and the PairViz R package
raph theoretic methods for ata Visualization. I. Pairwise isplay and the PairViz R package Wayne Oldford based on joint work with atherine Hurley Tutorial 1 The problem an we automatically, yet meaningfully,
More informationComparing Systems Using Sample Data
Comparing Systems Using Sample Data Dr. John Mellor-Crummey Department of Computer Science Rice University johnmc@cs.rice.edu COMP 528 Lecture 8 10 February 2005 Goals for Today Understand Population and
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationATLAS of Biochemistry
ATLAS of Biochemistry USER GUIDE http://lcsb-databases.epfl.ch/atlas/ CONTENT 1 2 3 GET STARTED Create your user account NAVIGATE Curated KEGG reactions ATLAS reactions Pathways Maps USE IT! Fill a gap
More informationBlack White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126
Psychology 60 Fall 2013 Practice Final Actual Exam: This Wednesday. Good luck! Name: To view the solutions, check the link at the end of the document. This practice final should supplement your studying;
More informationPEP530 Fundamental Principles of Physical Science 1. Stevens Institute of Technology
SEF 530 Fundamentals Principles of Physical Science Stevens Institute of Technology School: Course Title: Program(s): Engineering and Science Fundamental Principles of Physical Science Science & Engineering
More informationAndrogen-independent prostate cancer
The following tutorial walks through the identification of biological themes in a microarray dataset examining androgen-independent. Visit the GeneSifter Data Center (www.genesifter.net/web/datacenter.html)
More informationAlgorithm Independent Topics Lecture 6
Algorithm Independent Topics Lecture 6 Jason Corso SUNY at Buffalo Feb. 23 2009 J. Corso (SUNY at Buffalo) Algorithm Independent Topics Lecture 6 Feb. 23 2009 1 / 45 Introduction Now that we ve built an
More informationThis gives us an upper and lower bound that capture our population mean.
Confidence Intervals Critical Values Practice Problems 1 Estimation 1.1 Confidence Intervals Definition 1.1 Margin of error. The margin of error of a distribution is the amount of error we predict when
More informationUnsupervised Learning with Permuted Data
Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University
More informationArea Classification of Surrounding Parking Facility Based on Land Use Functionality
Open Journal of Applied Sciences, 0,, 80-85 Published Online July 0 in SciRes. http://www.scirp.org/journal/ojapps http://dx.doi.org/0.4/ojapps.0.709 Area Classification of Surrounding Parking Facility
More information104 Business Research Methods - MCQs
104 Business Research Methods - MCQs 1) Process of obtaining a numerical description of the extent to which a person or object possesses some characteristics a) Measurement b) Scaling c) Questionnaire
More informationMachine Learning: Exercise Sheet 2
Machine Learning: Exercise Sheet 2 Manuel Blum AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg mblum@informatik.uni-freiburg.de Manuel Blum Machine Learning
More informationAndras Hajdu Faculty of Informatics, University of Debrecen
Ensemble-based based systems in medical image processing Andras Hajdu Faculty of Informatics, University of Debrecen SSIP 2011, Szeged, Hungary Ensemble based systems Ensemble learning is the process by
More informationHidden Markov Models Part 1: Introduction
Hidden Markov Models Part 1: Introduction CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Modeling Sequential Data Suppose that
More informationPsych 230. Psychological Measurement and Statistics
Psych 230 Psychological Measurement and Statistics Pedro Wolf December 9, 2009 This Time. Non-Parametric statistics Chi-Square test One-way Two-way Statistical Testing 1. Decide which test to use 2. State
More informationMachine Learning Basics
Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each
More informationBayesian Networks. 10. Parameter Learning / Missing Values
Bayesian Networks Bayesian Networks 10. Parameter Learning / Missing Values Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Business Economics and Information Systems
More informationReal Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report
Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Hujia Yu, Jiafu Wu [hujiay, jiafuwu]@stanford.edu 1. Introduction Housing prices are an important
More informationName: Section Registered In:
Name: Section Registered In: Math 125 Exam 1 Version 1 February 21, 2006 60 points possible 1. (a) (3pts) Define what it means for a linear system to be inconsistent. Solution: A linear system is inconsistent
More informationUncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization
Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Haiping Lu 1 K. N. Plataniotis 1 A. N. Venetsanopoulos 1,2 1 Department of Electrical & Computer Engineering,
More informationAnalyzing the Performance of Multilayer Neural Networks for Object Recognition
Analyzing the Performance of Multilayer Neural Networks for Object Recognition Pulkit Agrawal, Ross Girshick, Jitendra Malik {pulkitag,rbg,malik}@eecs.berkeley.edu University of California Berkeley Supplementary
More informationMachine Learning 3. week
Machine Learning 3. week Entropy Decision Trees ID3 C4.5 Classification and Regression Trees (CART) 1 What is Decision Tree As a short description, decision tree is a data classification procedure which
More informationSpatial Decision Tree: A Novel Approach to Land-Cover Classification
Spatial Decision Tree: A Novel Approach to Land-Cover Classification Zhe Jiang 1, Shashi Shekhar 1, Xun Zhou 1, Joseph Knight 2, Jennifer Corcoran 2 1 Department of Computer Science & Engineering 2 Department
More information1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.
1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions
More informationName Class Date ELECTRONS AND THE STRUCTURE OF ATOMS
The Periodic Table ELECTRONS AND THE STRUCTURE OF ATOMS 6.1 Organizing the Elements Essential Understanding Although Dmitri Mendeleev is often credited as the father of the periodic table, the work of
More informationInternal link prediction: a new approach for predicting links in bipartite graphs
Internal link prediction: a new approach for predicting links in bipartite graphs Oussama llali, lémence Magnien and Matthieu Latapy LIP6 NRS and Université Pierre et Marie urie (UPM Paris 6) 4 place Jussieu
More information2016 SFUSD Math Validation Test
2016 SFUSD Math Validation Test The SFUSD course sequence for math provides a focused, coherent, and rigorous learning experience that balances mathematical content knowledge with mathematical practices.
More informationClustering Lecture 1: Basics. Jing Gao SUNY Buffalo
Clustering Lecture 1: Basics Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics Clustering
More informationHandling Raster Data for Hydrologic Applications
Handling Raster Data for Hydrologic Applications Prepared by Venkatesh Merwade Lyles School of Civil Engineering, Purdue University vmerwade@purdue.edu January 2018 Objective The objective of this exercise
More informationImage Compression Based on Visual Saliency at Individual Scales
Image Compression Based on Visual Saliency at Individual Scales Stella X. Yu 1 Dimitri A. Lisin 2 1 Computer Science Department Boston College Chestnut Hill, MA 2 VideoIQ, Inc. Bedford, MA November 30,
More informationCPSC 340: Machine Learning and Data Mining. Regularization Fall 2017
CPSC 340: Machine Learning and Data Mining Regularization Fall 2017 Assignment 2 Admin 2 late days to hand in tonight, answers posted tomorrow morning. Extra office hours Thursday at 4pm (ICICS 246). Midterm
More informationSUBJECTIVE EVALUATION OF IMAGE UNDERSTANDING RESULTS
18th European Signal Processing Conference (EUSIPCO-2010) Aalborg, Denmark, August 23-27, 2010 SUBJECTIVE EVALUATION OF IMAGE UNDERSTANDING RESULTS Baptiste Hemery 1, Hélène Laurent 2, and Christophe Rosenberger
More informationFrequency Distribution Cross-Tabulation
Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape
More informationTopical Sequence Profiling
Tim Gollub Nedim Lipka Eunyee Koh Erdan Genc Benno Stein TIR @ DEXA 5. Sept. 2016 Webis Group Bauhaus-Universität Weimar www.webis.de Big Data Experience Lab Adobe Systems www.research.adobe.com R e
More informationFrom Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018
From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction
More informationMERGING (MERGE / MOSAIC) GEOSPATIAL DATA
This help guide describes how to merge two or more feature classes (vector) or rasters into one single feature class or raster dataset. The Merge Tool The Merge Tool combines input features from input
More information