ESRI User Conference 2017 Space Time Pattern Mining Analysis of Matric Pass Rates in Cape Town Schools Dr Arulsivanathan Naidoo Statistics South Africa 18 October 2017
Choose one of the following
Leadership
Stability
Adventurous
Obsessed with Sex and Liquor
Outline Statistics and Geography Stats SA dissemination Analysis of matric pass rates in Cape town schools Spatial Statistics
Spatial Statistics Moran autocorrelation Clustering of data. Getis Ord Hot spot analysis. Anselin Outlier analysis. Create Space time cube from aggregated points Create space time cube for fixed locations Emerging hot spot analysis Visualization in 2d and 3d Space time cube explorer Ordinary least squares Regression Geographically weighted Regression
Thanks and Appreciation First UN World Data Forum Jan 2017 Linda Peters ERSI team in Johannesburg Dr Lauren Scott Esri website for videos and Pdfs Lauren Bennett
CRUISE Centre for Regional and Urban Innovation and Statistical Exploration. Vision of Dr Pali Lehohla that statistics should merge with Geography Masters Program at Stellenbosch university
Sample size Data Set A Data Set B 15 000 15 000 Mean 100 100 Standard Deviation 20 20 Median 100.35 100.92
As a young man, my fondest dream was to become a geographer. However, while working in the Customs Office, I thought deeply about the matter and concluded that it was far too difficult a subject. With some reluctance, I then turned to physics as an alternative. Quotation widely attributed to Albert Einstein
Vector data Geometry points (houses) Lines (street) Area polygons Shapefiles Attributes household data type of roof number of rooms piped water electricity person data age gender employment Census data GIS allows you to merge all this data for analysis
Piped water 2001 and 2011 2011 We can ask the question WHERE?
ERSI makes interpretation easy for non statisticians
Hotspot analysis Youth unemployment, Census 2011
Moran autocorrelation indicates the pass rates are clustered High Schools in Cape Town
Getis-Ord Hot Spot Analysis
Anselin Outlier Analysis 100 shacks in informal settlement 99 have income of R1500 or less 1 household has an income of R15000
Anselin Outlier Analysis
What is this school doing successfully to get a high pass rate
Waldo Tobler s First Law of Geography "everything is related to everything else, but near things are more related than distant things." Adding a new dimension to spatial statistics Time Everything is related to everything else, but near and RECENT things are more related than distant things The Sweedish geographer Torshen Hagerstrand introduced time geography in the mid 1990 s. He first introduced the space time aquariam or space time cube.
Table of Data for analysis
Space Time Cube
Space Time Cube
Space Time Cube
Space Time Cube
Space Time Cube
Space Time Cube
Space Time Cube
Space Time Cube Multi dimensional cube for advanced statistical analysis
Space Time Cube
Space Time Cube
Emerging Hot spot analysis
Create Space Time Cube By Aggregating Points Output from this tool is a netcdf representation of your input points The Input Features should be points This tool requires projected data to accurately measure distances.
Stats SA and many government departments cannot disclose unit record data We disseminate the data at aggregate levels School pass rates Clinic patients Police station house burgulary Create Space Time Cube From Defined Locations
Emerging Hot Spot Analysis. You would use the netcdf file as input to other tools such as the Emerging Hot spot analysis tool The trend analysis performed on the aggregated count data and summary field values is based on the Mann-Kendall statistic. Ten time-step intervals are required by the Mann-Kendall statistic.
Emerging hot spot analysis for matric pass rates in Cape Town schools
Visualization of Emerging hot spot analysis in 3d
Linking 2d and 3d side by side
Visualization using the Time Cube Explorer
Visualization using the Time Cube Explorer
Visualization using the Time Cube Explorer
A new hot spot is the where the last time step id value is statistically significant as a hot spot A consecutive hot spot is the where the last two time step id value is statistically significant as a hot spot
If 90% of the time step Id are statistically significant hot spots Attribute values increasing Intensifying hot spot Attribute values have no pattern- Persistent hot spot Attribute values decreasing Diminishing hot spot
If less than 90% of time step Id are statistically significant If no time step id was a cold spot Sporadic hot spot If at least one time step id was a cold spot Oscillating hot spot
90% of the time step Id are statistically significant hot spots but the last time step Id is not a hot spot.
Ask the question WHY? Regression Model
Dr Snow s explanation of cholera deaths in London Locations of water pu
Regression Model Pass =58.92 +0.13 Employment +0.12 Telephone +0.09 Computer Variable Estimate t-statistic p-value VIF Intercept 58.92 20.48 <0.0001 *** Computer 0.09 2.07 0.041 *** 4.96 Employed 0.13 2.31 0.020 *** 1.82 Telephone 0.12 2.88 0.004 *** 4.89
Simpsons Paradox
Is your Data Stationary Ages of first year students Ages of university staff
GWR- Local Regression with fewer points Try to picture a helicopter casting a shadow over the points
Geographically Weighted Regression Pass =58.92 +0.13 Employment
Challenge to ESRI Spatial developers GWR in Three dimension shadow to ballon
The South Africa I know, the Home I understand
The South Africa I know, the Home I understand
The South Africa I know, the Home I understand
No income 80% Income, age & level of education 70% 60% No schooling 50% 40% 30% 20% 10% 20 30 40 50 60 70 80+ Age Grade 11 Grade 12 Diploma Bachelors degree
Black African Coloured Indian White Percentage of workers in each age group who are skilled (managers, professionals, technicians) 55-64 45-54 35-44 25-34 15-24 Congratulate ESRI on their 55-64 45-54 35-44 25-34 15-24 initiative to introduce the diploma which will 55-64 45-54 35-44 25-34 15-24 55-64 45-54 35-44 25-34 15-24 The percentage of workers in skilled occupations increased in all age and all race groups, except for black Africans aged 25-34, which decreased contribute to an increase in the number of skilled people 1994 2014 There were much weaker gains in the black African group for all ages 0% 10% 20% 30% 40% 50% 60% 70% 80% Closure of nursing colleges, teacher training colleges, technical training
Conclusion Point pattern 2 dimension Space time cube Point pattern 3 dimension Statistics and Geography must work together. The technology of GIS ( expanding and improving tools for analysis) Visualization Evidence based decision making Stats SA has made all its data free and we also assist you in the analysis. Next step is to explore the Insights
Thank You aruln@statssa.gov.za 0828801684 The Science of WHERE