vizlib: Using The Seven Stages of Visualization to Explore Population Trends and Processes in Local Authority Research Robert Radburn, Jason Dykes, Jo Wood gicentre, School of Informatics, City University London, EC1V 0HB Telephone: +44 (0)20 7040 0212, Fax: +44 (0)20 7040 8584 E: {sbbd476 jad7 jwo} @soi.city.ac.uk, W: http://www.soi.city.ac.uk/~ {sbbd476 jad7 jwo} KEYWORDS: libraries, local authority, exploratory data analysis, visualization, Processing 1. Introduction We use data visualization to explore patterns in a large data set of library loans maintained by Leicestershire County Council (LCC). This work has resulted in hypotheses and insights that may influence policy. Such an approach is increasingly accessible to local authority researchers through high-level languages and toolkits including Processing (Fry and Reas, 2007), Prefuse (Heer et al., 2005) and ProtoVis (Heer & Bostock, 2009). Fry (2008) proposes a seven-stage process for manipulating and making sense of data from its acquisition through its visualization and ultimately to its interpretation. His model (Figure 1) is not rigid, but draws attention to the interdependencies between the various elements of the visual data analysis process. It may be useful as a framework for data visualization in local authorities. Figure 1: The Seven Stages of Visualization (Fry, 2004) The model structured our analysis and we use it here to describe data visualization with Fry s open set of software tools Processing (Fry, 2008) 1. This software sketchbook is being used in a variety of scientific domains to rapidly explore data with flexible interactive visualization prototypes (e.g. Meyer et al., 2009; Slingsby et al., 2009; Wood et al., in press). Before embarking on the seven stages of visualization, questions must be considered that are answerable with the data in hand: the more specific you can make your question, the more specific and clear the visual result will be (Fry, 2008.) We used the Leicestershire Library Services (LLS) TALIS database to address three questions pertinent to LLS: How does performance vary across the 54 libraries in Leicestershire? In which areas are the best customers living? Can the area you live in contribute to predictions of usage? 1 http://processing.org
2. The Seven Stages of Data Visualization Stage 1. ACQUIRE Obtain the Data The TALIS database contains a rich spatio-temporal record of every item loaned from a Leicestershire library. Data were obtained for 435,000 active library users over a two-year period. Stage 2. PARSE Structure the Data Fry s approach diverts effort from database design into analytical sketching (Fry, 2008). Flat text files were produced from TALIS with a consistent structure for visual prototyping. Stage 3. FILTER Identify the Data of Interest Data quality checks where undertaken and personal data removed from records. Population weighted centroids and OAC (Vickers et al., 2005; Vickers and Rees, 2007) codes were added to output areas (OAs) - the geography used to consider the home locations of library members. Stage 4. MINE Methods to Uncover Patterns in the Data The recency / frequency marketing technique used in LCC to segment records into quintiles was applied to structure the data (Radburn et al., 2007). Members of each library were categorized into one of twenty-five recency / frequency (RF) combinations. Signed chi statistics were calculated for individual libraries and OAs in line with the research questions to relate observed numbers with values expected according to regional population weighted averages. Figure 2: Sketch produced with LLS marketing manager outlining ideas on how the data could be depicted and interrogated. Note RF matrices (bottom left).
Stage 5. REPRESENT Visual Model for the Data Processing is designed for small-scale visualization that can be rapidly modified as the process of visual enquiry progresses. It encourages an exploratory approach to data visualization as code that links graphical methods can be quickly configured and applied to text files that have been parsed, filtered and mined (Radburn et al., 2009). Understanding of the context in which LLS use their data was developed through sketching with paper and pencil (Figure 2). The ideas generated were rapidly incorporated into interactive Processing data sketches that loaded flat files generated as a result of the parsing, filtering and mining of the TALIS data (Figures 3-7). 3. Findings Various graphical techniques were developed in Processing to depict the filtered and mined data and combined with dynamic behaviours to interactively explore the three research questions: (i) In which areas are the best customers living? Spider plots LLS wanted to link library and customer (see Figure 2, top centre). Lines linking libraries with the home locations of particular groups of their users give an indication of how the different facilities compete. For example, many of the most frequent and recent users of Market Harborough library come from nearby villages containing libraries (Figure 3). Figure 3: Locations of Market Harborough library best users and competing village libraries. OAs are coloured with diverging RdBu scheme showing variation from expected membership levels given the regional proportions (red higher than expected, blue lower). Library locations are shown using British National Grid with symbol sizes representing number of registered users.
Figure 4: Spatial treemap of Leicestershire OAs with diverging RdBu scheme showing variation from expected membership (red higher than expected, blue lower). Library locations are shown using British National Grid with symbol sizes representing number of registered users. Users of Thurmaston library are highlighted. (ii) Can the area you live in contribute to predictions of usage? Spatial Treemaps These non-occluding space-filling layouts that preserve aspects of underlying geography (Wood and Dykes, 2008) may reveal subtle patterns in the data. Local variations in library performance became evident through these data rich views. These include lower numbers of members than expected in Thurmaston and subsequent consideration of possible explanations, more sophisticated modelling and local service provision (Figure 4).
Figure 5: RF plots for 54 Leicestershire libraries - fixed size spatial treemap where circle size represents number of registered users per library. Top - numbers of users with scaled global sequential YlOrBn scheme. Bottom - signed-chi statistic with global diverging RdBu scheme (red for higher than expected, blue for lower than expected).
(iii) How does performance vary across the 54 libraries in Leicestershire? RF / CHI matrix plots: A matrix showing users in recency (by column) / frequency (by row) quintiles for each library (Figure 5) met LLS needs (see Figure 2, bottom left). The most recent and frequent users are located at the top right of these RF plots (Novos, 2004). Considering RF plots for 54 libraries concurrently was deemed useful by LLS particularly with animated transitions between spatial and spatially ordered views (Radburn et al., 2009). Concurrently visualizing signed chi statistics (Figure 5, bottom) reveals high numbers of low recency users in some larger libraries (Melton, Coalville, Loughborough, Wigston) but not others (e.g. Hinckley). Loughborough has more lapsed frequent users than expected in red. Stage 6. REFINE Improve the Representation of the Data Feedback on the visualization process was very positive with the LLS Marketing Manager commenting on how large complex data can be fully interrogated with different views. The speed with which new avenues of analysis could be explored was impressive, vindicating the use of Processing as a technology. Functionality developed through iterative refinement included: Quartile plots concentric distances travelled by the 25%, 50% and 75% closest users (Figure 6) reveal that despite the long-legged spiders most citizens use their local library. Standard Ellipse summarizing point distributions and indicating the directional pattern of usage. Figure 7 suggests that road and river networks affect spatial usage patterns. Although Birstall and Thurmaston libraries are in close proximity, there is little overlap between best users. This is useful information for LLS when deciding on opening hours as opening one library may not ensure recent / frequent usage by those in the neighbouring catchment. Stage 7. INTERACT Allow Users to Control What They See and How They See It Our visualization was an iterative process that involved data users and their expertise continually to drive the analysis. High levels of control were achieved through discussion and rapid development. Processing provided considerable scope for developing innovative and interactive graphics quickly and flexibly through access to low level functionality through high-level commands with little hindrance (Dykes, 2005). All three authors were able to develop and share Processing code during the visualization process. Interactive software controls were provided at all stages in all prototypes enabling view and focus to be changed through direct manipulation so that the graphical representations introduced here could be accessed (see Figure 1). 4. Conclusion The processes required to manipulate data from acquisition through to visualization are significant. Using Processing to apply Fry s visualization model enabled us to iteratively and efficiently ask a huge number of new questions of a large underused data set through interactive graphical means. This has produced some useful hypotheses. But how do we know what to do with the provisional and partial answers that result? We show that visualization has potential in the exploration of local authority data holdings. If visualization of this type, and indeed such large data sets, is to be used effectively to influence and develop policy in organizations that hold them then a focus on an eight stage of visualization may be important: act. This is a non-trivial stage with significant social as well as analytical and technical elements that require consideration. Acknowledgments This work was supported by an ESRC UPTAP User Fellowship RES-163-27-0017 and Leicestershire County Council. Usage data kindly provided by Leicestershire Library Service.
Figure 6: Quartile plots for all 54 libraries with geographic locations. Grey symbols are sized by borrower population. Red circles show distances travelled by 25, 50 and 75% of best users. Figure 7: Home locations of best users from neighbouring Birstall and Thurmaston libraries with standard ellipse and 1:50,000 LandRanger. Crown Copyright/database right 2009. An Ordnance Survey/EDINA supplied service.
References Dykes, J. (2005). Facilitating Interaction for Geovisualization. In Exploring Geovisualization. Amsterdam: Elsevier, pp. 265-291. Fry, B. (2004). Computational Information Design, Massachusetts Institute of Technology. Fry, B. (2008). Visualizing Data. Sebastopol, CA: O'Reilly, 382 pp. Fry, B., & Reas, C. (2007). Processing: A Programming Handbook for Visual Designers and Artists. Cambridge, MA: The MIT Press, 736 pp. Heer, J. and Bostock. M. (2009). ProtoVis: A Graphical Toolkit for Visualization. IEEE Transactions on Visualization and Computer Graphics, 15(6), 1121-1128. Meyer, M., Munzner, T. & Pfister, H. (2009). MizBee: A Multiscale Synteny Browser. IEEE Transactions on Visualization and Computer Graphics, 15(6), 897-904. Novos, J. (2004). Drilling down: Turning Customer data into profits with a spreadsheet, booklocker.com, 196 pp. Heer, J., Card, S.K & Landay, J.A. (2005). Prefuse: a toolkit for interactive information visualization, Proceedings of the SIGCHI Conference on Human factors in Computing Systems, 421-430. ACM New York, NY, USA. Radburn, R., Thomas, N., Forster, P., and Pye, S. (2007). Beyond the comfort zone. Public Library Journal, 22(3), 20-23. Radburn. R., Dykes, J. and Wood. J. (2009). vizlib: Developing Capacity for Exploratory Data Analysis in Local Government Visualization of Library Customer Behaviour, Proceedings of GISRUK, 1-3 April, Durham UK. Slingsby, A., Dykes, J. & Wood, J. (2009). Configuring Hierarchical Layouts to Address Research Questions. IEEE Transactions on Visualization and Computer Graphics, 15(6), 977-984. Vickers, D., Rees, P., and Birkin, M. (2005). Creating the National Classification of Census Output Areas: Data, Methods and Results, Working paper 03/3, School of Geography, University of Leeds. Vickers, D. & Rees, P. (2007). Creating the National Statistics 2001 Output Area Classification. Journal of the Royal Statistical Society, Series A, 170(2), 379-404. Wood, J., and Dykes, J. (2008). Spatially Ordered Treemaps, IEEE Transactions on Visualization and Computer Graphics, 14(6), 1348-1355. Wood, J., Dykes, J. & Slingsby, A., (in press) Visualization of Origins, Destinations and Flows with OD Maps. The Cartographic Journal. Biographies Robert Radburn was an ESRC UPTAP research fellow at the gicentre, City University London developing capacity for visual exploratory analysis in local government, and is currently Research and Intelligence Team Leader at Leicestershire County Council. Dr. Jason Dykes is a Senior Lecturer at the gicentre, City University London undertaking applied and theoretical research in, around and between information visualization, interactive analytical cartography and human-centred design. Dr. Jo Wood is a Reader in geographic information at the gicentre, City University London with research interests in geovisualization, terrain modelling and object oriented programming for spatial sciences.