Alexander Klippel and Chris Weaver. GeoVISTA Center, Department of Geography The Pennsylvania State University, PA, USA

Analyzing Behavioral Similarity Measures in Linguistic and Non-linguistic Conceptualization of Spatial Information and the Question of Individual Differences Alexander Klippel and Chris Weaver GeoVISTA Center, Department of Geography The Pennsylvania State University, PA, USA [klippel,cweaver]@psu.edu Similarity measures are central to human cognition (Tversky, 1977) and have long been a central aspect for modeling spatial knowledge (Tobler, 1970). To determine the similarity of geographic entities, such as elements of the landscape, and geographic events, such as a tornado crossing a state, from a cognitive perspective several approaches have been proposed, specifically: geometric, feature-based, alignment based, and transformational (Goldstone, 1999). Among these measures, the geometric approach has gained widespread attention (Gärdenfors, 2000; Ahlqvist, 2004; Schwering & Raubal, 2005; McIntosh & Yuan, 2005). In all disciplines participating in the endeavor of cognitive science, geometric approaches are used to model similarity and conceptual knowledge, by applying distance concepts such as network distance or Euclidean distance. Similarity is inversely proportional to distance: the smaller the distance between two entities, the more similar they are. Similarity has gained widespread interest in spatial sciences as a necessity for formally characterizing categorization (Schwering, 2008). However, behavioral studies on the similarity of spatio-temporal (geographic) events have not gained the same attention as formal treatments of similarity. Partially, this is due to the difficulties with perceived similarity that have long been an argument in psychology (for an overview see Goldstone, 1999). Establishing similarity is a prerequisite for conceptualization. Conceptualization has the goal to abstract from the rich, continuous, and overwhelming environmental data input into the human cognitive system to achieve a less detailed, coarser representation of spatio-temporal information. Clark termed this cognitive ability the 007 principle: In general, evolved creatures will neither store nor process information in costly ways when they can use the structure of the environment and their operations upon it as a convenient stand-in for the information-processing operations concerned. That is, know only as much as you need to know to get the job done. (Clark, A. 1989, p. 64) Given this brief background, our interest lies in the development of a research methodology that tries to reveal conceptualization processes by assessing behavioral similarity ratings. Furthermore, our interest lies in the relationship between the result of a conceptualization process, here called conceptual structure (Jackendoff, 1997), as an internal representation and, different externalizations, i.e. the use of different modalities to communicate conceptual structures. The focus in the latter case lies on linguistic and graphic representations.

Figure 1. Schematic of potential influences on conceptualization processes such as different languages, conceptualization in linguistic and non-linguistic tasks, but also individual differences. (Note: We do not really think that conceptualization processes are as separable as the boxes might imply.) Additionally, Figure 1 shows a schema of our broader research question that we are focusing on. We have been interested in the relationship between the conceptualization of spatial information in linguistic and non-linguistic tasks as well as in the differences potentially induced by different languages. And, equally important, we recently started to address individual differences. Individual differences are often sacrificed by stating statistically significant differences and the major focus has been on sex-related differences. In several of our recent experiments that addressed question of similarity of either spatial knowledge (particularly directional knowledge) and the conceptualization of geographic events (represented as topologically changing relations between two spatially extended entities) we found differences between participants across sex differences. These differences can be characterized as participants who chose a coarse granularity in rating similarities, meaning that they created few groups, and participants who chose a fine granularity, meaning that they created many groups. Additionally, subgroups of participants can be identified who make similar choices in rating the similarity of the given stimuli while others place boundaries between groups differently. We demonstrate this question in the following by looking into various experiments that we have conducted at the interface of different languages and different experimental setups, such as linguistic and non-linguistic representations (Klippel & Montello, 2007; Klippel, Worboys, & Duckham, 2008). Hence, this contribution targets primarily research methodology and exemplifies this approach by looking into behavioral similarity ratings. As we are generally interested in the relationship between the internal conceptual structure of environmental knowledge, for both, static and dynamic phenomena, and how this knowledge is externalized in different modalities, we reimplemented and improved a research toolkit for conducting experiments (Knauff, Rauh, & Renz, 1997; Klippel et al., 2008). The methodology and toolkit allow for the collection of data on environmental knowledge, both static and dynamic. The toolkit itself is a digital version of the classic

card sorting technique. It additionally collects data such as linguistic labels and/or sketch drawings for each group that a participant creates. The usual analysis of grouping data is straightforward. Grouping by each participant results in a similarity matrix that encodes similarity in a binary fashion: 0 if two stimuli are not in the same group, 1 if two stimuli are in the same group. Summing the similarity matrices of all participants results in an overall similarity matrix that can be subjected to various clustering algorithms. Beyond this, we are developing interactive visualization software tools for exploration and analysis of individual differences in grouping data. Using the Improvise interactive visualization environment (Weaver, 2004), we have so far built two tools that display similarity matrix data sets in visual matrix and graph form, respectively. The first visual tool, shown in Figure 2, displays two diagonal matrices. Cells in the upper left matrix encode, in text and color, the number of times that participants put the icon for that row and the icon for that column in the same group. This matrix can be optionally filtered to show counts for an arbitrarily subset of participants, interactively selected from individual miniature icon co-occurrence matrices along the bottom and right sides of the lower right matrix. Cells in the lower right matrix contain miniature binary difference matrices for each row-column pair of participants. Each of these matrices is accompanied by a binary similarity measure, calculated as a simple pseudo- Levensthein distance of 1-(cells different / cells total ). Figure 2. Analysis of individual differences of similarity ratings (i.e. grouping behavior). Patterns emerged on the basis of actual grouping behavior of participants.

The second visual tool, shown in Figure 3, displays a graph consisting of a node for each participant plus a node for each unique grouping of icons produced by at least one participant. Edges connect each participant to their respective groups. Bubble-like packs encompass the grouping nodes of each participant. The graph supports interactive dragging of any structural element as well as toggling of an iterative spring-based layout algorithm, allowing the experimenter to tease apart even complex grouping relationships. In this case, the graph tells (at least) four stories about individual differences in icon categorization by participants. First, some participants categorize icons into groups that are different from those of any other participant. These individualistic categorizations consist variously of few (#23, with 2), moderate (#33 and #38, each with 6), or many (#34 with 9 and #27 with 10) groups. Second, participants sometimes exhibit only minor variations in categorization; for instance, the way that participant #37 has two groups in common with #36, but splits the third group into two. Third, it is common for multiple participants to categorize icons into complex overlapping sets of groups, such as the pair of groups common to #21, #28, and #35, and the pair of groups common to #28 and #35 but not #21 (who categorized the eight icons in this latter pair of groups quite differently). Fourth, there are cases in which a large numbers of participants exhibit individual differences but have many groups in common, such as the darkest area with nine groups of eight icons in the midst of #20, #21, #22, #24, #26, #28, #29, #30, #32, and #39. Figure 3. Analysis of subgroups created by participants and individual boundaries. This step of analysis is a first step in analyzing the relationships between grouping behavior and linguistic and graphic representations created for groups. The versions of the visual tools shown here are little more than mockups, built as a part of our initial exploration into potential applications of visual analytics techniques

to experimentation in spatial conceptualization. In particular, we anticipate substantial improvements to the visual representation, interaction, and analytic capability of the graph tool in future versions. The text nodes that visually represent icon groupings could be replaced with the icons themselves, thereby providing a way to examine the breakdown of spatial categories within and between participants more directly. Similarly, the linguistic labels given to each icon group could be used to label the graph edges that connect participant nodes to icon grouping nodes, resulting in a small cloud of labels for each group. Finally, addition of filtering capability would allow the experimenter to iteratively select arbitrary subsets of participants, icons, groups, and labels for display in the graph, enabling the exploration of idiosyncratic patterns of spatial conceptualization in terms of both the primary grouping data and any available secondary demographic data describing the participants themselves. Interactive multi-dimensional filtering capability is well established for data sets from a variety of domains (Weaver, 2008), and appears readily adaptable for use with matrix-based similarity data of the sort considered here. References Ahlqvist, O. (2004). A parametrized representation of uncertain conceptual spaces. Transactions in GIS, 8(4), 493 514. Gärdenfors, P. (2000). Conceptual Spaces. The Geometry of Thought. Cambridge, MA: The MIT Press. Goldstone, R. (1999). Similarity. In R. A. Wilson & F. C. Keil (Eds.). The MIT encyclopedia of the cognitive sciences (pp. 763 765). Cambridge, MA: MIT Press. Jackendoff, R. (1997). The architecture of the language faculty. Cambridge, MA: MIT Press. Klippel, A., & Montello, D. R. (2007). Linguistic and nonlinguistic turn direction concepts. In S. Winter, B. Kuipers, M. Duckham, & L. Kulik (Eds.). Spatial Information Theory. 9th International Conference, COSIT 2007, Melbourne, Australia, September 19-23, 2007 Proceedings (pp. 354 372). Berlin: Springer. Klippel, A., Worboys, M., & Duckham, M. (2008). Identifying factors of geographic event conceptualisation. International Journal of Geographical Information Science, 22(2), 183 204. Knauff, M., Rauh, R., & Renz, J. (1997). A cognitive assessment of topological spatial relations: Results from an empirical investigation. In S. C. Hirtle & A. U. Frank (Eds.), Lecture notes in computer science. 1329:. Spatial information theory: A theoretical basis for GIS (pp. 193 206). Berlin: Springer. McIntosh, J., & Yuan, M. (2005). Assessing similarity of geographic processes and events. Transactions in GIS, 9(2), 223 245. Schwering, A. (2008). Approaches to semantic similarity measurement for geo-spatial data: A survey. Transactions in GIS, 12(1), 2 29. Schwering, A., & Raubal, M. (2005). Measuring semantic similarity between geospatial conceptual regions. In A. Rodríguez, I. Cruz, S. Levashkin, & M. J. Egenhofer (Eds.). GeoSpatial Semantics: First International Conference, GeoS 2005, Mexico City, Mexico, November 29-30, 2005. Proceedings. Berlin: Springer.

Tobler, W. R. (1970). A computer movie simulating urban growth in the Detroit region. Economic Geography, 46, 234 240. Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327 352. Weaver, C. Building highly-coordinated visualizations in improvise. In Proceedings of the IEEE Symposium on Information Visualization 2004, Austin, TX, October 2004. Weaver, C. Multidimensional visual analysis using cross-filtered views. In Proceedings of the IEEE Symposium on Visual Analytics Science and Technology 2008, Columbus, OH, October 2008.