Address: P.O. Box 4 (Yliopistonkatu 5), University of Helsinki, Finland,

Size: px

Start display at page:

Download "Address: P.O. Box 4 (Yliopistonkatu 5), University of Helsinki, Finland,"

Norah Daniel
5 years ago
Views:

1 Logics with Aggregate Operators Lauri Hella University of Helsinki and Leonid Libkin University of Toronto and Bell Labs and Juha Nurmonen University of Helsinki and Limsoon Wong KRDL, Singapore Name: Lauri Hella Aliation: Department of Mathematics Address: P.O. Box 4 (Yliopistonkatu 5), University of Helsinki, Finland, lauri.hella@helsinki.. Supported in part by grants and from the Academy of Finland. Name: Leonid Libkin Aliation: Department of Computer Science Address: 6 King's College Road, University of Toronto, Toronto, Ontario, M5S 3H5, Canada. libkin@cs.toronto.edu. Name: Juha Nurmonen Aliation: Department of Mathematics Address: P.O. Box 4 (Yliopistonkatu 5), University of Helsinki, Finland, juha.nurmonen@helsinki.. Supported in part by grant from the Academy of Finland. This work was initiated while supported in part by EPSRC grant GR/K Name: Limsoon Wong Aliation: Kent Ridge Digital Labs Address: 21 Heng Mui Keng Terrace, Singapore , limsoon@krdl.org.sg. Supported in part by the Singapore National Science and Technology Board. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for prot or direct commercial advantage and that copies show this notice on the rst page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works, requires prior specic permission and/or a fee. Permissions may be requested from Publications Dept, ACM Inc., 1515 Broadway, New York, NY USA, fax +1 (212) , or permissions@acm.org.

2 2 L. Hella, L. Libkin, J. Nurmonen, L. Wong We study adding aggregate operators, such as summing up elements of a column of a relation, to logics with counting mechanisms. The primary motivation comes from database applications, where aggregate operators are present in all real life query languages. Unlike other features of query languages, aggregates are not adequately captured by the existing logical formalisms. Consequently, all previous approaches to analyzing the expressive power of aggregation were only capable of producing partial results, depending on the allowed class of aggregate and arithmetic operations. We consider a powerful counting logic, and extend it with the set of all aggregate operators. We show that the resulting logic satises analogs of Hanf's and Gaifman's theorems, meaning that it can only express local properties. We consider a database query language that expresses all the standard aggregates found in commercial query languages, and show how it can be translated into the aggregate logic, thereby providing a number of expressivity bounds, that do not depend on a particular class of arithmetic functions, and that subsume all those previously known. We consider a restricted aggregate logic that gives us a tighter capture of database languages, and also use it to show that some questions on expressivity of aggregation cannot be answered without resolving some deep problems in complexity theory. Categories and Subject Descriptors: H.2.3 [Database management]: Query languages; F.4.1 [Mathematical logic and formal languages]: Model theory Additional Key Words and Phrases: Database, Relational Calculus, Aggregation, Expressive Power, Locality 1. INTRODUCTION First-order logic over nite structures plays a fundamental role in several computer science applications, perhaps most notably, in database theory. The standard theoretical query languages { relational algebra and calculus { that are the backbone for the commercial query languages, have precisely the power of rst-order logic. However, while this power is sucient for writing many useful queries, in practice one often nds that it is quite limited for two reasons. Firstly, in rst-order logic, one cannot do xpoint computation (for example, one cannot compute the transitive closure of a graph). Secondly, one cannot express nontrivial counting properties (for example, one cannot compare the cardinalities of two sets). From the practical point of view, xpoint computation, although sometimes desirable, is of less importance in the database context than counting. Indeed, in the de-facto standard of the commercial database world, SQL, a limited recursive construct has only been proposed for the latest language standard (SQL3). At the same time, constructs such as cardinality of a relation or the average value of a column, known as aggregate functions, are present in any commercial implementation of SQL (they belong to what is called the entry level SQL92, which is supported by all systems).

3 Logics with Aggregate Operators 3 On the theory side, however, xpoint extensions of rst-order logic and corresponding query languages are much better studied than their counting counterparts. A standard xpoint extension considered in the database literature is the query language datalog, and practically every aspect of it { expressive power, optimization, adding negation, implementation techniques { was the subject of numerous papers. For the study of expressive power of query languages, which will interest us most in this paper, a very nice result of [28] showed that the innitary logic with nitely many variables, L! 1!, has a 0-1 law over nite structures. As many xpoint logics can be embedded into it, this result gives many expressivity bounds for datalog-like languages. In the presence of an order relation, it is again a classical result that various xpoint extensions of rst-order logic capture familiar complexity classes such as PTIME and PSPACE [22; 43]. See [1; 12; 23] for an overview. For extensions with counting and aggregate operators, much less is known, especially in terms of expressive power of languages. In an early paper [26] it was shown how to extend both relational algebra and calculus with aggregate constructs, but the resulting language did not correspond naturally to any reasonable logic. It is known how to integrate aggregation into datalog-like languages (both recursive and nonrecursive) [39; 42], and various aspects of such aggregate languages were studied (e.g., query optimization [35], handling constraints involving aggregation [40], query containment and rewriting [8; 18], interaction with functional programming constructs [6]). A powerful counting extension of rst-order logic, L C, was introduced in [31], but the counting operators in that logic are quite dierent from aggregate operators. At this point, let us give an example of a typical aggregate query that would be supported by all commercial versions of SQL, and use it to explain problems that arise when one attempts to analyze expressiveness of the language. Suppose we have two database relations: a relation R1 with attributes \employee" and \department", and a relation R2 with attributes \employee" and \salary". Suppose we want to nd the average salary for each department that pays total salary at least $10 6. In SQL, this is done as follows. SELECT R1.Dept, AVG(R2.Salary) FROM R1, R2 WHERE R1.Employee = R2.Employee GROUPBY R1.Dept HAVING SUM(R2.Salary) > Relations R1 and R2 separate the information about departments and salaries. This query joins them to put together departments, employees, and salaries, and then performs an aggregation over the salary column, for each department in the database, followed by selecting some of the resulting tuples. While the features of the language given by the SELECT, FROM and WHERE clauses are well-known

4 4 L. Hella, L. Libkin, J. Nurmonen, L. Wong to be rst-order, other features used in this example pose a problem. (1) We permit computation of aggregate operators such as AVG and SUM over the entire column of a relation. This form of counting is rather dierent from the counting quantiers or terms (see, e.g, [13; 25; 38]), normally supported by logical formalisms. (2) The GROUPBY clause creates an intermediate structure which is a set of sets { for each department, it groups together its employees. Again, this does not get captured adequately by existing logical formalisms. This shows why it is hard to capture aggregation in query languages by a logic whose expressive power is easy to analyze. Still, some partial results exist. For example, [36] gives some bounds based on the estimates on the largest number a query can produce; clearly such bounds are not robust and do not withstand adding arithmetic operations. In [9] it is shown that the transitive closure of a graph is not expressible in an aggregate extension of rst-order logic if DLOGSPACE 6= NLOGSPACE. In [33] this is proved without any complexity assumptions; a generalization of [33] to many other queries is given in [11]. One problem with the proofs of [33; 11] is that they are very \syntactic" { they work for a particular presentation of the language, and rely heavily on complicated syntactic rewritings of queries, rather than on the semantic properties of those. An attempt to remedy this was made in [30] which considered a sublanguage that only permits aggregation over columns of natural numbers, returning natural numbers as well (for example, AVG is not allowed). Then [30] gave a somewhat complicated encoding of the language in rst-order logic with counting quantiers, for which expressivity bounds are known [30; 37]. The encoding of [30] was extended to aggregation over rational numbers [34]; it did allow more aggregates (e.g., AVG) and more arithmetic, at the expense of a very unpleasant and complicated encoding procedure. Thus, rst-order logic with counting quantiers is inadequate as a logic for expressing aggregate query languages. It also brings up an analogy with the development of datalog-like languages and L! 1!, and raises the following question: Can we nd a powerful logic into which aggregate queries can be easily embedded, and whose properties can be analyzed so that bounds for query languages can be derived? Our main goal is to give the positive answer to this question. To do so, we combine an innitary counting logic from [31] with an elegant framework of [17] for adding aggregation. As the numerical domain, we choose the set of rational numbers Q, although other domains (e.g., Z; R) can be chosen. While [17] gives a nice framework for modeling aggregation, it provides neither expressivity bounds nor techniques for proving them in a logic with aggregate operators. On the other

5 Logics with Aggregate Operators 5 hand, [31] presents a number of techniques for proving expressivity bounds, but the logic there does not have aggregate operators. We thus combine the two, which results in a logic L aggr. It denes every arithmetic operation and every aggregate function. We then show that it has very nice behavior: its formulae satisfy analogs of Hanf's [14; 19] and Gaifman's [16] theorems, meaning that it can only express local properties. In particular, properties such as connectivity of graphs cannot be expressed. We then consider a theoretical language RL aggr, similar to those dened in [5; 33], and explain how it models all the features of SQL. Next, we show an embedding of RL aggr into L aggr, which is much simpler than those previously considered for rst-order with counting [30; 34]. This implies that the behavior of aggregate queries is local over a large class of inputs, no matter what family of aggregate and arithmetic operations the language possesses. Not only is this result much stronger than all previous results on expressiveness of aggregation, its proof is also much cleaner than those that appeared in the literature. Furthermore, we believe that logics with aggregation are interesting on their own right, as they give a rather disciplined approach to modeling aggregation and can be used to study other aspects of it. Organization. We give notations, including two-sorted structures and a formal denition of aggregates in Section 2. In Section 3 we give the denition of the aggregate logic L aggr. In Section 4 we explain the locality theorems of Hanf and Gaifman and prove that L aggr satises analogs of both of them, thus showing that it cannot express properties such as the connectivity of a graph. In Section 5, we dene an aggregate query language NRL aggr, on nested relations, that models both aggregation and grouping features of SQL. We show, using standard techniques, that queries from at relations to at relations in this language can be expressed in a simpler language called RL aggr, that does not use nested relations even as intermediate structures, and then we give a translation of RL aggr into L aggr. This shows that NRL aggr queries over at databases that do not contain numbers are local. In Section 6 we consider a simpler logic L aggr and show that it captures the language RL aggr. We also show that some basic questions about expressive power of RL aggr cannot be answered without resolving some deep problems in complexity theory, under the assumption that input databases are allowed to contain numbers. Extended abstract of this paper appeared in the Proceedings of the 14th IEEE Symposium on Logic in Computer Science, pp , July NOTATION Most logics we consider here are two-sorted, and they are dened on two-sorted structures, with one sort being numerical. We shall assume, throughout the paper, that the numerical sort

6 6 L. Hella, L. Libkin, J. Nurmonen, L. Wong is interpreted as Q, the set of rational numbers. A two sorted relational signature is a nite collection fr 1 (n 1 ; J 1 ); : : : ; R l (n l ; J l )g where R i s are relation names, n i s are their arities, and J i f1; : : : ; n i g is the set of indices for the rst sort. For example, fr(3; f1; 2g)g is a signature that consists of a single ternary relation so that in each tuple (a; b; c) in R, a; b are of the rst sort and c is of the second sort. We let U be an innite set, disjoint from Q, to be interpreted as the domain of the rst sort. A structure of signature (or -structure) is A = ha; Q; R A; : : : ; RA 1 l i, where A U is the universe of the rst sort for A, and R A i Q n i dom(i; k), where dom(i; k) = A if k 2 J k=1 i and dom(i; k) = Q if k 62 J i. We shall always assume that A is nite. For any set X and any tuple (x 1 ; : : : ; x n ) 2 X n, the multiset (bag) consisting of the components of this tuple is denoted by fjx 1 ; : : : ; x n jg. Here n is the cardinality of fjx 1 ; : : : ; x n jg. We let fjxjg n denote the set of all such n-element multisets over X. Now, following [17], we dene an aggregate function as a collection F = ff 0 ; f 1 ; f 2 ; : : : ; f! g of functions, where f n : fjqjg n! Q, and f! 2 Q. Each function f n shows how the aggregate function behaves on an n-element input multiset of rational numbers, and the value f! is the result when the input is innite. We shall identify the function f 0 with the constant it produces on the empty bag fjjg. Examples include the aggregates P and Q : where and P = fs0 ; s 1 ; : : : ; s! g and Q = fp 0 ; p 1 ; : : : ; p! g s 0 = 0; s n (fjq 1 ; : : : ; q n jg) = q 1 + : : : + q n ; n > 0; p 0 = 1; p n (fjq 1 ; : : : ; q n jg) = q 1 : : : q n ; n > 0: (We assume s! = p! = 0.) Standard database languages use other aggregates as well; in fact, the standard ones for SQL are the following: P (also called TOTAL, which adds up all the elements of a bag), MIN and MAX, dened as the minimum (maximum) element of the input bag, COUNT, which returns the cardinality of a bag (that is, its ith function is the constant i), AVG, which returns the average value of a bag (that is, its ith function is s i =i for i > 0). 3. AN AGGREGATE LOGIC Assume that we are given two signatures on Q : one, denoted by, of functions and predicates, and one, denoted by, of aggregates. In addition we assume that there is a constant symbol c q for each q 2 Q. We now dene an aggregate logic L aggr (; ), on two-sorted structures. We

7 Logics with Aggregate Operators 7 do it, similarly to [31], in two steps. We rst dene a larger logic L aggr (; ) and then put a restriction on its formulae. We dene terms and formulae of the two-sorted logic L aggr (; ), over two-sorted structures, by simultaneous induction. Every variable of the ith sort is a term of the ith sort, i = 1; 2. Every constant c q 2 Q is a term of the second sort. Given a pair (n; J) with J f1; : : : ; ng, we say that an n-tuple of terms ~t = (t 1 ; : : : ; t n ) is of type (n; J), written ~t : (n; J), if t i is a rst-sort term for i 2 J and a second-sort term for i 62 J. For a formula '(~x), we write ' : (n; J) and say that its type is (n; J) if ~x = (x 1 ; : : : ; x n ) and i 2 J i x i is of the rst sort. Now for each R i (n i ; J i ) in, and ~t : (n i ; J i ), we let R i (~t) be a formula. Formulae are then closed under innitary disjunctions W and conjunctions V, negation :, and quantiers over both rst-sort domain A and second-sort domain Q. That is, if ' is a formula, then :'; 9x'; 8x'; 9q'; 8q' are formulae, where x is a rst-sort variable and q is a second-sort variable. Furthermore, if ' i ; i 2 I, is a (nite or innite) collection of formulae, then W i2i ' i and V i2i ' i are formulae. If t 1 ; : : : ; t n are second-sort terms, and f an n-ary function symbol from, then f(t 1 ; : : : ; t n ) is a second-sort term. For an n-ary predicate symbol P from, P (t 1 ; : : : ; t n ) is a formula. If t 1 ; t 2 are terms of the same sort, then t 1 = t 2 is a formula. Next, we add counting and aggregation to the logic. For any formula '(~x; ~y) with ~y being variables of the rst sort, we let t(~x) = #~y:'(~x; ~y) be a second-sort term. Let F be an aggregate from. Let '(~x; ~y) be a formula, and t(~x; ~y) a second-sort term, with no restrictions on the sorts of ~x; ~y. Then Aggr F ~y:('; t) is a second-sort term with free variables ~x. Remark Using innitary connectives is a convenient technical device, as it will make some translations easier, and the logic more expressive; however, it is possible to avoid using them. We prefer to work with an innitary logic here, so that we can use known results from [21; 31]. Furthermore, the fact that this logic is not eective (and even has uncountably many formulas) is not important as we consider inexpressibility results for it, which would then apply to any weaker logic. We now discuss the semantics. A tuple ~a = (a 1 ; : : : ; a n ) is of type (n; J) if a i 2 U for i 2 J and a i 2 Q for i 62 J. For every two-sorted -structure A, a formula '(~x) or a term t(~x) of type (n; J) in the language of, and a tuple ~a over A [ Q of type (n; J), we dene the value t A (~a) of the term t on ~a in A and the relation A j= '(~a). The denition is standard, with only the case of counting terms and aggregation requiring explanation. For t(~x) = #~y:'(~x; ~y), the value of t(~a) in A is the (nite) number of ~ b over A such that A j= '(~a; ~ b). Let s(~x) = Aggr F ~y:('(~x; ~y); t(~x; ~y)), and let ~a be of the same type as ~x. Dene '(~a; A) = f ~ b j A j= '(~a; ~ b)g. Let t('(~a; A)) be the multiset fjt A (~a; ~ b) j ~ b 2 '(~a; A)jg. (This is a multiset since t

8 8 L. Hella, L. Libkin, J. Nurmonen, L. Wong may produce identical values on several (~a; ~ b).) Let n be the cardinality of this multiset. Then the value s A (~a) is dened to be f n (t('(~a; A))), where f n is the nth component of F. If the set '(~a; A) is innite, the value of s A (~a) is f!. This concludes the denition of L aggr (; ). Next, we dene the notion of a rank of formulae and terms, rk(') and rk(t). For a variable or constant t, rk(t) = 0. For each relation name R i 2 and terms t 1 ; : : : ; t n, we let rk(r i (t 1 ; : : : ; t n )) = max i rk(t i ) and similarly rk(t = s) = maxfrk(t); rk(s)g. For any formula ' P (t 1 ; : : : ; t n ) with P 2, we have rk(') = max i rk(t i ), and similarly for a term f(~t) with f from. We then have rk( W ' i ) = rk( V ' i ) = sup i rk(' i ) and rk(:') = rk('). We let rk(9x') = rk(') + 1, for quantication over the rst sort, and rk(9q') = rk(') for quantication over the second sort. For counting and aggregate terms, rk(#~y:') = rk(')+ j~y j, and rk(aggr F ~y:('; t)) = max(rk('); rk(t)) + k, where k is the number of rst-sort variables in ~y. Definition 3.1. The formulae and terms of L aggr (; ) are precisely the formulae and terms of L aggr (; ) that have nite rank. If there is no restriction on the signature (that is, all functions and predicates are allowed), we write All. Thus, L aggr (All; All) is the aggregate logic in which every function, predicate, and aggregate function on Q is available. 2 Examples. First, counting terms are denable with P : #~y:'(~x; ~y) is equivalent to Aggr ~y:('(~x; ~y); c 1 ), where c 1 is the symbol for constant 1. Next, we show how to express the example from the introduction in L aggr (f<g; f P ; AVGg). The signature has two relations: R 1 (2; f1; 2g) and R 2 (2; f1g), as only the last attribute of R 2 { salary { is numerical. The query is now expressed as a L aggr formula '(x; q) with two free variables, one of the rst sort, one of the second sort, as follows: (9y9z: R 1 (x; y) ^ R 2 (y; z)) ^ (q = Aggr AVG z: (9y:R 1 (x; y) ^ R 2 (y; z); z)) ^ (Aggr z: (9y:R 1 (x; y) ^ R 2 (y; z); z) > c 10 6): 4. AGGREGATE LOGIC: EXPRESSIVE POWER In this section we deal with the expressive power of the aggregate logic. Our main goal is to show that it satises a very strong locality property. Locality properties were introduced in model theory by Hanf [19] and Gaifman [16], and recently, following [14], they were a subject of renewed attention (see, e.g., [11; 30; 31; 33; 37] and references therein). Intuitively, those properties say that the behavior of logical formulae depends on the structure of small neighborhoods. They imply strong expressivity bounds for queries denable by logical formulae.

9 Logics with Aggregate Operators 9 (See Subsection 4.2.) As there are several ways to dene locality, we want to establish the strongest property. The relationship between various notions of locality was investigated in [21; 30], and it was shown that the one based on Hanf's theorem implies the one based on Gaifman's theorem, which in turn implies other locality properties based on degrees of structures. Thus, our goal is to show (precise denition will be given a bit later in this section): Expressiveness of L aggr (All; All): Over rst-sort structures, formulae of L aggr (All; All) are Hanf-local. It is known that many properties requiring xpoint computation, such as the connectivity and acyclicity tests for graphs, or computing the transitive closure, violate some form of locality. Thus, as a corollary, we shall see that adding unlimited arithmetic and aggregation to rst-order logic does not enable it to express those properties. We start by showing how to embed L aggr (All; All) into a simpler logic L C that does not have aggregate operations. We then review the main notions of locality used in nite-model theory, and prove the strongest of them, Hanf-locality, of L C. 4.1 Logic L C Definition 4.1. The logic L C is dened to be L aggr (;; ;); that is, aggregate terms or numerical functions and predicates are not allowed. A slightly weaker version of this logic was studied in [31]. That logic, denoted by L 1!(C), was dened as L C over one-sorted structures (although the formulae are two sorted) and the set N of natural numbers as the numerical domain. For two logics we write L 1 4 L 2 if L 2 is at least as powerful as L 1. If for every formula in L 1 there is an equivalent one in L 2 of the same or smaller rank, we write L 1 4 rk L 2. We use L 1 t L 2 if L 1 4 L 2 and L 2 4 L 1, and likewise for L 1 t rk L 2. The main result that we show here is the following: Theorem 4.2. L aggr (All; All) t rk an equivalent L C formula of the same rank. L C. That is, for every formula of L aggr (All; All), there is We devote the rest of this subsection to the proof of this theorem. The embedding L C 4 rk L aggr (All; All) is trivial. For the other direction, we need the following lemma, which will be used in the encoding of aggregate terms in order to count tuples over

10 10 L. Hella, L. Libkin, J. Nurmonen, L. Wong both numerical and rst-sort domains. Lemma 4.3. For every formula '(~z; ~y) of L C and every natural number m, there exists an L C formula m ['](~z) such that m ['](~a) holds i the number of tuples ~ b such that '(~a; ~ b) holds is precisely m. Moreover, rk( m [']) = rk(') + k, where k is the number of rst-sort variables in ~y. Proof. To simplify the notation in the proof, we use tuples of rationals ~q = (q 1 ; : : : ; q n ) instead of the ocial notation (c q1 ; : : : ; c qn ) in L C formulae. We assume m > 0 (if m = 0, then m ['](~z) can be taken to be :9~y: '(~z; ~y)). If all variables in ~y are rst-sort, then m ['](~z) is #~y:' = m. If all variables in ~y are second-sort, then m ['](~z) is _ (^ B ~ b2b '(~z; ~ b) ^ ^ ~ b62b :'(~z; ~ b)); where the disjunction is taken over m-element sets B of tuples of rational numbers, of the same length as ~y. Note that the rank of the above formula equals rk('). Now assume that ~y = (~y 1 ; ~y 2 ), where ~y 1 are rst-sort variables and ~y 2 are second-sort variables. Let k be the length of ~y 1 and p the length of ~y 2, k; p > P 0. For a given m > 0, let S m be the set s of all tuples (l 1 ; : : : ; l s ) of positive integers such that l i=1 i = m. Then m ['](~z) is given by (l 1 ;:::;l s)2s m ~a 1 ;:::;~a s2q p ~a 1 ;:::;~a s distinct " 8~y 1 8~y 2 '(~z; ~y 1 ; ~y 2 )! s_ i=1 (~y 2 = ~a i )! ^ s^ i=1 (#~y 1 :'(~z; ~y 1 ;~a i ) = l i ) Clearly, rk( m [']) = rk(') + k. To see correctness, note that if there are m tuples ~ b i = ( ~ b i 1; ~ b i 2) satisfying '(~z; ~ b i ) (divided into two subtuples by sort), then the number of distinct tuples among ~ b i 2, say s, is at most m, and if for each of the s such tuples ~ b i 2, l i is the number of tuples ~c for which (~c; ~ b i 2) is among ~ P s b j s, then l i=1 i = m. Since the converse to this is true as well, and the above formula codes this condition, we conclude the proof of the lemma. 2 To prove Theorem 4.2, we dene for each second-sort term t(~x) a formula t (~x; z) of L C, and for each formula '(~x) of L aggr (All; All) a formula ' (~x) of L C such that A j= ' t (~a; q) i t A (~a) = q, and A j= '(~a) i A j= ' (~a), for any A and ~a a tuple of A of the same type as ~x. This is done by simultaneous induction. Let t(~x) be a second-sort term. Then t (~x; z), where z is a second-sort variable not contained in ~x, is dened inductively as follows: if t(~x) is a variable x i, then t (~x; z) is the formula z = x i ;!#

11 Logics with Aggregate Operators 11 if t(~x) is a constant symbol c q, then t (~x; z) is the formula z = c q ; if t(~x) is f(t 1 ; : : : ; t n ), where f is an n-ary function on Q, then t (~x; z) is the formula _ (q;q 1 ;:::;q n)2q n+1 f (q 1 ;:::;q n)=q (z = c q ^ n^ i=1 t i (~x; c qi )); if t(~x) is #~y:'(~x; ~y), then t (~x; z) is the formula z = #~y: ' (~x; ~y); if t(~x) is Aggr F ~y:('(~x; ~y); s(~x; ~y)), where F is an aggregate function on Q, '(~x; ~y) is a formula, and s(~x; ~y) is a second-sort term, then t (~x; z) is the formula 1 (~x; z) _ 2 (~x; z) _ 3 (~x; z). Here 1 (~x; z) is _ ( ~m;~q)2i (z = c q ) ^ m [ ' ](~x) ^ p^ i=1 mi [ ' ^ s ](~x; c qi ) where I is the set of all pairs of tuples ~m = (m 1 ; : : : ; m p ; m) 2 (N n f0g) p+1, p m, and ~q = (q 1 ; : : : ; q m ; q) 2 Q m+1 such that p m i=1 i = m, f m (fjq 1 ; : : : ; q m jg) = q and the multiplicity of q i in fjq 1 ; : : : ; q m jg is m i for each i = 1; : : : ; m. The formula 2 (~x; z) is dened as (z = c! ) ^ ^ : m [ ' ](~x) m2n where c! is the constant symbol corresponding to f! from F, and m again binds variables ~y. Finally, the formula 3 (~x; z) is dened as (z = c f0 ) ^ :9~y '(~x; ~y): Let then '(~x) be a formula of L aggr (All; All). We may assume without loss of generality that for every occurrence of a relation R i (t 1 ; : : : ; t ni ), each t j is a variable of the appropriate sort. This is because the only rst-sort terms are variables, and for second-sort terms t j1 ; : : : ; t js, we V can introduce fresh variables v j1 ; : : : ; v js of the second sort, and replace R i (~t) by 9v j1 ; : : : ; v js ( v l j l = t jl ^R i (~t 0 )) where ~t 0 is obtained from ~t by replacing each second-sort term t jl by v jl. Note that this transformation into an equivalent formula does not change the rank. We now dene the formula ' (~x) of L C (simultaneously with formulae ' t ) inductively as follows: if '(~x) is a formula R i (~x), where R i (n i ; J i ) 2 and ~t : (n i ; J i ) then ' (~x) = '(~x); if W '(~x) is :(~x), then ' (~x) is : W V (~x); and if '(~x) is ' i (~x) or ' i (~x), then ' (~x) is ' i (~x) or V ' i (~x), respectively;!

12 12 L. Hella, L. Libkin, J. Nurmonen, L. Wong if '(~x) is 9y(~x; y) or 8y(~x; y), then ' (~x) is 9y (~x; y) or 8y (~x; y), respectively; if '(~x) is a formula t(~x) = s(~x) where t and s are rst-sort terms (that is, rst-sort variables), then ' (~x) = '(~x); if W'(~x) is a formula t(~x) = s(~x) where t and s are second-sort terms, then ' (~x) is the formula q2q ( t(~x; c q ) ^ s (~x; c q )); if '(~x) is a formula P (t 1 ; : : : ; t n ) where each t i is a second-sort term and P is an n-ary predicate on Q, then ' (~x) is the formula W (q 1 ;:::;q n)2p (V n i=1 t i (~x; c qi )). Obviously t and ' constructed above are formulae of L C. It is also straightforward to verify by induction that rk( t ) = rk(t) and rk( ' ) = rk('), for every second-sort term t and for every formula ' of L aggr (All; All). It remains to verify that the constructed formulas t and ' have the desired properties. This is proved in the next lemma. Lemma 4.4. If t(~x) is a second-sort term, ~a is a tuple of A of the same type as ~x, and q 2 Q, then t A (~a) = q i A j= t (~a; q). Similarly, if '(~x) is a formula of L aggr (All; All), and ~a is a tuple of A of the same type as ~x, then A j= '(~a) i A j= ' (~a). Proof. Proof is by simultaneous induction. We sketch a few cases; others are straightforward. Let rst t(~x) be a second-sort term, ~a a tuple of A of the same type as ~x, and q 2 Q. The cases where t is a variable, constant symbol, counting term or of the form f(t 1 ; : : : ; t n ) for some f : Q n! Q, are obvious. Let then t(~x) be Aggr F ~y:('(~x; ~y); s(~x; ~y)). Assume rst that '(~a; A) is nite and not empty. Then t A (~a) = q i f m (s('(~a; A))) = q, where m is the cardinality of the multiset M = s('(~a; A)). P Thus there exist distinct rationals q 1 ; : : : ; q s such that the multiplicity of q i in M s is m i > 0, m i=1 i = m, and for each q i, there are exactly m i tuples ~ b such that '(~a; ~ b) holds and t A (~a; ~ b) = q i. But this is exactly what 1 states. If '(~a; A) is innite, f m (s('(~a; A))) = q i q = f!. Since 2 tests for M being innite and z being f!, and 3 tests for M being empty and z being f 0, we conclude the proof of correctness for the case of aggregate terms. Suppose then '(~x) is a formula of L aggr (All; All), and ~a is a tuple of A of the same type as ~x. Again, the rst four cases above are obvious. If '(~x) is a formula t(~x) = s(~x) where t and s are second-sort terms, then A j= '(~a) i for some q 2 Q, t A (~a) = q and s A (~a) = q. Since t A (~a) = q and s A (~a) = q hold i A j= t (~a; q) and A j= s (~a; q), this is equivalent to A j= ' (~a). If '(~x) is a formula P (t 1 ; : : : ; t n ) where each t i is a second-sort term and P is an n-ary predicate on Q, then A j= '(~a) i for some (q 1 ; : : : ; q n ) 2 Q n we have t A i (~a) = q i for each i and (q 1 ; : : : ; q n ) 2 P. But this is exactly what ' states. 2

13 Logics with Aggregate Operators 13 Theorem 4.2 now follows immediately Notions of locality in nite models In this section, we only consider one-sorted nite structures A = ha; R A 1 ; : : : ; R A l i and twosorted structures over signatures that only contain relation symbols of the non-numerical sort (i.e., we assume that J i = f1; : : : ; n i g for every R(n i ; J i ) 2 ). We call such two-sorted structures pure. Note that each one-sorted nite structure A can be extended to pure twosorted structure simply by adding the set Q as the second sort (and interpreting the constant symbols c q in the canonical way). We denote this extension of A by A. Given a nite one-sorted structure A, its Gaifman graph [12; 16; 14] G(A) is dened as ha; Ei where (a; b) is in E i a 6= b and there is a tuple ~c 2 R A i for some i such that both a and b are in ~c. The distance d(a; b) is dened as the length of the shortest path from a to b in G(A); we assume d(a; a) = 0. If ~a = (a 1 ; : : : ; a n ) and ~ b = (b 1 ; : : : ; b m ), then d(~a; ~ b) = min ij d(a i ; b j ). Given ~a over A, its r-sphere S A(~a) is r fb 2 A j d(~a; b) rg. Its r-neighborhood NA r (~a) is dened as a structure in the signature that consists of and n constant symbols: hs A r (~a); RA 1 \ S A r (~a)n 1 ; : : : ; R A l \ S A r (~a)n l ; a 1 ; : : : ; a n i That is, the universe of N A r (~a) is S A r (~a), the interpretation of the -relations is inherited from A, and the n extra constants are the elements of ~a. If A is clear from the context, we write S r (~a) and N r (~a). Given a tuple ~a of elements of A, and d 0, by ntp A d (~a) we denote the isomorphism type of N A(~a). Then d ntpa(~a) = d ntpb(~ d b) means that there is an isomorphism N A(~a)! d NB(~ d b) that sends ~a to ~ b; in this case we will also write ~a A;B ~ d b. If A = B, we write ~a A ~ d b. Given a tuple ~a = (a 1 ; : : : ; a n ) and an element c, we write ~ac for (a 1 ; : : : ; a n ; c). For two -structures A; B, we write A d B if there exists a bijection f : A! B such that ntp A(a) = d ntpb d (f(a)) for every a 2 A. That is, every isomorphism type of a d-neighborhood of a point has equally many realizers in A and B. We write (A;~a) d (B; ~ b) if there is a bijection f : A! B such that ntp A d (~ac) = ntpb d (~ bf(c)) for every c 2 A. Hanf-locality has been previously dened only for nite one-sorted structures. In the following we make a natural extension of its denition to the case of pure two-sorted structures. Definition 4.5. (see [19; 14; 30; 21]) A formula '(~x) on pure two-sorted structures is called Hanf-local if there exist a number d 0 such that for all nite one-sorted structures A and B, (A;~a) d (B; ~ b) implies A j= '(~a) i B j= '( ~ b). The denition for open formulae is from [21]; most previous papers [19; 14; 30; 37] considered its

14 14 L. Hella, L. Libkin, J. Nurmonen, L. Wong restriction to sentences. It is known [14] that A d B implies A r B for r d. It is also known that every (one-sorted) rst-order sentence is Hanf-local and d can be taken to be 3 qr()?1 [14]. This was generalized to various counting logics [37; 21], and the bound was improved to 2 qr()?1? 1 [23; 31]. Definition 4.6. (cf. [30; 31]) A formula '(~x) on pure two-sorted structures is called Gaifman-local if there exists a number r 0 such that, for any nite one-sorted structure A and any ~a; ~ b over A, ~a A r ~ b implies that A j= '(~a) i A j= '( ~ b). Gaifman's theorem [16] implies this notion of locality for rst-order formulae, with a (7 qr(')? 1)=2 bound for r; in [31] a tight bound of 2 qr(')? 1 is established. It is known that connectivity of graphs is not a Hanf-local property [14], and that the transitive closure of a graph is not Gaifman-local [16; 11]. Locality { either Gaifman or Hanf { implies a number of results that describe outputs of local queries by relating degrees of elements in the input and output. We now briey review one such result. With each formula '(x 1 ; : : : ; x n ) in the signature, we associate a query that maps a - structure A into '[A] = f~a 2 A n j A j= '(~a)g. If A is a -structure, and R i is of arity p i, then degree j (R A i ; a) for 1 j p i is the number of tuples ~a in R A i having a in the jth position. In the case of directed graphs, this gives us the usual notions of in- and out-degree. By deg set(a) we mean the set of all degrees realized in A, and deg count(a) stands for the cardinality of deg set(a). Definition 4.7. (see [33; 11; 30]) A query q, that is, a function that maps a -structure A to an m-ary relation on A, m 1, is said to have the bounded number of degrees property, or BNDP, if there exists a function f q : N! N such that deg count(q(a)) f q (k) for every A with deg set(a) f0; 1; : : : ; kg. 2 (Note that in several previous papers this was called the bounded degree property, or BDP, which was confusing as we talk about the number of degrees in the output. We thus decided to change the name.) The intuition behind this property is that if A locally looks simple, then q(a) has a simple structure as well { it cannot realize many dierent degrees. The BNDP is very easy to use for proving expressivity bounds [33]. For example, if A is a successor relation, then deg set(a) = f0; 1g. However, if j A j= n, then deg count(trcl(a)) = n, where TrCl(A) is the transitive closure of A. Thus, the transitive closure query cannot be expressible in any logic that has the BNDP. The relationship between the notions of locality we introduced is the following, when one deals

15 Logics with Aggregate Operators 15 with one-sorted nite structures: Proposition 4.8. a) (see [21]) Every Hanf-local formula is Gaifman-local. b) (see [11]) Every query dened by a Gaifman-local formula has the BNDP. 2 These results are not aected by the transfer to pure two-sorted structures. 4.3 Locality of L C In [37] it was proved that the extension of rst-order logic by all unary generalized quantiers is Hanf-local. The proof was based on bijective Ehrenfeucht-Frasse games [20] which characterize equivalence of structures with respect to all unary quantiers. We now use these games to prove the Hanf-locality of L C. Let A and B be two -structures, ~a 2 A n, and ~ b 2 B n. The r-round bijective game BEF r (A;~a; B; ~ b) is played by two players, called the spoiler and the duplicator. In each round i = 1; : : : ; r, the duplicator selects a bijection f i : A! B, and the spoiler selects an element c i 2 A (if j A j6=j B j, then the spoiler wins). After each round i, these moves determine the relation p i = p 0 [ f(c j ; f j (c j )) j 1 j ig, where p 0 is the initial relation f(a j ; b j ) j 1 j ng between the components of ~a and ~ b. The spoiler wins the game, if for some i, p i is not a partial isomorphism A! B; otherwise the duplicator wins. Before using these games for the Hanf-locality result, we rst mention the following simple observation. Lemma 4.9. Every formula in L C is equivalent to a formula that does not contain any quantiers 9q or 8q over second sort variables q. V W Proof. Clearly 9q'(~x; q) is equivalent to q2q '(~x; c q), and similarly, 8q'(~x; q) is equivalent to q2q '(~x; c q). Hence the claim follows by straightforward induction. 2 We now prove our main technical result of this subsection. Lemma Let A and B be nite one-sorted -structures, ~a 2 A n, ~ b 2 B n, and let A and B be the corresponding pure two-sorted structures. If the duplicator has a winning strategy in BEF r (A;~a; B; ~ b), then for every formula '(x 1 ; : : : ; x n ) in L C, with rk(') r and all free variables of the rst sort, A j= '(~a) if and only if B j= '( ~ b). Proof. Let t(~x; ~z) be a second sort term, and '(~x; ~z) a formula of L C, where ~x are rst sort variables and ~z are second sort variables. Furthermore, let ~c and ~q be tuples of elements of A

16 16 L. Hella, L. Libkin, J. Nurmonen, L. Wong and Q of matching lengths. We prove by simultaneous induction on i r that the following two claims hold whenever the duplicator is playing according to his winning strategy: (a) If rk(t) i and ~c 2 (dom(p r?i)) j~xj, then t A (~c; ~q) = t B (p r?i~c; ~q). (b) If rk(') i and ~c 2 (dom(p r?i)) j~xj, then A j= '(~c; ~q) if and only if B j= '(p r?i~c; ~q). Assume rst that i = 0. If rk(t) = 0, then t is either a second sort variable or a constant symbol. In both cases (a) holds trivially. Similarly, if rk(') = 0, then by Lemma 4.9, we can assume that ' is quantier free. Thus ' is Boolean combination of atomic formulas which are either purely rst-sort formulas or second-sort formulas of the form t = s for terms t, s. The truth of the rst type is preserved by p r since p r is determined by a winning strategy of the duplicator, and so it is a partial isomorphism. For the second type atomic subformulas the truth is preserved obviously since the values of the terms in A and B are the same. Assume then that i > 0 and the claims (a) and (b) hold for all j < i. Let rk(t) i. We can assume without loss of generality that rk(t) = i; otherwise (a) follows directly from the induction hypothesis, since p r?i p r?j for all j < i. Thus, t is of the form #~y: (~x; ~y; ~z), where rk( ) = i? k for k =j ~y j. Consider now the following function g : A k! B k generated by the winning strategy of the duplicator: g(d 1 ; : : : ; d k ) = (f r?i+1(d 1 ); : : : ; f r?i+k(d k )): As each f r?i+j is a bijection A! B, it is easy to see that g is a bijection A k! B k. Moreover, by the induction hypothesis, A j= (~c; ~ d; ~q) if and only if B j= (p r?i+k~c; g( ~ d); ~q) for all ~d = (d 1 ; : : : ; d k ) 2 A k. In other words, g is a bijection between the sets (~c; A ; ~q) and (p r?i~c; B ; ~q), and so t A (~c; ~q) = t B (p r?i~c; ~q). To prove (b), assume (without loss of generality) that rk(') = i. If ' is of the form t = s for second sort terms s and t, the claim follows easily from (a). The cases ' = :, ' = W and ' = V are also straightforward. Assume nally, that ' = 9y (~x; y; ~z). If A j= '(~c; ~q), then A j= (~c; d; ~q) for some d 2 A. The induction hypothesis implies then that B j= (p r?i~c; p r?i+1(d); ~q), whence B j= '(p r?i~c; ~q). The converse implication is proved in the same way. This completes the induction. The lemma follows now from the case i = r of claim (b). 2 We now prove the desired locality result. Theorem Over pure two-sorted structures, every formula of L C without free second sort variables is Hanf-local.

17 Logics with Aggregate Operators 17 Proof. Let '(~x) be a formula of L C, where ~x are rst-sort variables and rk(') = r. Let A and B be nite one-sorted -structures, and let ~a 2 A n and ~ b 2 B n. It was proved in [37] that if (A;~a) d (B; ~ b) for d = 3 r, then the duplicator has a winning strategy in the bijective game BEF r (A;~a; B; ~ b), and hence by Lemma 4.10, A j= '(~a) if and only if B j= '( ~ b). Thus ' is Hanf-local. 2 By Theorem 4.2, we get as a consequence the Hanf-locality of the full aggregate logic L aggr (All; All). Corollary Over pure two-sorted structures, all formulas of L aggr (All; All) without free second-sort variables are Hanf-local. 2 As we said earlier, Hanf-locality is a very strong form of locality that implies others, and consequently it gives us many expressivity bounds. Some of them are listed below. For a general overview of deriving expressiveness results from locality, see [11; 14; 16; 30; 33]. Corollary a) Over pure two-sorted structures, all formulas of L aggr (All; All) without free second-sort variables are Gaifman-local and have the BNDP. b) None of the following can be expressed in L aggr (All; All) over graphs on the universe of the rst sort: transitive closure, deterministic transitive closure, connectivity test, acyclicity test, the same-generation property for nodes in acyclic graphs, testing for balanced k-ary tree, k 1. 2 Thus, despite its enormous counting power, L aggr (All; All) cannot express nonlocal properties, among them most properties requiring xpoint computations. Remark. Note that in all the results proved so far the choice of numerical sort is irrelevant. The proofs go through if Q is replaced by any standard numerical domain, like N; Z; R or C. 5. DATABASE QUERY LANGUAGES AND L aggr The goal of this section is to show how standard SQL features can be coded in L aggr, thereby providing bounds on the expressive power of database queries with aggregation. The coding that we exhibit here is not only more general but also much simpler and more intuitive than that of [30; 34], thanks to the design of L aggr that does not limit available arithmetic operations and makes it easy to code aggregation.

18 18 L. Hella, L. Libkin, J. Nurmonen, L. Wong 5.1 Languages NRL aggr and RL aggr We dene a relational query language RL aggr (; ), which extends standard relational query languages, such as relational algebra and calculus, with aggregation constructs. The language is parameterized by a collection of allowed arithmetic functions and predicates and a collection of allowed aggregates. We assume that the usual arithmetic operations (+,?,, ) and the order < on Q are always in and the summation aggregate ( P ) is always in. The language is dened as a suitable restriction of a nested relational language NRL aggr (; ), in the same way it was done previously [33; 34]. The type system is given by Base := b j Q rt := Base : : : Base ft := rt j frtg t := Base j t : : : t j ftg The base types are b and Q, with the domain of b being an innite set U, disjoint from Q. We use for product types; the semantics of t 1 : : : t n is the cartesian product of the domains of types t 1 ; : : : ; t n. The semantics of ftg is the nite powerset of elements of type t. Types rt (record types) and ft (at types) are used in restrictions that dene RL aggr. A database schema is a list of names of database relations (which may be nested relations) together with their types. We are particularly interested in the case of schemas consisting of at relations, that is, those of types frtg. Such a list of names of relations and their at types naturally corresponds to a two-sorted signature. Indeed, a relation of type t = fb 1 : : : b n g, with each b i being either b or Q, corresponds to R(n; J) where J = fi j b i = bg. We thus identify at schemas and two-sorted signatures. Also, for each relational symbol R(n; J) in a two-sorted signature, we write tp (R) for its type, that is, fb 1 : : : b n g where b i = b for i 2 J and b i = Q for i 62 J. Expressions of the language (over a xed schema ) are shown in Figure 1. We adopt the convention of omitting the explicit type superscripts in these expressions whenever they can be inferred from the context. The set of free variables of an expression e is dened by induction on the structure of e and we often write e(x 1 ; : : : ; x n ) to explicitly indicate that x 1,..., x n are free variables of e. 0, 1, R, and ; t have no free variables. The free variables of (e 1 ; : : : ; e n ) are those of e 1,..., e n. The free variables of if e then e 1 else e 2 are those of e, e 1, and e 2. The free variables of f(e), P (e), i;n e and feg are those of e. The free variables of = (e 1 ; e 2 ) and e 1 [ e 2 are those of e 1 and e 2. The free variable of x is the variable x itself. The free variables of S fe 1 j x 2 e 2 g, P fe 1 j x 2 e 2 g, and Aggr F fe 1 j x 2 e 2 g are the free variables of e 1, excluding x, and those of e 2. In these three

19 constructs, we require that x is not a free variable of e 2. Logics with Aggregate Operators 19 Semantics. For each xed schema and an expression e(x 1 ; : : : ; x n ), the value of e(x 1 ; : : : ; x n ) is dened by induction on the structure of e and with respect to a database (nite -structure) A and a substitution [x 1 := a 1 ; : : : ; x n := a n ] that assigns to each variable x i a value a i of the appropriate type. We write e[x 1 := a 1 ; : : : ; x n := a n ](A) to denote this value; if the context is understood, we often shorten this to e[x 1 := a 1 ; : : : ; x n := a n ] or even just e. The values of 0 and 1 are 0; 1 2 Q. For reason of economy, we use them to code Booleans, letting 0 code \true" and 1 code \false" (any other pair of rationals can be used for that purpose). The value of f(e) is the rational number obtained by applying the function f 2 to the value of e. The value of P (e) is 0 if the predicate in denoted by P holds on the tuple denoted by e; otherwise, it is 1. The value of R is the corresponding relation in A. The value of if e then e 1 else e 2 is that of e 1 if the value of e is 0; otherwise, it is that of e 2. The value of (e 1 ; : : : ; e n ) is the n-ary tuple having the values of e 1,..., e n at positions 1,..., n respectively. The value of i;n e is the value at the i-th position of the n-ary tuple denoted by e. The value of = (e 1 ; e 2 ) is 0 if e 1 and e 2 have the same value; otherwise, it is 1. The value of the variable x is the corresponding a assigned to x in the given substitution. The value of feg is the singleton set containing the value of e. The value of e 1 [ e 2 is the union of the two sets denoted by e 1 and e 2. The value of ; is the empty set. To dene the semantics S S P of, and Aggr F, assume that the value of e 2 is the set fb 1 ; : : : ; b m g. Then the value of fe 1 j x 2 e 2 g is dened to be m[ The value of P fe 1 j x 2 e 2 g is i=1 mx i=1 e 1 [x 1 :=a 1 ; : : : ; x n :=a n ; x:=b i ](A): e 1 [x 1 :=a 1 ; : : : ; x n :=a n ; x:=b i ](A): Finally, the value of Aggr F fe 1 j x 2 e 2 g is f m (fjc 1 ; : : : ; c m jg), where f m is the mth function in F 2, and each c i is the value of e 1 [x 1 :=a 1 ; : : : ; x n :=a n ; x:=b i ](A), i = 1; : : : ; m. Language RL aggr. The at language RL aggr (; ) is obtained from NRL aggr (; ) by imposing the following type restrictions: each relation in is of type frtg; each expression is of type ft (at type); for each rule in Group 2, all t i s are replaced by Base;

20 20 L. Hella, L. Libkin, J. Nurmonen, L. Wong Group 1 0; 1 : Q R 2 R : tp (R) e : Q e 1 : t e 2 : t if e then e 1 else e 2 : t e : Q : : : Q (n times) f(e) : Q P (e) : Q for f : Q n! Q and P Q n from Group 2 e 1 : t 1 ; : : : e n : t n (e 1 ; : : :; e n ) : t 1 : : : t n i n e : t 1 : : : t n e 1 : t e 2 : t i;n e : t i = (e 1 ; e 2 ) : Q x t : t e : t feg : ftg e 1 : ft 1 g e 2 : ft S 2 g fe1 j x t2 2 e 2 g : ft 1 g Group 3 e 1 : ftg e 2 : ftg e 1 [ e 2 : ftg ; t : ftg e 1 : Q e P 2 : ftg fe1 j x t 2 e 2 g : Q F 2 e 1 : Q e 2 : ftg Aggr F fe 1 j x t 2 e 2 g : Q Fig. 1. Expressions of NRL aggr (; ) over signature for each rule in Group 3, all occurrences of t should be replaced by rt (record types). Thus, input databases for RL aggr expressions can be identied with nite -structures, when is a two-sorted signature. Furthermore, for a RL aggr (; ) expression e(x 1 ; : : : ; x n ), all free variables are of record types; thus, we shall write e[x 1 := ~a i ; : : : ; x n := ~a n ](A) for the value of this expression, where ~a i are tuples of the same type as x i, and A is a -structure. We shall now assume, for the rest of the paper, that in -structures, all relations are nite.

logic, cf. [1]. Another area of application is descriptive complexity. It turns out that familiar logics capture complexity classes over classes of (o

logic, cf. [1]. Another area of application is descriptive complexity. It turns out that familiar logics capture complexity classes over classes of (o Notions of locality and their logical characterizations over nite models Lauri Hella Department of Mathematics P.O. Box 4 (Yliopistonkatu 5) 00014 University of Helsinki, Finland Email: lauri.hella@helsinki.fi