Master Thesis Project

Size: px

Start display at page:

Download "Master Thesis Project"

Briana Haynes
5 years ago
Views:

1 Master Thesis Project Engineering Content-Centric Future Internet Applications Author: Alain Perkaz Supervisor: Mauro Caporuscio Examiner: Andreas Kerren Reader: Narges Khakpour Semester: VT 2018 Course Code: 5DV50E Subject: Computer Science

2 Abstract As the number and heterogeneity of devices connected to the Internet grow, the inherent complexity of the applications deployed on it also increases exponentially. Aside from the added complexity burden to the developers, this growth also challenges the current architectures for distributed application development and deployment, with challenges such as bandwidth limitations, latency concerns regarding ubiquitous computing and autonomous service management. To face those challenges, the umbrella term Future Internet is introduced. This thesis focuses on providing an overview of the main challenges to consider when developing content-centric Future Internet applications, namely: semantic service annotation, semantic service discovery, and opportunistic service grouping. These applications leverage the available services in a given environment to opportunistically provide content-tailored experiences. The thesis provides a background in the area of content-centric applications and Future Internet, in the form of a literature survey. The outcomes of this survey are leveraged to better understand the domain and propose an architectural extension for enabling content-centric applications. The architectural extension is presented in a general manner and then grounded to the Future Internet development framework PRIME. The grounding is referred to as I-PRIME and focuses on enabling interest-based content sharing between content-centric application peers. This work aims to extend the knowledge about content-centric applications and semantically annotated services, provide an architectural extension to achieve that goal, and ground it to an existing framework. The outcome is valuable for content-centric system developers and researchers due to the broadness of the literature review, and to the current developers of PRIME by providing a set of architectural guideline to follow in order to implement content-centric applications. Keywords: Future Internet, content-centric applications, semantic annotations, interest-defined communities, PRIME middleware

3 Preface This thesis is the final work in the Linnaeus University Master s Software Technology program. It presents the results of the research on Engineering Content-Centric FI Applications. From my point of view, a Computer Engineering graduate from Spain, the opportunity of conducting a substantial theoretical and systematic thesis research work has helped me to broaden my knowledge on several areas of Computer Science previously unknown for me. I would like to thank my thesis supervisor Mauro Caporuscio and PhD student Mirko D Angelo for their contribution and provided guidance in the form of discussions and continuous feedback. I would also like to thank the thesis course examiner Narges Khakpour for the useful insights into the process to follow during the thesis and guidance. Lastly, I would like to thank my family, girlfriend, and close friends for their support during this period of studies.

4 Contents 1 Introduction Background and Motivations Cloud Computing Fog Computing Edge Computing XaaS Service Discovery Current Status of PRIME Motivating Scenario Social Library Problem Statement Contributions Target group Report Structure Method Scientific approach Method description Literature Review Qualitative Evaluation of I-PRIME Reliability and Validity Literature Review Semantic Annotation of Services RDF OWL-S WSML SAWSDL Service discovery Service matchmaking Service discovery architectures Interest-based dynamic grouping Interest definition Service grouping I-PRIME PRIME Architectural extension of PRIME (I-PRIME) Application-context Resource management Interest definition Groupings Evaluation Running example Application-context ontology Resources Resource groups

5 5.1.4 Users Interests Interest groups Extension evaluation Result discussion Conclusions and Future Work 68 References 71 A Appendix 1 - Ontologies A A.1 Semantic Annotation of Services A A.1.1 OWL-S Example A A.1.2 WSDL Example B A.2 Device ontology example A A Appendix 2 - Motivating scenario s context ontology B A.1 Application context ontology for Social Library B

6 1 Introduction The Internet as we know it today has sustained continuous evolution since its creation, radically changing means of communication and ways in which commerce is globally operated. From the World Wide Web to the two-way video calls, it has shifted the ways people communicate and societies function. The Internet itself was first conceived as a network that would enable the communication between multiple trusted and known hosts, but as the time passed, it has notably evolved. Due to the significant adoption of Internet-connected devices (phones, personal computers, tablets...), the initial device homogeneity has shifted towards an extremely heterogeneous environment in which many different devices consume and publish resources, also referred as services [1]. As the number of connected devices and resources increases, it becomes critical to building systems that enable the autonomic publication, consumption, and retrieval of those resources [2]. As the inherent complexity of systems continues to grow, it is essential to set boundaries to their achievable capabilities. The traditional approaches to network-based computing are not sufficient [3], and new reference approaches should be presented. In this context the Future Internet (FI) [4] term emerges, a worldwide execution environment connecting large sets of heterogeneous and autonomic devices and resources. In such environments, systems leverage service annotations to fulfil emerging goals and dynamically organise resources based on interests. Although research has been conducted in those areas, active research is being carried out in the following areas: extensible machine-readable annotation of services, dynamic service discovery, architectural approaches for decentralised systems, and interest-focused dynamic service organisations. These concepts will be explained in the next section, as they will serve to contextualise the later presented problem statement and research questions. 1.1 Background and Motivations As briefly mentioned before, the Internet has had a great impact on our society. Nowadays, with the raise of data-driven trends such as the Internet of Things (IoT) and Industry 4.0, the impact continues to grow. Among others, those trends will allow increasing production capabilities [5], save energy [6], and provide better healthcare [7]. However, although the reach of the impact is clear, the limitations of current approaches must be solved first [3][8]. Along the data-expansion process [9] it is important to note that the Internet has evolved from Web 1.0 to Web 3.0 [10], embracing (semantic) data in applications and business processes. Regarding this process, it can be stated that the future relies on content-driven solutions. Cloud Computing service popularisation is an essential example of this, from Platform as a Service (PaaS) to the more and more common Software as a Service (SaaS). For the sake of complexity and flexibility [11], developers focus on building content and interaction [12] centric systems, by relying on resources provided by other systems. In the following subsections, a general introduction to multiple FI-related concepts and approaches will be provided. The current status of the framework upon which the extension will be presented will be introduced and detailed (PRIME). The description of the status will provide a broader architectural view before the introduction of the extension. 1

7 Future Internet refers to a global execution environment populated by a myriad of heterogeneous services (resources, devices, and systems). While the foundations of the Internet are challenged by the growth and expansion of multimedia applications and content [13], the communication through it is evolving. From the original human-human range, it will expand to human-machine and machine-machine [14] (also referred to as M2M) communication. The Future Internet has been grounded in four main pillars [15]: Internet by and for the People, Internet of Consensus and Knowledge, Internet of Services and Internet of Things. Those pillars share a common foundation, the Network Infrastructure. In the following image (Figure 1.1) the goals and foundations of each pillar are displayed. Figure 1.1: FI pillars and their foundation. The first of those pillars is the Internet by and for the People, which shall connect the growing population over time, encouraging the free exchange of ideas. The existing conceptual barrier between information producers and consumers will fade out, and roles such as the prosumer (both consuming and producing entities) will emerge [15]. Clients of FI will easily share knowledge and contents between them (both producing and consuming them), dynamically creating virtual communities around that shared knowledge. Semantic annotation of services, knowledge exchange mechanisms, and machine reasoning over knowledge is crucial for this pillar to form. Furthermore, the automatic knowledge aggregation of existing information and automatic reasoning are required to cope with the existing and future information available on the Web [15]. The second pillar, the Internet of Consensus and Knowledge emerges with the need for intelligent knowledge sharing mechanisms. As the available information grows and grows, the manual management of it becomes infeasible. Semantic an- 2

8 notation of resources enables that by deriving machine and understandable human knowledge from information. Opportunistic run-time aggregation of services [16] based on the available services and the declaration of services of interests is also possible [15]. The Internet of Services is the third pillar upon which FI is grounded. This umbrella term covers several interacting phenomena that will shape the future of how services operate over the Internet focused on Internet-scale service-oriented computing. It will radically change the way Internet applications are engineered, executed, and operated. New categories of applications, based on the access to computing resources, data, and software capabilities on demand, will emerge. This provides ease of service orchestration and potentiates prosumers sharing their knowledge. The applications will be centred around the content they provide by selfservicing and opportunistic service mashup. Systems will be fully personalised to the users (preferences, interests, navigation patterns). Focused on "loose-coupling" of services, services will be invoked and detected based on their capabilities. Cloud computing is a clear example of this, as it abstracts where the application is executed and how computing resources are managed, providing greater scalability and flexibility [15]. The fourth and last pillar is the Internet of Things (IoT). IoT implies that many daily life objects will be connected to the Internet and coordinated for achieving goals. IoT will face high levels of heterogeneity at the device level, and semantics will play a central role in managing that diverse environment. To address scalability needs, semantic dependant protocols must be created [15]. As a closing note, to allow content-driven applications to operate in such environments, it is essential to allow opportunistic run-time aggregation of services [16], based on the available services and the declaration of services of interests. Concerning the service availability, heterogeneous services must be extensively annotated in machine-readable formats. Therefore, concerns such as service annotation, service discovery, resource sharing and Edge Computing architectures must be analysed Cloud Computing Cloud Computing (CC) is a paradigm that composes the Net Infrastructure layer of FI, alongside Fog Computing, and Edge computing. It is also a business model and deployment model for applications. It enables the access to a pool of shared computing assets (resources) [3], often over the Internet. It emerged as an alternative to reduce the required upfront IT-investment of companies by providing easy scaling, fast provisioning, and minimal management of resources. Since Amazon Web Services (AWS) presented EC2, the access to high-capacity networks, lowcost computing parts and the broad adoption of hardware virtualisation, have led to broad adoption of CC. Multiple services and deployment models are available, each best suited for different use cases. Cloud providers enable access to their resources in multiple forms [17], the most common ones being: On-Premise, Infrastructure as a Service (IaaS), Platform as a Service (PaaS), Software as a Service (SaaS), Backend as a Service (BaaS) and Serverless Computing. These are layered roster by their level of abstraction, from lowest-level abstraction (On-Premise) to highest-level abstraction (Serverless Computing). 3

9 The lowest level of abstraction with regards to Cloud resources is provided by the On-Premise CC model, also referred to as Private Cloud. The Cloud deployment models will be briefly described in the next section, but On-Premise CC enables fine-grained control of all the resources, from the networking, storage and server layers, up to the application level. All the stack must be entirely managed by the Cloud user. IaaS provides high-level API access to the low-level Cloud resources. The user of such a service can deploy and manage all the stack that lays on top of the hypervisor, that being: the operating system, storage, deployed applications and limited access to network components. The Cloud provider provisions the resources, but the Cloud user is responsible for patching and updating Operating System (OS) and apps. The billing is typically conducted on a utility computing basis, the amount of consumed and allocated resources [18]. Examples of this model are Amazon Web Services (AWS) EC2, Google Cloud Compute Engine and Azure IaaS. PaaS provides next-tier abstraction when compared with IaaS, by offering the application development environment to developers. In contrast with IaaS, Cloud providers usually provision and keep up to date the OS, databases and web servers. Within this type of service, most vendors allow the vertical scaling of the resources that are being used by the deployments. Users can focus on the development environment their application will require, without expending resources on maintenance and troubleshooting. SaaS, which is also referred to as on-demand-software [19], provides a very high level of abstraction. Cloud vendors are responsible for the maintenance of the infrastructure and platforms on which those apps rely. The end-user gains access to software functionality, on top of the aforementioned operational benefits of Cloud hosting. App updates can be rolled-out seamlessly to the end-users, scale horizontally to meet needs and host multi-tenant solutions. The pricing model followed by SaaS is set by a flat fee per user, billed monthly or yearly. Popular SaaS examples are Salesforce, Microsoft Office 365, and Google Apps. Figure 1.2: Levels of abstraction of the main Cloud service models. Along the previously mentioned service models (Figure 1.2), and with the expansion of the Web as a medium to serve apps, Cloud providers started offering BaaS [20]. The goal of such service is to reduce the overhead and operational 4

10 costs to develop, deploy and maintain backend services for Web and mobile apps. Features such as authentication, user-database management, asset storage and transparent scalability are provided. Some examples of popular BaaS service solutions are Firebase, Meteor, and Kumulos. Finally, it is worth mentioning the Serverless service model, which has gained much traction lately. Serverless raises the abstraction level even higher from the CC resources, by providing a stateless environment where functions are executed when responding to events (API calls, updates to a database, message queue events...). Serverless provides the advantages of virtually unlimited horizontal scalability, ease of deployment and testing of the business logic, as the running environment is taken cared by the Cloud providers. The pricing model is another advantage of such a model. Billing is done per-function-call, allowing extremely fine-grained control over the payments, and reducing to zero the system cost if the function is not reacting to an event. Examples of this model are AWS Lamba functions, Google Cloud functions and Azure functions. Regarding the ownership and deployment models, Cloud Computing providing infrastructures can be separated into four main groups. Those are Public Clouds, Private Clouds, Community Clouds and Hybrid Clouds. Public Clouds are Cloud infrastructures accessible through network interfaces of open access (such as Internet browsers). Architecture-wise, they are not principally different from Private Clouds, although security may be approached differently, as Public Cloud vendors offer their services to a public audience over a non-trusted network. To mitigate the effect of this, Cloud providers also procure direct connect options to their private owned Internet-like networks and avoid the need of using VPN connections. Private Clouds are Cloud infrastructures operated only by private organisations. They run on an organisation s servers, very much like an Intranet [21]. Such ownership model provides multiple benefits, mainly: fine-grained control over deployments, updates, and security. That said, owning such infrastructure requires a high initial investment and pushes organisations to continually reevaluate the taken decisions with regards to the resources, as security issues must be evaluated and countered before inferring into vulnerabilities. Also, the main advantages provided by the economic model of CC (hand-off management, reduces physical footprint and virtually unlimited scalability) are lost, raising criticism about this deployment model [21]. Between the Public and Private Cloud concepts, Community Cloud stands. Acting as a conceptual extension over Private Clouds, in this deployment model multiple organisations with common concerns share the ownership of a Cloud. Only the members of members of those organisation can access it, and the costs are shared across multiple organisations. This shared-ownership model provides better economic results than what can be achieved with Private Clouds, but worse than Public Clouds. Hybrid Clouds are the composition or two or more Cloud deployment models (Private, Public or Community). Such set up provides the advantages of combining the benefits of those models, without losing the decoupled advantages. To achieve that, the services of those distinct models are combined and composed, blurring the boundaries between the models. Multiple use cases for this type of Cloud model occur, namely in the enterprise context. A company may require separating its sensitive data (user roles, personal information, payroll data...) from the operational 5

11 business logic. The sensitive data can be then stored on a Private Cloud set up on the company s data centre, and the non-sensitive business logic part deployed into the Public Cloud, as AWS Serverless functions, for example Fog Computing Once laying the conceptual foundations of what CC is and clarifying which kind of deployment model provides, it is crucial to describe other approaches to decentralised computing which conform to the mentioned above Net Infrastructure. Fog Computing (also referred to as fogging), in an architectural plan where a substantial part of the work (computation, storage, communication and management) is offloaded from the traditional Cloud. By extending the previously presented CC model to the edge of the networks and near the end users, a new horizon of opportunities is open for applications and systems. The work mentioned above is offloaded into edge devices often referred to as Fog Nodes (client devices or near-client devices), alleviating the CC resource requirements by conducting work near the clients. This approach proves to increase efficiency when dealing with IoT architectures [22], as significant amounts of data tend to be generated at the edge of the system, which incurs high bandwidths costs. When applying Fog Computing to such scenario by aggregating data before transport, bandwidths and storage costs of the CC infrastructure are significantly reduced. Another reason to consider such an architectural approach in the scope of IoT is mobility. In a CC centralised architecture, service mobility is restricted by the latency towards the Cloud providers. When elements often change geographical location, unless the Cloud infrastructure is geographically distributed (and even so, the client-server latency can exceed the requirements), effective node mobility is limited. By adding strategically placed near-client nodes, Fog Computing devices can potentially move (e.g. smart city, smart car) without suffering unexpected latency increase (full device mobility not yet supported). Those extra nodes can also extend the system by adding direct peer to peer connection (lower latency than CC centralised architecture) and data aggregation capabilities. In the following image (Figure 1.3), a high-level smart city reference architecture is exemplified, following both CC and Fog Computing approach. Once some of the advantages have been presented, it is equally important to highlight the challenges that this paradigm needs to face to succeed. The choice of virtualisation approach, the previously mentioned issues with latency, security and network management, constitute a relevant sample [23] of those challenges. The chosen virtualisation approach will significantly impact the performance of the Fog Nodes, for either container or hypervisor-based. Containers provide lightweight OS-level virtualisation, which eases the provision and deployment of solutions, and at the same time offers fine-grained security and portability. However, if a container based solution is selected, OS level flexibility is lost. A node will not be able to run multiple operating systems. Due to that restriction, hypervisorbased virtualisation is encouraged. Security represents one of the most significant challenges for Fog Computing, as it has to be taken into account at every layer of the design-stack. In contrast with CC, Fog Nodes face new vector threads not existing on an adequately managed Cloud infrastructure [24]. As an example, Fog Nodes are susceptible to man-in-the- 6

12 Figure 1.3: Smart City, CC on the left, Fog Computing approach on the right. middle attacks, privacy issues and security issues due to authentication at different gateway levels. For mitigation, public key infrastructure (PKI) and homomorphic data encryption are recommended. Lastly, network management must be addressed by applying software management techniques to networks (e.g. network function virtualisation or softwaredelivered networking). Nonetheless, the integration of such methods is not an easy task and still constitutes an open research area Edge Computing Edge Computing (also referred as Pure Edge Computing) [25] can be understood as a natural evolution of the approach proposed by Cloud Computing and Fog Computing (a further iteration of the FI Net Infrastructure models), as the edge devices become more and more capable. In the context of IoT and FI, Edge refers to the devices that are located close to the sources of data. Those devices can be very heterogeneous, such as SCADA controllers, wind turbines, data aggregation nodes, or temperature sensors. while sharing the characteristic that they are usually located far from the centralised computing nodes (namely Cloud). The Edge Computing paradigm is based on and composed by multiple technologies and approaches, namely: peer-to-peer networks (as edge devices can directly connect), overlay networks, everything as a service (XaaS) concept, Grid Computing and the aforementioned Fog Computing. In comparison with Fog Computing, which performs partial Cloud offloading by aggregation and precomputation, Edge Computing pushes the computing boundaries even further from the central Cloud, by offloading into the Edge devices. This lowers the architectural importance of a central computing core (Cloud providers, in the case of CC applications). It also reduces the technological vendor-lock-in for Cloud solutions, as the central nodes are much less critical than in CC and Fog Computing. On the following image (Figure 1.4), a high-level Edge Computing example is presented. Elements from Smart Factory, Smart City and Smart Car in- 7

13 teract with each other, each group having their own Edge Gateway and sharing a peer-to-peer communication integration. Figure 1.4: Edge Computing architecture. The features provided over more traditional CC solutions are evident, but it is worth mentioning that until recent years, the access to Edge Computing solutions was restricted to big companies. Due to the continuous cost-drop of sensors and computing power, the growing amounts of machine and environmental data, and the raise in machine learning and data analytics, Edge Computing has been increasingly adopted. The peer-to-peer (vastly decentralised) architectural approach adjusts well to contexts such as FI content-centric applications. Enabling the services (edge nodes) to conduct direct connection with minimal central dependencies, reduces peer-topeer latency, and by gathering meaningful data, enables to generate contextualised service-groups. This becomes extremely useful once the application use case shifts towards near-real-time data streaming, for which centralised CC solutions will eventually present bandwidth and latency limitations. Peer to Peer Networks Peer to Peer networks [26], also known by the acronym P2P, refer to networks composed of a set of computing nodes directly connected to the Internet. It does not rely on a centralised server for routing the traffic or communication, providing substantial scalability benefits. On FI environments where applications are entirely decentralised, this approach proposes a direct communication between the distributed nodes. Peers in a P2P network have equal rights and can at the same time produce and consume data (also referred as prosumers). Even when sharing computing resources, nodes can be composed of heterogeneous devices with widely ranging capabilities, as long as they all access the P2P network over a unified API. 8

14 A great example of P2P at scale is Napster. Created in 1999 by Shawn Fanning, enabled clients to share their music files and access the ones of the network peers. That said, P2P networks can be applied to a range of use cases, such as instant communication, secure browsing, P2P payments and recent use-cases such as blockchain. To manage P2P networks over heterogeneous communication technologies and routing configurations, overlay networks are leveraged (FI s common foundation, network infrastructure). Overlay networks provide an abstracted view over the physical network topology, where the network nodes are connected with logical links rather than several physical links. They enable large-scale data sharing, content distribution, efficient searching and selection of nearby peers, among other features [27]. In the context of FI and highly distributed systems, they provide an abstraction layer from the underlying ever-changing network topologies. Emergent P2P overlay networks can be divided into two main groups: structured and unstructured. Structured overlay networks relies on Distributed Hash Tables (DHT) for keeping track of the available resources (or data objects). DHT maps every resource to one unique key, and those keys are distributed deterministically over the overlay network. Scalable storage and retrieval are backed by key/value pairs, and when a retrieval or modification operation is triggered, the request is routed to the peer that contains the appropriate key. Due to the deterministic distribution and key/- value mapping, any file can be located on O(log n) overlay hops. However, due to the potentially high peer to peer lookup latency (linked to the structural differences between the overlay and P2P network distribution), performance can be negatively affected. Structure networks also incur a higher computation overhead than unstructured networks for popular resources. Chord [28] is a structured overlay networks example. Unstructured overlay networks are composed of peers that hold no prior knowledge about the network or neighbouring peer distribution. Queries are distributed by flooding mechanisms, where a peer response to one of such queries by pointing out which of its content matched the request. While this technique is suitable for retrieving popular content within the peer network, it is not well suited for finding rare content. It incurs lower overhead than structured networks for popular content, but the load on each peer scales linearly with the system size and query number. When significant system increases and aggregated queries occur, peers become overwhelmed, and the system can saturate. Gnutella [29] is an unstructured overlay network example, which provides distributed file search and download capabilities among peers XaaS On the FI environment, where everything (every resource: business service, content or peer) can be considered as a service, the XaaS term emerges. It is the outcome of the combination of the previously presented four FI pillars: Internet of People, Internet of Knowledge, Internet of Services and Internet of Things. By wrapping resources in computational abstractions, on-demand interactions are enabled between users and resources, moving the focus point away from infrastructure and operations [30]. IoT leverages a myriad of distributed heterogeneous devices abstracted as services for composing applications and bringing value to end-users. 9

15 This is crucial if the full potential of FI is to be exploited. Data is gradually completed with information (metadata, tags, annotations) and means to deliver that knowledge efficiently is required. The existing services are ubiquitous, and the aforementioned raw knowledge is not enough, as reasoning over it will bring many benefits. Web services have historically been the key technology for the delivery of services. The main challenges that this paradigm faces align with the ones from FI, namely: service composition, service design, crowdsourcing-based reputation and IoT. Establishing seamless discovery and composition over services of different nature is critical, mashing-up resources to provide meaningful services to end users, aligned with their preferences and interests. Information completion over services and service discovery will be covered more in-depth in subsequent sections of the report, along with service annotation and service mash-up (grouping) management Service Discovery When engineering distributed applications in contexts such as FI and especially when paired with paradigms such as XaaS, it is crucial for resources to be accessible and discoverable in an automatic manner. In some aspects, service discovery works similar to a Domain Name System (DNS), although DNS server and protocols are more tailored for the routing of static resources than service discovery in the FI context. As mentioned in previous sections, FI presents a heterogeneous execution environment that enables emergent composition of applications by opportunistic aggregation of resources [49]. Services must be first property annotated and their capabilities contextualised. Contextualization refers to machine-ready data understanding, and resource correlation which minimises resource management overhead. Apart from the FI use case, service discovery protocols have been broadly developed by multiple vendors and applied to numerous fields (e.g. [50]): Universal Plug and Play (UPnP, from Microsoft), Bluetooth Service Discovery Protocol (SDP), Kubernetes Service Discovery, and Zigbee Device Discovery. As different service discovery protocols apply different technological approaches and terminologies, a common conceptual classification of components must be first conducted. In further sections of the report, the features and advantages of multiple service discovery approach on pervasive environments will be analysed, using that standard conceptual classification. In the next image (Figure 1.5), we provide a graphical representation of those principal components. Following the image, the protocol components will be briefly described from the bottom up (from low to high levels of abstraction in the stack). First and foremost, the services and their attributes must be named. Clients will use that name for conducting further service search operations. Such naming (also referred as annotation) cannot be undertaken manually at scale, as the manual annotation overhead becomes too big. Template-based approaches are followed to solve this issue, defining an initial format for service names and attributes. Also, predefined namings (Universal Unique Identifier, Bluetooth SDP) can be used. However, those methods scale poorly when dealing with a genuinely heterogeneous and pervasive resource pool such as in IoT. 10

16 Figure 1.5: Service Discovery protocol components. After the services have been named, an initial communication is to be established. Based on the network and system configurations, different approaches can be taken, namely: unicast, multicast and broadcast. Unicast is the most resourceefficient, although it requires manual configuration (network address registration) of all the peer nodes. Multicast allows to dynamically identify the unicast addresses of the available services after a few messages. Broadcast has the same advantages as multicast but is often restricted to one network hop. Once the network protocol has been set, the discovery and registration process takes place. The most common approaches are query and announcement-based discovery and registry information. In the query-based method, a query resolver (often a designated system agent) processes the query-request and response right away. It takes care of the computing work for keeping track of the different service lifecycles, offloading it from the clients. On announcement-based approaches, interested clients listen to the broadcast of available information regarding the discovery and registration, may it be to notify that new services hop on, off, or other relevant changes occur. With regards to the discovery infrastructure, two approaches are taken: with dedicated discovery components (directory-based) or without (nondirectory-based). The main difference between the two is that dedicated components act as a query resolver by keeping and updating tracks of available services. When non-dedicated components are used, each service becomes its query resolver and has to respond if the request has matched. The status of those services is often kept as soft state after a period a service needs to update its state, or it will become invalid. However, maintaining state as hard-state is also a possibility, but as services are not auto removed from registries after a certain threshold, clients have to query the resolvers more often to validate that their available-service records are updated. 11

17 As clients discover and consume services through the resolvers, and the systems where the service discovery operates grow, it is a good practice to add scoped discovery. Network topologies, user roles, context information and combinations of the previous are often employed for such purpose. On top of the scope, service selection options are added. The selection option can be either manual (prompt the user) or automated (weighted preferences), potentially allowing for advanced automated matchmaking. After the scoping and matchmaking approaches are selected, service invocation and usage are defined. Invocation can be provided at different abstraction levels: network-address (lowest level, clients must choose the interaction protocol), communication protocol (RPC, XML, HTTP...) and application operations (highest abstraction level, specific to custom application domain). With regards to service usage, explicit or lease-based options exist. In explicit usage scenarios, the client must actively notify once the used resource can be freed again, and on lease-based, a service usage period is negotiated between client and service, allowing that timeframe to be cancelled, or expressively extended by further negotiations. Finally, how the service updates the status should be defined. Polling and eventbased notification (clients subscribe to updates) are the most common approaches to this. The choices made on each level of the stack will have a direct impact on the choice of technologies and suitability of architectures for the system context in mind. As often occurs in software engineering, there is no silver-bullet for service discovery that solves all discovery use cases. On subsequent sections of the report, the different approaches for service discovery in FI scenarios, especially with semantically annotated services will be examined, and a reference architecture proposed Current Status of PRIME As mentioned before, PRIME [31] has been chosen to contextualise the architectural extension over an existing framework so that the extension concepts can be applied in other systems. The detailed architectural extension will be provided. The current status presentation is composed of the review of the general features and high-level architectural view. After assessing the status, we introduce a conceptual base upon which to propose the architectural extension. The PRIME middleware also referred to as the PRIME approach, defines an architectural style based on modelling and programming abstractions to uniformly represent resources and develop FI applications with opportunistic resource aggregation. To cope with the challenges that the FI environment presents, PRIME enables resource abstraction and resource contextualization. We will briefly describe them from a high-level perspective. By resource abstraction, it supports the interaction between different types of resources. To provide that uniformed abstraction layer, it uses P-REST (Pervasive REST). The nodes that conform the PRIME architectures are considered as prosumers, both resource consumers and producers. A resource is anything (referred to as service on the FI context) exposed through the network, may it be data or a business process. In contrast with REST, P-REST provides asynchronous message passing capabilities, distributed DNS, lookup service and Observer-pattern based 12

18 coordination model. Within P-REST, the PRIME resources are identified using an abstract and concrete URI. The abstract URI (auri) is directly mapped with the application-context ontology and defines resource type. The concrete URI (curi) represents the resource type instance itself, unique per each resource instance and is used to access resources within a deployed PRIME system uniquely. Once the mechanisms for resource exchange have been set, resource contextualization is tackled. In PRIME, both the application context ontology and resource semantics are defined with RDF: subject, object and predicate triples. At a high-level, the architecture of PRIME can be divided into four main sections: the low-level wireless communications, the communication layer, the API programming layer and the user application. The architecture is detailed in the following image (Figure 1.6) and each layer is shortly explained below. Figure 1.6: High-level view of the current PRIME architecture. The wireless communication layer sits at the lowest abstraction level and set the base for the PRIME middleware. It represents the communication level heterogeneity present in FI environments. The communication layer encapsulates the heterogeneity of FI environments, by providing a distributed asynchronous messaging mechanism. The middleware of choice for that purpose is RabbitMQ, the most popular open-source message broker at the time of writing. Each prosumer instantiates a RabbitMQ broker at runtime, joining an application-wide federation (a cluster-less and loosely-coupled deployment mode of nodes). Depending on the resource types that each prosumer provides, multiple messaging channels are generated (one per each resource type) and both resource producer and consumers are subscribed to them. Those enable point-to-point and point-to-multipoint messaging between prosumers. It is important to mention that the communications between PRIME resources are conducted over HTTP, following the REST principles. We will later elaborate on how the RDF matchmaking is accomplished, on the API programming layer. The API programming layer uses the capabilities provided by the communication layer to enable the aforementioned Observer-pattern, where a subject (prosumer) keeps a list of subscribed (also referred as the dependant) observers (also prosumers) that are notified every time the state of the subject is modified. Prosumers can directly access a set of resources by their abstract URIs and particular resources by concrete URIs. Those URIs are retrieved using the lookup service, fully distributed and RDF-inference based. The inference method used is forward 13

19 chaining (data-driven), better suited for dynamic situations that are likely to change than backward chaining, as new data can trigger new inferences. When nodes enter the system, they are added to a specific messaging channel inside RabbitMQ (the "broadcast" channel), which enables the spread of the lookup query. On the upper section of the API programming layer, the PRIME application lies. A PRIME application is a logical construct build on top of the middleware, where PRIME resources are declared, which also serves as the entry point. The PRIME resources are the capabilities and services that the application exposes, which other applications will access (e.g. business logic or assets). Those resources are annotated based on the application-context knowledge, stored on an RDF ontology replicated among all the instances. The end user application and application-specific business logic are not bounded to the previously mentioned architectural layers, but access the capabilities through a thin interface layer, provided by the API programming layer. By doing so, the application specific code is loosely-coupled to the underlying PRIME middleware. To finalise with the explanation of the current status of PRIME, we will briefly define the insides of the PRIME application initialisation flow. This will help relate together the different layers of the architecture with the actual system behaviour and expand the context for the architectural extension to formulate. The initialisation of a PRIME application goes as follows: 1. Instantiate the PRIME app. Fire up the JVM and load the dependencies. 2. Load the application-context ontology and load the resource descriptions (ontology descriptions) for each one of the resources of the application. 3. Parse every loaded resource description and extract the relevant information, such as auri, curi, QoS and contextual information. 4. Set up the lookup resolver by resource description, with OpenRDF Sesame. It will resolve the SPARQL queries propagated through the system. 5. Instantiate the asynchronous message broker RabbitMQ in federated mode. 6. Subscribe the application to the system-wide broadcast communication channel "broadcast". 7. Per each resource type that the application exposes (represented by an auri and mapped to a class inside the application specific ontology), it subscribes the PRIME application to the specific channel. Each channel is identified by the auri defined on the design-time application context ontology. If the channel is non-existent, it is created. 8. Once the resource subscription and creation of RabbitMQ channels, the application is ready to operate. Is important to note that the creation and binding of communication channels are conducted on-runtime, as the application encourages dynamicity. As soon as an application leaves the system, it will be removed from the groups it belongs to. 14

20 1.2 Motivating Scenario After presenting the primary motivations for researching in the area of FI, we will introduce a motivating scenario. The scenario will serve to exemplify a concrete use case inside the myriad of applications conceivable within the FI paradigm. It will also serve to draw an evaluation frame upon which to assess the state of PRIME and the quality of the architectural extension that we will introduce, I-PRIME. The proposed motivation scenario covers the main challenges present in content-centric FI applications, briefly mentioned before. That said, the motivation scenario does not focus on providing the ultimate use case covering all possible content-centric applications in FI. Considering the the main challenges, it defines a specific scenario which covers them. The research questions that we will introduce in the following "Problem Statement" section aims to provide answers to those cross-cutting concerns that FI applications have to face. Due to the broad set of Future Internet potential use cases, providing a concrete motivating scenario from which to extract functional requirements and engineer a solution, allows us to narrow down from the conceptual to a more technical level of detail. That said, it is important to highlight that we will design the architecture with a strong focus on extensibility, to ease its adaptation to other FI scenarios, and scalability. Both aspects will be covered in the "Evaluation" section Social Library This section introduces Social Library, a Future Internet content-centric application example. Social Library is an open-source, open-access and distributed copyright-free media-sharing platform. It aims to democratise the access to learning and publicaccess to content. Instead of keeping a centralised architecture with a server storing all the content, it is built around the idea of collaboration. Users of the system share their copyright-free content through the app and gain access to the content provided by others. It is important to note that the access does not imply content-sharing, even if recommended for the optimal functioning of the system. The whole system is constructed around the ideas of content-centric (applications driven by its contents) and total distribution. As previously mentioned, the system will not store the resources in a centralised manner but leverage the computing resources available in the client nodes (Social Library nodes). The resources (e.g. books, audio files and movies) must be annotated so that they can be retrieved, shared and managed autonomously. As the system grows with more clients and presumably more heterogeneous resources, the previously mentioned annotations will be leveraged to allow users to subscribe to selected resources from peers. Ideally, users should first try to access a replicated resource from its near neighbours, as latency is most likely to increase otherwise. The annotations will also enable the grouping of users sharing the same interests (e.g. biology videos or terror novels), so the additions and modifications of resources from such category will trigger efficient broadcasts through the group. To maximise the scalability, users shall be capable of belonging to more than one group at the same time, with autonomous live-cycles. By grouping users based on their interests (either by the provided content or consumed content), dynamic scalability is achieved, as lower level interest groupings can potentially be governed by higher level geographical-region groupings if the system extension requires so. 15

21 The functional requirements for Social Library are as follows: R1: Users of Social Library may access the system through personal computers, smartphones or tablets. R2: Users may join/leave the system dynamically. R3: System users are autonomous, and the running system-instance has no prior knowledge about them. R4: The types of resources shared through the system vary over time. R5: All the users connected to the system must be discoverable and reachable by other system users. R6: Users can define their interests towards resources. R7: Users are aggregated into emerging communities based on common interests. R8: Users within the same community opportunistically interact with each other, exchanging group-relevant information. After defining the set of functional requirements that Social Library must cover, we have identified some cross-cutting concerns relevant to FI application scenarios and this scenario in particular. Those concerns are directly linked with contentcentric application challenges, and settle a fundamental base for the research question formulation, as we will try to bridge those concerns by knowledge acquisition and a design-level architectural extension formulation (covered in detail in the "I-PRIME" section). The cross-cutting concerns are presented as informal openquestions, upon which we will construct the research questions. The later proposed architectural extension will cover them to enable interest-based content sharing. As the number of services available in applications grows, is it possible to categorise and attach knowledge information automatically to the services? If so, is there any de-facto standard approach? To which degree does each service supports extensibility (both manual and automatic)? How to deal with high device mobility, especially with regards to discovery? Is it possible to enable the discovery of unforeseen resources at runtime, in distributed environments? If so, which are the main approaches to follow? Aside from the application service annotation and management, is it possible to represent in a machine-readable format the interests of the application users? If so, combining it with the service annotation, is it possible to create communities of users at runtime that enable opportunistic service collaboration in purely distributed computing scenarios? 16

22 1.3 Problem Statement Starting from the premise that Future Internet applications heavily rely on data and that such data can become dynamically available through services, it is safe to assume that everything could potentially be a service (that provides, consumes or processes data) [31]. Several solutions [32] have been proposed to partially cope with the challenges that such environment present [33]. However, the coupling with a central Cloud Computing entity is still high, limiting the future-proofing of such solutions (with regards to the potential bottlenecks and dependence with the private Cloud). By complementing the features of such systems with event-based P2P communication [31], Future Internet applications can be deployed with minimal dependency towards private Cloud vendors and reduced latency (by directly accessing neighbouring resources). The main focus of this thesis project is not to develop from scratch yet another Edge Computing platform, but to contribute by addressing the importance of enabling content-centric Future Internet applications, by providing a solution to annotate services dynamically and allow content-based service collaboration. The PRIME approach [31] will serve as a starting point, widely extending its architecture for the features/functionality mentioned above. The main reasons behind the choosing of PRIME are the extensive documentation regarding its architecture, and that the framework is still under active development. That said, even if the proposed architecture extension is presented inside the current PRIME architecture, the extension concepts are framework agnostic and thus applicable to other frameworks/architectures. Lastly, the non-functional requirements mentioned in the motivating scenario (scalability and extensibility) will be considered at the extension level, not at the level of the underlying framework. We have named the extension I-PRIME, standing for Interest enabled PRIME. In order to frame and evaluate the impact of the architectural extension of PRIME, we will use the set of functional requirements extracted from the previously introduced "Motivating Scenario". By extracting the research questions from the requirements and context that such scenario presents, it also allows us to evaluate the validity of the architectural extension. That said, it is important to note that we will design the extension with the support of other motivating scenarios in mind. The results and compliance of the extension over those requirements will constitute the later "Evaluation" section of the report. In the following list, we present the research questions. We have elicited them from the active lines of research in FI, with the help of the Social Library motivating scenario. After the list, we attach a brief description to each research question, clarifying its intent and linking relevant resources of interest. RQ1 How to provide an extensible semantic annotation of services? RQ2 How to implement a semantic-aware service discovery in Pure Edge Computing environments? RQ3 How to enable interest-focused service organisation through semanticbased dynamic groupings on PEC scenarios? The first research question focuses on providing a state of the art analysis of the different approaches to service annotation, by focusing on extensibility and machine 17

23 readability. One approach will be recommended from the pool of options, to be used in later RQ2 and RQ3. The second research question builds on top of the previous research question RQ1. Semantic-aware service discovery has been developed so far [34][35][36], but a deeper analysis of alternatives for conducting it on Pure Edge Computing environments is needed [25]. The questions focus on addressing the validity of different architectures for such discovery in Pure Edge Environments, iterating through topics such as distributed data messaging, annotation negotiation and service extensibility. The third and last research question focuses on semantic-based dynamic grouping, which stands for how to enable self-adapting application service groupings based on their content. Self-adaptive systems are systems that evaluate and change itself according to a set of metrics. The content of such services is of heterogeneous nature. Thus annotations will need to be merged and modified at runtime. However, this research question focuses on the feature of dynamic grouping. Rule-based groupings have been previously implemented [37], but the generation of groups based on complex interest-rules needs to be addressed. By interests, we refer to the feeling of attention towards something, and by grouping, we mean opportunistic runtime aggregation of services. The outcome of this research question partially relies in the outcome of RQ1, as the grouping system may differ depending on the annotation mechanism. Complex networks within PEC could provide mechanisms to permit interest based self-managing communities [38]. 1.4 Contributions The contributions of this project will be centred around providing a deeper understanding of the multiple aspects involved in the design and architectural approaches necessary to build FI content-centric applications. Firstly, we will conduct an extensible state-of-the-art review of the main topics. Secondly, a design-level reference architecture will be presented, covering the areas of the research questions and considering some cross-cutting concerns. Finally, by leveraging the motivating scenario requirements, the reference architecture will be evaluated. In the process, the reference architecture PRIME [31] will be analysed and identify areas of improvement to cover the previously mentioned concerns. By doing so, we will extend such architecture and guide further development, if FI content-centric apps are to be supported. To cope with the broadness of the topic and multiple scenarios where the reference architecture could be applied, we have proposed a specific motivating scenario. However, the set of requirements provided by it will also be shared by many applications to run on FI environment following a content-centric approach, providing a stable conceptual and architectural base to extend. This will ease further extension of the proposed approach and help to contextualise it by using a concrete scenario. 1.5 Target group The primary target group of this work are the researchers interested in expanding their knowledge in the area of content-centric FI applications, by covering many of the main concerns and challenges shared by such apps. The proposed reference architecture will also provide a general starting point for developers interested in implementing such systems, as the reference architecture will narrow down. Is 18

24 also worth mentioning that the current researchers and developers involved with PRIME will also benefit from the outcomes of this work. As previously mentioned, the PRIME framework is under active development and adding support for interestenabled FI content-centric applications is under the active concern of the developers [38]. 1.6 Report Structure This thesis report analyses and reviews the existing challenges of content-centric FI applications, and proposes a reference architecture extension for such applications over the PRIME framework. We will explain the followed methodology in the second section ("Method"). The analysis and answering of the research questions over a theoretical approach are conducted over a literature review, whose results are presented in the third section ("Literature review"). In the fourth section ("I- PRIME"), the current status of PRIME and the proposed architectural extension are detailed. Then, I-PRIME is evaluated over the requirements of Social Library and the results are discussed in the "Evaluation" section. Finally, the conclusions and future work are outlined, covering all the conducted thesis work. 19

25 2 Method In this chapter, we will describe the applied scientific approach followed to answer the research questions. The method will be described, along with its reliability and validity assessment. 2.1 Scientific approach To correctly answer the previously introduced research questions, we have conducted an extensible literature review about the different characteristics of the Future Internet, the implication of content-centric apps and the different layers that compose such applications. Those components range from extensible and machinereadable service annotation mechanisms, service discovery on distributed computing environments, and resource-grouping based on interest definition. We analysed multiple resources for each one of those layers and acquired the relevant knowledge. To compare the different approaches for each one of those layers, we chose a qualitative research approach. We compared the different approaches for each one of those layers, namely the relevant frameworks and the main characteristics. To generate a meaningful reference architecture extension and provide a useful view of the areas of interest when building content-centric FI applications, we analysed state of the art and extracted the relevant cross-domain information. After proposing the architectural extension and contextualising it on PRIME to cover the areas of interest defined by the research questions, the validity of such extension will be assessed leveraging the functional requirements of Social Library. This assessment will be qualitative. As mentioned in the introduction, the motivation scenario contains the main challenges in the topic area of the thesis, providing a solid base. The goal is to determine the main issues and propose an extension; the aforementioned motivating scenario is leveraged. 2.2 Method description As we previously mentioned, the research conducted for this report has been mostly structured as a literature review. It was chosen, taking into account the broadness of the topic and the quantity of available material. Aside from the literature review, the compliance of I-PRIME will be qualitatively assessed against the set of functional requirements of Social Library. The two methods will be described in detail in the following subsections. As a reference, we provide a high-level method work-flow schema in the following image (Figure 2.7). As described in the upper image, the thesis method workflow looks as follows. First, the thesis domain was chosen with the help of the supervisor, and a motivating scenario proposed, from which to extract the set of functional requirements. After that, the problem statement was defined and the research questions formulated. The literature review was conducted after that, covering the areas and issues presented by the research questions. Once the literature review reached the desired level of completion, the reference architecture extension over PRIME was drafted. Finally, the draft was assessed with regards to the fulfilment of the set of functional requirements presented by the motivating scenario. Even if the workflow is presented as a linear process, iteration has been present through all the process, by reshaping the requirements, research questions, literature review scope and the architectural extension itself. 20

Figure 2.7: High-level method workflow schema. 2.2.1 Literature Review As previously mentioned, we acquired the relevant information to answer the presented research questions through the literature review.

26 Figure 2.7: High-level method workflow schema Literature Review As previously mentioned, we acquired the relevant information to answer the presented research questions through the literature review. Due to the time constraints of the thesis and the broadness of the topic, we had to exclude the option of conducting a systematic literature review. Instead, a literature review has been conducted to gain a broader knowledge, following a set of guidelines inspired by the systematic literature review guidelines. We have chosen such method to successfully provide a framework upon which to conduct the required analysis to solve the research questions and formulate the reference architecture extension over PRIME. The literature review provides a proper meaning for exploring the reach of each research question, contextualise the aforementioned PRIME extension and identify the main methodologies. In the following image (Figure 2.8), we introduce the primary building blocks of the literature review strategy in an ordered fashion. Those blocks are represented sequentially, although the process has been executed iteratively. As the review advanced, we refined the approach and selection of reference papers. Figure 2.8: Building blocks of the literature review strategy. The first and foremost block of the strategy to define the review protocol was to propose the research questions. Those have been introduced in the previous section "Problem Statement". Second, the selection criteria for the review were set. In the case of this thesis, the selection criteria have been iteratively constructed and involved choosing the primary sources for scientific publications (mainly Google Scholar, IEEE, Lnu OneSearch and ScienceDirect) and a set of core papers from which to kickstart the research. The core papers are referenced in the brief explanations below the mentioned research questions. Such papers have been chosen by applying snowball techniques to the most relevant outcomes for the key phrases, and following experts advise (professors and supervisors). 21

27 Third, we set the search strategy. We chose a set of relevant key phrases per each research question, namely: RQ1: "service annotation", "extensible service annotation", "extensible resource annotation", "machine-readable annotation". RQ2: "service discovery", "service discovery in edge networks", "semantic discovery on distributed computing". RQ3: "interest definition", "semantic interest definition", "dynamic service composition", "opportunistic service grouping", "distributed service grouping on fully distributed networks", "semantic-based grouping", "dynamic grouping", "service communities". Once we defined the search strategy, a list of resources was acquired. This part of the literature review has been strongly iterative, as we have taken the advice of the thesis supervisor, Linnaeus University professors, new publications and the incremental discovery of resources into account. It is important to mention that the publication date of the resources has also been taken into account. We considered general view and high-level architectural blueprints regardless of their publication date, but for the framework and technological assessment, papers with a publication date before 2000 were excluded. This has been helpful, as it narrows down the focus of research to a set of more current and updated technologies, which are also more likely to be supported by community and creators. After selecting the resources, a synthesis of them has been made. The abstract, introduction where analysed first, and if the paper seemed relevant, then it was read thoroughly. All the read papers have been managed using Zotero, a bibliography management system, and annotated using Pdf Foxit Reader. Zotero allows the creation of organised collections of research material, which has been exploited by separating the resources by topic and relevance. Zotero also provides a browser plugin which allows for one-click export of references to a collection and automatic annotation of scientific resources. The direct outcomes of the literature review are presented in the "Literature Review" section, covering the technological and conceptual background of each of the aforementioned research questions. Aside from the results included in such section, more acquired knowledge has been applied to the background analysis and design of the architectural extension over PRIME Qualitative Evaluation of I-PRIME Apart from the core method of literature review, we conducted a qualitative evaluation of the architectural extension of PRIME. Once setting up the motivating scenario and extracting the theoretical base for answering the research questions, we will present the architectural extension of PRIME. As the extension is proposed at a software design level, a qualitative evaluation approach is prefered rather than quantitative. That said, the cross-cutting concerns of extensibility and scalability will be analysed at the design level, considering the computational complexity properties of the chosen architectural extension components. The functional requirements extracted from Social Library will serve as the evaluation scale for the extension. They cover what an FI content-centric application 22

will require and help contextualise the extension over a real-world use case. The flow of the evaluation is framed in the next figure (Figure 2.9). Figure 2.

28 will require and help contextualise the extension over a real-world use case. The flow of the evaluation is framed in the next figure (Figure 2.9). Figure 2.9: Qualitative evaluation of PRIME extension. 2.3 Reliability and Validity In the case of this report, reliability means if other researchers would get the same research outcomes and reach the same conclusions when executing the same work. Linking back to the previously mentioned extension of knowledge to cover and limitations for conducting a systematic literature review, the negative reliability impact has been mitigated by conducting the literature review under the aforementioned strategy. For the qualitative assessment of the PRIME extension, researchers are most likely to draw the same conclusions, as each of the research questions which is supported by the extension, has been analysed following the key phrases, inclusion/exclusion criteria and sources for scientific publication. However, if other research initiatives consider a radically different set of challenges in the area of content-centric applications, the research questions would differ, and the analysis of the architectural extension would be executed from another perspective. Thus providing different results. That said, with the proposed motivating scenario we aim to provide a solid base, not meant to cover all the use cases or systems to be developed in this area. With regards to the validity, internal validity and external validity have been taken into account. Internal validity takes into account if the drawn results follow the collected data. The results of the literature review are condensed into the Literature Review section of the report. To minimise the possibility of bias, we have conducted the literature review following a rigid strategy. However, the presented architectural extension over PRIME proposes a more significant challenge when it comes to assessing the internal validity. The architecture aims to bridge the gap encountered between the current version of PRIME and the context that the research questions frames. The extension will be assessed for fulfilment against the requirements of the motivating scenario, but it does not constitute the only possible approach for an extension. That said, concerns such as service discovery or grouping are likely to be relevant to a majority of content-centric systems. External validity refers to if the generality of the results is justified. On the context of this thesis work, this is harder to ensure, as different literature review outcomes could be achieved if different scientific publication search engines or languages (such as Chinese) were to be applied or included. With regards to the external validity of the architectural extension over PRIME, it has been assessed with 23

29 the help of the motivating scenario s functional requirements. When implemented, the architectural extension is to be implemented as an addition to existing solutions. This may generate performance issues, but as mentioned when evaluating the crosscutting concerns, there are no significant limitations at computational complexity level. This could be assured once a reference implementation is completed. 24

30 3 Literature Review In the following section, a state of the art review will be provided on relevant topics for enabling content-centric FI applications. We have extracted the content from conducting a literature review, detailed in the previous section. The outcome is divided into three main sections, aligned with the proposed research questions, namely: semantic annotation of services, service discovery and interest-based grouping. Each section will first define why the topic is relevant inside the report s context, and then present the current state of the art. Due to the extension of the available material, it is worth mentioning that this section covers in detail each topic, but does not aim to be an exhaustive analysis of all the existing approaches. Papers executing partial literature surveys have been leveraged as part of the literature review, but the literature review does not focus on the comparison of the information present them rather than extracting the relevant information to better answer the previously defined research questions. 3.1 Semantic Annotation of Services As the amount of multimedia content and data grows consistently [1], it has become increasingly important to enable the automatic annotation of it for further usage. Such annotations shall enable capabilities such as automatic indexing, retrieval and content reuse. In the context of FI, it is necessary to highlight that any device connected to the Internet potentially constitutes a distinct data source. In such case, the services that provide data are to be annotated in machine-readable format, with a strong focus on extensibility. In the next section, the primary alternatives for service description will be analysed and evaluated, regarding the machine and human readability and their extensibility. We will also consider the state of the art analysis of the current approaches and standards followed by researchers. Extended research has been conducted on the areas of resource annotation in multiple contexts [39][40][41]. However, such cases do not consider the challenges of running on PEC scenarios [25] and while considering the importance of the machine-readability of annotations, it is also important not to let aside the human readability and software support. Such annotations must allow human supervision and reasoning over the represented content. Therefore, the readability (also referred to as syntax) of the annotations provided by the different description frameworks will also serve as an evaluation angle. Apart from the machine-readable compliance of those annotations, the evergrowing content of the Internet is continuously pushing the boundaries regarding data formats, contents and existing resources as a whole. This trend proves infeasible to frame a stable reference architecture for what the annotation of exposed resources over the Web may look in the future. Instead of choosing a specific technological approach or annotation framework, it recommended basing such choice on extensible standards that allow seamless annotation and management of new resources [8], while maintaining backwards compatibility. To enable fully functional service-oriented computing (content-focused FI apps), Semantic Web Services (SWS) [42] description frameworks can be selected as a starting point. They provide means for intelligent agents to conduct meaningful and automated coordination of services on the Web, by describing what a service does and mapping its interfaces to those capabilities [43]. Those frameworks describe 25

31 behavioural and functional properties, leveraging logic-based formalisms also referred to as ontologies. It is important to note that SWS approach is not the only valid starting point when it comes to context representation, as other approaches such as key-value, graphical modelling and logic notations have been explored [44] by literature. However, ontologies provide the aforementioned formalisation of the conceptualisation that more simple approaches fail to enable (e.g. key-value based contextualizations), and thus hold a disadvantaged position in comparison with ontologies when it comes to machine readability and extensibility. As mentioned before, the semantic extension of Web Services is achieved by description frameworks. The report will provide a general overview of the most relevant approaches, namely: RDF, OWL-S (along RDF), WSML and SAWSDL. No framework choice will be made in this chapter, as a reference architecture will be formulated later in the report. However, we will leverage the information provided in this section for making a choice later RDF Resource Description Framework (RDF) [45] is an abstract-syntax based framework for representing information in the Web. It defines a data model (abstract syntax) that links together all the RDF-based languages. RDF information is modelled as graphs of subject-predicate-object triples and express descriptions of resources. RDF was designed to provide a simple data model, formal semantics, extensible XML-based syntax and allow to make statements about any given resource. The core of the RDF abstract syntax is the previously mentioned triples. Each triple is composed by a subject (node), predicate (edge) and object (node), which altogether create an RDF graph. Such construct is represented on the following image (Figure 3.10). Figure 3.10: RDF subject-predicate-object triple. The nodes that compose those RDF triples (also referred to as RDF statements) can be of three main types: IRIs, literals or blank nodes. IRIs (Internationalized Resource Identifier) and literals represent resources (such as documents, physical entities or abstract concepts). An RDF triple denotes that a binary relationship exists (predicate) between the subject and the object. In case of statements with blank nodes, it describes that a relationship exists for the given subject, but without providing any specific resource. On the following image (Figure 3.11), two examples are provided. The first one describes that the actor Brad Pitt has a father named William Pitt. The second one describes that Brad Pitt has a father, but does not specify which resource or entity that father may be. IRIs are globally unique identifiers (inside a given RDF graph), and they are used to construct RDF vocabularies. RDFS (Resource Description Framework Schema) [46] is one of those vocabularies, which provides data-modelling vocabulary as an extension of the underlying RDF vocabulary. Its central goal is to provide 26

32 Figure 3.11: RDF triples with literal and blank nodes. means to describe groups of related resources and their relationships. Those vocabulary extensions enable great expressibility in RDF, as defining custom vocabularies provide expressiveness over custom domains (e.g. time-bounded events). In the specific case of RDFS, it semantically extends RDF with concepts such as classes, resources, properties, datatypes, and properties: range, domain, label, or comment. As we previously mentioned, RDF data is expressed as graphs composed by a set of triples. Those graphs are static sets of information, yet mutable (by adding and removing triples). One of the benefits of such data structure is that multiple data sources (graphs) can be easily combined. By doing so, content from multiple sources can be accessed as one entity, while keeping the inner contents separated. Finally, its worth mentioning that RDF and its capabilities of semantic extension (via vocabularies) has set the foundations for the development of higher level frameworks for semantic annotations. RDF has some limitations though, namely: the limited local scope of properties (range limitations cannot be applied locally only), disjointness of classes (when a resource can belong to one class only), and cardinality restrictions (e.g. when a lecture must have at least one lecturer assigned). Those limitations are a trade-off for its relative simplicity of expressiveness OWL-S OWL-S [47] describes semantic services by leveraging the W3C defined standard ontology language, OWL [48]. OWL stands for Web Ontology Language, and it was designed for the use by applications that needed to process information content, rather than just presenting it to humans. OWL is built as a vocabulary extension over the Resource Description Framework (RDF) [45], which will be described in the next paragraph, and derived from DAML+OIL Web Ontology Language [49]. One of the main characteristics of OWL is that it enables greater machine interpretability than other representation, due to the additional vocabulary provided along formal semantics. OWL itself can be broken into three sub-languages, with regards to expressibility: OWL Lite, OWL DL and OWL Full. OWL-S builds on top of the aforementioned DAML Schema (DAML-S) [50] and was created with the aim of enabling the following tasks: automatic Web service discovery, automatic Web service invocation and automatic Web service composition. However, before diving more in depth on how OWL-S intents to enable the functions above, it is essential to separate the different types of services to consider: atomic or composite. By analysing the interaction level, atomic services enclose one-time-only interaction services (e.g. request a postal code for an address), and composite services refer to services composed by multiple atomic services (e.g. e-shop checkout process service, as it may rely on inventory, , and atomic payment services). OWL-S is designed to support both types of services, even if many of the motivation tasks make the most sense in the context of composite services. 27

33 To achieve the motivation tasks previously declared, OWL-S provides an ontology covering three types of knowledge per service. What the service offers, how it is used and how does a client interact with it. Firstly, what the service provides to potential clients is defined. By presenting the "service profile" (ServiceProfile ontology class), each service s functionalities are advertised. The primary purpose of the service profile is to enable service discovery, as it provides information such as inputs, outputs, preconditions, effects (IOPEs), service name, service category and extra meta-data about the service provider and context. We have added a graphical example of the ServiceProfile properties and classes below (Figure 3.12). Figure 3.12: Profile selected properties and classes. Secondly, how the service is meant to be used is defined. By the "process model" (ServiceProcess ontology class), the service composition is set. Such formation can be achieved through choreography or orchestration, of the above mentioned atomic or composite services. Processes can be atomic (actions a service can perform by engaging on a simple iteration), simple (non-grounded simple interaction operations, used as process abstraction layer) and composite. Composite processes define the workflows required for service composition, using control flow operators (also referred to as ControlContructs) [8] such as Sequence, Unordered, Choice, If-thenelse, Iterate, Repeat-until, Split and Split+Join. Is important to note, that what a composite process defines is not what the service will perform, but what behaviour a client can perform on a given service over a set of message-passing interactions. Aside from the control flow, specific data flow (input and output of a given process) and process variables can be declared inside the OWL-S process model. To provide a high-level understanding of the presented concepts, we provide a general process model schema below (Figure 3.13), highlighting the primary relations between classes and properties. Finally, how to access a given service is specified through groundings. Those groundings are composed of the information details required to conduct the access, such as: which protocol to use, the message exchange format, serialisation approach, transport and address specifications. They compose a mapping from abstract to concrete specification of the resources required for interaction. Both ServiceProfile and ProcessModel constitute abstract representations, while Service- Grounding deals with a specific level of abstraction. OWL-S supports arbitrary groundings; however, the existing Web Service Description Language (WSDL) [51] 28

34 Figure 3.13: Process model selected properties and classes. standard is used. WSDL is an XML based framework for abstractly declaring a set of network endpoints and operations abstractly, and then bound them to the concrete network protocol and message formats. One of WSDL s advantages is the extensible nature that allows maintaining a unified endpoint description regardless of the grounded network protocols and message formats. The relation between OWL-S and WSDL is complementary, as both languages are used to cover the distinct aspects of grounding. Atomic processes are mapped to WSDL operations, the OWL-S set of inputs correspond to the WSDL message concept, and the types (OWL classes) of OWL-S input/outputs are assigned to WSDL abstract types. We exemplify this grounding in the figure below (Figure 3.14). With regards to the available tooling and software support, it is worth mentioning that because OWL-S extends W3C standard ontology language OWL and builds on top of DAMLS, many tools are publicly available. For either discovery (e.g. OWLS-MX hybrid matchmaker [52]), composition (e.g. composition planning with OWLS-XPLAN [53]) and development (e.g. OWL-S IDE [54]) of such systems. However, despite the available tooling and community support, the limited service description in practice when using OWL-DL draws some criticism, as only the deterministic aspect of the surroundings can be expressed with OWL-DL, not covering dynamic and time-impacted aspects. An extended example of OWL-S syntax is provided in the appendix section of the report, in the form of a practical example, for further reference WSML Web Service Modelling Language (WSML) [55], provides a set of semantics and formal syntax for the Web Service Modelling Ontology (WSMO) [56]. On the effort of providing semantic annotation information regarding services, WSMO has been of significant impact along the OWL-S mentioned above. Both approaches aim to enable automatic discovery, execution and composition of services. To adequately explain WSML, linked relevant concepts will be first described, such as WSMO and Web Service Modelling Framework (WSMF). Subsequently, we will introduce the WSML layers and fundamental concepts. 29

35 Figure 3.14: Graphical representation of the OWL-S / WSDL grounding. WSMO introduces a conceptual model composed by four top-level elements: Ontologies (provide terminology used to describe the relevant aspects of the domain), Web Services (computation entities accessing the domain services), Goals (client desires, to be fulfilled with Web Service executions) and Mediators (elements that solve interoperability problems). By doing so, it provides a conceptual grounding for both Ontologies and Web service descriptions. It is based on the Web Service Modelling Framework (WSMF) [57], which is then extended with formal language and ontologies. The main design principles upon which WSMO is build are the following: ontology-based, strict decoupling, web compliance (URI utilized for resource identification), importance of mediation (high heterogeneity must be handled), description over implementation (the executable technologies are decoupled from the semantic descriptions), role separation (between client needs and available services ) and description of Services over Web Services [58]. Strongly linked with WSMO, it is relevant to mention WSMF [57]. WSMF is a European initiative to present a complete framework to cover the different aspects of Web services. The primary goal is to achieve a scalable mediation service and maximal service decoupling. It is mainly composed by two projects: WSMO providing the service ontology for goal, mediator and Web service definition, and Semantic Web-enabled Web Services (SWWS) [59] acts as a description, discovery and mediation framework. When it comes to Web Service description, WSML is based on the same four top-elements of WSMO. Taking into account that the primary aim of WSML is to assess the applicability of different formalisms for SWS descriptions, it does not restrict to existing languages for such description, such as OWL. Instead, it uses formal methods to describe the goal and service semantics. Depending on the logic expressiveness, multiple variants (also referred to as layers) of WSML 30

36 are available: WMSL-Core, WSML-DL, WSML-Flight, WSML-Rule and WSML- Full. By having multiple layers, a conscious trade-off can be conducted between implied complexity (for ontology modelling) and expressiveness. Each variant will be briefly described and a visual representation of the variant stack provided. WMSL-Core is the least expressive of all the WSML layers, based on the junction between Horn and Description Logic. WSML-DL is a variant of the Description Logic that encompasses OWL. WSML-Flight provides rule language over WMSL-Core, based on logic-programming and has similar semantics to Datalog. WSML-Rule further extends WSML-Flight with Logic Programming and metamodelling elements. Finally, WSML-Full unifies the aforementioned Description Logic and Logic Programming approaches, whose semantics currently constitute an open research issue. On the following image (Figure 3.15), those layers are stacked and compared, with regards to their underlying paradigms. Figure 3.15: WSML layering After describing the approach bases and different layers that WSML offers, how services are annotated will be explained. In this case, the semantic specification is divided into goal, service capability and service interface definition. First, the goal reflects the required WSML service and can be defined by using ontologies. The ontology will provide formal semantics that will declare the service parameters and transition rules. The transition rules will formulate what should change in the global state (context). The client access rules and groundings are also defined as goals. Even if purposes are not entirely fulfilled for a request scenario, clients can match with services that partially align with the requested goals. Second, the capabilities of the service are specified. The capabilities describe state-based functionalities of the service by scope: precondition (conditions over the provided input), postcondition (service execution results), assumption (global state requirement before execution) and effect (how the execution changes the global state). Also shared variables can be defined, shared between multiple capability scopes (e.g. variables shared between pre/postconditions). Finally, the service interface is framed. It describes how the functionality of the service is achieved by coordination of multiple service providers (choreography, from the requester point of view) and communication patterns that allow that service to meet its capacity (orchestration, from the provider point of view). 31

37 With regards to the available tooling and software support, there are multiple open source software tools for developing WSML services. Examples of such tools are WSMO Studio (available as Eclipse extension), WSML Rule Reasoner, WSML DL Reasoner, WSMO4J (reference API for building SWS compliant with WSMO) and WSMX (execution environment for dynamic matchmaking, selection and service invocation). However, WSML composition planners nor fully-fledged service matchmakers are yet fully implemented. We provide an extended example of WSML ontology, web service, goal and mediator syntax in the appendix section of the report, for further reference SAWSDL Semantic Annotations for WSDL and XML Schema (SAWSDL) [60], is the first step taken by the W3C towards SWS technology standardisations. It provides a common ground for the ongoing efforts on SWS frameworks, such as the OWL-S and WSMO mentioned above. However, SAWSDL itself is not a complete technology that allows SWS automation (such as OWL-S or WSMO) but which enables extending WSDL with ontology pointers. As previously mentioned in the report, both OWL-S and WSMO provide semantic descriptions for Web services, but there was a lack of understanding in academia with regards to what precisely a semantic Web service should do [60]. However, it was agreed that providing semantic for those descriptions and building on top of WSDL [61] should be encouraged. Following those premises, in April 2006 W3C organised a group to work on the standardisation of semantic annotations over WSDL. SAWSDL was the output of such work, providing an ontology-independent semantic annotation, as it represents semantic concepts as URIs. Therefore, RDF, OWL-S, and WSMO (or another service semantic annotation framework) can be used along SAWSDL for annotation. With regards to the Web service description layers, SAWSDL is located below the Semantic description layer (service semantics such as OWL or WSML) and on the top layer of the Non-semantic descriptions level (above WSDL). While WSDL describes services at a syntactic level (how do messages look), SAWSDL allows WSDL elements to specify their semantic. The figure below (Figure 3.16) provides a general view of the Web service specification stack by layers. Figure 3.16: Service specification stack. Technology examples on the left and stacklevel name on the right. 32

38 In order to enable the ontology mapping of services, SAWSDL provides a set of syntactical constructs: modelreference, liftingschemamapping, loweringschemamapping and attrextensions. Those constructs enable two extension forms: model references and schema mappings. They link to specific semantic concepts and define the data transformations that need to occur between the messages and semantic representations. The model references, link semantic concepts with XML elements from and schema, using URI identified semantic sections. About the data transformations, two central operations are available: lifting and lowering, which permit the communication between a semantic client and a Web service. The image below (Figure 3.17) exemplifies lowering and lifting SAWSDL operations for a Web service. Figure 3.17: SAWSDL lifting and lowering flow. When comparing the standardised grounding provided by SAWSDL between semantic descriptions and WSDL with the "thin" grounding [47] mentioned in the OWL-S section, a clear improvement is achieved. With the "thin" (also referred to as incomplete) grounding, WSDL operations map to OWL-S atomic processes, while properties are linked to input/output values. The main issue of such simple grounding is that services mapped in such a manner can only be executed as direct invocations, as WSDL does not allow precondition and effects assignments to calls. With regards to the available tooling and software support, the W3C standardisation process requires every specification to be tested under implementation before full standardisation of it. Multiple open source tools are available; SAWSDL4J [62] (for service development), Radiant (Eclipse plugin for semantic annotation) or Lumina (Eclipse plugin for service discovery) [61]. Privately owned tools exist for semantic data mediation, such as the ones developed by IBM. Due to the verbosity of examples, readers are recommended to visit the official documentation for further reference on SAWSDL examples. 3.2 Service discovery After presenting the technologies for semantic annotation of services, we will examine the different approaches to service discovery (also referred to as matchmaking) on the Future Internet context. As mentioned in the background section, no single method fits all the contexts in which service discovery can be applied. Therefore a domain specific solution must be introduced. The section will mainly focus on the matchmaking approaches to service discovery, and semantic Web service discovery architectures. 33

39 In the case of this report, FI context will be used to evaluate and analyse the available options. Future Internet scenarios are focused on automation, hence it is crucial to leverage the aforementioned semantic annotation of services. When compared to traditional Web service coordination, the coordination of SWS must provide a more advanced level of automation, due to the present resource and device heterogeneity. Semantic annotation allows the modelling of applications to discover services that fulfil user-goals at runtime (emergent applications) by the aggregation of unforeseen resources. Apart from the automation requirements, FI context applications tend to be deployed on PEC environments [38], where high heterogeneity of devices and communication protocols are prevalent. To cope with that, the discovery logic must be moved towards the upper layers of the system stack. By shifting from network focused discovery towards higher-level context-oriented discovery, discovery boundaries are expanded from administrative domains and network infrastructures. Overlay networks [63] are a proven solution for achieving such abstraction Service matchmaking Similar to the semantic annotation of services, the service discovery for semantically annotated services is accomplished through frameworks. In the following subsections, a general overview of the most relevant ones will be provided, highlighting their capabilities and tradeoffs. We will conduct the choice over the reference architecture s discovery service approach in later sections of the report; once a concrete motivating scenario has been presented with a set of requirements. The semantic service discovery frameworks to analyse will be OWLS-MX (OWL-S matchmaker), WSMO-MX (WSDL matchmaker) and METEOR-S WSDI (SAWSDL matchmaker). It is worth mentioning that multiple types of service matchmaker frameworks exist: non-logic, logic and hybrid. The previous list has been selected as they each cover a service semantic annotation framework analysed in the previous section, therefore enabling a better technological match. In the following image (Figure 3.18), a broader view of the available frameworks is provided, giving a high-level categorised selection. In the image below, the presented frameworks are (top to bottom): DSD-MM [64], imatcher1 [65], HotBlu [66], OWLS-UDDI [67], SDS [68], GLUE [69], ROWLS [70], imatcher2 [71] and FC-MATCH [72]. Logic-based semantic service matchmaking approaches, execute deductive reasoning over service semantics [73]. The semantic descriptions of services are compared at either design or run-time. The matching levels [74] can be: exact, plugin, disjoint and subsume, and their definition varies based on the applied logic theory and service semantics. Non-logic based approaches, perform the matchmaking without logical inference over service semantics. In this case, the level of matching is extracted from syntactic similarities, graph matching and concept-distance computation over ontologies. The implicit semantics are exploited rather than the explicit ones. An example is DSD (DIANE Service Descriptions) matchmaker [75], which executes graph matching (returning a degree of match) over object-oriented service description state-sets. The hybrid matchmaking approach combines the previous two methods. By doing so, the generated matchmaking outcome can outperform the pure counterparts 34

Figure 3.18: Service matchmaker frameworks by categories. (either logic or non-logic based). An example of this is OWLS-MX [76] that merges text-similarity comparison with logic-based reasoning.

40 Figure 3.18: Service matchmaker frameworks by categories. (either logic or non-logic based). An example of this is OWLS-MX [76] that merges text-similarity comparison with logic-based reasoning. If the text-similarity for a given query exceeds a predefined threshold, the service will be classified as relevant, even if the logic-based reasoning has failed. Semantic process-model matchmaking is also possible, although the creators of SWS formats did not directly enable that. For example, the process model semantics of both OWL-S and WSML have not been formally defined, and SAWSDL does not provide any process model. Intuitive mapping to workflow process models can be applied, even if the generated mapping cannot be entirely complete [77]. In this approach, the expected operational behaviour of a given service is assessed, regarding data and control flow. As a part of the analysis, we will provide more in-depth information about some of the available matchmakers. The primary goal is to provide a general idea about how different approaches tackle matchmaking: OWLS-MX, WSMO-MX [78] and METEOR-S WSDI [79]. OWLS-MX OWLS-MX will be the first semantic matchmaker framework to analyse. It provides matchmaking capabilities for OWL-S services. As previously mentioned, it provides hybrid capabilities by combining logic-based reasoning with token-based syntactic similarity comparison. The following will only be executed once the logicbased fails. The main reason for choosing a hybrid approach is that pure logical reasoning does not match the reality that the Web provides [80]. The reasoning and information gathering are a resource (computation, time) bounded actions, and due to that, it is not possible to infer real optimal choices but to conduct incomplete reasoning. As previously mentioned, OWLS-MX supports logic-based matching with nonlogic information retrieval. It is focused on service Input/Output matching while ignoring the logical service specifications. The service matching is provided in degrees: exact, plug-in, subsume, subsume-by match, logical fail, hybrid subsumed- 35

by match, nearest-neighbour and fail. Each of those matching degrees will be shortly described below. An exact match occurs when a service S perfectly matches the given request to R.

41 by match, nearest-neighbour and fail. Each of those matching degrees will be shortly described below. An exact match occurs when a service S perfectly matches the given request to R. That means that the I/O signatures of both are identical. Plug-in match is achieved when the Input of the request is part of a more-specific subset of service S s Inputs. If the OWL input concepts can be mapped to similar WSDL messages of inputs, this match occurs. Subsume match is similar to a plug-in match. However, it differs from service output, as the returned service s output is more specific than the requested by the client. A subsume-by match is when request R is subsumed by service S. This occurs when the service provides a more general data-output than the request. In order to select services that provide too general data for a given request, direct parent relations are considered only. That said, depending on the application context and use case, that restriction may be relaxed, also considering the granularity of the underlying ontology. Logical fail happens when the service can not match the request in any of the aforementioned matching levels. Hybrid subsumed-by match complements the previous subsumed-by logic filter with syntactic matching. The syntactic matching is executed following a predefined text similarity comparison approach. Nearest-neighbour in a purely non-logic match, conducted by checking the syntactical similarities of inputs and outputs between the request and the service. It is only executed once all the previous matching approaches have failed. The final matching degree is the failure. As the name suggests, after all the previous filters are passed with failure, failure is the final status. Service S does not match request R in any of the matching degrees. When the previous matching levels are sorted according to semantic relevance, the following structure is created (Figure 3.19). Figure 3.19: OWLS-MX matching levels according to semantic relevance. 36

42 OWLS-MX returns a set of services that fulfil a given request based on an individual matching level. The client can set the syntactic similarity threshold and degree of matching that requires. To accomplish that, every published service and requests are classified extending a given initial ontology into the matchmaker ontology. Input and output values are mapped and simplified to that initial ontology and information about services that use those I/O are also stored. Once the logical concept mapping fails, the hybrid matchmaking kicks in, applying hybrid subsumed-by and nearest-neighbour matchings. With regards to the performance of this framework, OWLS-MX expends a significant amount of time processing the I/O of new services and registering them into the matchmaker ontology. Also, the query response times are about 10s for 582 service use cases, which prove that there is room for improvement performance wise [76]. Related to the validity of the matching results, there are false positive and negative cases in both logic and hybrid approach [76]. That said, the OWLS-MX hybrid matchmaker for OWL-S has been successfully applied in real-world use cases, proving the matchmaker validity and community support. An example of this is the ehealth and repatriation planning system Health-SCALLOPS. WSMO-MX WSMO-MX [78] is also a hybrid semantic matchmaker, in this case for services written in WSML-MX. WSML-MX is a WSML-Rule variant that enables the precondition and postcondition matching for object-oriented descriptions. Instead of a layered matching approach, recursively several different matching filters are applied and then aggregated into a single evaluation-matching vector. A global goal to fulfil is also specified, which will be further used in the matchmaking process. The algorithm executes the request against a local knowledge base, with the guidance of a client-specified configuration file. WSMO-MX considers the precondition and postcondition states of each published service from its local knowledge base and then compare them in pairs against the given goal. The previously mentioned seven matching levels are equivalence, plug-in, inverseplug-in, intersection, fuzzy similarity, neutral and disjunction (failure). To explain the different matching levels, the goal description (G) and the service description (S) will be used. In equivalence level, G and S precondition and postcondition are entirely equal. In the plug-in level, G is part of an equal-subset to S in the precondition, and for the postconditions, S is an equal-subset of G. At the inverse-plug-in level, G is an equalsuperset of S for precondition and S, an equal-superset of G for postconditions. At the intersection level, G intersects with S at both pre and postconditions. Fuzzy similarity, pre and postconditions for G and S are similar. Finally, disjunction means failure to match, caused by non-intersection of G and S. WSMO-MX internally first executes a parameter matching (derivative comparison of I/O parameters). Then, it executes the semantic matching filters: type (the degree of semantic relation in the matchmaker ontology), constraint (performed by relative query containment) and relation matching (by recursive name matching). Finally, the syntactic matching filters can be applied. They are optional as they can be compensative (if one of the previous filters fails) or complementary (to enrich the previously computed matching result). 37

43 WSMO-MX provides a more fine-grained parametrisation than OWLS-MX, and it also provides faster performance than purely logic-based matching. Property parametrised pure syntactical matching can keep up with the performance levels of logic-matching and also outperform it [81]. METEOR-S WSDI METEOR-S Web Service Discovery Infrastructure (METEOR-S WSDI) [79] is the last semantic matchmaker to analyse in this section, tailored to provide a scalable infrastructure for semantic discovery and publication of Web services. In comparison with the previously analysed OWLS-MX and WSMO-MX, the former tackles the challenge of dealing with potentially thousands of Web service registries, at a broader scope level. It focuses on Universal Description Discovery and Integration (UDDI) [82], which is not tied to any Web service description format, by adding a semantic description to existing UDDI-defined services. By using domain-specific ontologies, the implied semantics by service description structures are made implicit. To accomplish the goal of making the service scalable to thousands of unique service registries, the framework provides a decentralised approach based on peerto-peer networks and registry operator agents, which help with scalability. Services are categorised into registries based on domains, and each domain is decoupled from each other, allowing to follow distinct ontology and matchmaking approaches for each domain. The high-level architecture of METEOR-S WSDI is divided into four main sections: data layer, communications layer, operator services layer and semantic specifications layer. Clients access the registries through the operator layer, abstracted from the low-level details. The layers are displayed in the figure below (Figure 3.20). Note that the semantic specification layer is orthogonal to the rest. Figure 3.20: METEOR-S WSDI high level layered architecture. The data layer is composed of the Web service registries. It is the lowest level layer in the architecture and UDDI is used. No change to the original UDDI services is made, and due to that, semantic operations over the services are only available through the previously mentioned operator layer. That said, the real UDDI services are still accessible. The communication layer provides the infrastructure for the intra-communication between the distributed components. All the components in METEOR-S WSDI are considered peers and four types exist: operator peers (maintain registries), gateway peers (entry points for entry points entering the system), auxiliary peers (provide registry ontologies) and client peers (transient exploiters of the provided capabilities). 38

44 The operator services layer is responsible for keeping the services provided by operator peers in the registry. It allows that clients access services over a layer of abstraction, hiding the semantic service details. Service discovery from the client side is accomplished with templates, which are then communicated to the operators, which translates them to the registry specific format. The semantic specifications layer enables the use of semantic metadata over the UDDI services. Semantics are added on top of both Web services and registries using ontologies. This allows fine-grained access and querying over services in a scalable manner, detailed below. With regards to the framework implementation, ii is worth mentioning that the peer to peer network has been implemented using JXTA [83]. JXTA peers connect using pipes and allow runtime pipe binding. By using JXTA, METEOR-S WSDI is kept platform and device independent, which enables broad interoperability. METEOR-S WSDI enables two options for matching the constructs (between WSDL to domain-specific ontologies) that serve for service publication and discovery. Those mappings are made using tmodels and Category bags and annotated in UDDI. Manual and semi-automatic mappings are available, the second one leveraging the SAWSSDL algorithm mentioned before Service discovery architectures Once we have presented the central concepts about the service matchmaking approaches and frameworks, it is important to frame them into an architectural context. Systems are usually meant to be run in a real-world-like environment, therefore it is essential to analyse higher-level architectures rather than pure matchmaking. As previously mentioned, one of the critical pillars of FI is the decentralisation of resources (by Edge Computing). This brings a whole new set of challenges to be considered, as the resources to annotate and services to discover are no longer part of a controlled centralised architectural entity. To successfully enable and implement service discovery capabilities on systems deployed in such environments, it is crucial to leverage existing Semantic Service Discovery architectures. Such architectures can be broadly categorised as centralised or decentralised. The division is made with regards to how the service registry storage and peer location are handled. In the figure below (Figure 3.21), the aforementioned categorisation is exemplified, by providing some key features of each category. Each category will be explained in depth shortly after. On the one hand, centralised Service Discovery architectures [84] are the most common approach. By keeping a centralised service registry and managing the service discovery between peers and super-peers, the implementation is simplified. The lookup time for services matching a query is also low, thanks to the centralised registry. However, by doing so, a single point of failure is introduced. In the case of super-peer failure, the capabilities of the systems are significantly reduced, and that can only be partially mitigated by applying caching and replication. The potential scalability of such systems is limited when the offer and demand of services grow significantly. The previously mentioned music-sharing system Napster is an example of a centralised P2P service discovery system. On the other hand, decentralised Service Discovery architectures [84] keep a distributed storage of the service registry. As the registries are distributed among all peers, decentralised architectures can be divided following traditional P2P network 39

Figure 3.21: High-level Semantic Service Discovery architecture categories. topologies: structured, unstructured and hybrid. On the following image (Figure 3.

Structured Semantic Service Discovery architectures, heavily rely on a predefined network structure (topology), even if no central service-registry server is present.

45 Figure 3.21: High-level Semantic Service Discovery architecture categories. topologies: structured, unstructured and hybrid. On the following image (Figure 3.22), the characteristics of each subcategory are highlighted. Figure 3.22: Decentralized Service Discovery architectural subcategories. Structured Semantic Service Discovery architectures, heavily rely on a predefined network structure (topology), even if no central service-registry server is present. When resources are added to such systems, they are not distributed randomly between peers, but into specific locations. The system service index is distributed among all peers given an overlay, which will be exploited for querying. The overlay can be structured following hierarchical or flat organisational approaches. On hierarchical overlays, such as P-Grid [85], peers are organised in domain-oriented groups, managed by super-peers (the super-peer is typically assigned at runtime). On flat overlays, such as Pastry [86], queries follow a key based-routing, which requires that the joining/leaving resources must update the registry of the responsible resolver peer. Unstructured Semantic Service Discovery architectures, do not rely on a structured network overlay. When peers join the system, they hold no information about the services or other peers available on the system. Resources are to be located dynamically, usually following network-flooding techniques. On such techniques, a peer broadcasts an information request to the peers with some range parameters, until a successful match is generated. Flooding allows high network resiliency, but can also generate high loads of traffic depending on the system size and peer access pattern. An example of this approach is PULSE [87], an adaptive live streaming system for unstructured P2P networks. 40

46 Finally, Semantic Hybrid Service Discovery architectures combine the two previous approaches. As an example, flat data-hash tables (for finding rare resources) can be combined with flooding techniques (for finding replicated resources), such as in [88]. That said, it is yet unknown which hybrid service matchmaking can scale best for the needs of the Web as a whole. As each system use case will propose different variability and specific constraints, it creates the need to adjust and evaluate the solution for each specific situation. 3.3 Interest-based dynamic grouping The final topic we will cover in the literature review section is the interest-based dynamic grouping of services. In order to design content-centric applications, where services are to fulfil user-goals at runtime, it is crucial to support automatic aggregation and grouping of services. Static (design-time) service grouping in gridcomputing scenarios has been researched [89][90], although interest-based dynamic grouping of semantic services constitutes an open field of research [38]. The interest-based grouping is made possible thanks to the annotation of services and the service discovery capabilities described above. Without those, the grouping would not be feasible, as it relies on interservice communication and discovery of services. By grouping the services in the same context, multiple advantages are generated. Under FI environmental conditions, where services are heterogeneous, dynamic and mobile, it allows higher scalability, resiliency, and eases management. It also allows for applications to include group-specific features, such as group messaging, scalable contextualised searching by groups, location-awareness at overlay level and so on. To better analyse the interest-based dynamic grouping solutions, we will analyse the following aspects: interest definition, and service groupings. While a general idea about such topics will be provided, the analysis will be focused on their application in FI environments Interest definition User interest definition has been a topic of research for many years. With the proliferation of e-commerce platforms and folksonomies [91], recommender systems have been trying to increase the user profile depth. The broader (cross-domain) a user profile is, the better recommendations it will allow and therefore provide higher value to services and businesses. To annotate user interests accurately, first, what interest means needs to be clarified. Interest is defined as "the feeling of wanting to give your attention to something or of wanting to be involved with and to discover more about something" by the Cambridge dictionary [92]. Inside the context of user interest, in addition to the topic of interest, often the levels of interest are also modelled as vectors. The degree of interest is extracted from either explicit or implicit actions [93]. Some approaches also consider time-data as a relevant part of the interest information [94]. In the context of the thesis and to manage the levels of complexity, interests are defined as a set of topics significant for a given user. Users interests can evolve, allowing the coexistence of users with (sub)sets of common interests. However, providing a cross-domain (interoperable) interest definition is not without challenges. Interoperability requires a standardised representation of those 41

47 interests, which can be solved by using formal definitions (ontologies). Different approaches use ontologies to model interest [94], as opposed to other interest definition methods (star-rating, tags), it allows querying interests at the domain level, and not only by syntactic string comparison. Other important reason to choose the ontology interest definition approach, is that often categorisation (grouping) of the interests in classes is required, in use cases such as interest-based user grouping or graph-like interest representations. Below, the primary layers supporting user interest definition are introduced. The goal is to illustrate the set of steps to take to construct interests. First, the different approaches to interest annotation are introduced and second, we explain how those interests can be modelled. Interest annotation As we previously mentioned, interest can be represented in multiple ways. We will deepen a bit into the two predominant ones: tag [95] and semantic annotation [96]. Tag-based annotation relies on the use of sets of strings to define an interest. Those tags can be user-defined but are often related to the underlying applicationcontext. They provide a flexible and easy to compose multidimensional interestdefinition vectors. N is the number of unique tags in a specific application or domain, a specific interest of a user can be represented as a vector of N dimensions. Tags are generally boolean, meaning that they express either interest or non-interest, without specific intensity vector or complex time-related data. In the following image (Figure 3.23) we propose an example of tag-defined interests for a user. Figure 3.23: Sara in interested in crusty apple pie (Interest 1) and Spider-man comic books by Stan Lee (Interest 2). With regards to the vectorial representation of those tag-defined interests, it will depend on the number of unique tags in the system. In the use case of Sara, six unique tags are used. Therefore, the system will represent the interest using six boolean dimensions. In the following table (Table 3.1), we exemplify such vectorial representation. With regard to the namespace of the tags, as we previously mentioned, unless limited by the application, it does not directly map to the application context. This does not present an issue to compare users based on their interests, as vectorial comparisons can be applied and users clustered based on those [97]. However, it 42

48 Interest Pie Apple Crusty Comic Book Spiderman Stan Lee Interest Interest Table 3.1: Vectorial representation of the previously introduced Sara s interest. Note that boolean values are represented using integer notation (true 1 and false 0). hardens the standardisation of terms for interests representation, as different users may represent virtually the same interest with different tags. Available tools like Wordnet can be used to conduct such standardisation process, but the results are not always exact [98]. Aside from that, tags do not contain any hierarchical or relational structure between them. Therefore, by combining tags arbitrarily, nonsense interests can be defined: Ana can be interested in crusty apple Spider-man, which makes no sense (to a human reader). The tag-approach also fails to represent complex interests with logical operators. After presenting the tag-annotated interest, we will mention the semantic annotation approach. Semantically annotated interests rely on ontologies and provide contextualised, hierarchical and relational information. By defining all the interests within a system upon the same ontology, one-to-one interest comparison between users is achieved. Another advantage of using ontologies for interest definition is that it is possible to annotate the resources that fulfil the interests using the same ontology as the interests. This provides substantial advantages with regards to resource/interest matching and comparison. Another important benefit is the granularity of the defined interests. When constructing the interests following ontologies, aside from the concepts, the predicate relations between subject and objects are stored (e.g. when using triples). An ontology example for user-interest annotation is the ODP web directory, now replaced by Curlie. It provides a hierarchic view of web pages and has been used as an ontology for representing user interests in [99]. Another option is to construct a domain-specific interest-defining ontology, as done in [94]. The main drawback of semantically annotating interests, either by using existing interest-contextualising ontologies or creating a new one, is that the interest definition is bounded to the underlying ontology (defined at design time), severely limiting the expressiveness of interests. It is worth mentioning that ontologies have also been applied successfully to recommendation systems, allowing solutions for cold-start, and interest acquisition issues [100]. It proves that recommendations of interests are also successfully modelled with ontologies. Interest modelling Regardless of the form chosen for annotating the user interests, there are different approaches available for the modelling of those interests. Knowledge-based, behaviour-based and hybrid modelling are the main exponents. Knowledge-based interest modelling generate static interest models and then match the users to the nearest model instance. Well-suited for environments that do not change, it fails to scale for the FI scenario properly. In a context where unforeseen resources and users are constantly interacting, an entirely static userinterest categorisation does not fulfil the dynamicity requirement. An example of 43

49 knowledge-based interest modelling would be: first, designing three interest groups (books, movies, music) and then sorting the users into the aforementioned interest groups. The behaviour-based interest modelling does the contrary. Instead of creating a set of interest-profiles (groups) at design time, the groups are inferred from the user behaviour. Those groups get composed at runtime, and they may not map to concrete knowledge terms (such as movies or books). In the process of behaviourbased interest modelling, users are clustered based on similar behaviours. This approach is used to binary represent the interests and recommend new interests, based on cluster-neighbours similarity. Hybrid interest modelling [101] combines the previous two approaches. Instead of a purely design-time grouping definition or extraction from the user behaviour, it merges both. This is achieved by combining the application-domain knowledge (represented with ontologies) with supervised behavioural clustering. Doing so, the process of generating user profiles (user interests) is enhanced by the domain knowledge and dynamic Service grouping In order to enable dynamic interest-based service (resource) grouping, once the interests are adequately annotated, it is crucial to define and choose suitable service grouping approaches. Aligned with the first pillar of FI, the Internet by and for the People [15], dynamic creation of knowledge-sharing virtual communities (groupings) is critical. By leveraging automatic knowledge acquisition and reasoning, higher value to users is provided by intelligent service composition and contextualization. Taking into account that FI envisions an environment where resources are to be ubiquitous, dynamic and decentralised, such groupings are to be accomplished over the network and often as an overlay structure [8]. Those groups remain as virtual entities, as the underlying network topology of the system under constant change. By doing so, great flexibility is achieved, as groupings can be abstracted from their physical topologies towards goal-defined and coordinated entities. When grouping services, it has been mentioned that groups are often created aligned with grouping policies, also referred to as goals. Such goals represent the main intent of the subset of services of the group, which can significantly vary. When engineering content-centric applications, resources are to be grouped following content-related goals (e.g. grouping together nodes that consume the same type of resources). However, resources can also be grouped by applying a set of more arbitrary constraints, such as location-based information, inner properties, or live-cycle status. The set of defined goals will most likely vary from application to application, so the goal is to analyse the main approaches to decentralised service grouping, without focusing on goal definition. That will be developed in the "I-PRIME" section, as the reference architecture for the motivating scenario is presented. Non-dynamic grouping is quite trivial to achieve when compared to the dynamic counterpart, as resources are categorised at design time, and have reduced mobility. When dealing with dynamic grouping scenarios, resources will most likely evolve, either by changing location, properties or provided capabilities. The primary challenge in those cases is how to build systems that provide dynamic group- 44

50 ings over unforeseen resources, deal with duplication, automatic group management and group roles. Under the analysis of dynamic groupings, two approaches will be discussed in detail: coordinated and biological groupings. Coordinated grouping Coordinated grouping approaches to dynamic service grouping (also referred to as supervised), are exemplified by A-3 [37]. A-3 aims to simplify the coordination of highly dynamic and distributed systems by group abstraction. Systems with a high number of components are abstracted and managed as coordinated groups, simplifying management. Revisions of this approach [102] have been conducted, providing a unified programming model that eases system design. We will briefly explain the insides of the aforementioned A-3 framework, for providing a more concrete contextualization of the coordinated grouping. This specific framework leaves the grouping policy open so that the system designer can set it. Systems are composed of combining components and connectors, to form specific configurations. Such combinations can be either programmatic or declarative, depending on the use case. Groups are composed of supervisors, supervised components and connectors. Supervisors represent entire groups in the system and broadcast information to the supervised components within the group (by broadcast, multicast and unicast). Supervised components provide relevant information to the group supervisor. Connectors are responsible for the binding between the previous two, in the form of asynchronous messaging. The basic elements of a group are shown in the image below (Figure 3.24). The displayed hierarchy is composed of three groups, two of them subsumes of the other. It is important to note that the nodes which supervise the two lower groups in the hierarchy, simultaneously take the role of supervisor and supervised components. A node can also be part of two or more groups at the same time, depending on the use case. Figure 3.24: Basic elements of an A-3 group, inside a three group hierarchy. With regards to system architecture in A-3, it can be configured in multiple ways: by hierarchical composition (previous figure), by bidirectional hierarchies and shared supervised components (Figure 3.25). It is important to mention that the architecture of choice will be dependant on the use case in hand. 45

51 Figure 3.25: Shared with supervised component and bi-directional hierarchies. In order to provide grouping capabilities, the Java reference implementation of A-3 is composed of two primary layers: the discovery manager and the group manager. The application specific logic is extracted to a superior abstraction layer. With regards to the management of the dynamic coordination, A-3 proposes three options: direct access from a supervisor to the underlying low-level groups API (JGroups), direct access from a supervisor to the underlying group configuration by a set of high-level APIs, and specification of the desired configuration at design time. The proposition of those different approaches brings the focus to the challenge of automatic dynamic group management. Even if the first two direct access solutions enable grouping, desired configuration declaration at design time is beneficial, as it abstract the lower-level group management. Biological grouping Other approaches are inspired by biological patterns, such as Myconet [103]. Instead of modelling the coordination, the organisation of the systems emerges from the components themselves. Components follow a defined behavioural pattern inspired by biological systems, which demonstrate self-organising capabilities. Myconet applies a fungi-inspired grouping model, generating a super-peer overlay network. By following a biologically-inspired model, properties such as selforganisation, emergent adaptation and resilience are introduced. In this case, superpeers are used to reduce the system s network diameter and increase communication efficiency between resources. Following the vegetative growth patterns of the hyphae, peers contain a discrete amount of biomass from which the grouping will grow, selecting interconnected super-peers. This approach applies well to large-scale P2P networks, where a general view of the systems is not held, and notably dynamic peers are to coexist. Although this approach has proven to be scalable, robust and able to handle component malfunction, it has been criticised for its lack of proper design approaches [102]. Myconet, for example, arranges the system of nodes following 46

52 a biomass-based growth pattern. Such growth is only development in one dimension, disregarding more fine-grained community and interest definitions than the pure numeric value (representing the amount of biomass per node). 47

53 4 I-PRIME In the following chapter, we will present the current status of the PRIME middleware [31] and the proposed architectural extension of PRIME. Once we analysed the literature regarding the formulated research questions and acquired the appropriate knowledge, we introduced the application motivating scenario (Social Library) from where we extracted the set of functional requirements. In order to give a solution to the interest-based groupings (based on semantic service annotation and discovery) in content-centric applications, we grounded our architectural extension to an existing middleware for developing FI applications, PRIME. The extension is presented as a set of composable blocks, introduced in the context of the PRIME architecture but applicable to other middlewares. After proposing the architecture extension to support the research questions, the requirements elicited from the motivating scenario will serve as a base point for assessing the validity of the approach, along with a set of cross-cutting concerns. 4.1 PRIME As mentioned above, PRIME has been chosen to contextualise the architectural extension over an existing framework. The detailed architectural description and active development were factors for decision in favour of PRIME. However, we want to clarify once more than the main purpose of basing the extension in PRIME is the provided grounding to existing architecture and that specific use cases may benefit from other framework choices, out of the scope of this work. The PRIME approach defines an architectural style based on modelling and programming abstractions to uniformly represent resources and develop FI applications with opportunistic resource aggregation. We have chosen PRIME for the following set of reasons: Provides a clear modelling abstraction for handling resources in FI. Separated from the underlying implementation. Proven academic usage. Used for multiple courses at Linnaeus University, in Växjö, Sweden. Under active development. Direct access and open feedback from the maintainers, enabling proactive iteration over the design extension. We consider the feature of providing a clear abstraction over resource handling on FI scenarios to be key. Due to the heterogeneity and complexity of the underlying system details, such as networks, defining device capabilities as resources enables abstract application modelling. By doing so, a unified conceptual layer is achieved, easing the development of solution with inherently heterogeneous services uniformly. Another reason we considered relevant when choosing PRIME was the active development. The architecture extension could have been grounded to other approaches, but choosing a system under active development and academic usage significantly increase the impact of the contributions. Important to mention that 48

54 the support for interest-defined grouping is a milestone in PRIME s defined future development roadmap. Finally, and related to the previous point, the current developers and maintainers of PRIME enabled direct and proactive feedback, assuring that the architecture extension grounding is sound about the underlying architectural model. This communication also provided access to more detailed descriptions of the system, often not reachable over literature reviews. 4.2 Architectural extension of PRIME (I-PRIME) After laying down the principles and explaining the high-level architectural design of PRIME, we will formulate an architectural extension. The main reason for this architectural extension is that PRIME provides a middleware to represent services (also referred to as resources) uniformly and to develop FI applications around the opportunistic aggregation of resources. However, when it comes to the creation of content-centric FI applications, the support for interest-focused dynamic service grouping is missing. The section will be structured as follows: first, the highlevel modifications over the existing architecture will be highlighted, and then those specifics modifications will be detailed. It is important to note that for the sake of understandability, brief examples will be provided for each of the aforementioned extensions. However, a more complete and contextualised running example will be provided in the "Evaluation" section. As we previously mentioned, PRIME presents a firm base on which to build up an extension that copes with those requirements. Based on the proposed modelling abstractions and layered architecture of the middleware, it will be extended to support interest-focused dynamic grouping of services. We will broaden the design by heavily modifying and extending the previously mentioned API programming layer. The layer in charge of communications will remain untouched, by keeping the service grouping and interests related concerns at the upper layer. The architecture schema is provided in the image below (Figure 4.26), being the additions and modifications to the architecture highlighted in orange. Figure 4.26: High-level view of the extended PRIME architecture. As seen in the image above, the architecture will be extended to the API programming layer, by expanding the Prime Application section and by adding the 49

55 new Grouping section. The reason to propose the architectural extension as a separate module inside the API programming layer is to maximise the portability of the extension. By placing the extension concepts directly below the user application (business logic), the coupling with lower levels is minimal. The concepts embedded inside the Prime Application architectural block are not coupled with it. The reason to place them inside is that the Prime Application represents the main entry point for the framework, so we consider essential to initialise and include those concepts inside it. When extending the approach to other frameworks and depending on the framework architecture, those concepts should be placed on the initialisation/entry component. The Prime Application section will expand to accommodate the interest definition along the resources. Conceptually we have placed the interest definition at the highest level of the API programming layer, as we consider that interests constitute the highest abstraction tier. Similar to resources, interests are tightly bonded to the application s context ontology. Constructs such as groupings will be built around them, defined either explicitly or implicitly. How interests are modelled in the PRIME architecture extension will be will be explained in more detail in the following "Interest definition" subsection. Resource annotation approach will also be modified, along adding a distributed shared storage mechanism for the sake of persistence. Below the newly modified Prime Application, we added a new architectural sub-layer, the Grouping. The Grouping tier is composed of two central concepts: the group creation and the group management. This layer manages all the related operations with the grouping of users (Prime Applications) based on goals. Taking into account that the topic of the thesis is to research how to engineer contentcentric FI applications, the goals that bring the users together will be the interests. Interests quantify how the system users feel about resources, and provide insights later leveraged for tailoring personalised and meaningful resources to users. How the grouping section has been solved will be explained in the following "Grouping over interests" subsection. It is important to mention that aside from the resource, interest and grouping related modifications, to provide a solution capable of covering the previously proposed research questions, the application context-information handling will be modified, and a notion of shared storage added, to support the interest and resource definition. This will be discussed in the following subsections Application-context First and foremost, how we extended the application-context handling in PRIME will be explained. Recalling from the "Current Status of PRIME", conceptually the application-context is composed by the application-wide knowledge, leveraged to achieve common ground of understanding and reasoning across the distributed nodes of the application. Such knowledge is stored as an ontology and represents a unique source of truth about the context of the application. We acknowledge the capabilities and potential of such knowledge. However, in order to provide enough support for the extended resource management and interest definition, we consider the current approach to application-context handling as suboptimal. We propose to keep the format of the ontology (RDF triples), but provide 50

a new resource classification ontology, so the types and quality of information later annotated are enriched for the aforementioned purposes.

A graphical representation of the proposed ontology is provided in the image below (Figure 4.27).

56 a new resource classification ontology, so the types and quality of information later annotated are enriched for the aforementioned purposes. The proposed ontology for resource classification and identification is inspired by the categorisation approach followed in [91], merged with the resource annotation by tags [98]. A graphical representation of the proposed ontology is provided in the image below (Figure 4.27). It is important to note that the provided ontology covers the context-agnostic application scenario, and the context-specific ontology data is to be further added by the developers and modellers. Such is done on the example provided in the "Evaluation" section, where an ontology for the Social Library motivating scenario is provided. Note that the proposed base ontology is rather simple, as we favour the extension over the relevant application-context, and support the introduced tag notion. Figure 4.27: Application-context ontology. The domain-specific section (and variant depending on the use case) is marked in blue. The main idea behind such design is to provide support for the tag-based annotation of resources and complex interest definition over tags (further explained in the following subsections). The tags are directly mapped to the common knowledgebase (RDF ontology), thus conveying a common ground for the reasoning on the application nodes. Another advantage of linking such tags directly to the ontology is that tags can be grouped and relations between tag-annotated resources extracted with a simple lookup from the ontology graph. In order to better exemplify the proposed ontology, we are going to introduce a brief example (focusing on the domain-specific section of the ontology). Given the following simplistic application ontology (Figure 4.28) that provides information about the following movie genres (horror, action, adventure, epic and historical), the following resource tags are generated: movie, movie-horror, movie-action, movie-adventure, movie-adventure-epic and movie-adventure-historical. Each one of those tags uniquely identifies each one of the ontology concepts. Figure 4.28: Graphical representation of a movie genre ontology. The tag for each class is provided between brackets. Note that within the ontology above some tags are nested, as sub-categories (movie-adventure-epic and movie-adventure-historical are nested below movie-adventure). 51

By having a one to one mapping from tags to the hierarchical ontology entities, is it trivial to find common ancestors of successors from tag sets.

57 By having a one to one mapping from tags to the hierarchical ontology entities, is it trivial to find common ancestors of successors from tag sets. This provides great flexibility, as tags conceptually keep the semantic meaning and hierarchical level of the mapped ontology classes. That said, it is critical to mention that the descriptivity of the tags is bounded to the granularity of the ontology. Generalistic ontologies will provide generalistic tag sets with low descriptive granularity, and over specific ontologies will generate on overspecific tag sets. Also, considering that tags are grounded to the ontology, the extensibility of tags (and ontology) is limited to the capabilities of ontology extension at runtime. This feature is under active research and constitutes a factor to consider Resource management Once the principles for application-wide knowledge sharing have been laid down, we will deepen on the resource management aspect of the PRIME extension. In order to present the contribution, we will follow a similar structure as in the previous subsection, recalling the current approach to resource management, pinpointing the shortcomings and introducing our solution. On the current version of PRIME, a PRIME application can be composed by N resources, and each one of those resources directly maps to one ontology concept (class). By the auri and curi class properties, those resources are uniquely identified and made available. The limit of resources inside one application is not set, but a resource can only be mapped to one ontology class simultaneously. The main issue with the unique mapping is that relevant information granularity is lost. Depending on the context and application complexity, some resources may well relate to one or more concepts expressed by the ontology. Such cases are not possible to implement with the current approach, losing potentially crucial data. An illustrative example is provided below with an ontology and a set of annotated resources, following the current (before the extension) approach (Figure 4.29). Figure 4.29: Simple ontology with current (prior to extension) resource annotation. One ontology concept (link by auri) per annotated resource. Directly linked to the mentioned individual mapping, we consider that the resource lookup is currently simplistic. Complex resource lookups concerning multiple parameters (e.g. resources fulfilling or simultaneously related to a set of ontology concepts) are not possible, and there is no mean of inferring similarity and relations between ontology concepts. Lookups are conducted only at the auri level, querying all the resources linked to a specific ontology concept. While useful for 52

58 specific use cases, the approach falls short if user interests over resources are to be expressed as lookup operations. In order to solve the first challenge, we propose to extend PRIME allowing resources to be annotated with sets of links to multiple ontology concepts (identified with tags, as previously explained). By doing so, a given resource can be related to multiple concepts represented by the ontology, notably enriching the available information. As an example, below we introduce a simple ontology and an example of resource mapping (Figure 4.30). Figure 4.30: Simple ontology with extension resource annotation. Multiple ontology concept (linked by tags) per annotated resource. Once the resources are annotated with a set of tags (a set of ontology concept links), more complex lookup operations are made available. Aside from the syntactic comparison over an expressed lookup set of tags, the direct mapping of tags to the hierarchical knowledge base (ontology) allows the request broadcasting over related nodes in the hierarchical tree of knowledge. How the grouping and lookup of resources by tags are explicitly handled will be explained in detail on the following section "Groupings", as we consider that to fully comprehend how the lookup and tag resolution are handled, the interest definition and grouping sections of the extension must be first introduced Interest definition After presenting how the application-context handling and resource management are accomplished, we will present our approach to interest definition. Taking into account that the thesis focuses on engineering FI content-centric applications, we consider paramount to support user interest definitions over contents (application resources), to provide extended capabilities based on user-preferences over contents. Aligned with the previously explained status of PRIME, the current version of the middleware does not support interest definition nor exploits any application capabilities derived from user interests. Due to that, we have decided to propose a novel approach to user interest definition, to enable users to express complex interests over contents, denominated Complex Interest (CI). As presented in the state-of-the-art section, the main options for interest definition are tags and semantic annotations. Recommender systems and online commerce sites build user profiles, grouping multiple interests (over a specific element or ontology class), which are later user for providing custom recommendations. However, the expressiveness of those interests is quite limited, often only expressing if a user likes or dislikes a specific content. In our case, we provide the means 53

59 of expression for more complex user-interest definitions, to enable users to express interests over multiple concepts and their operations (considering the concepts as sets): unions, intersections and differences. Considering that resources are annotated with sets of tags (mapping to ontology concepts), we propose to compose Complex Interests by describing operations over those tags. The operations are semantically grounded, as the mapping to the ontology can be leveraged at the time of operation-resolution. In order to do so, we propose a formal language that defines a set of productions. It is important to note that the proposed solution is the first step towards the construction of more complex interest definitions, and that the Complex Interest concept can be easily ported to any other domain that can benefit from the definition of complex interests and maintains a shared knowledge base (so all the system instances agree on the meaning of the referred concepts). The reason to choose a formal language to define the interests is to provide a unified syntactical definition of the interest defining expressions. The expressions can be parsed by a computer with the help of the context-free grammar, and new expressions automatically generated in a repeatable manner. This allows consistent extensibility of expressions when compared with a non-formal definition format and in the specific of ontology-tag expressions, enables the definition of standard set operations. Context-free grammar The formal language is described by a context-free grammar (CFG), a set of productions (recursive rewriting rules) that are used to generate string patterns. The CFG productions of Complex Interest are listed below. 1. <expression> > ( <expression> ) 2. <expression> > <expression> <expression> 3. <expression> > <expression> <expression> 4. <expression> > <expression> \ <expression> 5. <expression> > tag It is important to mention that the first production provides recursive support, as multiple binary operating expressions can be chained with parenthesis. The proposed grammar is composed by one terminal, tag. The terminal tag encompasses all the tags defined by the ontology (as ontology concepts). Therefore, a CI expression is composed of a set of tags and combinatory operations over them (over the resource sets that they represent). The available operations are mapped to a subset of the available binary operations over sets, later explained. In order to set a common ground for the later examples of CI expressions, we will mark the basics of an example. Provided that a given PRIME system runs over the ontology O, T is the set of tags provided by O (composed by the number of concepts in the ontology, n). T = {T 1, T 2, T 3, T 4,..., T n }. For our examples we propose a hierarchical ontology where the element "E1" (with tag T 1 ) is the root concept, n = 5, and is structured as follows (Figure 4.31). The resource set corresponding to each tag is represented as ST x (x being the related tag number). 54

32). Figure 4.32: Union operation between sets, Venn diagram. The resulting set is highlighted in blue. The third production covers the intersection.

60 Figure 4.31: Simple ontology with tags The second production covers the union between two sets. The outcome (S) of the following CI expression, S = T 2 T 3, is the following: S = ST 2 ST 3. A graphical representation of the result is provided below (Figure 4.32). Figure 4.32: Union operation between sets, Venn diagram. The resulting set is highlighted in blue. The third production covers the intersection. The outcome of the following expression S = T 4 T 5, is S = ST 4 ST 5. A graphical representation of the result is provided below (Figure 4.33). Figure 4.33: Intersection operation between sets, Venn diagram. The resulting set is highlighted in blue. 55

$The fourth production covers the relative complement operation. The outcome of the following expression S = T 2 \T 5, is S = ST 2 \ST 5.$

61 The fourth production covers the relative complement operation. The outcome of the following expression S = T 2 \T 5, is S = ST 2 \ST 5. A graphical representation of the result is provided below (Figure 4.34). Figure 4.34: Relative component operation between sets, Venn diagram. The resulting set is highlighted in blue. Finally, we consider essential to note that more advanced expressions are available by the combinations of the previously mentioned operations. We will provide examples of such expressions on the "Evaluation" section Groupings Finally, we will conclude the presentation of I-PRIME with the proposed solution to grouping. The previously introduced features are paramount, as grouping features are built on top of the new paradigms for application-context management, resource annotation, and especially, over the CI interest definition. This subsection directly tackles the third research question: "How to enable interest-focused service organisation through semantic-based dynamic groupings in PEC scenarios?". As previously said, the thesis topic spins around the idea of content-centric applications, where content and user preferences are leveraged for providing an enriched user experience over the existing content. The high-level conceptual solution to the research question aims to allow users to dynamically join groups of interests, containing users with shared interests and the resources that fulfil those interests. In the existing approach of PRIME, resources are only grouped based on their type (using the auri, directly mapped to an ontology concept). While providing basic lookup capabilities, there is no support for interest-focused service organisation (grouping). Based on the existing RabbitMQ approach for distributed group manager, we propose to extend the grouping approach of PRIME, to allow dynamic user-interest defined groups to emerge. We will break down the proposed extension into two central sections, each linked to the group types that will be generated. First, the producer groups will be introduced and later the interest-focused groups. It is important to note that both groupings base their operation on the notion of shared storage, for keeping track of the existing groups and system-wide operational data. Producer groups The producer groups, as the name suggests, are solely composed of the resource 56

62 producers of the application. Based on the previously presented idea of a common application-context (by ontology), we propose to separate the strictly-resource related groups from the groups constructed with those resources for other purposes (e.g. interest-based content recommendation). The number of groups in this category will be strictly linked to the ontological concepts presented in the ontology and available resources. The central concept is to aggregate the resources based on their tags. Due to the available combinatory options concerning the tag annotations, we first decided to create as many resource groups as different ontological-tags for the annotation of services. As an example, provided that one PRIME application expose a resource (R) annotated with the tag set {T 2, T 3 }, two distinct resource-producer groups will be generated: group T 2 and group T 3. Such group division approach is chosen, to be able to group resources that provide partially similar (the shared annotation tag subset) annotations. A graphical representation of the mentioned example is attached below (Figure 4.35), along with the supporting ontology. Figure 4.35: Simple resource producer grouping from tag annotation sets. It is important to note that we have also considered the potential size of the domain-specific ontologies. Depending on the amount of domain-specific knowledge that those structures contain, the granularity and complexity will increase. This proposes a challenge to the approach of creating as many groups as distinct tags are used for the exposed resource annotation, as the number of groups would grow linearly to the number of ontology concepts. To palliate this situation, we propose a domain-specific ontology granularity threshold, to be set by the domain experts. The purpose of such threshold is to set a common-agreement ontology level, upon which the resources will be grouped. Resources annotated with tags below a chosen level of granularity (hierarchical level on the ontology, below the common-agreement threshold), will be added to producer groups according to the common-agreed denominator (by recursive ascent). By doing so, a potentially excessive amount of groups will be avoided (the maximum group amount is the sum of ontological concepts from the root to the threshold level) and the executed lookups will run against those groups. It is important to mention that even if the resources are grouped according to more-general (higher in the ontology) concepts, they keep the individual annotation information. Concrete (towards lower-level concepts in the ontology) lookup requests can be executed over general groups, as each participant will evaluate each request. A graphical representation of the mentioned granularity threshold application on producer group creation is added below (Figure 4.36). Aside from how the producer groups are constructed about the applicationcontext ontology and to complete the solution proposition, we will explain how the producer group management is conducted. It is important to highlight that such process is agnostic to the selected technology stack. 57

63 Figure 4.36: Complex resource producer grouping from tag annotation sets. The granularity threshold is highlighted in yellow and T 8 is abstracted into group T 4. The creation occurs when the group for a particular tag is non-existing. Linked with the granularity threshold concept previously presented, it is essential to clarify that some groups will be created upon the tags inferred by recursive ascent. Once the PRIME application joins the system, it will check the grouping per each resource. If there is no existing group for one tag, it will create it and join the group. If the group already exists, it will automatically join. As soon as the PRIME application leaves the system, all the resources will leave the producer groups which were participating. In case of looking up for the existence of a group when the granularity threshold is set, the ontology structure will be used. As groups will be created at the level of the threshold for more granular tags, when looking for groups, recursive ascent will be applied over the structure of the ontology, up to the threshold. If after the ascent there is still no group for the given tag, the group will be created. All said a more detailed example would be provided on the "Evaluation" section, for the sake of understandability. Interest-focused groups These groups encompass both users and producers around shared interests. First, users define some interest towards a set of resources (using the CI vocabulary presented before), and then users with the same interests are grouped, along with the resources that fulfil those interests. By doing so, users with similar interests are connected, enabling communication between users and access to all the resources that fulfil that specific interest. Every distinct interest will generate a different group, and users will be part of as many groups as interests they define. As an example, in music streaming application context, users define which music genres (specified on the application-context) they are interested in, and they are then grouped by similar interests. Once grouped, they will be able to connect with similar (interest-sharing) peers and find more songs (resources) that fulfil their interests. Those capabilities provide large opportunities for implementing content-centric features in service-composed environments. As mentioned above, the interest-defined groups are created around user interests, in our case, described by Complex Interest expressions. CI use applicationcontext ontology concepts (tags) as terminals, which will be reused for the group construction process. We will briefly explain how the group creation process is 58

64 executed, and highlight the importance of CI expressions along the previously presented producer groups. CI expressions are built as operations over sets of resources, referred to by tags. As we previously marked, tags are directly mapped to the hierarchical ontology, allowing for traversal operations (ascent, descent). In the previous section, we presented the producer groups, which contain the application resources grouped by ontology tags. Keeping those elements in mind, we will present the group creation and join operations by example. The group creation occurs when a user defines a unique interest, structured as a CI expression, in the context of the running system. Given a ontology with the concepts (element:tag tuples) {(E1 : T 1, E2 : T 2, E3 : T 3, E4 : T 4, E5 : T 5 }, the resources with the annotated tags R1 = {T 1, T 2, T 4 }, R2 = {T 1, T 2 } and a user with the interests I1 = T 1 T 2 and I2 = T 1 \T 4, will produce the following outcome (Figure 4.37). In the image, the ontology, resources, interests and outcome groups are presented. Figure 4.37: Interest-defined grouping. On the right side of the groups, the participant resources are declared. Once an existing interest is instantiated, instead of creating a group, the new user will join the existing interest defining group. In the context of the previous example, if a new user declares the following interest: I3 = (T 4 T 5 ) T 1, the user will be part of the previously created interest defined group G1. Even if the expression is syntactically different, by using the information available in the ontology, we know that the outcome set of T 4 T 5 is equal to the set of elements of T 2. The resources that belong to those groups are identified by executing lookups the in the producer groups. Guided recursive lookup operations are executed, traversing the ontology tree using the tags composing the CI expressions and then operating over the result sets. With regards to interests comparison, a measure similar to the granularity threshold available in the producer groups, the concept could be potentially implemented for allowing a more flexible interest matching. However, for the proposed architectural extension and considering the scope of the thesis, only equal interests (with the same result set over the ontology) have been considered. A more detailed running example of these groupings will be provided in the "Evaluation" section. 59

5 Evaluation In the following section, we will evaluate the validity of the presented architectural extension over PRIME, taking into account the support for the functional requirements of Social

65 5 Evaluation In the following section, we will evaluate the validity of the presented architectural extension over PRIME, taking into account the support for the functional requirements of Social Library. Each of the functional requirements of Social Library will be analysed in depth and examples provided. We will also discuss the results of the research loop started with the research questions. We have divided the section into the following subsections: running example description, architectural extension evaluation and result discussion. 5.1 Running example Before the evaluation of I-PRIME, we think it is relevant to set a solid example. Based on the context provided by the motivating scenario, we will specify the details of a running system, from the application-context ontology to resources, userinterests and groupings Application-context ontology The application-context that we provide is constructed to support the Social Library application. Considering that the domain encompasses media content from different types and genres, we propose the following ontology (Figure 5.38). The ontology code is provided in "Appendix 2" due to the length. Figure 5.38: Social Library running example ontology. The granularity threshold is marked in yellow. In the image above the ontology concepts and the tags are displayed. Note that the concepts outside of the domain-specific ontology section (Thing and Resource) do not have tags and also that a granularity threshold has been set, on the fourth level of the ontology, marked in yellow. We acknowledge that the proposed ontology is somewhat simplistic, as media categorising ontologies can be of more extensive. Instead of an exhaustive ontology, the idea is to provide a realistic example that will suffice to prove the validity of the proposed architectural extension, while keeping the complexity levels manageable Resources In the running instance of Social Library, we consider the following list of available resources (Table 5.2). We provide the annotated tag set and provider information per each resource. 60

Resource Annotation Tags Provider R1 {book, thriller, adventure} Sara R2 {short} Sara R3 {feature-film, action} John R4 {audio, black-comedy} John R5 {short, epic, adventure} Alex R6 {comedy} Alex

66 Resource Annotation Tags Provider R1 {book, thriller, adventure} Sara R2 {short} Sara R3 {feature-film, action} John R4 {audio, black-comedy} John R5 {short, epic, adventure} Alex R6 {comedy} Alex Table 5.2: Listing of the available resources, their annotation tags and the provider actors. In the table above, we present the six available resources and their providers. As previously mentioned, PRIME applications are prosumers. Thus the aboveidentified providers are PRIME applications that expose the listed resources. Aside from providing, they can also consume, and there may be more consumer-only actors, not listed above. The generated producer groups from those resources are presented in the next subsection, "Resource groups". It is important to note that the identity (unique string) of the resource providers will be later used to identify the actors who compose the different groups Resource groups Applying the approach introduced in the extension and the resources presented above, a set of producer groups will be created (Figure 5.39). The granularity threshold is applied when grouping resources annotated with tags below it, such as R1 or R3. Figure 5.39: Social Library producer groups Users The users of the system, are the resource consumers. In the case of Social Library, they are Sara, Alex and Mark. Sara and Alex are also producers, but they are also users as they express interest in specific ontology concepts, with Complex Interest expressions. Those interests will be later leveraged for creating interest in defined groups, grouping common interest users and interest fulfilling resources. 61

67 5.1.5 Interests As mentioned above, the user interests are defined as CI expressions. The individual interests of the users of the system are displayed in the following Table 5.4. It is important to note that for the sake of brevity, in this example each user will only define one interest. This behaviour is unlikely to occur on a real setup, as users tend to define a set of different interests. Users Sara Sara Alex Alex Mark Interests I1 = video action I2 = audio comedy I3 = video action I4 = short\epic I5 = comedy audio Table 5.3: Users and their interests Interest groups Based on the CI expressions defining the interests of the system users, a set of interest-defined groups is created: group G1, group G2 and group G4. The groups and the members are described in the image below (Figure 5.4). Name Users Resources group G1 Sara, Alex R1,R2,R3,R5 group G2 Sara, Mark R4 group G3 Alex R2 Table 5.4: Interest group names, users and resources. The I1 and I3 interests are equivalent and I2 and I5 too. In the second case, even if the expressions are not syntactically equivalent, the result sets are equal. Applying set theory, if X = A B and Y = B A, then X = Y. 5.2 Extension evaluation This section will focus on the compliance of PRIME to the functional requirements of Social Library. We will not evaluate other aspects of the system, such as the ease of deployment, performance per middleware instance or choice of virtualisation platform (currently Docker is recommended). Despite the indisputable importance of those, the report aims to tackle issues and propose solutions at the design level, in order to provide a broader view, useful for other middleware approach developers that may share a similar problem domain. That said, the cross-cutting concerns of scalability and extensibility of the design will be discussed after the functional requirement fulfilment is covered. We acknowledge the importance of considering other non-functional requirements such as middleware performance or reliability, which in the case of this research are conceded towards implementation. Being highly dependent on the implementation stack and running environment, further 62

assessment of those non-functional requirements is to be done on future implementation scenarios, discussed in the next section of the report.

68 assessment of those non-functional requirements is to be done on future implementation scenarios, discussed in the next section of the report. With regard to the functional requirements, the first one (R1) states that users should be able to access the system from multiple devices (smartphones, laptops, tablets...). The middleware fulfils this requirement as it is Java-based, and the Java VM is widely supported by those. The most used operating systems support it (Windows, Mac OS, Ubuntu, Android, ios, or Windows Phone) and even many ebooks and printers can execute the Java VM [104]. The second requirement (R2) sets that the users may join or leave the system dynamically. PRIME fulfils this by preserving a set of fluid architecture [105] principles: loose coupling, flexibility, dynamism and serendipity. Due to flexibility, the number of connected users can change at runtime, without disrupting the system. Once a user is connected, the distributed DNS will add its record and will be made available for discovery and usage by the lookup service. When the user disconnects, the DNS removes the record, and the lookup service will not be able to locate it. The previously mentioned architectural fluidity concepts that PRIME is built around are exemplified in the next image (Figure 5.29). On the extension approach to grouping, such dynamicity is not an issue, as the participants can join and leave the system and groups dynamically. Figure 5.40: Primary concepts of architectural fluidity. The third requirement (R3), states that the system users are autonomous and that the running system-instance has no prior knowledge about them. The middleware achieves this by the dynamism and serendipity of flexible architecture. PRIME applications are autonomous and standalone, which enables the sharing of new and unforeseen resources at runtime. The sharing of those resources is driven by ontologies. The application context is semantically modelled, and as long as such ontology acknowledges flexible resource-types by generalisation, the system will fulfil the requirement. In the previous example, when the system users (Sara, John, Mark and Alex) join the system along with the provided resources, the running instance has no prior knowledge about then and accustoms them dynamically. The fourth functional requirement (R4) states that the types of resources shared through the system vary over time. As previously mentioned, in the specific case of PRIME, resources are semantically annotated with tag sets that map to the ontology. Such ontology is deployed within every node, and new or modified resources are evaluated at runtime. Leveraging the use case provided above, we exemplify that resource types can vary over time by removing one resource and modifying the 63

annotation tags of an existing one (Figure 5.41). Such operations are conventional, in order to add new resources, correct existing information or solely remove them. Figure 5.

69 annotation tags of an existing one (Figure 5.41). Such operations are conventional, in order to add new resources, correct existing information or solely remove them. Figure 5.41: New grouping disposition after resource modification. The additions are highlighted in bold and the removals striked through. The fifth requirement (R5) of Social Library is that all the system users must be discoverable and reachable by other system users. This requirement is directly connected to the third pillar of a fluid architecture, dynamism. Leveraging the semantic annotation of services and service discovery mechanisms, all the system users are made available for discovery through the producer groups and broadcast channel (using tag queries and RabbitMQ). The lookup service along the DNS keeps track of the available resources, and tag-based lookups are resolved against the producer groups. As we previously mentioned, in the case of the PRIME extension, all connected actors (PRIME applications) of the system are by default subscribed to the decentralised "Broadcast" channel, where any member can broadcast, multicast or unicast resource matching requests (to the producer groups) or broadcast to all the system users, fulfilling the R5 requirement. The sixth requirement (R6) states that users can define their interests in resources. In the case of the PRIME extension, this requirement is fulfilled by the CI vocabulary. Users can compose complex interest expressions, which result in sets of fulfilling resources. Using the CI expressions makes it possible to compare interests and evaluate their similarity. How the users define interests has been already detailed and exemplified in the previous "Interest groups" section. The seventh functional requirement (R7) that Social Library presents is the following: the system users must be aggregated into emerging communities based on shared interests. The previous version of PRIME was not able to fulfil this requirement, as the client interests were considered neither at design nor at modelling phase on PRIME. With the extension, interest-defined grouping has been enabled and operates dynamically. When a set of users declare a common interest, they are grouped, along with interest in fulfilling resources. By doing so, the discovery and interaction within those units are enabled. In the previous use case example, two interest groups emerge with more than one user, group G1 (Sara, Alex) and group G2 (Sara, Mark). 64

Telecommunication Services Engineering (TSE) Lab. Chapter IX Presence Applications and Services.

Telecommunication Services Engineering (TSE) Lab. Chapter IX Presence Applications and Services. Chapter IX Presence Applications and Services http://users.encs.concordia.ca/~glitho/ Outline 1. Basics 2. Interoperability 3. Presence service in clouds Basics 1 - IETF abstract model 2 - An example of