New data structures to reduce data size and search time

Similar documents
Tests for the Ratio of Two Poisson Rates

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

The Regulated and Riemann Integrals

Lecture 20: Numerical Integration III

Acceptance Sampling by Attributes

UNIT 1 FUNCTIONS AND THEIR INVERSES Lesson 1.4: Logarithmic Functions as Inverses Instruction

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

4.4 Areas, Integrals and Antiderivatives

1 Online Learning and Regret Minimization

1 The Riemann Integral

Designing Information Devices and Systems I Discussion 8B

Introduction to Determinants. Remarks. Remarks. The determinant applies in the case of square matrices

13: Diffusion in 2 Energy Groups

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus

1. Weak acids. For a weak acid HA, there is less than 100% dissociation to ions. The B-L equilibrium is:

Numerical Analysis: Trapezoidal and Simpson s Rule

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

The Predom module. Predom calculates and plots isothermal 1-, 2- and 3-metal predominance area diagrams. Predom accesses only compound databases.

Review of Calculus, cont d

The Fundamental Theorem of Calculus. The Total Change Theorem and the Area Under a Curve.

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007

Jim Lambers MAT 169 Fall Semester Lecture 4 Notes

Math 8 Winter 2015 Applications of Integration

Strong Bisimulation. Overview. References. Actions Labeled transition system Transition semantics Simulation Bisimulation

Chapter 0. What is the Lebesgue integral about?

MATH 144: Business Calculus Final Review

Math Lecture 23

a a a a a a a a a a a a a a a a a a a a a a a a In this section, we introduce a general formula for computing determinants.

5.7 Improper Integrals

Math 1B, lecture 4: Error bounds for numerical methods

Chapter 6 Notes, Larson/Hostetler 3e

CS5371 Theory of Computation. Lecture 20: Complexity V (Polynomial-Time Reducibility)

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004

p-adic Egyptian Fractions

Lecture 3 ( ) (translated and slightly adapted from lecture notes by Martin Klazar)

Riemann Sums and Riemann Integrals

INTRODUCTION TO INTEGRATION

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Riemann Sums and Riemann Integrals

Consolidation Worksheet

DIRECT CURRENT CIRCUITS

1 Probability Density Functions

20 MATHEMATICS POLYNOMIALS

Matrix Solution to Linear Equations and Markov Chains

Generalized Fano and non-fano networks

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

Measuring Electron Work Function in Metal

Lecture 6: Singular Integrals, Open Quadrature rules, and Gauss Quadrature

THERMAL EXPANSION COEFFICIENT OF WATER FOR VOLUMETRIC CALIBRATION

Chapter 14. Matrix Representations of Linear Transformations

7.2 The Definite Integral

Problem Set 3 Solutions

The Wave Equation I. MA 436 Kurt Bryan

ENGI 3424 Engineering Mathematics Five Tutorial Examples of Partial Fractions

The use of a so called graphing calculator or programmable calculator is not permitted. Simple scientific calculators are allowed.

Bases for Vector Spaces

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

Recitation 3: More Applications of the Derivative

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.

1 Nondeterministic Finite Automata

Chapter 3 MATRIX. In this chapter: 3.1 MATRIX NOTATION AND TERMINOLOGY

Chapters 4 & 5 Integrals & Applications

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

Chapter 1. Basic Concepts

List all of the possible rational roots of each equation. Then find all solutions (both real and imaginary) of the equation. 1.

X Z Y Table 1: Possibles values for Y = XZ. 1, p

A New Grey-rough Set Model Based on Interval-Valued Grey Sets

Infinite Geometric Series

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

Scientific notation is a way of expressing really big numbers or really small numbers.

Sample pages. 9:04 Equations with grouping symbols

Handout: Natural deduction for first order logic

Module 6: LINEAR TRANSFORMATIONS

13.4. Integration by Parts. Introduction. Prerequisites. Learning Outcomes

Riemann is the Mann! (But Lebesgue may besgue to differ.)

Objectives. Materials

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

Basic model for traffic interweave

a < a+ x < a+2 x < < a+n x = b, n A i n f(x i ) x. i=1 i=1

In this skill we review equations that involve percents. review the meaning of proportion.

ADVANCEMENT OF THE CLOSELY COUPLED PROBES POTENTIAL DROP TECHNIQUE FOR NDE OF SURFACE CRACKS

INTERACTION BETWEEN THE NUCLEONS IN THE ATOMIC NUCLEUS. By Nesho Kolev Neshev

Student Activity 3: Single Factor ANOVA

Section 11.5 Estimation of difference of two proportions

Improper Integrals, and Differential Equations

Purpose of the experiment

19 Optimal behavior: Game theory

Here we study square linear systems and properties of their coefficient matrices as they relate to the solution set of the linear system.

Designing finite automata II

Section 6.1 Definite Integral

SOME INTEGRAL INEQUALITIES OF GRÜSS TYPE

Suppose we want to find the area under the parabola and above the x axis, between the lines x = 2 and x = -2.

63. Representation of functions as power series Consider a power series. ( 1) n x 2n for all 1 < x < 1

The Algebra (al-jabr) of Matrices

A Direct Transformation of a Matrix Spectrum

13.4 Work done by Constant Forces

The practical version

Convert the NFA into DFA

Transcription:

New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute of Electronics, Informtion nd Communiction Engineers, nd Informtion Processing Society of Jpn Permission number: 18TB0078

New dt structures to reduce dt size nd serch time Tsuneo Kuwbr 1. Introduction It is n importnt problem for dtbses to reduce serch time. Indexing methods hve been commonly used to reduce serch time. However, indexing methods do not reduce the dt size. Here, I propose dt structures nd serching methods to reduce both serch time nd dt size [1],[2],[3]. The proposed methods mintin dt normliztions nd integrity. The proposed methods re lso independent from indexing methods, so the two methods cn be used simultneously. In this pper, the principles of the proposed dt structures re described in Section 2. Some pplicble templtes nd the simultion results of reduction rtes re shown in Sections 3 nd 4, respectively. The updting methods, which re lso importnt processes, re shown in Section 5 [3], [4]. Section 6 gives the conclusions. 2. Principles The principles of the proposed dt structures re shown in Fig. 1. Tble A in Fig. 1 is n exmple of conventionl dt structure. Here, the vlues of Items A nd B hve multiple reltions with ech other. Tbles B, C, nd D re exmples of the proposed dt structures. Multiple vlues of Item A nd Item B tht re relted to ech other in Tble A re ssigned to the sme group in Tbles B nd C. The remining reltions between Item A nd Item B, those tht cnnot be stored in Tble B nor Tble C, re recorded in Tble D. In the exmple shown in Fig. 1, the totl size of the dt is reduced by bout 25%, reltive to the size with conventionl dt structures, by using the proposed dt structures. The reduction of totl dt size effects reduction in serch time. Conventionl dt structure Relted through the sme group Proposed dt structures Tble A Tble B Tble C Tble D Item A Item B Group Item A Group Item B Item A Item B b c d 1 501 1 X b X b 1000 c 1000 500 501 4 rows 1001 1501 Grouped Item A c Grouped Item B 1500 1500 2000 1001 d 2001 2000 3000 2001 Totl: 3004 rows 2000 rows 3000 1000 rows The number of rows of the proposed dt structures is pproximtely 25% lower thn tht of the conventionl dt structure in this exmple. 4000 rows Fig. 1 Principles of the proposed dt structures Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn

Moreover, in some cses, some tbles cn be omitted from the serch. For exmple, only Tble D needs to be serched when the vlues of Item B relted to vlue d of Item A re to be checked, becuse the vlue d of item A is not relted to ny group in Tble B. 3. Applicble Templte Fig. 2 shows nother templte for conventionl dt structure. Tble F corresponds to Tble A in Fig. 1. Attributes of Item A re recorded in Tble E, nd those of Item B in Tble G. As specific exmple, if Item A were merchndise, then Item B might be purchsers of the merchndise. The ttributes of Item B relted to some item A, whose ttributes hve certin vlues, cn be found from the dt structures in Fig. 2 by ppropritely using subqueries if necessry. Considering the sme dt structures s in Fig. 2, the number of rows in Tble F is much higher thn the numbers in Tble E nd Tble G. Becuse of this, the time to serch Tble F my be much longer thn the times to serch Tbles E nd G. Tble E Tble F Tble G Item Reltion Item B ID of A 1 * between A nd B * 1 ID of B ttribute 1 ID of A ttribute ttribute 2 ID of B ttribute b ttribute 3 ttribute 4 ttribute c ttribute d Fig. 2 Templte for conventionl dt structures Fig. 3 shows the proposed dt structures corresponding to the conventionl dt structures shown in Fig. 2. Tble F in Fig. 2 is replced with Tble H, Tble I, nd Tble J in Fig. 3. Vlues of Item A nd Item B relted ech other in Tble F re ssigned to the sme group in Tbles H nd I. The reltions between Item A nd Item B tht cnnot be recorded in Tble H or Tble I re recorded in Tble J. As mentioned in Section 2, the expected collective size of Tble H, Tble I, nd Tble J is smller thn tht of Tble F. As result, the serch time using the dt structures in Fig. 3 is expected to be less thn tht using the dt structures in Fig. 2. 4. Simultions of Time nd Size Reduction Experimentl results of time nd size reduction by using the proposed methods hve been previously reported, in reference [2]. In this pper, the theoreticl reduction rte of dt size, D, nd the reduction rte of expected serch time, T, re introduced. These rtes re bsed on the dt structures shown in Fig. 2 nd Fig. 3, clculted by compring the size of Tble H, Tble I, nd Tble J collectively with the size of Tble F. Moreover, it is ssumed tht the number of records in Tble H is equl to tht in Tble I. This is the worst-cse ssumption for the proposed method in terms of serch time. D nd T re given by Eq. (1) nd Eq. (2), respectively. D = 1 nn ii=1 MM ii +NN ii +SS nn = 1 XX ii=1 MM ii NN ii +SS 1+ (1) replcement for Tble F in Fig.2 Tble E Tble H Tble I Tble G Item A Group of A Group of B Item B ID of A 1 * Group * * Group * 1 ID of B ttribute 1 ID of A ID of B ttribute ttribute 2 Tble J ttribute b ttribute 3 1 * Reltion * 1 ttribute c ttribute 4 between A nd B ttribute d ID of A ID of B Fig. 3 Templte for proposed dt structures

Reduction of dt size, D (%) 100 80 60 X=0.067 40 X=0.2 20 X=0.67 0 0 0.5 1 1.5 2 Fig. 4 Simulted reduction of dt size Reduction of serch time, T (%) 100 80 60 X=0.067 40 X=0.2 20 X=0.67 0 0 0.5 1 1.5 2 Fig. 5 Simulted reduction of serch time T = D nn ii=1 MM ii NN ii nn + ii=1 MM ii NN ii +SS 1 nn ii=1 MMii +SS SS nn ii=1 MM ii NN ii +SS nn ii=1 MM ii NN ii +SS = 1 XX+ XX 2 (2) 1+) 2 where, XX = nn ii=1 (MMii +NN ii ) = nn ii=1 MM ii NN ii SS nn. ii=1 MM ii NN ii with the following symbols. i: group number n: the number of groups M i : the number of records of group i in Tble H N i : the number of records of group i in Tble I S: the number of records in Tble J Here, X is the rte of compression chieved by refctoring Tble F to Tbles H nd J. is the rtio of uncompressed dt to compressed dt in the proposed dt structures. D nd T re numericlly clculted by Eq. (1) nd Eq. (2), respectively, under the following conditions. (1) M i = N i = 30 for ll i. In this cse, X 0.067. (2) M i = N i = 10 for ll i. In this cse, X = 0.2. (3) M i = N i = 3 for ll i. In this cse, X 0.67. The clcultion results for D nd T re shown in Fig. 4 nd Fig. 5, respectively. Fig. 4 nd Fig. 5 show tht the proposed methods cn effectively reduce totl dt s size nd serch times. In cse (1) with = 0, both D nd T re pproximtely 93%. Even in cse (3) with = 2, D is pproximtely 11% nd T is pproximtely 18%. 5. Dt-updting Algorithm In this section, the lgorithm to updte dt for the proposed dt structures is described. Here, the dt structures in Fig. 3 re ssumed to hve the bsic structure shown, nd newly inputted dt re only reltions between the ID of item A nd the ID of item B. Though Tble H nd Tble I re logiclly equivlent in Fig 3., they cnnot be treted eqully in prcticl pplictions. For exmple, if Item A represents piece of merchndise nd Item B is the purchser, the dt size of Item B in Tble I my be much lrger thn tht of Item A in Tble H. Here, it is ssumed tht the dt in Tble I is bigger thn tht in Tble H. If the reltions between Item A nd Item B re supplemented then, new dt my need to be dded to Tble H or Tble I. However, it is possible to hndle new dt by updting only one of Tble H nd Tble I. The lgorithm described here hndles new dt, dding records to Tble I only. Dividing Tble J into two tbles, Tble J-1 nd Tble J-2, llows efficient updtes. Tble J-1 contins the reltions between the IDs of Item A nd Item B, which re not relted in ny groups of Tble H nd Tble I. In contrst, Tble J-2 contins the IDs of Items A nd B, which re relted through being in the sme group in Tble H or Tble I. By dividing Tble J into Tbles J-1 nd J-2, it becomes esy to exmine whether newly inputted record forms new reltion between group nd

Strt Input dt (ID of A nd ID of B) Serch Tble H for groups relted to the inputted ID of A no yes Record the inputted dt in Tble J-1 Do ny groups exist? Serch Tble J-2 for IDs of A relted to the inputted ID of B Record the inputted dt in Tble J-2 Do the inputted ID of A nd the serched IDs of A contin ll the IDs of A relted to the group? yes no End Record the reltion between the group nd the inputted ID of B in Tble I Delete the reltions between the IDs of B nd A relted to the group from Tble J-2 Fig. 6 Algorithm to updte dt with the proposed method the inputted ID of Item B by serching only Tble J-2. Fig. 6 shows the updte lgorithm for the proposed method. In this lgorithm, Tble H is serched first for the inputted ID of Item A, bsed on the ssumption tht the size of Tble H is smller thn tht of Tble I. If there is no group relted to the inputted ID of Item A, then the inputted dt re recorded on Tble J-1. When there exist some groups relted to the inputted ID of Item A, then Tble J-2 is serched for the IDs of Item A relted to the inputted ID of Item B. If ll the IDs of Item A relted to the group without the inputted ID of A re serched, then the reltion between the group nd the inputted ID of Item B re dded in Tble I, nd the reltions between the inputted ID of Item B nd ll the IDs of Item A relted to the group newly recorded on Tble I re deleted from Tble J-2. In this cse, the size of Tble J-2 nd the totl dt size re reduced. If some IDs of Item A relted to the group without the newly inputted ID of item A do not exist, then the inputted dt re recorded in Tble J-2. As described bove, Tble J-2 contins the IDs of items A nd B only s relted to the groups. Moreover, some records of Tble J-2 re sometimes deleted. As result, the dt size of Tble J-2, which will be serched in this lgorithm, is expected to remin reltively smll even fter newly inputting dt. It tkes more time for the updting thn on the conventionl methods, minly in the process of deleting dt from Tble J-2. One of the relxtion methods of this problem is to proceed the updting or only the deleting for multiply inputs t one time [3],[4], conveniently in idle times for serch tsk. 6. Conclusion New dt structures nd method of serching dtbses re proposed. In the proposed method, the dt size nd serch times cn be reduced. As well, dt normliztion nd integrity cn be perfectly mintined. The proposed methods re independent from indexing methods, so both methods cn be used simultneously. References [1] T. Kuwbr : JpnesePtentNo.6269884, Kngw University(ptentee) (2017.5.19 ppliction, 2018.1.12 registrtion) [2]T. Kuwbr : New Dt Structures to Reduce Serching Time on Dtbses: IEICEGenerl Conference 2018,D-4-7,p28(2018). [3]T. Kuwbr : PCT ppliction JP2018/018419, Kngw University (pplicnt), (2018.5) [4] T. Kuwbr : Jpnese Ptent ppliction No.2018-090308, Kngw University (pplicnt), (2018.5)