New data structures to reduce data size and search time

New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute of Electronics, Informtion nd Communiction Engineers, nd Informtion Processing Society of Jpn Permission number: 18TB0078

New dt structures to reduce dt size nd serch time Tsuneo Kuwbr 1. Introduction It is n importnt problem for dtbses to reduce serch time. Indexing methods hve been commonly used to reduce serch time. However, indexing methods do not reduce the dt size. Here, I propose dt structures nd serching methods to reduce both serch time nd dt size [1],[2],[3]. The proposed methods mintin dt normliztions nd integrity. The proposed methods re lso independent from indexing methods, so the two methods cn be used simultneously. In this pper, the principles of the proposed dt structures re described in Section 2. Some pplicble templtes nd the simultion results of reduction rtes re shown in Sections 3 nd 4, respectively. The updting methods, which re lso importnt processes, re shown in Section 5 [3], [4]. Section 6 gives the conclusions. 2. Principles The principles of the proposed dt structures re shown in Fig. 1. Tble A in Fig. 1 is n exmple of conventionl dt structure. Here, the vlues of Items A nd B hve multiple reltions with ech other. Tbles B, C, nd D re exmples of the proposed dt structures. Multiple vlues of Item A nd Item B tht re relted to ech other in Tble A re ssigned to the sme group in Tbles B nd C. The remining reltions between Item A nd Item B, those tht cnnot be stored in Tble B nor Tble C, re recorded in Tble D. In the exmple shown in Fig. 1, the totl size of the dt is reduced by bout 25%, reltive to the size with conventionl dt structures, by using the proposed dt structures. The reduction of totl dt size effects reduction in serch time. Conventionl dt structure Relted through the sme group Proposed dt structures Tble A Tble B Tble C Tble D Item A Item B Group Item A Group Item B Item A Item B b c d 1 501 1 X b X b 1000 c 1000 500 501 4 rows 1001 1501 Grouped Item A c Grouped Item B 1500 1500 2000 1001 d 2001 2000 3000 2001 Totl: 3004 rows 2000 rows 3000 1000 rows The number of rows of the proposed dt structures is pproximtely 25% lower thn tht of the conventionl dt structure in this exmple. 4000 rows Fig. 1 Principles of the proposed dt structures Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn

Moreover, in some cses, some tbles cn be omitted from the serch. For exmple, only Tble D needs to be serched when the vlues of Item B relted to vlue d of Item A re to be checked, becuse the vlue d of item A is not relted to ny group in Tble B. 3. Applicble Templte Fig. 2 shows nother templte for conventionl dt structure. Tble F corresponds to Tble A in Fig. 1. Attributes of Item A re recorded in Tble E, nd those of Item B in Tble G. As specific exmple, if Item A were merchndise, then Item B might be purchsers of the merchndise. The ttributes of Item B relted to some item A, whose ttributes hve certin vlues, cn be found from the dt structures in Fig. 2 by ppropritely using subqueries if necessry. Considering the sme dt structures s in Fig. 2, the number of rows in Tble F is much higher thn the numbers in Tble E nd Tble G. Becuse of this, the time to serch Tble F my be much longer thn the times to serch Tbles E nd G. Tble E Tble F Tble G Item Reltion Item B ID of A 1 * between A nd B * 1 ID of B ttribute 1 ID of A ttribute ttribute 2 ID of B ttribute b ttribute 3 ttribute 4 ttribute c ttribute d Fig. 2 Templte for conventionl dt structures Fig. 3 shows the proposed dt structures corresponding to the conventionl dt structures shown in Fig. 2. Tble F in Fig. 2 is replced with Tble H, Tble I, nd Tble J in Fig. 3. Vlues of Item A nd Item B relted ech other in Tble F re ssigned to the sme group in Tbles H nd I. The reltions between Item A nd Item B tht cnnot be recorded in Tble H or Tble I re recorded in Tble J. As mentioned in Section 2, the expected collective size of Tble H, Tble I, nd Tble J is smller thn tht of Tble F. As result, the serch time using the dt structures in Fig. 3 is expected to be less thn tht using the dt structures in Fig. 2. 4. Simultions of Time nd Size Reduction Experimentl results of time nd size reduction by using the proposed methods hve been previously reported, in reference [2]. In this pper, the theoreticl reduction rte of dt size, D, nd the reduction rte of expected serch time, T, re introduced. These rtes re bsed on the dt structures shown in Fig. 2 nd Fig. 3, clculted by compring the size of Tble H, Tble I, nd Tble J collectively with the size of Tble F. Moreover, it is ssumed tht the number of records in Tble H is equl to tht in Tble I. This is the worst-cse ssumption for the proposed method in terms of serch time. D nd T re given by Eq. (1) nd Eq. (2), respectively. D = 1 nn ii=1 MM ii +NN ii +SS nn = 1 XX ii=1 MM ii NN ii +SS 1+ (1) replcement for Tble F in Fig.2 Tble E Tble H Tble I Tble G Item A Group of A Group of B Item B ID of A 1 * Group * * Group * 1 ID of B ttribute 1 ID of A ID of B ttribute ttribute 2 Tble J ttribute b ttribute 3 1 * Reltion * 1 ttribute c ttribute 4 between A nd B ttribute d ID of A ID of B Fig. 3 Templte for proposed dt structures

Reduction of dt size, D (%) 100 80 60 X=0.067 40 X=0.2 20 X=0.67 0 0 0.5 1 1.5 2 Fig. 4 Simulted reduction of dt size Reduction of serch time, T (%) 100 80 60 X=0.067 40 X=0.2 20 X=0.67 0 0 0.5 1 1.5 2 Fig. 5 Simulted reduction of serch time T = D nn ii=1 MM ii NN ii nn + ii=1 MM ii NN ii +SS 1 nn ii=1 MMii +SS SS nn ii=1 MM ii NN ii +SS nn ii=1 MM ii NN ii +SS = 1 XX+ XX 2 (2) 1+) 2 where, XX = nn ii=1 (MMii +NN ii ) = nn ii=1 MM ii NN ii SS nn. ii=1 MM ii NN ii with the following symbols. i: group number n: the number of groups M i : the number of records of group i in Tble H N i : the number of records of group i in Tble I S: the number of records in Tble J Here, X is the rte of compression chieved by refctoring Tble F to Tbles H nd J. is the rtio of uncompressed dt to compressed dt in the proposed dt structures. D nd T re numericlly clculted by Eq. (1) nd Eq. (2), respectively, under the following conditions. (1) M i = N i = 30 for ll i. In this cse, X 0.067. (2) M i = N i = 10 for ll i. In this cse, X = 0.2. (3) M i = N i = 3 for ll i. In this cse, X 0.67. The clcultion results for D nd T re shown in Fig. 4 nd Fig. 5, respectively. Fig. 4 nd Fig. 5 show tht the proposed methods cn effectively reduce totl dt s size nd serch times. In cse (1) with = 0, both D nd T re pproximtely 93%. Even in cse (3) with = 2, D is pproximtely 11% nd T is pproximtely 18%. 5. Dt-updting Algorithm In this section, the lgorithm to updte dt for the proposed dt structures is described. Here, the dt structures in Fig. 3 re ssumed to hve the bsic structure shown, nd newly inputted dt re only reltions between the ID of item A nd the ID of item B. Though Tble H nd Tble I re logiclly equivlent in Fig 3., they cnnot be treted eqully in prcticl pplictions. For exmple, if Item A represents piece of merchndise nd Item B is the purchser, the dt size of Item B in Tble I my be much lrger thn tht of Item A in Tble H. Here, it is ssumed tht the dt in Tble I is bigger thn tht in Tble H. If the reltions between Item A nd Item B re supplemented then, new dt my need to be dded to Tble H or Tble I. However, it is possible to hndle new dt by updting only one of Tble H nd Tble I. The lgorithm described here hndles new dt, dding records to Tble I only. Dividing Tble J into two tbles, Tble J-1 nd Tble J-2, llows efficient updtes. Tble J-1 contins the reltions between the IDs of Item A nd Item B, which re not relted in ny groups of Tble H nd Tble I. In contrst, Tble J-2 contins the IDs of Items A nd B, which re relted through being in the sme group in Tble H or Tble I. By dividing Tble J into Tbles J-1 nd J-2, it becomes esy to exmine whether newly inputted record forms new reltion between group nd

Strt Input dt (ID of A nd ID of B) Serch Tble H for groups relted to the inputted ID of A no yes Record the inputted dt in Tble J-1 Do ny groups exist? Serch Tble J-2 for IDs of A relted to the inputted ID of B Record the inputted dt in Tble J-2 Do the inputted ID of A nd the serched IDs of A contin ll the IDs of A relted to the group? yes no End Record the reltion between the group nd the inputted ID of B in Tble I Delete the reltions between the IDs of B nd A relted to the group from Tble J-2 Fig. 6 Algorithm to updte dt with the proposed method the inputted ID of Item B by serching only Tble J-2. Fig. 6 shows the updte lgorithm for the proposed method. In this lgorithm, Tble H is serched first for the inputted ID of Item A, bsed on the ssumption tht the size of Tble H is smller thn tht of Tble I. If there is no group relted to the inputted ID of Item A, then the inputted dt re recorded on Tble J-1. When there exist some groups relted to the inputted ID of Item A, then Tble J-2 is serched for the IDs of Item A relted to the inputted ID of Item B. If ll the IDs of Item A relted to the group without the inputted ID of A re serched, then the reltion between the group nd the inputted ID of Item B re dded in Tble I, nd the reltions between the inputted ID of Item B nd ll the IDs of Item A relted to the group newly recorded on Tble I re deleted from Tble J-2. In this cse, the size of Tble J-2 nd the totl dt size re reduced. If some IDs of Item A relted to the group without the newly inputted ID of item A do not exist, then the inputted dt re recorded in Tble J-2. As described bove, Tble J-2 contins the IDs of items A nd B only s relted to the groups. Moreover, some records of Tble J-2 re sometimes deleted. As result, the dt size of Tble J-2, which will be serched in this lgorithm, is expected to remin reltively smll even fter newly inputting dt. It tkes more time for the updting thn on the conventionl methods, minly in the process of deleting dt from Tble J-2. One of the relxtion methods of this problem is to proceed the updting or only the deleting for multiply inputs t one time [3],[4], conveniently in idle times for serch tsk. 6. Conclusion New dt structures nd method of serching dtbses re proposed. In the proposed method, the dt size nd serch times cn be reduced. As well, dt normliztion nd integrity cn be perfectly mintined. The proposed methods re independent from indexing methods, so both methods cn be used simultneously. References [1] T. Kuwbr : JpnesePtentNo.6269884, Kngw University(ptentee) (2017.5.19 ppliction, 2018.1.12 registrtion) [2]T. Kuwbr : New Dt Structures to Reduce Serching Time on Dtbses: IEICEGenerl Conference 2018,D-4-7,p28(2018). [3]T. Kuwbr : PCT ppliction JP2018/018419, Kngw University (pplicnt), (2018.5) [4] T. Kuwbr : Jpnese Ptent ppliction No.2018-090308, Kngw University (pplicnt), (2018.5)