Mining in Hepatitis Data by LISp-Miner and SumatraTT

Size: px
Start display at page:

Download "Mining in Hepatitis Data by LISp-Miner and SumatraTT"

Transcription

1 Mining in Hepatitis Data by LISp-Miner and SumatraTT Petr Aubrecht 1, Martin Kejkula 2, Petr Křemen 1, Lenka Nováková 1, Jan Rauch 2, Milan Šimůnek2, Olga Štěpánková1, and Monika Žáková1 1 Czech Technical University in Prague, FEE, Prague 6, Czech Republic, {aubrech,kremep1,novakova,step}@felcvutcz 2 University of Economics, Prague, W Churchill Sq 4, Praha 3, Czech Republic {kejkula,rauch,simunek}@vsecz Abstract The paper suggests a methodology for search of temporal patterns, which is tested on the problem of difference between hepatitis B and C To reach this goal two software systems LISp-Miner and SumatraTT are combined sophisticated data transformations and enhancements are designed and ensured through SumatraTT while LISp- Miner takes care for the search of significant interesting differences in the resulting datasets The main obtained results are reviewed in the section 3 there are identified some examinations the values of which significantly differ for both types of attributes This proves that the suggested general methodology has promising potential when applied to the considered type of data A plan for additional data-mining questions to be studied later is presented in the Conclusions 1 Introduction This paper presents results of mining in the hepatitis data set that was offered as a part of Discovery Challenge at PKDD 2005 We try to contribute to discovering differences in temporal patterns between hepatitis B and C Similar question has been analyzed by [2]; while they are using temporal abstraction, we introduce trend characteristics (see section 22) which are calculated during data preprocessing ensured by SumatraTT and we apply LISp-Miner to find relevant association rules (see section 3) SumatraTT 3 [8] is a modular system for data preprocessing and data transformations It offers a number of easy-to-use reusable modules for loading or exporting data from/to different formats, for analysis of individual attributes (elementary statistics, contingency tables, etc) and for definition of additional derived attributes through sophisticated processing (scripting, integration with SQL databases etc) The last property represents one of the main advantages of SumatraTT for the considered data-mining task: to characterize the temporal 3 homepage:

2 patterns in the measured attributes of patients, it proved necessary to introduce number of new derived attributes they have been obtained through transformations and aggregation ensured by SumatraTT The individual modules can be combined into a project through an intuitive graphical interface which creates automatically a detailed documentation as a by-product In this way, the SumatraTT project becomes an efficient communication platform for our team during the work on any data-mining task LISp-Miner is an academic software system intended to support teaching and research It consists of six data mining procedures, machine learning procedure KEX and two procedures for data transformation [5, 7] see also http: //lispminervsecz All six data mining procedures of LISp-Miner system are GUHA procedures in the sense of [1] The input of GUHA procedure consists of the analyzed data and of a simple definition of relevant (ie potentially interesting) patterns GUHA procedure automatically generates each particular pattern and tests if it is true in the analyzed data The output of the procedure consists of all prime patterns The pattern is prime if it is true in the analyzed data and if it does not immediately follow from other simpler output patterns [1] Each GUHA procedure of the LISp-Miner system mines for a particular type of patterns Its most frequently used procedure is 4ft-Miner, which mines for enhanced association rules [5] In this paper, we use the procedure SD4ft-Miner [6] The paper is organized as follows Applications of the SumatraTT to derive several data matrices suitable for further analysis are described in section 2 The procedure SD4ft-Miner mines for SD4ft-patterns that are introduced in section 3 together with results of several applications of the SD4ft-Miner Some concluding remarks are in section 5 2 Transforming Data by SumatraTT 21 Data Understanding Review of Important Properties The hepatitis source data is in a form of CSV files with a good documentation Data was primarily loaded from the text files (ilab e030704csv etc) into SQL database We prepared several steps of data preprocessing First, the data from both tables ilab and olab (internal and external exams) was merged and further considered together The merged table contains 1060 different types of exams (mainly due to disparate exam types in olab with 845 diff exams) To reduce this excessive number we have first decided to omit rare exam types and take into account only exam types with more than occurrences there are 41 exams of this sort Moreover, 81 important exam types were considered upon specialist s recommendation In this way, we have ended up with 105 exam types During the data preprocessing we have identified some important properties of the considered dataset which were not explicitly mentioned in the former articles dedicated to this dataset After reviewing several summary numbers on the next page, we decided to study the group with exactly one biopsy and without interferon therapy

3 Patients Description Patients Description 771 all 99 > 1 biopsy (all have type C) 503 with a biopsy 123 = 1 biopsy and interferon 74 > 1 biopsy and interferon biopsy, no interferon 1 with interferon, no biopsy exam before the first biopsy 281 = 1 biopsy, no interferon exam in hemat table before 3 1 biopsy, no other exam the first biopsy (mid=808, 500, 202) 22 Temporal Characteristics of the Considered Attributes The patients are not examined regularly: a period between two examinations can range form one day to several months Some periods, when a patient is observed frequently, alternate with more restful periods The highest number of all exams for one patient is (patient # 321) This irregularity has to be taken into account when choosing the characteristics for description of the temporal properties of the measured values In order to standardize information provided about individual patients, we have decided to concentrate on data collected during a specific well defined time interval of a fixed length τ for each individual patient The considered interval does not start on a single date for all the patients On the contrary, it is tightly bound to the state of the individual patient: the considered interval ends (or begins) in a significant instant, which can be easily recognized in the available data of the patient, eg the time of his/her first biopsy, on the time when a specific treatment was introduced (eg interferon) The length of the time interval is set constant for all patients and it is understood to be a parameter τ of the considered project The number of measurements for one patient during one year ranges usually between 2-10 To make up for this non-uniformity we have decided to use the following trend characteristics of the considered sequence of time-stamped data: average, number of measurements, gradient (resulting from linear approximation), maximum, minimum, and variance For purposes of further data mining the results were saved as a matrix with patients in rows and trend characteristics in columns The rest of the paper tries to prove that this type of derived attributes can depict interesting dependencies in the considered data For that purpose we have to fix the significant instant and τ In the rest of the paper, the significant instant is set to the time of the first biopsy Moreover, we do not include in the studied dataset patients treated by interferon Only those patients with measurements during τ period (see further) before the first biopsy were selected In the next step, all the exams were filtered according to the following requirements: the exam must provide numeric value (omitting values like +, > 3 etc) and the type of exam must be measured at least 10 times for each considered patient The size of the resulting set is mentioned in section 23 Data selected in the previous steps was then analyzed as sequences and there were calculated upper mentioned trend characteristics Finally, data about patients was added (sex, age, type of hepatitis, maximum fibrosis and activity)

4 The data preprocessing resulted in a data matrix, rows of which correspond to particular patients identified by MID Columns of this data matrix contain various trend characteristics of considered examinations for the corresponding patients (eg ALB avg is an average of the values of ALB exam results) 23 Enhanced Datasets The following data matrices describing behavior of various characteristics before the first biopsy were prepared for data mining: TRENDS BIO 24 relates to patients who have history of exams at least τ = 24 months long, TRENDS BIO 12 to patients with exam history of τ = 12 months (investigated in detail further in the article), and TRENDS BIO 3 to patients with exam history of τ = 3 months The size of resulting datasets is increasing (53, 85, and 171) For pilot experiments the dataset corresponding to 12 months was chosen 3 SD4ft-patterns We use data matrix TRENDS BIO 12 shown in fig 1 to introduce the SD4ftpatterns Each row of TRENDS BIO 12 corresponds to one patient identified row Basic attributes In-hospital examinations number MID Sex Age Type Fibrosis Activity CL avg CL grad 1 1 M 29 B 2 2 X X 2 42 M 33 C E M 58 C 1 FALSE X X Fig 1 Data matrix TRENDS BIO 12 by the value of column MID Values of the column Sex come from the table pt e030704csv Column Age contains the age of the patient in the time of the first biopsy (bio e030704csv and pt e030704csv are used) Columns Type (ie hepatitis type), Fibrosis and Activity come from bio e030704csv and they indicate values at the time of the first biopsy The value X in the column CL avg in the row 1 means that the value of the of CL (ie chloride, see ilab e030704csv) for the patient with MID = 1 was not measured The value -55E-7 in the column CL grad in the row 2 is the value of the gradient of the linear approximation of the time series of the examinations of CL taken during the 12 months before the first biopsy for the patient with MID = 42 Analogously for further patients and columns The data matrix TRENDS BIO 12 has 224 columns with gradient, average, etc values of specific examinations The procedure SD4ft-Miner mines for SD4ft-patterns of the form α β : ϕ ψ / γ

5 M/(α γ) ψ ψ ϕ a α γ b α γ ϕ c α γ d α γ 4ft(ϕ, ψ, M/(α γ)) M/(β γ) ψ ψ ϕ a β γ b β γ ϕ c β γ d β γ 4ft(ϕ, ψ, M/(β γ)) Fig 2 4ft-tables 4ft(ϕ, ψ, M/(α γ)) and 4ft(ϕ, ψ, M/(β γ)) Here α, β, γ, ϕ, and ψ are Boolean attributes defined from the columns of analyzed data matrix M The SD4ft-pattern α β : ϕ ψ/γ means that the subsets of patients meeting the Boolean conditions α and β differ in what concerns the validity of association rule ϕ ψ when the condition given by Boolean attribute γ is satisfied A measure of difference is defined by the symbol that is called SD4ft-quantifier The association rule ϕ ψ means here a general relation of Boolean attributes ϕ and ψ in the sense of [5] An example of the SD4ft-pattern is the pattern Type(B) Type(C) : LDH grad( 0) D 04 GOT grad( 0) / Age(30 69) It means that the patients with hepatitis B differ from the patients with hepatitis C what concerns relation of Boolean attributes LDH grad( 0) (ie the value of LDH grad is 0) and GOT grad( 0) when we consider patients of the age years The difference is given by the SD4ft-quantifier D 04 We introduce it using general notation α, β, γ, ϕ, and ψ The SD4ft-quantifier concerns two four-fold contingency tables (ie 4ft-tables) 4ft(ϕ, ψ, M/(α γ)) and 4ft(ϕ, ψ, M/(β γ)), see fig 2 The 4ft-table 4ft(ϕ, ψ, M/(α γ)) of ϕ and ψ on M/(α γ) is the contingency table of ϕ and ψ on M/(α γ) The data matrix M/(α γ) is a data submatrix of M that consists of exactly all rows of M satisfying α γ It means that M/(α γ) corresponds to all objects (ie rows) from the set defined by α that satisfy the condition γ It is 4ft(ϕ, ψ, M/(α γ)) = a α γ, b α γ, c α γ, d α γ where a α γ is the number of rows of data matrix M/(α γ) satisfying both ϕ and ψ, etc The 4ft-table 4ft(ϕ, ψ, M/(β γ)) of ϕ and ψ on M/(β γ) is defined analogously The SD4ft-quantifier D 04 is defined by the condition a α γ a β γ 04 a α γ + b α γ a β γ + b β γ This condition means that the difference between the confidence of the classical association rule ϕ ψ on data matrix M/(α γ)) and the confidence of this association rule on data matrix M/(β γ)) is at least 04 The SD4ft-pattern α β : ϕ D 04 ψ / γ is true on data matrix M if the condition a β γ a β γ +b β γ a α γ a α γ +b α γ 04 is satisfied The example SD-4ft pattern is verified using the 4ft-tables T B and T C see Fig 3 Let us note that the sum of all frequencies from 4ft-tables T B and T C is

6 TRENDS BIO 12 / (Type(B) Age(30-69)) GOT grad( 0) GOT grad( 0) LDH grad( 0) 11 0 LDH grad( 0) 6 5 T B = 4ft(LDH grad( 0),GOT grad( 0), TRENDS BIO 12/(Type(B) Age(30-69)) TRENDS BIO 12 / (Type(C) Age(30-69)) GOT grad( 0) GOT grad( 0) LDH grad( 0) LDH grad( 0) 0 4 T C = 4ft(LDH grad( 0),GOT grad( 0), TRENDS BIO 12/(Type(C) Age(30-69)) Fig 3 4ft-tables T B and T C smaller than 60 because of omitting missing values X It is easy to verify that the the condition corresponding to the SD4ft quantifier D 04 is satisfied We can conclude that the SD4ft pattern Type(B) Type(C) : LDH grad( 0) D 04 GOT grad( 0) / Age(30 69) is true on the data matrix TRENDS BIO 12 Very informally speaking we can interpret this SD4ft pattern as The confidence of association rule (not negative gradient LDH) (not negative gradient GOT) is 04 greater for type B than for type C when we consider the patients years old 4 SD4ft-Miner Application Results We solved three different tasks In the first task we searched for very simple SD4ft-patterns (without condition γ) Type(B) Type(C) : T RUE ψ where T RU E is a specially prepared basic Boolean attribute that is identically true and is a suitable SD4ft-quantifier (see below) Remark that the confidence of the association rule T RUE ψ is equal to the relative frequency of rows of analyzed data matrix satisfying ψ It means that we can use the SD4ft quantifier D 015 a α 10 a β 10 that says that the difference of relative frequencies is at least 015 and that there are at least 10 patients with type B hepatitis satisfying ψ and also at least 10 patients with type C hepatitis We use the set of relevant SD4ft-patterns Type(B) Type(C) : T RUE ψ such that the succedents are 903 intervals of averages of 22 in-hospital examinations, namely CL, D-BIL, F-CHO, FE, G-GL, G-GTP, GOT, GPT, HBE-AB,

7 HBE-AG, CHE, I-BIL, K, LDH, NA, Oudan, T-BIL, T-CHO, TG, TP, U-UBG, UN The amount of 903 intervals is defined by few parameters such that the resulting intervals are of reasonable size This amount was generated and verified in 1 sec (PC with 306 GHz, 512 MB DDR SDRAM) Due to various optimizations only 308 verifications was really done The result is 18 true SD4ft-patterns concerning 8 attributes One strongest pattern for each of these attributes is shown in table?? Remark that there are 27 patients with the hepatitis type B and 33 patients with the hepatitis type C frequency type B frequency type C literal relative R B absolute relative R C absolute R B R C TP avg( 7) CHE avg(100; F CHO avg(45; CL avg(102; I BIL avg(03; UN avg(12; T BIL avg(06; G GTP avg(20; Table 1 Differences of relative frequencies The difference of relative frequencies can be understood as a difference of confidences of association rules T RUE ψ for types B and C Thus it is reasonable to ask if there is a stronger difference than 04 for confidences of association rules ϕ ψ where both ϕ and ψ are similar literals as ψ in the previous section Thus we searched for SD4ft-patterns (without condition γ) of the form Type(B) Type(C) : ϕ ψ where is the SD4ft-quantifier defined as D 04 a α 10 a β 10 This quantifier says among other that the difference of confidences is at least 04 We defined the set of more than of relevant SD4ft-patterns Due to various optimizations only was generated and verified in about 2 seconds, see also [5] There are 27 SD4ft patterns satisfying given condition, all of them have the attribute TP avg in the succedent Thus we show only the three strongest ones and also further three ones not containing the attribute TP avg and found by an another run of the SD4ft-Miner procedure, see table?? We tried also to find some conditions under which is the difference of confidences even stronger We searched for SD4ft-patterns of the form more α β : ϕ ψ / γ where where the condition γ was created from Sex, Age, Fibrosis and Activity About relevant patterns was verified The amount of 76 true

8 type B type C rule Conf B support Conf C support Conf B a B % a C % - Conf C CHE avg(100; 300 TP avg(5; CL avg(102; 106 TP avg(65; K avg 4; 44) TP avg(6; Further rules with TP avg in succedent skipped LDH grad 0; 05) GPT grad 0; 05) F-CHO grad 0; 05) GOT grad 0; 05) CL avg(103; 107 I BIL avg 03; 06) Table 2 Differences concerning pairs of examinations type B type C rule Conf B support Conf C support Conf B a B % a C % Conf C Condition: Age 40; Type B: 8 patients; Type C: 26 patients CHE avg(0; 300 TP avg(50; Condition: Age 35; Type B: 15 patients; Type C: 28 patients K avg 4; 44) TP avg(60; Oudan avg(4; 6 T BIL avg(05; Oudan avg(4; 6 TP avg(05; Condition: Fibrosis(1,2); Type B: 18 patients; Type C: 21 patients LDH grad 0; 05) GPT grad 0; 05) F-CHO grad 0; 05) GPT grad 0; 05) D BIL avg 01; 03) TP avg(50; Table 3 Differences concerning pairs of examinations under conditions SD4ft-patterns with condition were found in 2 minutes and 23 seconds Some examples of strongest and interesting ones are in table?? 5 Conclusions and Further Work We have succeeded to find several patterns that indicate existence of differences in trend characteristics for hepatitis type B and type C The process is far from straightforward First, it was necessary to transform original data into suitable data matrix using SumatraTT and then the SD4ft-Miner procedure has been applied several times There seem to appear some strong rules but interpretation of the obtained results given in tables 1, 2, and 3 is impossible without relevant medical knowledge It will be very interesting to compare our results with those in [2] there are several attributes which have been identified as important by both approaches, namely T/BIL, CHE, GOT, GPT and TP Anyway, the considered set of 60 patients is too small the applied restrictions leading to

9 creation of the considered data matrix do not take optimal advantage of all the available data All the steps of our approach are easy to repeat and modify There are lot of possibilities how to do so Based on the experience with the present data and results of current data mining efforts, we are planning to modify selection criteria for the used preprocessing We believe that the suggested methodology based on selection of a time window related to some significant instant could prove useful when studying influence of the interferon therapy Further analysis should work with a new enhanced data set in which two significant instants are considered: one corresponds to the beginning of the interferon therapy, while the other is set several months after that This setting makes it possible to study changes in time patterns due to the therapy Moreover, measurements from the table hemat will be included Results from this new data are under investigation now The project showed, that a cooperation of the both tools, SumatraTT and LISp-Miner, is effective and allows fast data preprocessing and data mining cycle The whole process can be easily modified and reused for different data mining tasks (eg influence of interferon) and even to different datasets Acknowledgements The work described here has been supported by the grant 201/05/0325 of the Czech Science Foundation and the research program No MSM Transdisciplinary Research in Biomedical Engineering II of the CTU in Prague References 1 Hájek, P, Havránek, T: Mechanizing Hypothesis Formation (Mathematical Foundations for a General Theory), Springer Verlag Ho TB et al: Combining temporal abstraction and data mining to study hepatitis In Proceedings of the Discovery Chalenge 2004 A Collaborative Effort in Knowledge Discovery from Databases Prague: University of Economics, Kléma, J - Nováková, L - Karel, F - Štěpánková, O: Trend Analysis in Stulong Data In Proceedings of the Discovery Chalenge 2004 A Collaborative Effort in Knowledge Discovery from Databases Prague: University of Economics, 2004, pp Rauch J, Šimůnek M (2000): Mining for 4ft Association Rules In: Arikawa S, Morishita (eds) Discovery Science, Springer Verlag, pp Rauch J, Šimůnek M (2005) An Alternative Approach to Mining Association Rules In: Lin T Y, Ohsuga S, Liau C J, and Tsumoto S (eds) Data Mining: Foundations, Methods, and Applications, Springer-Verlag, 2005, pp (to appear) 6 Rauch J, Šimůnek M (2005) GUHA Method and Granular Computing In: HU, Xiaohua, LIU, Qing, SKOWRON, Andrzej, LIN, Tsau Young, YAGER, Ronald R, ZANG, Bo (ed) Proceedings of IEEE International Conference on Granular Computing IEEE, 2005, pp Šimůnek M (2003) Academic KDD Project LISp-Miner In Abraham A et al (eds) Advances in Soft Computing Intelligent Systems Design and Applications, Springer, Berlin Heidelberg New York

10 8 Štěpánková O, Aubrecht P, Kouba Z, Mikšovský P Preprocessing for Data Mining and Decision Support, pp Kluwer Academic Publishers, Dordrecht, 2003

Alternative Approach to Mining Association Rules

Alternative Approach to Mining Association Rules Alternative Approach to Mining Association Rules Jan Rauch 1, Milan Šimůnek 1 2 1 Faculty of Informatics and Statistics, University of Economics Prague, Czech Republic 2 Institute of Computer Sciences,

More information

Investigating Measures of Association by Graphs and Tables of Critical Frequencies

Investigating Measures of Association by Graphs and Tables of Critical Frequencies Investigating Measures of Association by Graphs Investigating and Tables Measures of Critical of Association Frequencies by Graphs and Tables of Critical Frequencies Martin Ralbovský, Jan Rauch University

More information

Applying Domain Knowledge in Association Rules Mining Process First Experience

Applying Domain Knowledge in Association Rules Mining Process First Experience Applying Domain Knowledge in Association Rules Mining Process First Experience Jan Rauch, Milan Šimůnek Faculty of Informatics and Statistics, University of Economics, Prague nám W. Churchilla 4, 130 67

More information

USING THE AC4FT-MINER PROCEDURE IN THE MEDICAL DOMAIN. Viktor Nekvapil

USING THE AC4FT-MINER PROCEDURE IN THE MEDICAL DOMAIN. Viktor Nekvapil USING THE AC4FT-MINER PROCEDURE IN THE MEDICAL DOMAIN Viktor Nekvapil About the author 2 VŠE: IT 4IQ, 2. semestr Bachelor thesis: Using the Ac4ft-Miner procedure in the medical domain Supervisor: doc.

More information

The GUHA method and its meaning for data mining. Petr Hájek, Martin Holeňa, Jan Rauch

The GUHA method and its meaning for data mining. Petr Hájek, Martin Holeňa, Jan Rauch The GUHA method and its meaning for data mining Petr Hájek, Martin Holeňa, Jan Rauch 1 Introduction. GUHA: a method of exploratory data analysis developed in Prague since mid-sixties of the past century.

More information

A Logical Formulation of the Granular Data Model

A Logical Formulation of the Granular Data Model 2008 IEEE International Conference on Data Mining Workshops A Logical Formulation of the Granular Data Model Tuan-Fang Fan Department of Computer Science and Information Engineering National Penghu University

More information

Machine Learning and Association rules. Petr Berka, Jan Rauch University of Economics, Prague {berka

Machine Learning and Association rules. Petr Berka, Jan Rauch University of Economics, Prague {berka Machine Learning and Association rules Petr Berka, Jan Rauch University of Economics, Prague {berka rauch}@vse.cz Tutorial Outline Statistics, machine learning and data mining basic concepts, similarities

More information

High Frequency Rough Set Model based on Database Systems

High Frequency Rough Set Model based on Database Systems High Frequency Rough Set Model based on Database Systems Kartik Vaithyanathan kvaithya@gmail.com T.Y.Lin Department of Computer Science San Jose State University San Jose, CA 94403, USA tylin@cs.sjsu.edu

More information

Classification of Voice Signals through Mining Unique Episodes in Temporal Information Systems: A Rough Set Approach

Classification of Voice Signals through Mining Unique Episodes in Temporal Information Systems: A Rough Set Approach Classification of Voice Signals through Mining Unique Episodes in Temporal Information Systems: A Rough Set Approach Krzysztof Pancerz, Wies law Paja, Mariusz Wrzesień, and Jan Warcho l 1 University of

More information

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University Grudzi adzka 5, 87-100 Toruń, Poland

More information

Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D.

Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Sánchez 18th September 2013 Detecting Anom and Exc Behaviour on

More information

SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA

SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA D. Pokrajac Center for Information Science and Technology Temple University Philadelphia, Pennsylvania A. Lazarevic Computer

More information

Feature Selection by Reordering *

Feature Selection by Reordering * Feature Selection by Reordering * Marcel Jirina and Marcel Jirina jr. 2 Institute of Computer Science, Pod vodarenskou vezi 2, 82 07 Prague 8 Liben, Czech Republic marcel@cs.cas.cz 2 Center of Applied

More information

NOMINAL VARIABLE CLUSTERING AND ITS EVALUATION

NOMINAL VARIABLE CLUSTERING AND ITS EVALUATION NOMINAL VARIABLE CLUSTERING AND ITS EVALUATION Hana Řezanková Abstract The paper evaluates clustering of nominal variables using different similarity measures. The created clusters can serve for dimensionality

More information

HYBRID FLOW-SHOP WITH ADJUSTMENT

HYBRID FLOW-SHOP WITH ADJUSTMENT K Y BERNETIKA VOLUM E 47 ( 2011), NUMBER 1, P AGES 50 59 HYBRID FLOW-SHOP WITH ADJUSTMENT Jan Pelikán The subject of this paper is a flow-shop based on a case study aimed at the optimisation of ordering

More information

HOW TO WRITE PROOFS. Dr. Min Ru, University of Houston

HOW TO WRITE PROOFS. Dr. Min Ru, University of Houston HOW TO WRITE PROOFS Dr. Min Ru, University of Houston One of the most difficult things you will attempt in this course is to write proofs. A proof is to give a legal (logical) argument or justification

More information

On Improving the k-means Algorithm to Classify Unclassified Patterns

On Improving the k-means Algorithm to Classify Unclassified Patterns On Improving the k-means Algorithm to Classify Unclassified Patterns Mohamed M. Rizk 1, Safar Mohamed Safar Alghamdi 2 1 Mathematics & Statistics Department, Faculty of Science, Taif University, Taif,

More information

Multi-Plant Photovoltaic Energy Forecasting Challenge with Regression Tree Ensembles and Hourly Average Forecasts

Multi-Plant Photovoltaic Energy Forecasting Challenge with Regression Tree Ensembles and Hourly Average Forecasts Multi-Plant Photovoltaic Energy Forecasting Challenge with Regression Tree Ensembles and Hourly Average Forecasts Kathrin Bujna 1 and Martin Wistuba 2 1 Paderborn University 2 IBM Research Ireland Abstract.

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information

Ensembles of classifiers based on approximate reducts

Ensembles of classifiers based on approximate reducts Fundamenta Informaticae 34 (2014) 1 10 1 IOS Press Ensembles of classifiers based on approximate reducts Jakub Wróblewski Polish-Japanese Institute of Information Technology and Institute of Mathematics,

More information

Analysis of Evolutionary Trends in Astronomical Literature using a Knowledge-Discovery System: Tétralogie

Analysis of Evolutionary Trends in Astronomical Literature using a Knowledge-Discovery System: Tétralogie Library and Information Services in Astronomy III ASP Conference Series, Vol. 153, 1998 U. Grothkopf, H. Andernach, S. Stevens-Rayburn, and M. Gomez (eds.) Analysis of Evolutionary Trends in Astronomical

More information

Classification Based on Logical Concept Analysis

Classification Based on Logical Concept Analysis Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract.

More information

Principal Component Analysis, A Powerful Scoring Technique

Principal Component Analysis, A Powerful Scoring Technique Principal Component Analysis, A Powerful Scoring Technique George C. J. Fernandez, University of Nevada - Reno, Reno NV 89557 ABSTRACT Data mining is a collection of analytical techniques to uncover new

More information

COMPUTER SCIENCE TEMPORAL LOGICS NEED THEIR CLOCKS

COMPUTER SCIENCE TEMPORAL LOGICS NEED THEIR CLOCKS Bulletin of the Section of Logic Volume 18/4 (1989), pp. 153 160 reedition 2006 [original edition, pp. 153 160] Ildikó Sain COMPUTER SCIENCE TEMPORAL LOGICS NEED THEIR CLOCKS In this paper we solve some

More information

Predictive Analytics on Accident Data Using Rule Based and Discriminative Classifiers

Predictive Analytics on Accident Data Using Rule Based and Discriminative Classifiers Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 3 (2017) pp. 461-469 Research India Publications http://www.ripublication.com Predictive Analytics on Accident Data Using

More information

Minimal Attribute Space Bias for Attribute Reduction

Minimal Attribute Space Bias for Attribute Reduction Minimal Attribute Space Bias for Attribute Reduction Fan Min, Xianghui Du, Hang Qiu, and Qihe Liu School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu

More information

Analysis of 2x2 Cross-Over Designs using T-Tests

Analysis of 2x2 Cross-Over Designs using T-Tests Chapter 234 Analysis of 2x2 Cross-Over Designs using T-Tests Introduction This procedure analyzes data from a two-treatment, two-period (2x2) cross-over design. The response is assumed to be a continuous

More information

ENVIRONMENTAL DATA ANALYSIS WILLIAM MENKE JOSHUA MENKE WITH MATLAB COPYRIGHT 2011 BY ELSEVIER, INC. ALL RIGHTS RESERVED.

ENVIRONMENTAL DATA ANALYSIS WILLIAM MENKE JOSHUA MENKE WITH MATLAB COPYRIGHT 2011 BY ELSEVIER, INC. ALL RIGHTS RESERVED. ENVIRONMENTAL DATA ANALYSIS WITH MATLAB WILLIAM MENKE PROFESSOR OF EARTH AND ENVIRONMENTAL SCIENCE COLUMBIA UNIVERSITY JOSHUA MENKE SOFTWARE ENGINEER JOM ASSOCIATES COPYRIGHT 2011 BY ELSEVIER, INC. ALL

More information

1 Algebraic Methods. 1.1 Gröbner Bases Applied to SAT

1 Algebraic Methods. 1.1 Gröbner Bases Applied to SAT 1 Algebraic Methods In an algebraic system Boolean constraints are expressed as a system of algebraic equations or inequalities which has a solution if and only if the constraints are satisfiable. Equations

More information

Knowledge Discovery Based Query Answering in Hierarchical Information Systems

Knowledge Discovery Based Query Answering in Hierarchical Information Systems Knowledge Discovery Based Query Answering in Hierarchical Information Systems Zbigniew W. Raś 1,2, Agnieszka Dardzińska 3, and Osman Gürdal 4 1 Univ. of North Carolina, Dept. of Comp. Sci., Charlotte,

More information

A Simple Implementation of the Stochastic Discrimination for Pattern Recognition

A Simple Implementation of the Stochastic Discrimination for Pattern Recognition A Simple Implementation of the Stochastic Discrimination for Pattern Recognition Dechang Chen 1 and Xiuzhen Cheng 2 1 University of Wisconsin Green Bay, Green Bay, WI 54311, USA chend@uwgb.edu 2 University

More information

Research on Complete Algorithms for Minimal Attribute Reduction

Research on Complete Algorithms for Minimal Attribute Reduction Research on Complete Algorithms for Minimal Attribute Reduction Jie Zhou, Duoqian Miao, Qinrong Feng, and Lijun Sun Department of Computer Science and Technology, Tongji University Shanghai, P.R. China,

More information

Machine Learning for Disease Progression

Machine Learning for Disease Progression Machine Learning for Disease Progression Yong Deng Department of Materials Science & Engineering yongdeng@stanford.edu Xuxin Huang Department of Applied Physics xxhuang@stanford.edu Guanyang Wang Department

More information

(S1) (S2) = =8.16

(S1) (S2) = =8.16 Formulae for Attributes As described in the manuscript, the first step of our method to create a predictive model of material properties is to compute attributes based on the composition of materials.

More information

Regrese a predikce pomocí fuzzy asociačních pravidel

Regrese a predikce pomocí fuzzy asociačních pravidel Regrese a predikce pomocí fuzzy asociačních pravidel Pavel Rusnok Institute for Research and Applications of Fuzzy Modeling University of Ostrava Ostrava, Czech Republic pavel.rusnok@osu.cz March 1, 2018,

More information

Easy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix

Easy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix Easy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix Manuel S. Lazo-Cortés 1, José Francisco Martínez-Trinidad 1, Jesús Ariel Carrasco-Ochoa 1, and Guillermo

More information

Efficiently merging symbolic rules into integrated rules

Efficiently merging symbolic rules into integrated rules Efficiently merging symbolic rules into integrated rules Jim Prentzas a, Ioannis Hatzilygeroudis b a Democritus University of Thrace, School of Education Sciences Department of Education Sciences in Pre-School

More information

SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS

SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS Hans-Jürgen Winkler ABSTRACT In this paper an efficient on-line recognition system for handwritten mathematical formulas is proposed. After formula

More information

Compenzational Vagueness

Compenzational Vagueness Compenzational Vagueness Milan Mareš Institute of information Theory and Automation Academy of Sciences of the Czech Republic P. O. Box 18, 182 08 Praha 8, Czech Republic mares@utia.cas.cz Abstract Some

More information

Visualizing Logical Thinking using Homotopy A new learning method to survive in dynamically changing cyberworlds

Visualizing Logical Thinking using Homotopy A new learning method to survive in dynamically changing cyberworlds Visualizing Logical Thinking using Homotopy A new learning method to survive in dynamically changing cyberworlds Kenji Ohmori 1, Tosiyasu L. Kunii 2 1 Computer and Information Sciences, Hosei University,

More information

Imago: open-source toolkit for 2D chemical structure image recognition

Imago: open-source toolkit for 2D chemical structure image recognition Imago: open-source toolkit for 2D chemical structure image recognition Viktor Smolov *, Fedor Zentsev and Mikhail Rybalkin GGA Software Services LLC Abstract Different chemical databases contain molecule

More information

Anomaly Detection for the CERN Large Hadron Collider injection magnets

Anomaly Detection for the CERN Large Hadron Collider injection magnets Anomaly Detection for the CERN Large Hadron Collider injection magnets Armin Halilovic KU Leuven - Department of Computer Science In cooperation with CERN 2018-07-27 0 Outline 1 Context 2 Data 3 Preprocessing

More information

Advanced Techniques for Mining Structured Data: Process Mining

Advanced Techniques for Mining Structured Data: Process Mining Advanced Techniques for Mining Structured Data: Process Mining Frequent Pattern Discovery /Event Forecasting Dr A. Appice Scuola di Dottorato in Informatica e Matematica XXXII Problem definition 1. Given

More information

Integrated Cheminformatics to Guide Drug Discovery

Integrated Cheminformatics to Guide Drug Discovery Integrated Cheminformatics to Guide Drug Discovery Matthew Segall, Ed Champness, Peter Hunt, Tamsin Mansley CINF Drug Discovery Cheminformatics Approaches August 23 rd 2017 Optibrium, StarDrop, Auto-Modeller,

More information

Optimizing Abstaining Classifiers using ROC Analysis. Tadek Pietraszek / 'tʌ dek pɪe 'trʌ ʃek / ICML 2005 August 9, 2005

Optimizing Abstaining Classifiers using ROC Analysis. Tadek Pietraszek / 'tʌ dek pɪe 'trʌ ʃek / ICML 2005 August 9, 2005 IBM Zurich Research Laboratory, GSAL Optimizing Abstaining Classifiers using ROC Analysis Tadek Pietraszek / 'tʌ dek pɪe 'trʌ ʃek / pie@zurich.ibm.com ICML 2005 August 9, 2005 To classify, or not to classify:

More information

Designing and Evaluating Generic Ontologies

Designing and Evaluating Generic Ontologies Designing and Evaluating Generic Ontologies Michael Grüninger Department of Industrial Engineering University of Toronto gruninger@ie.utoronto.ca August 28, 2007 1 Introduction One of the many uses of

More information

Topology Proceedings. COPYRIGHT c by Topology Proceedings. All rights reserved.

Topology Proceedings. COPYRIGHT c by Topology Proceedings. All rights reserved. Topology Proceedings Web: http://topology.auburn.edu/tp/ Mail: Topology Proceedings Department of Mathematics & Statistics Auburn University, Alabama 36849, USA E-mail: topolog@auburn.edu ISSN: 0146-4124

More information

Pattern Structures 1

Pattern Structures 1 Pattern Structures 1 Pattern Structures Models describe whole or a large part of the data Pattern characterizes some local aspect of the data Pattern is a predicate that returns true for those objects

More information

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 41 Pulse Code Modulation (PCM) So, if you remember we have been talking

More information

Geospatial Intelligence

Geospatial Intelligence Geospatial Intelligence Geospatial analysis has existed as long as humans have made and studied maps but its importance to the intelligence community has skyrocketed in the past several years, with Unmanned

More information

MIXED DATA GENERATOR

MIXED DATA GENERATOR MIXED DATA GENERATOR Martin Matějka Jiří Procházka Zdeněk Šulc Abstract Very frequently, simulated data are required for quality evaluation of newly developed coefficients. In some cases, datasets with

More information

Mapcube and Mapview. Two Web-based Spatial Data Visualization and Mining Systems. C.T. Lu, Y. Kou, H. Wang Dept. of Computer Science Virginia Tech

Mapcube and Mapview. Two Web-based Spatial Data Visualization and Mining Systems. C.T. Lu, Y. Kou, H. Wang Dept. of Computer Science Virginia Tech Mapcube and Mapview Two Web-based Spatial Data Visualization and Mining Systems C.T. Lu, Y. Kou, H. Wang Dept. of Computer Science Virginia Tech S. Shekhar, P. Zhang, R. Liu Dept. of Computer Science University

More information

BIOS 2041: Introduction to Statistical Methods

BIOS 2041: Introduction to Statistical Methods BIOS 2041: Introduction to Statistical Methods Abdus S Wahed* *Some of the materials in this chapter has been adapted from Dr. John Wilson s lecture notes for the same course. Chapter 0 2 Chapter 1 Introduction

More information

Calculus at Rutgers. Course descriptions

Calculus at Rutgers. Course descriptions Calculus at Rutgers This edition of Jon Rogawski s text, Calculus Early Transcendentals, is intended for students to use in the three-semester calculus sequence Math 151/152/251 beginning with Math 151

More information

On Tuning OWA Operators in a Flexible Querying Interface

On Tuning OWA Operators in a Flexible Querying Interface On Tuning OWA Operators in a Flexible Querying Interface Sławomir Zadrożny 1 and Janusz Kacprzyk 2 1 Warsaw School of Information Technology, ul. Newelska 6, 01-447 Warsaw, Poland 2 Systems Research Institute

More information

EYE-TRACKING TESTING OF GIS INTERFACES

EYE-TRACKING TESTING OF GIS INTERFACES Geoinformatics EYE-TRACKING TESTING OF GIS INTERFACES Bc. Vaclav Kudelka Ing. Zdena Dobesova, Ph.D. Department of Geoinformatics, Palacký University, Olomouc, Czech Republic ABSTRACT Eye-tracking is currently

More information

Inferring Passenger Boarding and Alighting Preference for the Marguerite Shuttle Bus System

Inferring Passenger Boarding and Alighting Preference for the Marguerite Shuttle Bus System Inferring Passenger Boarding and Alighting Preference for the Marguerite Shuttle Bus System Adrian Albert Abstract We analyze passenger count data from the Marguerite Shuttle system operating on the Stanford

More information

Standards-Based Quantification in DTSA-II Part II

Standards-Based Quantification in DTSA-II Part II Standards-Based Quantification in DTSA-II Part II Nicholas W.M. Ritchie National Institute of Standards and Technology, Gaithersburg, MD 20899-8371 nicholas.ritchie@nist.gov Introduction This article is

More information

MODULE- 07 : FLUIDICS AND FLUID LOGIC

MODULE- 07 : FLUIDICS AND FLUID LOGIC MODULE- 07 : FLUIDICS AND FLUID LOGIC LECTURE- 26 : INTRODUCTION TO FLUID LOGIC INTRODUCTION Fluidics (also known as Fluidic logic) is the use of a fluid or compressible medium to perform analog or digital

More information

Impact of Data Characteristics on Recommender Systems Performance

Impact of Data Characteristics on Recommender Systems Performance Impact of Data Characteristics on Recommender Systems Performance Gediminas Adomavicius YoungOk Kwon Jingjing Zhang Department of Information and Decision Sciences Carlson School of Management, University

More information

Administering your Enterprise Geodatabase using Python. Jill Penney

Administering your Enterprise Geodatabase using Python. Jill Penney Administering your Enterprise Geodatabase using Python Jill Penney Assumptions Basic knowledge of python Basic knowledge enterprise geodatabases and workflows You want code Please turn off or silence cell

More information

Learning ArcGIS: Introduction to ArcCatalog 10.1

Learning ArcGIS: Introduction to ArcCatalog 10.1 Learning ArcGIS: Introduction to ArcCatalog 10.1 Estimated Time: 1 Hour Information systems help us to manage what we know by making it easier to organize, access, manipulate, and apply knowledge to the

More information

An Efficient Decision Procedure for Functional Decomposable Theories Based on Dual Constraints

An Efficient Decision Procedure for Functional Decomposable Theories Based on Dual Constraints An Efficient Decision Procedure for Functional Decomposable Theories Based on Dual Constraints Khalil Djelloul Laboratoire d Informatique Fondamentale d Orléans. Bat. 3IA, rue Léonard de Vinci. 45067 Orléans,

More information

Ranking Verification Counterexamples: An Invariant guided approach

Ranking Verification Counterexamples: An Invariant guided approach Ranking Verification Counterexamples: An Invariant guided approach Ansuman Banerjee Indian Statistical Institute Joint work with Pallab Dasgupta, Srobona Mitra and Harish Kumar Complex Systems Everywhere

More information

Analysis of United States Rainfall

Analysis of United States Rainfall Analysis of United States Rainfall Trevyn Currie, Stephen Blatt CurrieTrevyn@gmail.com, SBlattJ@gmail.com Abstract Using hourly rainfall data in the United States, we used SQL to construct a data mart

More information

SPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM

SPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM SPATIAL DATA MINING Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM INTRODUCTION The main difference between data mining in relational DBS and in spatial DBS is that attributes of the neighbors

More information

Predictive Modelling of Ag, Au, U, and Hg Ore Deposits in West Texas Carl R. Stockmeyer. December 5, GEO 327G

Predictive Modelling of Ag, Au, U, and Hg Ore Deposits in West Texas Carl R. Stockmeyer. December 5, GEO 327G Predictive Modelling of Ag, Au, U, and Hg Ore Deposits in West Texas Carl R. Stockmeyer December 5, 2013 - GEO 327G Objectives and Motivations The goal of this project is to use ArcGIS to create models

More information

Introducing GIS analysis

Introducing GIS analysis 1 Introducing GIS analysis GIS analysis lets you see patterns and relationships in your geographic data. The results of your analysis will give you insight into a place, help you focus your actions, or

More information

Patent Searching using Bayesian Statistics

Patent Searching using Bayesian Statistics Patent Searching using Bayesian Statistics Willem van Hoorn, Exscientia Ltd Biovia European Forum, London, June 2017 Contents Who are we? Searching molecules in patents What can Pipeline Pilot do for you?

More information

About the impossibility to prove P NP or P = NP and the pseudo-randomness in NP

About the impossibility to prove P NP or P = NP and the pseudo-randomness in NP About the impossibility to prove P NP or P = NP and the pseudo-randomness in NP Prof. Marcel Rémon 1 arxiv:0904.0698v3 [cs.cc] 24 Mar 2016 Abstract The relationship between the complexity classes P and

More information

Drawing Conclusions from Data The Rough Set Way

Drawing Conclusions from Data The Rough Set Way Drawing Conclusions from Data The Rough et Way Zdzisław Pawlak Institute of Theoretical and Applied Informatics, Polish Academy of ciences, ul Bałtycka 5, 44 000 Gliwice, Poland In the rough set theory

More information

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module

More information

BV4.1 Methodology and User-friendly Software for Decomposing Economic Time Series

BV4.1 Methodology and User-friendly Software for Decomposing Economic Time Series Conference on Seasonality, Seasonal Adjustment and their implications for Short-Term Analysis and Forecasting 10-12 May 2006 BV4.1 Methodology and User-friendly Software for Decomposing Economic Time Series

More information

Applying Bayesian networks in the game of Minesweeper

Applying Bayesian networks in the game of Minesweeper Applying Bayesian networks in the game of Minesweeper Marta Vomlelová Faculty of Mathematics and Physics Charles University in Prague http://kti.mff.cuni.cz/~marta/ Jiří Vomlel Institute of Information

More information

Computational Tasks and Models

Computational Tasks and Models 1 Computational Tasks and Models Overview: We assume that the reader is familiar with computing devices but may associate the notion of computation with specific incarnations of it. Our first goal is to

More information

Sound Recognition in Mixtures

Sound Recognition in Mixtures Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems

More information

A new Approach to Drawing Conclusions from Data A Rough Set Perspective

A new Approach to Drawing Conclusions from Data A Rough Set Perspective Motto: Let the data speak for themselves R.A. Fisher A new Approach to Drawing Conclusions from Data A Rough et Perspective Zdzisław Pawlak Institute for Theoretical and Applied Informatics Polish Academy

More information

Gaussian EDA and Truncation Selection: Setting Limits for Sustainable Progress

Gaussian EDA and Truncation Selection: Setting Limits for Sustainable Progress Gaussian EDA and Truncation Selection: Setting Limits for Sustainable Progress Petr Pošík Czech Technical University, Faculty of Electrical Engineering, Department of Cybernetics Technická, 66 7 Prague

More information

an efficient procedure for the decision problem. We illustrate this phenomenon for the Satisfiability problem.

an efficient procedure for the decision problem. We illustrate this phenomenon for the Satisfiability problem. 1 More on NP In this set of lecture notes, we examine the class NP in more detail. We give a characterization of NP which justifies the guess and verify paradigm, and study the complexity of solving search

More information

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,

More information

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Stéphane Ross and Brahim Chaib-draa Department of Computer Science and Software Engineering Laval University, Québec (Qc),

More information

Machine Learning 2010

Machine Learning 2010 Machine Learning 2010 Concept Learning: The Logical Approach Michael M Richter Email: mrichter@ucalgary.ca 1 - Part 1 Basic Concepts and Representation Languages 2 - Why Concept Learning? Concepts describe

More information

Evaluation, transformation, and parameterization of epipolar conics

Evaluation, transformation, and parameterization of epipolar conics Evaluation, transformation, and parameterization of epipolar conics Tomáš Svoboda svoboda@cmp.felk.cvut.cz N - CTU CMP 2000 11 July 31, 2000 Available at ftp://cmp.felk.cvut.cz/pub/cmp/articles/svoboda/svoboda-tr-2000-11.pdf

More information

Let s now begin to formalize our analysis of sequential machines Powerful methods for designing machines for System control Pattern recognition Etc.

Let s now begin to formalize our analysis of sequential machines Powerful methods for designing machines for System control Pattern recognition Etc. Finite State Machines Introduction Let s now begin to formalize our analysis of sequential machines Powerful methods for designing machines for System control Pattern recognition Etc. Such devices form

More information

Mathematics 1104B. Systems of Equations and Inequalities, and Matrices. Study Guide. Text: Mathematics 11. Alexander and Kelly; Addison-Wesley, 1998.

Mathematics 1104B. Systems of Equations and Inequalities, and Matrices. Study Guide. Text: Mathematics 11. Alexander and Kelly; Addison-Wesley, 1998. Adult Basic Education Mathematics Systems of Equations and Inequalities, and Matrices Prerequisites: Mathematics 1104A, Mathematics 1104B Credit Value: 1 Text: Mathematics 11. Alexander and Kelly; Addison-Wesley,

More information

Prediction of Citations for Academic Papers

Prediction of Citations for Academic Papers 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Describing Data Table with Best Decision

Describing Data Table with Best Decision Describing Data Table with Best Decision ANTS TORIM, REIN KUUSIK Department of Informatics Tallinn University of Technology Raja 15, 12618 Tallinn ESTONIA torim@staff.ttu.ee kuusik@cc.ttu.ee http://staff.ttu.ee/~torim

More information

SCIENCE PROGRAM CALCULUS III

SCIENCE PROGRAM CALCULUS III SCIENCE PROGRAM CALCULUS III Discipline: Mathematics Semester: Winter 2005 Course Code: 201-DDB-05 Instructor: Objectives: 00UV, 00UU Office: Ponderation: 3-2-3 Tel.: 457-6610 Credits: 2 2/3 Local: Course

More information

Affine Normalization of Symmetric Objects

Affine Normalization of Symmetric Objects Affine Normalization of Symmetric Objects Tomáš Suk and Jan Flusser Institute of Information Theory and Automation, Academy of Sciences of the Czech Republic, Pod vodárenskou věží 4, 182 08 Prague 8, Czech

More information

ROUGH set methodology has been witnessed great success

ROUGH set methodology has been witnessed great success IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 2, APRIL 2006 191 Fuzzy Probabilistic Approximation Spaces and Their Information Measures Qinghua Hu, Daren Yu, Zongxia Xie, and Jinfu Liu Abstract Rough

More information

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Eigenface and

More information

On the use of Long-Short Term Memory neural networks for time series prediction

On the use of Long-Short Term Memory neural networks for time series prediction On the use of Long-Short Term Memory neural networks for time series prediction Pilar Gómez-Gil National Institute of Astrophysics, Optics and Electronics ccc.inaoep.mx/~pgomez In collaboration with: J.

More information

Feature Engineering, Model Evaluations

Feature Engineering, Model Evaluations Feature Engineering, Model Evaluations Giri Iyengar Cornell University gi43@cornell.edu Feb 5, 2018 Giri Iyengar (Cornell Tech) Feature Engineering Feb 5, 2018 1 / 35 Overview 1 ETL 2 Feature Engineering

More information

EE290H F05. Spanos. Lecture 5: Comparison of Treatments and ANOVA

EE290H F05. Spanos. Lecture 5: Comparison of Treatments and ANOVA 1 Design of Experiments in Semiconductor Manufacturing Comparison of Treatments which recipe works the best? Simple Factorial Experiments to explore impact of few variables Fractional Factorial Experiments

More information

Matrix Factorization Techniques For Recommender Systems. Collaborative Filtering

Matrix Factorization Techniques For Recommender Systems. Collaborative Filtering Matrix Factorization Techniques For Recommender Systems Collaborative Filtering Markus Freitag, Jan-Felix Schwarz 28 April 2011 Agenda 2 1. Paper Backgrounds 2. Latent Factor Models 3. Overfitting & Regularization

More information

Integrating State Constraints and Obligations in Situation Calculus

Integrating State Constraints and Obligations in Situation Calculus Integrating State Constraints and Obligations in Situation Calculus Robert Demolombe ONERA-Toulouse 2, Avenue Edouard Belin BP 4025, 31055 Toulouse Cedex 4, France. Robert.Demolombe@cert.fr Pilar Pozos

More information

arxiv: v1 [cs.cl] 21 May 2017

arxiv: v1 [cs.cl] 21 May 2017 Spelling Correction as a Foreign Language Yingbo Zhou yingbzhou@ebay.com Utkarsh Porwal uporwal@ebay.com Roberto Konow rkonow@ebay.com arxiv:1705.07371v1 [cs.cl] 21 May 2017 Abstract In this paper, we

More information

Guaranteeing the Accuracy of Association Rules by Statistical Significance

Guaranteeing the Accuracy of Association Rules by Statistical Significance Guaranteeing the Accuracy of Association Rules by Statistical Significance W. Hämäläinen Department of Computer Science, University of Helsinki, Finland Abstract. Association rules are a popular knowledge

More information

FORECASTING OF ECONOMIC QUANTITIES USING FUZZY AUTOREGRESSIVE MODEL AND FUZZY NEURAL NETWORK

FORECASTING OF ECONOMIC QUANTITIES USING FUZZY AUTOREGRESSIVE MODEL AND FUZZY NEURAL NETWORK FORECASTING OF ECONOMIC QUANTITIES USING FUZZY AUTOREGRESSIVE MODEL AND FUZZY NEURAL NETWORK Dusan Marcek Silesian University, Institute of Computer Science Opava Research Institute of the IT4Innovations

More information

Classification of Be Stars Using Feature Extraction Based on Discrete Wavelet Transform

Classification of Be Stars Using Feature Extraction Based on Discrete Wavelet Transform Classification of Be Stars Using Feature Extraction Based on Discrete Wavelet Transform Pavla Bromová 1, David Bařina 1, Petr Škoda 2, Jaroslav Vážný 2, and Jaroslav Zendulka 1 1 Faculty of Information

More information