[11] [3] [1], [12] HITS [4] [2] k-means[9] 2 [11] [3] [6], [13] 1 ( ) k n O(n k ) 1 2
|
|
- Harriet Young
- 5 years ago
- Views:
Transcription
1 情報処理学会研究報告 1,a) IT Clustering by Clique Enumeration and Data Cleaning like Method Abstract: Recent development on information technology has made bigdata analysis more familiar in research and industrial areas. However, bigdata has big difficulties on diversity, other than its huge size. Data with much diversity is usually composed of many groups each of those has its original feature, thus data analysis from the global structures usually fails to capture the details of the data. To analyze the data correctly, capturing the group structure is important. Existing clustring algorithms and pattern mining algorithms aim to extract the group structures from the data. However, they usually find huge number of solutions, too few groups, many similar groups, or groups with large biased sizes, and often take long computation time. In this paper, we address the graph clustering problem, and discuss what are good graphs in that we can easily capture the group structures. From the discussion, we define a graph class that is a model of noiseless clarified graph. We then propose a data cleaning like method that modifies the given data graph to a clarified graph without breaking the group structures. We also show some mathematical and algorithmic properties for the graph class and the modifying algorithm. Keywords: data cleaning, data analysis, clique enumeration, pattern mining, clustering 1. Introduction Web , Hitotsubashi, Chiyoda, Tokyo , Japan , Ichiban-cho, Uegahara, Nishinomiya, Hyogo , Japan a) uno@nii.ac.jp 1
2 [11] [3] [1], [12] HITS [4] [2] k-means[9] 2 [11] [3] [6], [13] 1 ( ) k n O(n k ) 1 2
3 2. 2 E V ( V 1)/2 U G U δ δ- G v v u v u v v N(v) v d(v) v N(v) N[v] N(v) {v} w u v K d K (v) K v K min v K d K (v) 3. v u K K v u u v (k ) u v 2 u v k- : k <6 k- k- v u N[v] Nu k k- k- (u, v) E N[v] N[u] k k- (k) k 1 n k- O(n k ) : k S K K K u K v S N[v],N[u] N[u] N[v] k u v K K S k 3
4 k k n C k k n k n C k + n k = O(n k ) 3 n/3 O(n k ) k 1 n k- O(n k ) : [10] O(n ) 2 n k- k k =Θ(n) k 2 4 n n 2 /4+3n/2 k- 2 n/2 : G V = {1,...,n} i j i j = i +1 (i, j) G {2i, 2i 1} 2 n/2 u v N[u] N[v] = n 2 G V V 1 = {1, 3, 5,...,n 1} V 2 = {2, 4, 6,...,n} U 0 = V 2 U i = V 1 \{2i 1} {2i} 1 i n/2 U j G U j U j n/2 U j U j G =(V,E ) G V = n +(n/2+1) n/2 =n 2 /4+3n/2 U j (u, v) G (u, v) N[u] N[v] (1) (u, v) E n 2+n/2 (2) u, v V (u, v) E n 2 (3) u U j \ V v U j n (u, v) E (3) u U j \ V v U j n/2 (u, v) E n- G G G V G G 2 n/2 4. k- k- G =(V,E) G k- E = (u, v) N[v] N[u] k u v u v k- k- 1 γk K (γ +1)k/2 G k- : K 2 u v 2 (γk (γ +1)k )=γk k 2 4
5 N[u] N[v] k u v k- K K K (γ +1)/2γ all graphs clarified graphs 3 K γk (γ 1)k/3 δ- 3(1 δ) < ( γ 1 γ )2 k- K : M K M k- K M =(1 δ)γk(γk 1)/2 K v P d K (v) (γ +1)k/2 Q (γ +1)k/2 d K (v) (2γ +1)k/3 p = P q = Q M pq + p(p 1)/2 q 2M ((γ 1)k/2)p = 6M 2p (γ 1)k/3 (γ 1)k M 6M p( (γ 1)k 2p)+p(p 1) 2 = 6pM (γ 1)k 2p2 + p2 2 p 2 = 3 2 p2 + 12M (γ 1)k p 2(γ 1)k X = 12M (γ 1)k 2(γ 1)k p = X/3 M 3p + X =0 X 2 /6 M M M /M X2 /6 M (X 2 /6 (1 δ)((γ 1)k) 2 /2 X 2 < 3(1 δ)((γ 1)k) 2 X < 3(1 δ)(γ 1)k M /M < 1 X < 3(1 δ)(γ 1)k 12M (γ 1)k 2(γ 1)k < 3(1 δ)(γ 1)k 3(1 δ)γ(γk 1) 1 2 < 3(1 δ)(γ 1)k(γ 1) 3(1 δ)γ 2 k< 3(1 δ)(γ 1) 2 k γ 1 3(1 δ) < ( ) 2 γ 3 K γk (γ 1)k/3 3 5/6-γ >6 1/4 /(6 1/4 1) k- K : δ =5/6 1/6 1/4 < (γ 1)/γ γ >6 1/4 /(6 1/4 1) k- k- k- k- N[u] N[v] 1 0 sim(u, v) k- sim sim- sim- sim Jaccard N[u] N[v] θ v u N[u] N[v] 5
6 θ Algorithm DataPolishing SimilarityGraph (G =(V,E): graph, sim:similarity measure, τ:repeat number) 1. for i := 1 to τ 2. E := {(u, v) sim(u, v) =1} 3. E := E 4. end for 5. output G 5. Algorithm for Fast Data Polishing sim(u, v) O( V 2 ) V V : sim(u, v) =1 N[u] N[v] > 0 N[u] N[v] =0 N[u] N[v] N[u] N[v] = N[u] + N[v] N[u] N[v] N[u] \ N[v] = N[u] N[u] N[v] 2 N[u] N[v] u v w N[w] 3 N[u] N[v] N[u] w v N[w] u v N[u] N[v] w N[u] N[w] w N[w] v N[w] v 1 w v N[u] N[v] for each w N[u] do for each v N[w] s.t.v<u, intersection[v] :=intersection[v]+1 end for N[u] N[v] v v L v 0 1 v L L 0 O( V ) v L L O( L ) Algorithm NeighborIntersection (G =(V,E)) 1. intersection[u] :=0foreachu V 2. for each u V do 3. L := 4. for each w N[u] do 5. for each v N[w] s.t.v<udo 6. if intersection[v] =0then insert v to L 7. intersection[v] :=intersection[v]+1 8. end for 9. end for 6
7 10. for each v L output {v,u}; intersection[v] :=0 11. end for O( u V w N[u] {v N[w] v<u} ) u V w N[u] u u w u N[w] u N[w] {v N[w] v>u} = O( N[w] 2 ) Δ O( Δ2 )=O(Δ 2 V ) Web G i α α/i k c β w N[w] cα/w k + β N[w] 2 (cα/w k + β) 2 3 (cα/w k ) 2 + β 2 O( E )+ 3 (cα/w k ) 2 = O( E + α 2 (1/w 2k ) k = 1 O( E + α 2 log E ) k >1 O( E + α 2 ) α O( V 1/2 ) k =1 O( E log E ) O( E ) 4 G =(V,E) i c β cα/i k + β NeighborIntersection k =1 O( E + α 2 log E ) k >1 O( E + α 2 ) cα/i k + β 6. CREST [1] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen and A. I. Verkamo, Fast discovery of association rules, Advances in Knowledge Discovery and Data Mining, Chapter 12, AAAI Press / The MIT Press (1996). [2] C. Cortes and V. N. Vapnik, Support-Vector Networks, Machine Learning 20 (1995). [3] M. Girvan and M. E. J. Newman, Community Structure in Social and Biological Networks, Proc. Natl. Acad. Sci. USA 99, pp (2002) [4] P. Judea, Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach, Proceedings of the 7
8 Second National Conference on Artificial Intelligence, pp (1982). [5] C. D. Manning, P. Raghavan and H. Schütze, Introduction to Information Retrieval, Cambridge University Press, (2008). [6] B. Goethals, the FIMI repository, (2003). [7] S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, Trawling the Web for Emerging Cyber- Communities, Proceedings of the Eighth International World Wide Web Conference, Toronto, Canada, [8] UCI Machine Learning Repository, [9] J. B. MacQueen, Some Methods for Classification and Analysis of Multivariate Observations, Berkeley Symposium on Mathematical Statistics and Probability, pp (1967). [10] K. Makino and T. Uno, New Algorithms for Enumerating All Maximal Cliques, LNCS 3111 (Proc. SWAT 2004), pp (2004). [11] G. Palla, I. Derènyi, I. Farkas, and T. Vicsek, Uncovering the Overlapping Community Structure of Complex Networks in Nature and Society, Nature 435, pp (2005) [12] T. Uno, T. Asai, Y. Uchida, H. Arimura, An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases, LNAI 3245, pp (2004). [13] T.Uno,M.Kiyomi,H.Arimura, LCMver.2:Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets, Proc. IEEE ICDM 04 Workshop FIMI 04 (2004). [14] T. Uno, An Efficient Algorithm for Solving Pseudo Clique Enumeration Problem Algorithmica 56, pp (2010). 8
An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases
An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases Takeaki Uno, Tatsuya Asai 2 3, Yuzo Uchida 2, and Hiroki Arimura 2 National Institute of Informatics, 2--2, Hitotsubashi,
More informationCommunities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices
Communities Via Laplacian Matrices Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices The Laplacian Approach As with betweenness approach, we want to divide a social graph into
More informationChapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining
Chapter 6. Frequent Pattern Mining: Concepts and Apriori Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Pattern Discovery: Definition What are patterns? Patterns: A set of
More informationAn Analysis on Link Structure Evolution Pattern of Web Communities
DEWS2005 5-C-01 153-8505 4-6-1 E-mail: {imafuji,kitsure}@tkl.iis.u-tokyo.ac.jp 1999 2002 HITS An Analysis on Link Structure Evolution Pattern of Web Communities Noriko IMAFUJI and Masaru KITSUREGAWA Institute
More informationAn Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases
An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases Takeaki Uno 1, Tatsuya Asai 2,,YuzoUchida 2, and Hiroki Arimura 2 1 National Institute of Informatics, 2-1-2, Hitotsubashi,
More informationOn Resolution Limit of the Modularity in Community Detection
The Ninth International Symposium on Operations Research and Its Applications (ISORA 10) Chengdu-Jiuzhaigou, China, August 19 23, 2010 Copyright 2010 ORSC & APORC, pp. 492 499 On Resolution imit of the
More informationA Multiobjective GO based Approach to Protein Complex Detection
Available online at www.sciencedirect.com Procedia Technology 4 (2012 ) 555 560 C3IT-2012 A Multiobjective GO based Approach to Protein Complex Detection Sumanta Ray a, Moumita De b, Anirban Mukhopadhyay
More informationOn Minimal Infrequent Itemset Mining
On Minimal Infrequent Itemset Mining David J. Haglin and Anna M. Manning Abstract A new algorithm for minimal infrequent itemset mining is presented. Potential applications of finding infrequent itemsets
More informationRelation between Pareto-Optimal Fuzzy Rules and Pareto-Optimal Fuzzy Rule Sets
Relation between Pareto-Optimal Fuzzy Rules and Pareto-Optimal Fuzzy Rule Sets Hisao Ishibuchi, Isao Kuwajima, and Yusuke Nojima Department of Computer Science and Intelligent Systems, Osaka Prefecture
More informationEncyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen
Book Title Encyclopedia of Machine Learning Chapter Number 00403 Book CopyRight - Year 2010 Title Frequent Pattern Author Particle Given Name Hannu Family Name Toivonen Suffix Email hannu.toivonen@cs.helsinki.fi
More informationDistributed Mining of Frequent Closed Itemsets: Some Preliminary Results
Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results Claudio Lucchese Ca Foscari University of Venice clucches@dsi.unive.it Raffaele Perego ISTI-CNR of Pisa perego@isti.cnr.it Salvatore
More informationMining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationModeling of Growing Networks with Directional Attachment and Communities
Modeling of Growing Networks with Directional Attachment and Communities Masahiro KIMURA, Kazumi SAITO, Naonori UEDA NTT Communication Science Laboratories 2-4 Hikaridai, Seika-cho, Kyoto 619-0237, Japan
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University Slides adapted from Prof. Jiawei Han @UIUC, Prof. Srinivasan
More informationMining Maximal Flexible Patterns in a Sequence
Mining Maximal Flexible Patterns in a Sequence Hiroki Arimura 1, Takeaki Uno 2 1 Graduate School of Information Science and Technology, Hokkaido University Kita 14 Nishi 9, Sapporo 060-0814, Japan arim@ist.hokudai.ac.jp
More informationLars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
Syllabus Fri. 21.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 27.10. (2) A.1 Linear Regression Fri. 3.11. (3) A.2 Linear Classification Fri. 10.11. (4) A.3 Regularization
More informationModified Entropy Measure for Detection of Association Rules Under Simpson's Paradox Context.
Modified Entropy Measure for Detection of Association Rules Under Simpson's Paradox Context. Murphy Choy Cally Claire Ong Michelle Cheong Abstract The rapid explosion in retail data calls for more effective
More informationAn Extended Branch and Bound Search Algorithm for Finding Top-N Formal Concepts of Documents
An Extended Branch and Bound Search Algorithm for Finding Top-N Formal Concepts of Documents Makoto HARAGUCHI and Yoshiaki OKUBO Division of Computer Science Graduate School of Information Science and
More informationAn Anomaly Detection Method for Spacecraft using Relevance Vector Learning
An Anomaly Detection Method for Spacecraft using Relevance Vector Learning Ryohei Fujimaki 1, Takehisa Yairi 2, and Kazuo Machida 2 1 The Univ. of Tokyo, Aero. and Astronautics 2 The Univ. of Tokyo, RCAST
More informationAn Approach to Classification Based on Fuzzy Association Rules
An Approach to Classification Based on Fuzzy Association Rules Zuoliang Chen, Guoqing Chen School of Economics and Management, Tsinghua University, Beijing 100084, P. R. China Abstract Classification based
More informationPattern-Based Decision Tree Construction
Pattern-Based Decision Tree Construction Dominique Gay, Nazha Selmaoui ERIM - University of New Caledonia BP R4 F-98851 Nouméa cedex, France {dominique.gay, nazha.selmaoui}@univ-nc.nc Jean-François Boulicaut
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu Task: Find coalitions in signed networks Incentives: European
More informationTransaction Databases, Frequent Itemsets, and Their Condensed Representations
Transaction Databases, Frequent Itemsets, and Their Condensed Representations Taneli Mielikäinen HIIT Basic Research Unit Department of Computer Science University of Helsinki, Finland Abstract. Mining
More informationAn Intersection Inequality for Discrete Distributions and Related Generation Problems
An Intersection Inequality for Discrete Distributions and Related Generation Problems E. Boros 1, K. Elbassioni 1, V. Gurvich 1, L. Khachiyan 2, and K. Makino 3 1 RUTCOR, Rutgers University, 640 Bartholomew
More informationDetecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D.
Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Sánchez 18th September 2013 Detecting Anom and Exc Behaviour on
More informationHard and Fuzzy c-medoids for Asymmetric Networks
16th World Congress of the International Fuzzy Systems Association (IFSA) 9th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT) Hard and Fuzzy c-medoids for Asymmetric Networks
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University 10/17/2017 Slides adapted from Prof. Jiawei Han @UIUC, Prof.
More informationThe network growth model that considered community information
1 1 1 1 CNN The network growth model that considered community information EIJI MIYOSHI, 1 IKUO SUZUKI, 1 MASAHITO YAMAMOTO 1 and MASASHI FURUKAWA 1 There is a study of much network model, but, as for
More informationKnowledge Extraction from DBNs for Images
Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental
More informationLearning latent structure in complex networks
Learning latent structure in complex networks Lars Kai Hansen www.imm.dtu.dk/~lkh Current network research issues: Social Media Neuroinformatics Machine learning Joint work with Morten Mørup, Sune Lehmann
More informationSimultaneous Reliability Evaluation of Generality and Accuracy for Rule Discovery in Databases Einoshin Suzuki
Simultaneous Reliability Evaluation of Generality and Accuracy for Rule Discovery in Databases Einoshin Suzuki Electrical and Computer Engineering, Yokohama National University, 79-5, Toldwadai, Hodogaya,
More informationOverlapping Communities
Overlapping Communities Davide Mottin HassoPlattner Institute Graph Mining course Winter Semester 2017 Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides GRAPH
More informationUSING SINGULAR VALUE DECOMPOSITION (SVD) AS A SOLUTION FOR SEARCH RESULT CLUSTERING
POZNAN UNIVE RSIY OF E CHNOLOGY ACADE MIC JOURNALS No. 80 Electrical Engineering 2014 Hussam D. ABDULLA* Abdella S. ABDELRAHMAN* Vaclav SNASEL* USING SINGULAR VALUE DECOMPOSIION (SVD) AS A SOLUION FOR
More informationCorrelation Preserving Unsupervised Discretization. Outline
Correlation Preserving Unsupervised Discretization Jee Vang Outline Paper References What is discretization? Motivation Principal Component Analysis (PCA) Association Mining Correlation Preserving Discretization
More informationModeling and Estimation from High Dimensional Data TAKASHI WASHIO THE INSTITUTE OF SCIENTIFIC AND INDUSTRIAL RESEARCH OSAKA UNIVERSITY
Modeling and Estimation from High Dimensional Data TAKASHI WASHIO THE INSTITUTE OF SCIENTIFIC AND INDUSTRIAL RESEARCH OSAKA UNIVERSITY Department of Reasoning for Intelligence, Division of Information
More informationText mining and natural language analysis. Jefrey Lijffijt
Text mining and natural language analysis Jefrey Lijffijt PART I: Introduction to Text Mining Why text mining The amount of text published on paper, on the web, and even within companies is inconceivably
More informationBottom-Up Propositionalization
Bottom-Up Propositionalization Stefan Kramer 1 and Eibe Frank 2 1 Institute for Computer Science, Machine Learning Lab University Freiburg, Am Flughafen 17, D-79110 Freiburg/Br. skramer@informatik.uni-freiburg.de
More informationAn Efficient Algorithm for Protein-Protein Interaction Network Analysis to Discover Overlapping Functional Modules
An Efficient Algorithm for Protein-Protein Interaction Network Analysis to Discover Overlapping Functional Modules Ying Liu 1 Department of Computer Science, Mathematics and Science, College of Professional
More informationInternational Journal of Scientific & Engineering Research, Volume 7, Issue 2, February ISSN
International Journal of Scientific & Engineering Research, Volume 7, Issue 2, February-2016 9 Automated Methodology for Context Based Semantic Anomaly Identification in Big Data Hema.R 1, Vidhya.V 2,
More informationClassification of Voice Signals through Mining Unique Episodes in Temporal Information Systems: A Rough Set Approach
Classification of Voice Signals through Mining Unique Episodes in Temporal Information Systems: A Rough Set Approach Krzysztof Pancerz, Wies law Paja, Mariusz Wrzesień, and Jan Warcho l 1 University of
More informationDeterministic scale-free networks
Physica A 299 (2001) 559 564 www.elsevier.com/locate/physa Deterministic scale-free networks Albert-Laszlo Barabasi a;, Erzsebet Ravasz a, Tamas Vicsek b a Department of Physics, College of Science, University
More informationClassification Based on Logical Concept Analysis
Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract.
More informationA Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms T. Vijayakumar 1, V.Nivedhitha 2, K.Deeba 3 and M. Sathya Bama 4 1 Assistant professor / Dept of IT, Dr.N.G.P College of Engineering
More informationarxiv: v1 [physics.soc-ph] 20 Oct 2010
arxiv:00.098v [physics.soc-ph] 20 Oct 200 Spectral methods for the detection of network community structure: A comparative analysis Hua-Wei Shen and Xue-Qi Cheng Institute of Computing Technology, Chinese
More informationarxiv:cond-mat/ v1 [cond-mat.dis-nn] 4 May 2000
Topology of evolving networks: local events and universality arxiv:cond-mat/0005085v1 [cond-mat.dis-nn] 4 May 2000 Réka Albert and Albert-László Barabási Department of Physics, University of Notre-Dame,
More informationSampling for Frequent Itemset Mining The Power of PAC learning
Sampling for Frequent Itemset Mining The Power of PAC learning prof. dr Arno Siebes Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht The Papers Today
More informationDiscovery of Frequent Word Sequences in Text. Helena Ahonen-Myka. University of Helsinki
Discovery of Frequent Word Sequences in Text Helena Ahonen-Myka University of Helsinki Department of Computer Science P.O. Box 26 (Teollisuuskatu 23) FIN{00014 University of Helsinki, Finland, helena.ahonen-myka@cs.helsinki.fi
More informationPositive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise
Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise Guimei Liu 1,2 Jinyan Li 1 Limsoon Wong 2 Wynne Hsu 2 1 Institute for Infocomm Research, Singapore 2 School
More informationNetwork by Weighted Graph Mining
2012 4th International Conference on Bioinformatics and Biomedical Technology IPCBEE vol.29 (2012) (2012) IACSIT Press, Singapore + Prediction of Protein Function from Protein-Protein Interaction Network
More informationCSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13
CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as
More informationBayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models
Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Kohta Aoki 1 and Hiroshi Nagahashi 2 1 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
More informationMachine Learning on temporal data
Machine Learning on temporal data Classification rees for ime Series Ahlame Douzal (Ahlame.Douzal@imag.fr) AMA, LIG, Université Joseph Fourier Master 2R - MOSIG (2011) Plan ime Series classification approaches
More informationOn the NP-Completeness of Some Graph Cluster Measures
arxiv:cs/050600v [cs.cc] 9 Jun 005 On the NP-Completeness of Some Graph Cluster Measures Jiří Šíma Institute of Computer Science, Academy of Sciences of the Czech Republic, P. O. Box 5, 807 Prague 8, Czech
More informationA. Pelliccioni (*), R. Cotroneo (*), F. Pungì (*) (*)ISPESL-DIPIA, Via Fontana Candida 1, 00040, Monteporzio Catone (RM), Italy.
Application of Neural Net Models to classify and to forecast the observed precipitation type at the ground using the Artificial Intelligence Competition data set. A. Pelliccioni (*), R. Cotroneo (*), F.
More informationCS224W: Social and Information Network Analysis
CS224W: Social and Information Network Analysis Reaction Paper Adithya Rao, Gautam Kumar Parai, Sandeep Sripada Keywords: Self-similar networks, fractality, scale invariance, modularity, Kronecker graphs.
More informationBagging and Boosting for the Nearest Mean Classifier: Effects of Sample Size on Diversity and Accuracy
and for the Nearest Mean Classifier: Effects of Sample Size on Diversity and Accuracy Marina Skurichina, Liudmila I. Kuncheva 2 and Robert P.W. Duin Pattern Recognition Group, Department of Applied Physics,
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationMining Spatial and Spatio-temporal Patterns in Scientific Data
Mining Spatial and Spatio-temporal Patterns in Scientific Data Hui Yang Dept. of Comp. Sci. & Eng. The Ohio State University Columbus, OH, 43210 yanghu@cse.ohio-state.edu Srinivasan Parthasarathy Dept.
More informationNetBox: A Probabilistic Method for Analyzing Market Basket Data
NetBox: A Probabilistic Method for Analyzing Market Basket Data José Miguel Hernández-Lobato joint work with Zoubin Gharhamani Department of Engineering, Cambridge University October 22, 2012 J. M. Hernández-Lobato
More informationInfo-Cluster Based Regional Influence Analysis in Social Networks
Info-Cluster Based Regional Influence Analysis in Social Networks Chao Li,2,3, Zhongying Zhao,2,3,JunLuo, and Jianping Fan Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen
More informationModeling High-Dimensional Discrete Data with Multi-Layer Neural Networks
Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks Yoshua Bengio Dept. IRO Université de Montréal Montreal, Qc, Canada, H3C 3J7 bengioy@iro.umontreal.ca Samy Bengio IDIAP CP 592,
More informationA Post-Aggregation Error Record Extraction Based on Naive Bayes for Statistical Survey Enumeration
A Post-Aggregation Error Record Extraction Based on Naive Bayes for Statistical Survey Enumeration Kiyomi Shirakawa, National Statistics Center, Tokyo, JAPAN e-mail: kshirakawa@nstac.go.jp Abstract At
More informationChapter 5-2: Clustering
Chapter 5-2: Clustering Jilles Vreeken Revision 1, November 20 th typo s fixed: dendrogram Revision 2, December 10 th clarified: we do consider a point x as a member of its own ε-neighborhood 12 Nov 2015
More informationClassifier Selection. Nicholas Ver Hoeve Craig Martek Ben Gardner
Classifier Selection Nicholas Ver Hoeve Craig Martek Ben Gardner Classifier Ensembles Assume we have an ensemble of classifiers with a well-chosen feature set. We want to optimize the competence of this
More informationDiscovering Non-Redundant Association Rules using MinMax Approximation Rules
Discovering Non-Redundant Association Rules using MinMax Approximation Rules R. Vijaya Prakash Department Of Informatics Kakatiya University, Warangal, India vijprak@hotmail.com Dr.A. Govardhan Department.
More informationInformation Geometric Structure on Positive Definite Matrices and its Applications
Information Geometric Structure on Positive Definite Matrices and its Applications Atsumi Ohara Osaka University 2010 Feb. 21 at Osaka City University 大阪市立大学数学研究所情報幾何関連分野研究会 2010 情報工学への幾何学的アプローチ 1 Outline
More informationThe Forward-Backward Embedding of Directed Graphs
The Forward-Backward Embedding of Directed Graphs Anonymous authors Paper under double-blind review Abstract We introduce a novel embedding of directed graphs derived from the singular value decomposition
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More informationClass Prior Estimation from Positive and Unlabeled Data
IEICE Transactions on Information and Systems, vol.e97-d, no.5, pp.1358 1362, 2014. 1 Class Prior Estimation from Positive and Unlabeled Data Marthinus Christoffel du Plessis Tokyo Institute of Technology,
More informationOn the Sizes of Decision Diagrams Representing the Set of All Parse Trees of a Context-free Grammar
Proceedings of Machine Learning Research vol 73:153-164, 2017 AMBN 2017 On the Sizes of Decision Diagrams Representing the Set of All Parse Trees of a Context-free Grammar Kei Amii Kyoto University Kyoto
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 6
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 6 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013 Han, Kamber & Pei. All rights
More informationA Generalized Decision Logic in Interval-set-valued Information Tables
A Generalized Decision Logic in Interval-set-valued Information Tables Y.Y. Yao 1 and Qing Liu 2 1 Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: yyao@cs.uregina.ca
More informationThe final publication is available at IEEE via
This manuscript is a post-print copy of the following article Title: Mining Approximate Multi-Relational Patterns Authors: Eirini Spyropoulou; Tijl De Bie The final publication is available at IEEE via
More informationDynamic Programming Approach for Construction of Association Rule Systems
Dynamic Programming Approach for Construction of Association Rule Systems Fawaz Alsolami 1, Talha Amin 1, Igor Chikalov 1, Mikhail Moshkov 1, and Beata Zielosko 2 1 Computer, Electrical and Mathematical
More informationUnsupervised Classification via Convex Absolute Value Inequalities
Unsupervised Classification via Convex Absolute Value Inequalities Olvi L. Mangasarian Abstract We consider the problem of classifying completely unlabeled data by using convex inequalities that contain
More informationOn the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis Thèse de doctorat en informatique Mehdi Kaytoue 22 April 2011 Amedeo Napoli Sébastien Duplessis Somewhere... in a temperate forest... N 2 /
More informationUn nouvel algorithme de génération des itemsets fermés fréquents
Un nouvel algorithme de génération des itemsets fermés fréquents Huaiguo Fu CRIL-CNRS FRE2499, Université d Artois - IUT de Lens Rue de l université SP 16, 62307 Lens cedex. France. E-mail: fu@cril.univ-artois.fr
More informationFinding All Minimal Infrequent Multi-dimensional Intervals
Finding All Minimal nfrequent Multi-dimensional ntervals Khaled M. Elbassioni Max-Planck-nstitut für nformatik, Saarbrücken, Germany; elbassio@mpi-sb.mpg.de Abstract. Let D be a database of transactions
More informationSocial and Technological Network Analysis. Lecture 4: Modularity and Overlapping Dr. Cecilia Mascolo
Social and Technological Network Analysis Lecture 4: Modularity and Overlapping Communi@es Dr. Cecilia Mascolo In This Lecture We will describe the concept of Modularity and Modularity Op@miza@on. We will
More informationShort Note: Naive Bayes Classifiers and Permanence of Ratios
Short Note: Naive Bayes Classifiers and Permanence of Ratios Julián M. Ortiz (jmo1@ualberta.ca) Department of Civil & Environmental Engineering University of Alberta Abstract The assumption of permanence
More informationDiscovery of Functional and Approximate Functional Dependencies in Relational Databases
JOURNAL OF APPLIED MATHEMATICS AND DECISION SCIENCES, 7(1), 49 59 Copyright c 2003, Lawrence Erlbaum Associates, Inc. Discovery of Functional and Approximate Functional Dependencies in Relational Databases
More informationLink Prediction. Eman Badr Mohammed Saquib Akmal Khan
Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling
More informationSpectral Analysis of Directed Complex Networks. Tetsuro Murai
MASTER THESIS Spectral Analysis of Directed Complex Networks Tetsuro Murai Department of Physics, Graduate School of Science and Engineering, Aoyama Gakuin University Supervisors: Naomichi Hatano and Kenn
More informationA Probabilistic Relational Model for Characterizing Situations in Dynamic Multi-Agent Systems
A Probabilistic Relational Model for Characterizing Situations in Dynamic Multi-Agent Systems Daniel Meyer-Delius 1, Christian Plagemann 1, Georg von Wichert 2, Wendelin Feiten 2, Gisbert Lawitzky 2, and
More informationCHAPTER 2: DATA MINING - A MODERN TOOL FOR ANALYSIS. Due to elements of uncertainty many problems in this world appear to be
11 CHAPTER 2: DATA MINING - A MODERN TOOL FOR ANALYSIS Due to elements of uncertainty many problems in this world appear to be complex. The uncertainty may be either in parameters defining the problem
More informationOptimizing Abstaining Classifiers using ROC Analysis. Tadek Pietraszek / 'tʌ dek pɪe 'trʌ ʃek / ICML 2005 August 9, 2005
IBM Zurich Research Laboratory, GSAL Optimizing Abstaining Classifiers using ROC Analysis Tadek Pietraszek / 'tʌ dek pɪe 'trʌ ʃek / pie@zurich.ibm.com ICML 2005 August 9, 2005 To classify, or not to classify:
More informationJure Leskovec Joint work with Jaewon Yang, Julian McAuley
Jure Leskovec (@jure) Joint work with Jaewon Yang, Julian McAuley Given a network, find communities! Sets of nodes with common function, role or property 2 3 Q: How and why do communities form? A: Strength
More informationSparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference
Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which
More informationConnectivity of Wireless Sensor Networks with Constant Density
Connectivity of Wireless Sensor Networks with Constant Density Sarah Carruthers and Valerie King University of Victoria, Victoria, BC, Canada Abstract. We consider a wireless sensor network in which each
More informationCube Lattices: a Framework for Multidimensional Data Mining
Cube Lattices: a Framework for Multidimensional Data Mining Alain Casali Rosine Cicchetti Lotfi Lakhal Abstract Constrained multidimensional patterns differ from the well-known frequent patterns from a
More informationIdentification of Bursts in a Document Stream
Identification of Bursts in a Document Stream Toshiaki FUJIKI 1, Tomoyuki NANNO 1, Yasuhiro SUZUKI 1 and Manabu OKUMURA 2 1 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute
More informationAn Overview of Alternative Rule Evaluation Criteria and Their Use in Separate-and-Conquer Classifiers
An Overview of Alternative Rule Evaluation Criteria and Their Use in Separate-and-Conquer Classifiers Fernando Berzal, Juan-Carlos Cubero, Nicolás Marín, and José-Luis Polo Department of Computer Science
More informationUnsupervised Learning with Permuted Data
Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University
More informationPreserving Privacy in Data Mining using Data Distortion Approach
Preserving Privacy in Data Mining using Data Distortion Approach Mrs. Prachi Karandikar #, Prof. Sachin Deshpande * # M.E. Comp,VIT, Wadala, University of Mumbai * VIT Wadala,University of Mumbai 1. prachiv21@yahoo.co.in
More informationSupporting Statistical Hypothesis Testing Over Graphs
Supporting Statistical Hypothesis Testing Over Graphs Jennifer Neville Departments of Computer Science and Statistics Purdue University (joint work with Tina Eliassi-Rad, Brian Gallagher, Sergey Kirshner,
More informationAnalysis of Multiclass Support Vector Machines
Analysis of Multiclass Support Vector Machines Shigeo Abe Graduate School of Science and Technology Kobe University Kobe, Japan abe@eedept.kobe-u.ac.jp Abstract Since support vector machines for pattern
More informationIntroduction. Data Mining. University of Szeged. Data Mining
University of Szeged Requirements and grading 2 midterms for 25 pts each (min. 8 pts each) (17 Oct, 28 Nov) Retake: if you have >= 8 pts per midterm you can have a retake. Its material covers that of the
More informationDoes Better Inference mean Better Learning?
Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract
More informationMultiple Similarities Based Kernel Subspace Learning for Image Classification
Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese
More informationGroups of vertices and Core-periphery structure. By: Ralucca Gera, Applied math department, Naval Postgraduate School Monterey, CA, USA
Groups of vertices and Core-periphery structure By: Ralucca Gera, Applied math department, Naval Postgraduate School Monterey, CA, USA Mostly observed real networks have: Why? Heavy tail (powerlaw most
More information