Supervisor: Prof. Stefano Spaccapietra Dr. Fabio Porto Student: Yuanjian Wang Zufferey EPFL - Computer Science - LBD 1
Introduction Related Work Proposed Solution Implementation Important Results Conclusion Future work EPFL - Computer Science - LBD 2
Background Problem Statement EPFL - Computer Science - LBD 3
Fast growing g knowledge collection on the Neuroscience Biologists: Biological neural information Experiment Data Mathematicians, ti i physicians, i and computer scientists: Computational neural model information Simulation data EPFL - Computer Science - LBD 4
Two kinds of data Meta data Basal Knowledge about the biological neuron Such as the physiology of a neuron computational model Such as the mathematic model High dimension data Experiment data from biological experiments Simulation data from simulations of computational model EPFL - Computer Science - LBD 5
User requirements: Basic knowledge storage, such as neural biological structure, computational composition Comparison between different computational model on the same biological neural structure Interpretation of the connection between biological and computational aspect on the same biological neural structure Bridge between biological experiment and simulation of computational model EPFL - Computer Science - LBD 6
Common knowledge database NeuronDB Save a computational model as a package written by some program languages, such as MatLab, NEURON, JAVA, C++ Few or no biological information Can t compare different computational model easily Can t find the connection between the biological experiment and computational simulation EPFL - Computer Science - LBD 7
Traditional database manager systems Predefined schema Can t compare high dimension data Semantic web applications Well defined conceptual schema to supply annotations Using Ontology reasoners or other intelligent tools Can t mine the high dimension data EPFL - Computer Science - LBD 8
Artificial intelligence Machine learning Supervised Learning Know in advance the classes Need training data: typical examples Unsupervised learning classes and typical examples are not available Solving cluster problem Reinforcement learning Real time system EPFL - Computer Science - LBD 9
Summary Learning Database Architecture Concept Model Unsupervised Machine Learning - Competitive Learning EPFL - Computer Science - LBD 10
Traditional data management system combined with machine learning Learning database well-formed storage Easy retrieval tool Find the possible hidden similarity between biological data and simulated data by one of the unsupervised machine learning algorithm - competitive learning algorithm EPFL - Computer Science - LBD 11
EPFL - Computer Science - LBD 12
Object-Oriented UML definition Easy to be understand by developers A standard XML format Easy to share information between different users Easy to compare the biological and computational information Easy to be used by the ontology reasoners or be mined by other intelligent agents Easy to communicate between different existing database EPFL - Computer Science - LBD 13
Biological neuron example A pyramidal neural cell: Neuron_Classification: multipolar neuron OrganInstance: hippocampus or cerebral cortex Compartments: a triangularly shaped soma a single apical dendrite multiple basal dendrites a single axon ElectricalProperty: K + channels on dendrites EPFL - Computer Science - LBD 14
Biological Neuron information EPFL - Computer Science - LBD 15
Computational p Model Example Simple Model of Spiking Neurons: Two equations: Three variables Four parameters EPFL - Computer Science - LBD 16
ReadInterface I WriteInterface u and v Hypothesis Hodgkin-Huxley-type yp dynamics and integrate-and fire BiologicalQuestion Spiking and bursting behavior of known types of cortical neurons Reference Izhikevich artificial neuron model from EM Izhikevich "Simple Model of Spiking Neurons IEEE Transactions On Neural Networks, Vol. 14, No. 6, November 2003 pp 1569-1572 EPFL - Computer Science - LBD 17
Computational Model information EPFL - Computer Science - LBD 18
Connection between biological information and Computational Model by BiologicalExplanation: for example, equation definition: EPFL - Computer Science - LBD 19
Simulation Example Two computational models: M1:Computational_Model_Na M2:Computational_Model_K Neural cell with 3 compartments: Soma, Axon Dendrite Simulated Element: M1 and M2 applied on the membrane of each of the three compartments with different initial conditions, stop conditions and parameter settings EPFL - Computer Science - LBD 20
Simulation information EPFL - Computer Science - LBD 21
Predefined annotations for mining EPFL - Computer Science - LBD 22
Important Definitions Dimension: The number of points in one-dimension vector, such as X=, m is the dimension of vector X. Prototype: The center of cluster or a typical example of the data in a cluster Data Clustering: A technique for data analysis by partitioning a data set into subsets whose elements share common traits. Correlated Prototypes: The pair of prototypes that the cross-correlation between them is greater than predefine threshold. It means they may represent the same cluster. EPFL - Computer Science - LBD 23
K-means Competitive Learning Time consummation is great, slow to be converged Dead units and number of classes (clusters) unknown Results depends on the initial prototypes Need to recalculate all when new sample come Kohonen Competitive Learning Simple, need to decrease learning rate to be converged Dead units and number of classes unknown Results depends on the initial prototypes Fuzzy-C-Means Competitive Learning Partial membership in classes Results depends on the initial prototypes Need to recalculate l all when new sample come Principal component analysis (PCA) Works for the homologous same type of data Need to recalculate all when new sample come EPFL - Computer Science - LBD 24
Kohonen Competitive Learning Define more prototypes More dead units at beginning But not to miss real prototype Randomly initialize prototypes Cit Criteriai guarantee few correlated prototypes Reinitialize prototypes when samples scale grows to get better precision Decrease learning rate when samples size grows Learn fast at beginning, g slowly later EPFL - Computer Science - LBD 25
This project is implemented in Oracle by using the PL/SQL language Query tool: The semester project of EasyQueries of laboratory of Databases of EPFL in 2007 (author: Ariane Pasquier, supervised by Dr. Fabio Porto) EPFL - Computer Science - LBD 26
Possibility to reduce the dimension for the homologous data samples EPFL - Computer Science - LBD 27
To improve the quality of the clusters when the samples number has grown in some scales: Initialize the prototypes Execute the competitive i learning again EPFL - Computer Science - LBD 28
To improve the performance in Oracle: By using external tools such as Matlab to execute the learning algorithm Oracle serves as the storage and retrieval tools EPFL - Computer Science - LBD 29
What has been done in this project: A database stores the biological and computational neural information, also the experiment and simulation i information i A database learns from the high dimension data series or graph information to find the cluster information EPFL - Computer Science - LBD 30
An intelligent input tool: Editor Import tool PCA To reduce the dimension of homologous data samples External Competitive learning interface Phylogenetic tree From bottom to up From detail to general EPFL - Computer Science - LBD 31
Thanks Prof. Stefano Spaccapietra p to have accepted my proposition for this project. Thanks for the help of Dr. Fabio Porto during the whole work. Thanks the support of my family. Questions EPFL - Computer Science - LBD 32
Cluster analysis. (2008, June 17). In Wikipedia, The Free Encyclopedia. Retrieved 14:58, June 17, 2008, from http://en.wikipedia.org/w/index.php?title=cluster_analysis&oldid=219899952 / /i d h t l i ldid Neuron. (2008, May 18). In Wikipedia, the Free Encyclopedia. Retrieved 08:16, May 26, 2008, from http://en.wikipedia.org/w/index.php?title=neuron&oldid=213235579 Membrane potential. (2008, May 21). In Wikipedia, The Free Encyclopedia. Retrieved 08:14, May 26, 2008, from http://en.wikipedia.org/w/index.php?title=membrane_potential&oldid=214033480 p p p p Pyramidal cell. (2008, May 8). In Wikipedia, The Free Encyclopedia. Retrieved 13:46, May 26, 2008, from http://en.wikipedia.org/w/index.php?title=pyramidal_cell&oldid=211030760 Izhikevich artificial neuron model from EM Izhikevich "Simple Model of Spiking Neurons" IEEE Transactions On Neural Networks, Vol. 14, No. 6, November 2003 pp 1569-1572 Kanungo, T. Mount, D.M. Netanyahu, N.S. Piatko, C.D. Silverman, R. Wu, A.Y.: An Efficient k-means Clustering Algorithm:Analysis and Implementation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24, NO. 7, JULY 2002 IBM Corporation, August, 2005 http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.qmf.doc.usi ng/dsqk2mst365.htm IBM Corporation, August, 2005 http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.qmf.doc.usi ng/dsqk2mst339.htm Ariane Pasquier: User Manul of EasyQueries. EPFL Semester Project April 2007. EPFL - Computer Science - LBD 33
Hines ML, Morse T, Migliore M, Carnevale NT, Shepherd GM. ModelDB: A Database to Support Computational Neuroscience. J Comput Neurosci. 2004 Jul-Aug;17(1):7-11. Heng Tao Shen, Xiaofang Zhou, Aoying Zhou: An adaptive and dynamic imensionality reduction method for high-dimensional indexing. The VLDB Journal (2007) 16(2): 219 234 Wikipedia contributors, 'Machine learning', Wikipedia, The Free Encyclopedia, 11 June 2008, 20:59 UTC, http://en.wikipedia.org/w/index.php?title=machine_learning&oldid=218711697 p p p g Neuroscience. (2008, June 10). In Wikipedia, The Free Encyclopedia. Retrieved 14:09, June 17, 2008, from http://en.wikipedia.org/w/index.php?title=neuroscience&oldid=218438467 Moving average. (2008, June 4). In Wikipedia, The Free Encyclopedia. Retrieved 07:27, June 18, 2008, from http://en.wikipedia.org/w/index.php?title=moving Moving_average&oldid=216962510 Principal components analysis. (2008, June 16). In Wikipedia, The Free Encyclopedia. Retrieved 07:45, June 18, 2008, from http://en.wikipedia.org/w/index.php?title=principal_components_analysis&oldid=219604812 T. Kohonen Self-Organization and Associative Memory. Springer-Verlag Verlag, Berlin Heidelberg, 1989 H.Nielsen. Neurocomputing. Addison-Wesley, Redwood City. 1990 EPFL - Computer Science - LBD 34
EPFL - Computer Science - LBD 35
K-means Fuzzy-C-means Kohonen (http: //spie.org/x24069.xml) PCA (3-dimensional gene expression samples are project onto a 2-dimensional component space) (http://www.nlpca.de/pca.html) EPFL - Computer Science - LBD 36
The distance is usually defined as the Euclidian norm: The prototype j with a minimum distance is named winner: The prototype vector is moved a certain proportion of the distance between it and the sample: EPFL - Computer Science - LBD 37
Tree of life Hand written digits EPFL - Computer Science - LBD 38