Neural Networks and Machine Learning research at the Laboratory of Computer and Information Science, Helsinki University of Technology

Size: px

Start display at page:

Download "Neural Networks and Machine Learning research at the Laboratory of Computer and Information Science, Helsinki University of Technology"

Osborne Armstrong
5 years ago
Views:

1 Neural Networks and Machine Learning research at the Laboratory of Computer and Information Science, Helsinki University of Technology Erkki Oja Department of Computer Science Aalto University, Finland AIHelsinki Seminar, April 28, 2017

2 CIS lab: the pre-history

3 CIS Lab Established first in 1965 as the electronics lab at Dept. of Technical Physics, HUT Changed names many times Chair : Teuvo Kohonen, : Erkki Oja. Then restructured Now part of the big CS Department at Aalto.

4 Neural networks

5 Neural networks Teuvo Kohonen started research into neural networks in early 1970 s (associative memories, subspace classifiers, speech recognition); then the Self-Organizing Map (SOM) from early 1980 s Teuvo will give a presentation in AIHelsinki later this spring E. Oja completed his PhD in 1977 with Teuvo.

6 Part of my PhD Thesis Kohonen-Oja paper

7 One of my Postdoc papers Cooper et al

8 Subspace book

9 Neural networks and AI 1970 s and 1980 s was the time of deep contradictions between neural computation (connectionism) and true AI True AI was symbolic, the mainstream was expert (knowledge-based) systems, search, logic, frames, semantic nets, Lisp, Prolog etc. In late 1980 s, probabilistic (Bayesian) reasoning and neural networks slowly sneaked into AI.

10 Le Net by Yann le Cun, 1989

11 The first ICANN ever, in 1991

12 My problem at that time: what is nonlinear Principal Component Analysis (PCA)? My solution: a novel neural network, deep auto-encoder E. Oja: Data compression, feature extraction, and auto-association in feedforward neural networks. Proc. ICANN 1991, pp

13 Deep auto-encoder (from the paper)

14 The trick is that a data vector x is both the input and the desired output. This was one of the first papers on multilayer (deep) auto-encoders, which today are quite popular. In those days, this was quite difficult to train. Newer results: Hinton and Zemel (1994), Bengio (2009), and many others.

15 CIS lab: the past 25 years

16 The research was structured by 4 consequent Centers of Excellence financed by the Academy of Finland Neural Networks Research Centre NNRC, Continuation, Adaptive Informatics Research Centre AIRC, Computational Inference Research Centre COIN,

17 DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

18 COIN Centre of Excellence in Computational Inference: Introduction Erkki Oja, Director of COIN

19 Computational logic, Intelligent systems (Niemelä, Myllymäki) Added Value Computational statistics, Computational biology (Corander, Aurell) Data analysis, Machine learning (Oja, Kaski, Laaksonen)

20 Some ML algorithms studied at the CoE s Visualization, nonlin. dim. reduction Probabilistic Latent Variable Models, Bayes Bayes blocks Relevance by data fusion Nonl.dynamics, subspaces Relational models Sparse PCA, DSS, nonneg. projections Nonlinear, non-neg. BSS Linear mixtures SOM ICA FastICA Reliability Reliability Page 20 Department of Information and Computer Science

21 Demonstration of Independent Component Analysis (ICA): Original 9 independent images

22 9 mixtures with random mixing; this is the only available data we have

23 Estimated original images, found by an ICA algorithm

25 My own present research topic: matrix factorizations for data analysis

26 Some ML algorithms studied at the CoE s Visualization, nonlin. dim. reduction Probabilistic Latent Variable Models, Bayes Bayes blocks Relevance by data fusion Nonl.dynamics, subspaces Relational models Sparse PCA, DSS, nonneg. projections Nonlinear, non-neg. BSS Linear mixtures SOM ICA FastICA Reliability Reliability Page 26 Department of Information and Computer Science

28 Example: spatio-temporal data Graphically, the situation may be like this: time s p a c e s p a c e time H X W

29 Global daily temperature ( points x days)

30 E.g. global warming component One row of matrix H Correspond ing column of matrix W

32 A successful example: the Netflix competition

33 Non-negative matrix factorization

34 NMF and its extensions is today quite an active research topic Tensor factorizations (Cichocki et al, 2009) Low-rank approximation (LRA) (Markovsky, 2012) Missing data (Koren et al, 2009) Robust and sparse PCA (Candés et al, 2011) Symmetric NMF and clustering (Ding et al, 2012)

35 NMF and clustering Clustering is a very classical problem, in which n vectors (data items) must be partitioned into r clusters. The clustering result can be shown by the nxr cluster indicator matrix H It is a binary matrix whose element h 1 ij if and only if the i-th data vector belongs to the j-th cluster

41 The k-means algorithm is minimizing the cost function: 2 J r j 1 If the indicator matrix is suitably normalized then this becomes equal to (Ding et al, 2012) J X x C i j Notice the similarity to NMF and PCA! ( Binary PCA ) x i XHH c T j 2

42 Actually, minimizing this (for H) is mathematically equivalent to maximizing tr( X T XHH T ) which immediately allows the kernel trick of replacing X T X with kernel k( x, x ) i j, extending k-means to any data structures (Yang and Oja, IEEE Tr-Neural Networks, 2010).

43 A novel clustering method: DCD Starting again from the binary cluster indicator matrix H, we can define another binary matrix called cluster incidence matrix defined as M HH Its ij-th element is equal to one if the i-th and the j-th data item are in the same cluster, zero otherwise. T

44 It is customary to normalize it so that the row sums (and column sums, because it is symmetric) are equal to 1 (Shi and Malik, 2000). Call the normalized matrix also M. Assume a suitable similarity measure S ij between every i-th and j-th data items (for example a kernel). Then a nice criterion is: J S M

45 This is an example of symmetrical NMF because both the similarity matrix and the incidence matrix are symmetrical, and both are naturally nonnegative. S is full rank, but the rank of M is r. Contrary to the usual NMF, there are two extra constraints: the row sums of M are equal to 1, and M is a (scaled) binary matrix. The solution: probabilistic relaxation to smooth the constraints (Yang, Corander and Oja, JMLR, 2016)

46 Data-cluster-data (DCD) random walk

47 Clustering results for large datasets DCD k-means Ncut

48 Clustering results for large datasets DCD k-means Ncut

49 CIS lab : the future Now part of the CS department at Aalto School of Science Less isolated, much partnerships (other CS groups, HIIT, Helsinki University etc.) Talented researchers, increasingly international Strong impact on Machine Learning in Finland and in the world, in research and teaching Example: our M.Sc. Program Macadamia (Machine Learning and Data Mining; Mannila & Oja 2008)

50 Macadamia was the 3rd most popular M.Sc. Program in Aalto School of Science in 2017

51 THANK YOU FOR YOUR ATTENTION!

Automatic Rank Determination in Projective Nonnegative Matrix Factorization

Automatic Rank Determination in Projective Nonnegative Matrix Factorization Zhirong Yang, Zhanxing Zhu, and Erkki Oja Department of Information and Computer Science Aalto University School of Science and