Russell Hanson DFCI April 24, 2009

Similar documents
Text mining and natural language analysis. Jefrey Lijffijt

Topic Models and Applications to Short Documents

Collaborative Topic Modeling for Recommending Scientific Articles

Dimension Reduction (PCA, ICA, CCA, FLD,

Collaborative topic models: motivations cont

Recent Advances in Bayesian Inference Techniques

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Topic Modelling and Latent Dirichlet Allocation

Topic Modeling: Beyond Bag-of-Words

Topic Modeling Using Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

Latent Dirichlet Allocation Introduction/Overview

Generative Clustering, Topic Modeling, & Bayesian Inference

* Matrix Factorization and Recommendation Systems

IPSJ SIG Technical Report Vol.2014-MPS-100 No /9/25 1,a) 1 1 SNS / / / / / / Time Series Topic Model Considering Dependence to Multiple Topics S

Distributed ML for DOSNs: giving power back to users

Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition

Computational methods for predicting protein-protein interactions

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

Bayesian Nonparametrics for Speech and Signal Processing

Motif Extraction and Protein Classification

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

Dept. of Physics 2018 Summer Internship Opportunities For undergraduate students

MOLECULAR MODELING IN BIOLOGY (BIO 3356) SYLLABUS

Intelligent Systems I

Department of Computer Science, Guiyang University, Guiyang , GuiZhou, China

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

A Continuous-Time Model of Topic Co-occurrence Trends

Bayesian Networks BY: MOHAMAD ALSABBAGH

EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER

Networks & pathways. Hedi Peterson MTAT Bioinformatics

16 : Approximate Inference: Markov Chain Monte Carlo

Term Filtering with Bounded Error

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1

Machine Learning 2007: Slides 1. Instructor: Tim van Erven Website: erven/teaching/0708/ml/

On the use of Long-Short Term Memory neural networks for time series prediction

Understanding Comments Submitted to FCC on Net Neutrality. Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations

STRUCTURAL BIOINFORMATICS I. Fall 2015

Content-based Recommendation

Mixtures of Multinomials

2 GENE FUNCTIONAL SIMILARITY. 2.1 Semantic values of GO terms

BSc MATHEMATICAL SCIENCE

Integrating Ontological Prior Knowledge into Relational Learning

Introduction to Bioinformatics

Non-parametric Clustering with Dirichlet Processes

Replicated Softmax: an Undirected Topic Model. Stephen Turner

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Graph Wavelets to Analyze Genomic Data with Biological Networks

Differential Modeling for Cancer Microarray Data

Deep Learning for NLP

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Correlated Protein Function Prediction via Maximization of Data-Knowledge Consistency

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)

Computational Systems Biology

CS Lecture 18. Topic Models and LDA

人工知能学会インタラクティブ情報アクセスと可視化マイニング研究会 ( 第 3 回 ) SIG-AM Pseudo Labled Latent Dirichlet Allocation 1 2 Satoko Suzuki 1 Ichiro Kobayashi Departmen

The Role of Network Science in Biology and Medicine. Tiffany J. Callahan Computational Bioscience Program Hunter/Kahn Labs

Modeling User Rating Profiles For Collaborative Filtering

Bioinformatics Chapter 1. Introduction

Recent Advances in Structured Sparse Models

Approximate inference for stochastic dynamics in large biological networks

Finding Your Literature Match

Applying hlda to Practical Topic Modeling

Latent Dirichlet Allocation Based Multi-Document Summarization

Lecture 13 : Variational Inference: Mean Field Approximation

Computational Biology From The Perspective Of A Physical Scientist

Analysis of N-terminal Acetylation data with Kernel-Based Clustering

Staffing formula and enrollment limits X Other REMOVAL of GWAR attribute

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees

Predicting Protein Functions and Domain Interactions from Protein Interactions

Collaborative Hotel Recommendation based on Topic and Sentiment of Review Comments

Nonparametric Bayesian Methods (Gaussian Processes)

IBM Q: building the first universal quantum computers for business and science. Federico Mattei Banking and Insurance Technical Leader, IBM Italy

Collaborative Filtering

Comparative Summarization via Latent Dirichlet Allocation

Unsupervised Classification via Convex Absolute Value Inequalities

Preliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use!

CS612 - Algorithms in Bioinformatics

Welcome to CAMCOS Reports Day Fall 2011

Gene Ontology and overrepresentation analysis

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

Latent variable models for discrete data

Research topics and trends over the past decade ( ) of Baltic Coleopterology using text mining methods

Kernel Density Topic Models: Visual Topics Without Visual Words

Simon Trebst. Quantitative Modeling of Complex Systems

Efficient Tree-Based Topic Modeling

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Analyzing Burst of Topics in News Stream

Learning Gaussian Process Kernels via Hierarchical Bayes

Machine learning methods for gene/protein function prediction

Text Mining for Economics and Finance Latent Dirichlet Allocation

Bioinformatics I. CPBS 7711 October 29, 2015 Protein interaction networks. Debra Goldberg

Overview. Overview. Social networks. What is a network? 10/29/14. Bioinformatics I. Networks are everywhere! Introduction to Networks

Text Mining: Basic Models and Applications

Clustering and Network

Transcription:

DFCI Boston: Using the Weighted Histogram Analysis Method (WHAM) in cancer biology and the Yeast Protein Databank (YPD); Latent Dirichlet Analysis (LDA) for biological sequences and structures Russell Hanson DFCI April 24, 2009

Outline Part I Uri Alon s cell fate modeling Weighted histogram analysis method Part II Gene function in cell Weighted histogram analysis method Part III Corpora clustering Latent Dirichlet Analysis for topical clustering DFCI 2009 2

Uri Alon s talk (p1) Question: Why do some cells live and some die? Procedure: Add chemotherapy drugs; monitor cell fate DFCI 2009 3

Uri Alon s talk (p2) Superposition idea: P 1,2 (t) = a*p 1 (t)+b*p 2 (t) s.t. a+b=1. N.B. non-linearities; non-linear responses; non-linear systems; 5 coupled differential equations in biochemical processes; simulation methods in biochemical pathways (Plectix); system doesn t saturate to 1 upon additional drug addition DFCI 2009 4

Some ideas from statistical mechanics - WHAM Superposition idea: P 1,2 (t) = a*p 1 (t)+b*p 2 (t) s.t. a+b=1. DFCI 2009 5

Part II Yeast functional classes DFCI 2009 6

YPD data Yeast Protein Data Bank has the following data: DFCI 2009 7

Matrix representation for YPD classifications DFCI 2009 8

Part III Corpora and thematic clustering Shift gears slightly: making recommendations for news websites, etc. will return shortly Want to have more fine-grained recommendations than connectivity in user network -- weight in a given thematic cluster. DFCI 2009 9

Attributized Bayesian Choice Modeling Attributized content items, i, are stored as vectors in the choice-set database such that: Summary of tastes, T: Collaborative Filtering for text and news : Cold Start Problem (it isn t collaborative until it s collaborative) Past Experience: Some people want the most popular ( Dodgers make offer to Manny Ramirez - Boston.com ); some don t ( Non-Abelian Anyons and Topological Quantum Computation ) By weight in whole network; by weight in user s network; by weight in thematic cluster DFCI 2009 10

Methods in corpora clustering (p1) DFCI 2009 11

Methods in corpora clustering (p2) DFCI 2009 12

Latent Dirichlet Allocation/Analysis (p1) DFCI 2009 13

Latent Dirichlet Alloca/on/Analysis (p2) DFCI 2009 14

Latent Dirichlet Allocation/Analysis (p3) DFCI 2009 15

Latent Dirichlet Allocation/Analysis for proteins/genes/cancer targets/clinical data Just as in previous slide, topics were clustered around Arts, Budgets, Children, and Education. - Cluster around patient type - Cluster around drug type - Cluster around drug-interaction type - Cluster around phenotype/genotype - Cluster around structure/sequence types DFCI 2009 16

Selected references: A. A. Cohen, et al. Response to a Drug Dynamic Proteomics of Individual Cancer Cells in DOI: 10.1126/Science.1160165, 1511 (2008); 322 Ferrenberg AM, Swendsen RH. Optimized Monte-Carlo data-analysis Physical Review Letters volume: 63 issue: 12 pages: 1195-1198 Sep 18 1989. Document Clustering in Large German Corpora Using Natural Language Processing, Richard Forster (2006), University of Zurich Latent Dirichlet Allocation, Blei, Ng, and Jordan, Journal of Machine Learning Research 3 (2003) 993-1022 LobeLink.com A. Ben-Hur and W.S. Noble "Kernel methods for predicting protein-protein interactions." Bioinformatics (Proceedings of the Intelligent Systems for Molecular Biology Conference). 21:38-46, 2005. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNBI 4115, pp. 514 524, 2006. Prediction of Protein Complexes Based on Protein Interaction Data and Functional Annotation Data Using Kernel Methods. Springer-Verlag Berlin Heidelberg 2006. B. Schwikowski, P. Uetz, S. Fields. A network of protein-protein in yeast. Nature Biotechnology 18 (12): 1257-1261 Dec. 2000. interactions K. Tsuda, H.J. Shin, B. Schölkopf. Fast protein classification with multiple networks. Bioinformatics 2005 21(Suppl 2):ii59-ii65. J.-P. Vert, K. Tsuda and B. Schölkopf, A primer on kernel methods, in Kernel Methods in Computational Biology, B. Schölkopf, K. Tsuda and J.-P. Vert (Eds.), MIT Press, p.35-70, 2004. DFCI 2009 17