Quantitative Structure-Activity Relationships (QSARs) Tutorial

Similar documents
Computer simulation of radioactive decay

Tutorial 12 Excess Pore Pressure (B-bar method) Undrained loading (B-bar method) Initial pore pressure Excess pore pressure

An area chart emphasizes the trend of each value over time. An area chart also shows the relationship of parts to a whole.

Moving into the information age: From records to Google Earth

Electric Fields and Equipotentials

Tutorial 23 Back Analysis of Material Properties

v WMS Tutorials GIS Module Importing, displaying, and converting shapefiles Required Components Time minutes

Comparing whole genomes

Using Microsoft Excel

Data Structures & Database Queries in GIS

1 Introduction to Minitab

How many states. Record high temperature

WISE Regression/Correlation Interactive Lab. Introduction to the WISE Correlation/Regression Applet

OECD QSAR Toolbox v.3.0

Exercise 6: Coordinate Systems

Didacticiel - Études de cas

Investigating Factors that Influence Climate

TALLINN UNIVERSITY OF TECHNOLOGY, INSTITUTE OF PHYSICS 6. THE TEMPERATURE DEPENDANCE OF RESISTANCE

Lab 1 Uniform Motion - Graphing and Analyzing Motion

TECDIS and TELchart ECS Weather Overlay Guide

Newton s Second Law of Motion

Tutorial 8 Raster Data Analysis

M&M Exponentials Exponential Function

OECD QSAR Toolbox v.3.3. Step-by-step example of how to categorize an inventory by mechanistic behaviour of the chemicals which it consists

Foundations of Computation

OECD QSAR Toolbox v.4.1

OECD QSAR Toolbox v.3.3

Lab 15 Taylor Polynomials

OECD QSAR Toolbox v.3.4

A Scientific Model for Free Fall.

Retrieve and Open the Data

Presenting Tree Inventory. Tomislav Sapic GIS Technologist Faculty of Natural Resources Management Lakehead University

OECD QSAR Toolbox v.4.1. Tutorial on how to predict Skin sensitization potential taking into account alert performance

1. Double-click the ArcMap icon on your computer s desktop. 2. When the ArcMap start-up dialog box appears, click An existing map and click OK.

Experiment: Oscillations of a Mass on a Spring

NMR Predictor. Introduction

Tutorial. Getting started. Sample to Insight. March 31, 2016

v Prerequisite Tutorials GSSHA WMS Basics Watershed Delineation using DEMs and 2D Grid Generation Time minutes

Introduction to Hartree-Fock calculations in Spartan

Contents. 9. Fractional and Quadratic Equations 2 Example Example Example

OECD QSAR Toolbox v.4.1

Ligand Scout Tutorials

Dock Ligands from a 2D Molecule Sketch

caused displacement of ocean water resulting in a massive tsunami. II. Purpose

How to Make or Plot a Graph or Chart in Excel

Calculating Conflict Density and Change over Time in Uganda using Vector Techniques

OECD QSAR Toolbox v.4.0. Tutorial on how to predict Skin sensitization potential taking into account alert performance

Error Analysis, Statistics and Graphing Workshop

Physics. Simple Harmonic Motion ID: 9461

Geography 3251: Mountain Geography Assignment II: Island Biogeography Theory Assigned: May 22, 2012 Due: May 29, 9 AM

Introduction to Computer Tools and Uncertainties

EOS 102: Dynamic Oceans Exercise 1: Navigating Planet Earth

TRANSIENT MODELING. Sewering

Curriculum Support Maps for the Study of Indiana Coal (Student Handout)

Remember that C is a constant and ë and n are variables. This equation now fits the template of a straight line:

Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear

Better Exponential Curve Fitting Using Excel

Newton s Second Law. Computer with Capstone software, motion detector, PVC pipe, low friction cart, track, meter stick.

OECD QSAR Toolbox v.4.1. Tutorial illustrating new options for grouping with metabolism

Interpolation Techniques

PHY221 Lab 2 - Experiencing Acceleration: Motion with constant acceleration; Logger Pro fits to displacement-time graphs

Project: Equilibrium Simulation

MERGING (MERGE / MOSAIC) GEOSPATIAL DATA

SRM assay generation and data analysis in Skyline

Part 1: Play Purpose: Play! See how everything works before we try to find any relationships.

Harvard Life Science Outreach December 7, 2017 Measuring ecosystem carbon fluxes using eddy covariance data ACTIVITIES I. NAME THAT ECOSYSTEM!

Lesson 4 Linear Functions and Applications

Petrel TIPS&TRICKS from SCM

General Physics Laboratory Experiment Report 1st Semester, Year 2018

Introduction to Spectroscopy: Analysis of Copper Ore

Experiment 0 ~ Introduction to Statistics and Excel Tutorial. Introduction to Statistics, Error and Measurement

MEASUREMENT OF THE CHARGE TO MASS RATIO (e/m e ) OF AN ELECTRON

Figure 2.1 The Inclined Plane

Physics Spring 2006 Experiment 4. Centripetal Force. For a mass M in uniform circular motion with tangential speed v at radius R, the required

Introduction to ArcGIS 10.2

Using SkyTools to log Texas 45 list objects

CC Algebra 2H Transforming the parent function

Experiment 1: The Same or Not The Same?

Quick Start Guide New Mountain Visit our Website to Register Your Copy (weatherview32.com)

41. Sim Reactions Example

Kinematics Lab. 1 Introduction. 2 Equipment. 3 Procedures

Your work from these three exercises will be due Thursday, March 2 at class time.

LAB 2 - ONE DIMENSIONAL MOTION

M E R C E R W I N WA L K T H R O U G H

Hot Spot / Kernel Density Analysis: Calculating the Change in Uganda Conflict Zones

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Physics Department. Experiment 03: Work and Energy

Working with ArcGIS: Classification

17-Nov-2015 PHYS MAXWELL WHEEL. To test the conservation of energy in a system with gravitational, translational and rotational energies.

Space Objects. Section. When you finish this section, you should understand the following:

Physical Chemistry Final Take Home Fall 2003

85. Geo Processing Mineral Liberation Data

Exercise 4 Estimating the effects of sea level rise on coastlines by reclassification

Students will explore Stellarium, an open-source planetarium and astronomical visualization software.

7. STREAMBED TEXTURE ANALYSIS

OECD QSAR Toolbox v.4.1. Tutorial illustrating new options of the structure similarity

M61 1 M61.1 PC COMPUTER ASSISTED DETERMINATION OF ANGULAR ACCELERATION USING TORQUE AND MOMENT OF INERTIA

1. Write the symbolic representation and one possible unit for angular velocity, angular acceleration, torque and rotational inertia.

How to Create Stream Networks using DEM and TauDEM

Overlay Analysis II: Using Zonal and Extract Tools to Transfer Raster Values in ArcMap

Gravity: How fast do objects fall? Teacher Advanced Version (Grade Level: 8 12)

Transcription:

Quantitative Structure-Activity Relationships (QSARs) Tutorial Single and Multi Linear Regression This tutorial will show you how to perform a QSAR with MLR and PLS statistical tools

In the Gnumerics file enclosed (Tutorial Data) you will find 29 sugars with different sucrose relative power and 8 different descriptors as reported in the publication. The objective of this tutorial is to make you use of simple informatics tools to establish QSAR models. QSAR Pagina 2 The tutorial rely on the data available from Singh s paper. You can fine the publication in elearning website

Gnumeric QSAR Pagina 3 The gnumeric program will be used. The program can be searched in the net as it is freely available. Anyway it is also available from elarning. Download it and install by double click on the file.

Gnumeric QSAR Pagina 4 Answer to the question during the installation as shown in the slides

Gnumeric QSAR Pagina 5

Gnumeric QSAR Pagina 6

Gnumeric QSAR Pagina 7 Look for the folder where gnumeric has been installed and open up it

Open the file QSAR Pagina 8 By mean of the «file» menu

Open the file QSAR Pagina 9 From elearning download the file Rajesh.Student.gnumeric and open it with gnumeric

Initial Table QSAR Pagina 10 The opened file should appear as in the slide. There is a dependent variable (RS = relative sweetness) and 8 descriptors

Descriptors DH in the table W in the table EA and IP in the table SE in the table QSAR Pagina 11 Here are described the 8 used descriptors

Check dependent variable (RS) QSAR Pagina 12 The first thing to do is to check the linear distribution of the dependent variable RS. So copy the column of data belonging to RS it in the memory.

Check dependent variable (RS) QSAR Pagina 13. and paste it the into a new sheet

Check dependent variable (RS) QSAR Pagina 14

Check dependent variable (RS) QSAR Pagina 15 To check for linearity, just insert a column type chart

Check dependent variable (RS) QSAR Pagina 16 As you can see, the preview is showing the plot

Check dependent variable (RS) QSAR Pagina 17 Then click in the area of the sheet grid and the plot will appear. Clearly there is no linear distribution of the dependent variable

Check dependent variable (RS) QSAR Pagina 18 Let s transform it in the logarithm scale as shown in the slide

Check dependent variable (RS) QSAR Pagina 19

Check dependent variable (RS) QSAR Pagina 20 Then apply the formula for the cell «I2» to the other cells just grabbing the small square in the cell right bottom corner

Check dependent variable (RS) QSAR Pagina 21 And all the data will be transformed

Check dependent variable (RS) QSAR Pagina 22 Make the plot as shown before and now the linearity is present.

QSAR Pagina 23 Duplicate the sheet by right clicking on the sheet name.

Check dependent variable (RS) QSAR Pagina 24 Rename the sheet into «New Table»

Check dependent variable (RS) QSAR Pagina 25 Highlight the transformed RS variable and copy into the memory with the right click

Check dependent variable (RS) QSAR Pagina 26 Then paste it to the «New Table» sheet. Use the «paste special» option

Check dependent variable (RS) QSAR Pagina 27 and select «as value»

Check dependent variable (RS) QSAR Pagina 28 Now you have the dependent variable fixed

Independent variables analysis Notice that the 8 molecular descriptors have very different magnitudes. To guarantee the comparability of the MLR coefficients (see previous lesson on QSAR), proceed to the normalization of all the descriptors (for instance for the interval [0; 1] ). The first step is to calculate the MAX and MIN values of each descriptor with the functions =MAX(range) and =MIN(range) as described in the following slides QSAR Pagina 29 Once we have checked the dependent variable (RS) we need to analyse the independent variables (descriptors).

Independent variables analysis QSAR Pagina 30 First we calculate the maximum and the minimum values for each colum

Independent variables analysis QSAR Pagina 31 The method is similar to that already shown. Just do the same as displayed in the slide.

Independent variables analysis QSAR Pagina 32

Independent variables analysis QSAR Pagina 33

Independent variables analysis QSAR Pagina 34 And all the min and max values are promptly calculated

Independent variables analysis QSAR Pagina 35 In a new sheet.

Independent variables analysis QSAR Pagina 36 renamed «Normalized»

Independent variables analysis QSAR Pagina 37 Copy the ID and labels from «New Table» and also the descriptors labels

Independent variables analysis QSAR Pagina 38 paste everything in the «Normalized» sheet

Independent variables analysis QSAR Pagina 39 Make sure you have the same values for RS

Independent variables analysis QSAR Pagina 40 Now apply the normalization as described in the lesson. (see also the formula in the slide). Here the MinMax normalization is being applied: Normalized valued = (original value min) / (max min)

Independent variables analysis QSAR Pagina 41 And let s obtain the normalized matrix, that can be checked by calculating min and max. Their values should range all between 0 and 1.

Independent variables analysis QSAR Pagina 42 At this point we can continue analysing the data.

Independent variables analysis QSAR Pagina 43 Prepare a new sheet «Intercorrelation» as in the slide

Independent variables analysis QSAR Pagina 44 And by the proper function calculate the squared pearson coefficients (r2)

Independent variables analysis QSAR Pagina 45 First RS vs RS. It should be = 1!

Independent variables analysis QSAR Pagina 46 Lock at the formula and define fixed variable (insert $ signs as in the slide)

Independent variables analysis QSAR Pagina 47 Then just by drugging the r2 is calculated for all the values in the first row. Make a similar operation in the others rows

Independent variables analysis QSAR Pagina 48 And the correlation matrix is obtained

Independent variables analysis QSAR Pagina 49 In the first row are reported the r2 values for each variable with the dependent variable, these are monoparametric regressions. The best one is IP that as the highest r^2 value.

Independent variables analysis QSAR Pagina 50 In the yellow area are highlighted the intercorrelations among the descriptors to check for any collinearity. As a limit an r2 greater than 0.5 means that two variable are correlated and should not used together.

Choice of the best descriptors (stepwise method) In order to get an overview of the uniparametric relationship between the various descriptors we use a linear regression analysis function built into Gnumeric (Statistics > Dependent Observations > Regression) QSAR Pagina 51 In the next slides we are going to make a stepwise building of the QSAR model

Choice of the best descriptors (stepwise method) QSAR Pagina 52

Choice of the best descriptors (stepwise method) QSAR Pagina 53 We start with the uniparametric regressions using the «Statistic Regression» menu. Then by clicking in the X and Y variable boxes we define the areas containing the values. While doing this operation select also the label of the variables. Then click on the OK button.

Choice of the best descriptors (stepwise method) QSAR Pagina 54 and a new sheet «Regression (1)» will be created with all the statistical values

Choice of the best descriptors (stepwise method) logrs = 0.90 + 2.40 * EA QSAR Pagina 55 By analysing the new sheet we can observe the intercept and constant values of the line and thus we can write its equation

Choice of the best descriptors (stepwise method) QSAR Pagina 56 Reapeat for all the descriptors

Choice of the best descriptors (stepwise method) QSAR Pagina 57

Choice of the best descriptors (stepwise method) QSAR Pagina 58

Choice of the best descriptors (stepwise method) QSAR Pagina 59

Choice of the best descriptors (stepwise method) QSAR Pagina 60

Choice of the best descriptors (stepwise method) QSAR Pagina 61

Choice of the best descriptors (stepwise method) QSAR Pagina 62

Choice of the best descriptors (stepwise method) QSAR Pagina 63 And record the r^2 values in a new sheet «Regressions»

Choice of the best descriptors (stepwise method) QSAR Pagina 64

Choice of the best descriptors (stepwise method) QSAR Pagina 65 Compare the Regressions with the initial correlations and the they should be the same values.

MLR with all variables Overfitting! QSAR Pagina 66 Just to try, instead of highlighting one parameter, select all the 8 descriptors while doung the regressions, in the new sheet «Regression (9)» the values are clearlyindicating that 8 parameters give a perfect regression of r2 = 1! This is overfitting! We are using collinear parameters and the model is LYING!

No differences with unscaled data! QSAR Pagina 67 IF we repeated with unscaled data (without normalizing) we can observe that the results are the same, the only differences are the intercept and the variable coefficient values. Although it seems that scaling is not necessary, apply always a normalization in you matrix!

Step Forward Multi Linear Regressions QSAR Pagina 68 Going back to the intercorrelation data it is possible to see that the first parameter to use is «IP» and that we can associate with it only «W», «MR», «SASA», «TE» and «DH» values. The ones that display low correlation with «IP». At first glance we could expect a godd correlation by using atriparametric model including «IP»,

Step Forward Multi Linear Regressions QSAR Pagina 69 Let s prepare the biparametric combinations

Step Forward Multi Linear Regressions QSAR Pagina 70 And by using the «Statistic Regression» menu the five biparametric are easily build.

Step Forward Multi Linear Regressions QSAR Pagina 71

Step Forward Multi Linear Regressions QSAR Pagina 72

Step Forward Multi Linear Regressions QSAR Pagina 73

Step Forward Multi Linear Regressions QSAR Pagina 74

Biparametric models QSAR Pagina 75 The best combination resulted to be IP + W, but also IP + MR should be analyzed with other descriptor

Step Forward Multi Linear Regressions QSAR Pagina 76 So prepare the triparametric combination for IP + W

Step Forward Multi Linear Regressions QSAR Pagina 77 And the other threes for IP + MR

Step Forward Multi Linear Regressions QSAR Pagina 78 And build the seven triparametric models

Step Forward Multi Linear Regressions QSAR Pagina 79

Step Forward Multi Linear Regressions QSAR Pagina 80

Step Forward Multi Linear Regressions QSAR Pagina 81

Step Forward Multi Linear Regressions QSAR Pagina 82

Step Forward Multi Linear Regressions QSAR Pagina 83

Step Forward Multi Linear Regressions QSAR Pagina 84

Triparametric models QSAR Pagina 85 From the results the best triparametric model is that with «IP» + «MR» + «SASA»

Triparametric models QSAR Pagina 86 The fitting plot can be made by the «Insert Chart» menu

Compare result with those published Are the same! QSAR Pagina 87 Comparing the results with those reported in the publication the same results were obtained using data not normalized. The same results are obtained either with normalized or autoscaled data!