High-Throughput in Chemical Crystallography from an industrial point of view Ina Dix Novartis Institutes for Biomedical Research, Basel
Analytics at Novartis (Basel) staff # spectra NMR 8 10.000 1.500 structure determinations IR 3 10.000 all completely interpreted MS 5 18.000 11.000 completely interpreted A MS M-H+ = 444 IR O NMR N N O sample (purified) with HPLC molecular mass functional groups structure
Analytics at Novartis (Basel) staff # spectra NMR 8 10.000 1.500 structure determinations IR 3 10.000 all completely interpreted MS 5 18.000 11.000 completely interpreted X-ray # samples XP XX structures 2010* 3 291 199 92 208 * first 8 month
Number of samples at Novartis (Basel) 450 400 350 no. samples 300 250 200 150 expected accepted 100 50 0 2003 2004 2005 2006 2007 2008 2009 2010 year
Number of samples at Novartis (Basel) incoming processed 1 (0.3 %) 199 (68.3 %) 92 (31.6 %) 119 (41.0 %) 89 (30.6 %) 79 (27.1 %) 3 (1.0 %) blue: red: crystalline not crystalline blue: crystalline yellow: grown crystals red: not crystalline
Crystal morphologies at Novartis (Basel) needles plates others 26 % 63 % 11 %
Crystal sizes at Novartis (Basel) 250 200 150 100 50 0 26 % 63 % 11 % 1.2 1 0.8 0.6 0.4 0.2 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 other
Instruments used for measurements ImS RA SLS 10 µm 0 20 40 60 80 100 120
Categories of compounds 160 140 120 100 80 with HA w/o HA 60 40 20 0 chiral non chiral
Number of independent molecules 8 7 6 5 4 3 2 1 0 20 40 60 80 100 120 140
Some more statistics 2.4 % 79.8 % 17.8 % 65.4 % 19.7 % 11.6 % structures: finished refinement: normal ongoing disorder not solved twinned modulated 71 % of measured crystals are solvates! 3.3 %
Statistics 2010: quality criteria 166 structures (8 month) 0.35 0.3 0.25 0.2 0.15 0.1 0.05 10.72 wr2 5.07 R 1 (all) 0
Reasons for x-ray structures stereochemistry proof of constitution bond length and angles 3-dimensional models d absolute structure determination unknown compounds?
Customised approach Proof of constitution (routine case) N H N H N H 1. Structure solution as fast as possible N H N H compound o.k.? symmetry structure o.k.? publicity HO HO H O 2. Completion of data set based on requirements HO OH Absolute structure determination
Time problems with Cu radiation partial data set (proof of constitution) full data set (ACTA C standards) to obtain structure as quick as possible: 2θ 2θ nλ = 2d sin(θ) move in detector to catch as many reflections as possible with only one detector position limiting factor: overlap for long cell edges at 4 cm up to 50 Å Mo Cu radiation (small (large 2θ range): 21 detector settings
Experimenatal parameter: increments of rotation x-ray x-ray each scan: overall 180º in ω outer sphere: 3 x 180, doubled exposure time ω # frames data RA+OH total 0.3 600 1:40 h 1:40 h 3:20 h 1.2 150 0:25 h 0:25 h 0:50 h 2.0 90 0:15 h 0:13 h 0:28 h * Exposure time: 5 s
Limiting factors: axis length, exposure time, software 5º image of an 47 Å axis x x 0.3º 2.1º Indexing and intergration: center of gravity? good compromise: ω frames data readout total 1.2 150 0:25 0:25 0:50
Scaling of Cu-data (SADABS) 1 001-100.00-100.00 60.00 54.74 2-0.300 600 5.00 2 001-100.00-100.00 180.00 54.74 2-0.300 600 5.00 3 001-100.00-100.00 300.00 54.74 2-0.300 600 5.00 4 001-40.00-40.00 0.00 54.74 2-0.300 600 5.00 5 001-40.00-40.00 120.00 54.74 2-0.300 600 5.00 6 001-40.00-40.00 240.00 54.74 2-0.300 600 5.00 typ. YLID acceptance measurement at d=5(!) cm after 10 h redundancy: 5.4 (3.3) 01 001-46.00-46.00 60.00 54.74 2-1.200 150 5.00 02 001-94.00-94.00 0.00 54.74 2-1.200 150 10.00 03 001-94.00-94.00 120.00 54.74 2-1.200 150 10.00 04 001-94.00-94.00 240.00 54.74 2-1.200 150 10.00 05 001-46.00-46.00 180.00 54.74 2-1.200 150 5.00 06 001-94.00-94.00 60.00 54.74 2-1.200 150 10.00 07 001-94.00-94.00 180.00 54.74 2-1.200 150 10.00 08 001-94.00-94.00 300.00 54.74 2-1.200 150 10.00 09 001-46.00-46.00 300.00 54.74 2-1.200 150 5.00 10 001-94.00-94.00 30.00 54.74 2-1.200 150 10.00 11 001-94.00-94.00 150.00 54.74 2-1.200 150 10.00 12 001-94.00-94.00 270.00 54.74 2-1.200 150 10.00 h Red. d/p 2:20 4.6 (2.8) 5.9 (97%) 4:40 9.2 (5.5) 6.8 (99%) 7:00 13.7 (8.3) 7.1
Completion of data set x-ray scan-ratio 1:3 x-ray completeness > 95 % resolution 0.84 Å redundancy > 3 I/σ(I) > 3 01 001-46.00-46.00 60.00 54.74 2-1.200 150 5.00 02 001-94.00-94.00 0.00 54.74 2-1.200 150 10.00 03 001-94.00-94.00 120.00 54.74 2-1.200 150 10.00 04 001-94.00-94.00 240.00 54.74 2-1.200 150 10.00 05 001-46.00-46.00 180.00 54.74 2-1.200 150 5.00 06 001-94.00-94.00 60.00 54.74 2-1.200 150 10.00 07 001-94.00-94.00 180.00 54.74 2-1.200 150 10.00 08 001-94.00-94.00 300.00 54.74 2-1.200 150 10.00 09 001-46.00-46.00 300.00 54.74 2-1.200 150 5.00 10 001-94.00-94.00 30.00 54.74 2-1.200 150 10.00 11 001-94.00-94.00 150.00 54.74 2-1.200 150 10.00 12 001-94.00-94.00 270.00 54.74 2-1.200 150 10.00 13 001-46.00-46.00 0.00 54.74 2-1.200 150 5.00 14 001-94.00-94.00 90.00 54.74 2-1.200 150 10.00 15 001-94.00-94.00 210.00 54.74 2-1.200 150 10.00 16 001-94.00-94.00 330.00 54.74 2-1.200 150 10.00 all w/o P 1 P 1 P 21/c P 2 1
What means based on requirements? R1 HO H Q1: 0.25 Q2: 0.20 Q3: 0.19 Q1: 0.48 Q2: 0.20 R2 R1 = 0.0316, wr2 = 0.0788, Flack x = 0.03(1)!! H N 5 % impurity not assignable with NMR >8Å N O N O N inversion of 6-ring causes line broadening ratio 60:40
example 1: good scatterer, low symmetry (P-1) after 2 scans 180º (50 min): 52 % compl. data to 0.84 Å data processing online preparation of input files 1 min solution after altogether 55 min! 0.006 0.030 0.025 0.025 0.015
example 2: twin, monoclinic after 150 frames (25 min): 77 % compl. data set to 1.10 Å Structure solution after 30 min after 2 seconds: 54 % of atoms (refinement: + 46%)
example 2: twin, monoclinic single crystal R1 all (R1 [I>2σ]) 3.07 (3.12) wr2 all 7.93 R1* 3.14 GooF 1.113 Weight 0.0110/4.2039 Flack x 0.0691(0.0617) twin (HKLF4) 2.64 (2.67) 6.59 2.56 1.168 0.000/1.9944 0.0303(0.0490) twin (HKLF5) [ratio 83/17] 2.82 (2.85) 7.20 2.59 1.164 0.0199/1.5010 0.0337(0.0510) * reflections after merging for Fourier
example 3: multiple twin, monoclinic
example 3: multiple twin, monoclinic after 150 frames (50 min): 73 % compl. data to 1.10 Å structure solution after 60 sec 41 atoms (32 OK, cycle1 +14, cycle 2 +2) R1 3.22 (3.50) wr2 8.26 Flack x 0.023(10)
example 4: weak scatterer, monoclinic after 75 frames (90 min): 44 % compl. data to 1.09 Å structure solution after 75 sec after 2 min: 43 atoms (one cycle: remaining 15 atoms found)
example 4: weak scatterer, monoclinic 6 scans 8 scans 12 scans 15 scans best 10 scans Compl. 97.7 98.6 98.9 99.0 98.7 Redund. 4.15 (1.38) 5.66 (1.63) 8.49 (2.37) 10.57 (2.98) 6.76 (1.61) R(int) 11.31 12.01 12.77 13.28 11.19 R(σ) 9.53 7.85 6.63 6.15 7.20 R1(2σ) 6.41 6.10 5.99 6.00 6.00 R1 all 13.13 11.63 10.79 10.51 11.25 wr2 (2σ) 13.12 13.03 13.12 13.24 13.28 wr2 all 15.81 15.45 15.37 15.46 15.64 GooF 1.006 1.028 1.032 1.034 1.030
Absolute structure determination f = f o + f + if f and f atom type and wavelength dependent Ag-Kα = 0.55000 Å Mo-Kα = 0.71073 Å Cu-Kα = 1.54178 Å Cr-Kα = 2.28962 Å AgKα MoKα CuKα CrKα f f f f f f f f B 0.000 0.000 0.000 0.001 0.008 0.004 0.018 0.009 C 0.000 0.001 0.002 0.002 0.017 0.009 0.035 0.021 N 0.001 0.002 0.004 0.003 0.029 0.018 0.059 0.042 O 0.003 0.004 0.008 0.006 0.047 0.032 0.090 0.073 F 0.006 0.006 0.014 0.010 0.069 0.053 0.129 0.119 P 0.055 0.058 0.090 0.095 0.283 0.434 0.377 0.900 S 0.068 0.076 0.110 0.124 0.319 0.557 0.364 1.142 Cl 0.084 0.099 0.132 0.159 0.348 0.702 0.335 1.423 Br 0.090 1.643-0.374 2.456-0.767 1.283-0.198 2.563 I -1.144 1.187-0.726 1.182-0.579 6.835-5.852 12.85
Absolute structure determination S-compound, O-compound, Cu-Kα: Cu-Kα: I 732 = I 732 555.5; = 513.1; I -7-3-2 I -7-3-2 = 696.9 = 513.5 F c (4 3 5) = 81.91 F c (-4-3 -5) = 84.97 -> I(-) > I(+)!! I (-) = 141.4 = 0.4 I (+) 4-3 -5 39.376 2.970 4-3 -5 46.885 4.600 4 3 5 38.196 2.790 4 3 5 40.079 4.670 4 3-5 37.906 2.730 4 3-5 40.096 2.800 4 3-5 47.225 4.670-4 -3-5 41.776 4.790 4-3 5 34.587 2.970 C: 66.7 %, H: 6.7%, O:26.6 % I (+) -4-3 -5 38.980 3.207 I (-) 63693 refl. (5356 unique, 5280 > 2 σ) R int = 0.0349 R 1 = 0.0232 (0.0236) wr 2 = 0.0570 (0.0594) GoF = 1.048 (1.047, 17) res. el. dens. +0.13 / -0.13 e/å 3 Flack x = 0.00(9) 4-3 5 36.456 2.820 4 3-5 36.636 2.870 4 3-5 43.046 4.420 4-3 5 39.916 4.440 4 3-5 43.966 2.910 4 3-5 35.097 2.690 4 3-5 44.176 3.070 4-3 5 36.046 3.010 4-3 5 38.666 4.710 4-3 5 36.686 2.870 4-3 5 38.156 2.830 4 3-5 31.437 2.700 4 3-5 34.357 2.550 4 3-5 34.737 2.780 4-3 5 36.846 2.510 4 3-5 37.636 2.570 4 3-5 35.657 2.510 4-3 5 31.957 2.750-4 -3-5 39.116 4.650 4-3 5 37.056 2.850-4 -3-5 36.816 0.555 4 3 5 42.596 4.620 4 3 5 33.917 2.770 4 3 5 36.066 2.770 4 3 5 34.337 2.750 4 3 5 37.246 2.540 4 3 5 36.516 2.650-4 -3 5 35.507 4.450 4-3 -5 34.807 2.740 4-3 -5 35.607 2.550 4 3 5 33.987 2.840 4-3 -5 40.046 2.930 4-3 -5 35.087 2.490 4-3 -5 37.326 2.520 4-3 -5 36.656 2.720 4 3 5 35.427 2.540 4-3 -5 42.746 4.360 4-3 -5 40.226 4.780 4 3 5 36.576 2.930 4-3 -5 35.257 2.520 4 3 5 36.150 0.469
example 5: absolute configuration, monoclinic C: 78%, H: 9%, N: 6 %, O: 7% runs # refl Red. R(int) R(sigma) x(u) 4 6055 4.5 (2.1) 3.04 2.65 0.08(29) 8 12075 9.0 (3.7) 3.21 1.87 0.05(27) 12 18103 13.5 (5.4) 3.30 1.54 0.07(27) 16 24191 18.1 (7.5) 3.29 1.31 0.00(26) 20 30275 22.6 (9.5) 3.30 1.16 0.01(26) 24 36373 27.1 (11.4) 3.31 1.07 0.03(26) 29 44294 33.1 (14.4) 3.25 1.00 0.00(26)
Q-value method (Simon Parsons) Most reflections in a data set are rather insensitive to the absolute structure. For the sensitive reflections a quotient D can be defined: D obs (h) = I(h) I(h) I( h) + I( h) = (1 F(h) 2x) F(h) 2 2 + F( h) F( h) 2 2 I(h) and I(-h) have been measured in a way that the quotient is free from systematic errors such as absorption and extinction These quantities can be applied as restraints (in CRYSTALS). Since the restraints are linear in x convergence is fast (one cycle). The method is also implemented in XPREP.
Q-value method Implementation in XPREP pre-requisites:.res file (w/o riding H!) and.hkl file
Q-value method 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0-0.05-0.1 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0-0.05-0.1 runs # refl Red. R(int) R(sigma) x(u) 4 6055 4.5 (2.1) 3.04 2.65-0.02(12) 8 12075 9.0 (3.7) 3.21 1.87-0.07(9) 12 18103 13.5 (5.4) 3.30 1.54 0.04(7) 16 24191 18.1 (7.5) 3.29 1.31 0.00(6) 20 30275 22.6 (9.5) 3.30 1.16 0.03(5) 24 36373 27.1 (11.4) 3.31 1.07 0.03(5) 29 44294 33.1 (14.4) 3.25 1.00-0.01(5)
Bayesian Approach Provides relative probabilities for different models of the chiral compound. It is possible with prior knowledge of enantiopurity to calculate the probability of each enantiomer. p2(true) and p2(false) It is also possible to determine with no prior knowledge if the structure is either enatiomerically pure or indeed a racemic mixture. p3(true), p3(false) or p3(rac-twin)
Bayesian Approach Using the method
Bayesian Approach Using the method
example 5: absolute configuration, monoclinic 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0-0.05-0.1 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0-0.05-0.1 standard deviations are significantly lowered (factor 2-3)
example 6: being fast, but not too fast C: 82%, H: 6%, N: 8 %, O: 4% 4 scans 5034 (96 %) 2.5 0.38(34) i 0.37(28) 6 scans 7370 (98 %) 3.7 0.56(25) i 0.27(25) 16 scans 20154 (100 %) 10.1 0.81(19) i 0.11(19) 0.65(20) 0.34(20) 0.86(18) 0.14(18) 1.02(10) 0.03(10) 0.6(2) 0.4(2) 0.74(19) 0.26(19) 0.91(10) 0.09(10)
Going to the limits C: 83 %, H: 7 %, N: 10 %
Flack x parameter 0.5 0.4 1 2 3 0.3 0.2 0.1 0-0.1-0.2 Parsons Flack x: 1 0.07 (0.13) 2 0.11 (0.15) 3 0.03 (0.16)
Thanks to: Philippe Piechon, Lukas Oberer, Trixie Wagner (Novartis) George Sheldrick (Uni Göttingen) Bruker AXS