AdaGAN: Boosting Generative Models

Similar documents
arxiv: v2 [stat.ml] 24 May 2017

Nishant Gurnani. GAN Reading Group. April 14th, / 107

Wasserstein GAN. Juho Lee. Jan 23, 2017

topics about f-divergence

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018

Lecture 14: Deep Generative Learning

Lecture 35: December The fundamental statistical distances

Variational Autoencoders

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Generative Adversarial Networks

Series 7, May 22, 2018 (EM Convergence)

Variational Autoencoders (VAEs)

Latent Variable Models

Supplementary Materials for: f-gan: Training Generative Neural Samplers using Variational Divergence Minimization

REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS

Summary of A Few Recent Papers about Discrete Generative models

Auto-Encoding Variational Bayes

Some theoretical properties of GANs. Gérard Biau Toulouse, September 2018

Generative Adversarial Networks. Presented by Yi Zhang

CSCI-567: Machine Learning (Spring 2019)

GENERATIVE ADVERSARIAL LEARNING

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

Theory and Applications of Wasserstein Distance. Yunsheng Bai Mar 2, 2018

Understanding GANs: Back to the basics

Generative Models and Optimal Transport

Variational Inference. Sargur Srihari

Expectation Maximization

Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley Artificial Intelligence Lab,

An Introduction to Expectation-Maximization

A Unified View of Deep Generative Models

arxiv: v3 [stat.ml] 12 Mar 2018

Chapter 20. Deep Generative Models

Surrogate loss functions, divergences and decentralized detection

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

Latent Variable Models and EM algorithm

Sandwiching the marginal likelihood using bidirectional Monte Carlo. Roger Grosse

Descent methods. min x. f(x)

Geometrical Insights for Implicit Generative Modeling

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

Generative adversarial networks

Lecture 16 Deep Neural Generative Models

Quantitative Biology II Lecture 4: Variational Methods

14 : Mean Field Assumption

Generative Adversarial Networks

f-gan: Training Generative Neural Samplers using Variational Divergence Minimization

Posterior Regularization

Variational Auto-Encoders (VAE)

arxiv: v1 [stat.ml] 24 May 2018

Clustering, K-Means, EM Tutorial

Variational Autoencoder

Empirical Risk Minimization

The Success of Deep Generative Models

Homework Assignment #2 for Prob-Stats, Fall 2018 Due date: Monday, October 22, 2018

arxiv: v1 [cs.lg] 6 Dec 2018

ECE521 lecture 4: 19 January Optimization, MLE, regularization

Unsupervised Learning

Data Mining Techniques

Instructor: Dr. Volkan Cevher. 1. Background

Information geometry for bivariate distribution control

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Probabilistic Graphical Models for Image Analysis - Lecture 4

Lecture 14. Clustering, K-means, and EM

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016

MMD GAN 1 Fisher GAN 2

STA 4273H: Statistical Machine Learning

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Variational Auto Encoders

6.1 Variational representation of f-divergences

B PROOF OF LEMMA 1. Published as a conference paper at ICLR 2018

Lecture 13 : Variational Inference: Mean Field Approximation

Discrete Markov Random Fields the Inference story. Pradeep Ravikumar

How to do backpropagation in a brain

ECE 4400:693 - Information Theory

Lecture 5 Channel Coding over Continuous Channels

Density Estimation under Independent Similarly Distributed Sampling Assumptions

Lecture 21: Minimax Theory

Deep Generative Models. (Unsupervised Learning)

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Algorithmic Game Theory and Applications. Lecture 5: Introduction to Linear Programming

arxiv: v1 [cs.lg] 20 Apr 2017

Variational Autoencoders. Presented by Alex Beatson Materials from Yann LeCun, Jaan Altosaar, Shakir Mohamed

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Ambiguity Sets and their applications to SVM

Neural Networks and Deep Learning

Distirbutional robustness, regularizing variance, and adversaries

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions

Probabilistic Graphical Models

Nonparametric estimation of. s concave and log-concave densities: alternatives to maximum likelihood

arxiv: v7 [cs.lg] 27 Jul 2018

Are You a Bayesian or a Frequentist?

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13

arxiv: v1 [cs.lg] 8 Dec 2016

arxiv: v3 [cs.lg] 2 Nov 2018

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

Advances in kernel exponential families

Energy-Based Generative Adversarial Network

Generative models for missing value completion

Transcription:

AdaGAN: Boosting Generative Models Ilya Tolstikhin ilya@tuebingen.mpg.de joint work with Gelly 2, Bousquet 2, Simon-Gabriel 1, Schölkopf 1 1 MPI for Intelligent Systems 2 Google Brain

Radford et al., 2015)

Radford et al., 2015)

Ngyuen et al., 2016)

Youtube: Yota Ishida)

Motivation Generative Adversarial Networks, Variational Autoencoders, and other unsupervised representation learning/generative modelling techniques ROCK! Dozens of papers being submitted, discussed, published,... Among papers submitted to ICLR2017 15 have titles matching Generative Adversarial*. NIPS2016 Tutorial on GANs. GAN Workshop at NIPS2016 was the second-largest. That s all very exciting. However: GANs are very hard to train. The field is, apparently, in its trials-and-errors phase. People try all kinds of heuristic hacks. Sometimes they work, sometimes not... Very few papers manage to go beyond heuristics and provide a deeper conceptual understanding of what s going on.

AdaGAN: adaptive mixtures of GANs

AdaGAN: adaptive mixtures of GANs

Contents 1. Generative Adversarial Networks and f-divergences GANs are trying to minimize specific f-divergence min D f P data, P model ). P model P At least, this is one way to explain what they are doing. 2. Minimizing f-divergences with mixtures of distributions We study theoretical properties of the following problem: ) T min D f P data, α t P t α T,P T P and show that it can be reduced to min P T P D f t=1 Pdata, P T ). *) 3. Back to GANs Now we use GANs to train P T in *), which results in AdaGAN.

Contents 1. Generative Adversarial Networks and f-divergences GANs are trying to minimize specific f-divergence min D f P data P model ). P model P At least, this is one way to explain what they are doing. 2. Minimizing f-divergences with mixtures of distributions We study theoretical properties of the following problem: ) min D T f P data α t P t α T,P T P and show that it can be reduced to min P T P D f t=1 Pdata P T ). *) 3. Back to GANs Now we use GANs to train P T in *), which results in AdaGAN.

f-gan: Adversarial training for f-divergence minimization For any probability distributions P and Q, f-divergence is ) dq D f Q P ) := f dp x) dp x), where f : 0, ) R is convex and satisfies f1) = 0. Properties of f-divergence: 1. D f P Q) 0 and D f P Q) = 0 when P = Q; 2. D f P Q) = D f Q P ) for f x) := xf1/x); 3. D f P Q) is symmetric when fx) = xf1/x) for all x; 4. D f P Q) = D g P Q) for any gx) = fx) + Cx 1); 5. Taking f 0 x) := fx) x 1)f 1) is a good choice. It is nonnegative, nonincreasing on 0, 1], and nondecreasing on [1, ).

f-gan: Adversarial training for f-divergence minimization For any probability distributions P and Q, f-divergence is ) dq D f Q P ) := f dp x) dp x), where f : 0, ) R is convex and satisfies f1) = 0. Examples of f-divergences: 1. Kullback-Leibler divergence for fx) = x log x; 2. Reverse Kullback-Leibler divergence for fx) = log x; 3. Total Variation distance for fx) = x 1 ; 4. Jensen-Shannon divergence for fx) = x + 1) log x+1 2 + x log x: JSP Q) = KL P P + Q ) + KL Q P + Q ). 2 2

f-gan: Adversarial training for f-divergence minimization Kullback-Leibler minimization: assume X 1,..., X n Q ) dq min KLQ P ) min log P P dp x) dqx) min logdp x))dqx) P max logdp x))dqx) P 1 N max logdp X i )). P N Reverse Kullback-Leibler minimization: ) dp min KLP Q) min log P P dq x) dp x) min HP ) logdqx))dp x) P max HP ) + logdqx))dp x). P i=1

f-gan: Adversarial training for f-divergence minimization Now we will follow Nowozin et al., NIPS2016). For any probability distributions P and Q over X, f-divergence is ) dq D f Q P ) := f dp x) dp x). Variational representation: a very neat result, saying that D f Q P ) = [ sup E X Q [gx)] E Y P f gy ) )], g : X domf ) where f x) := sup u x u fu) is a convex conjugate of f. D f can be computed by solving an optimization problem! The f-divergence minimization can be now written as: inf D f Q P ) inf P P sup [ E X Q [gx)] E Y P f gy ) )]. g :...

f-gan: Adversarial training for f-divergence minimization The f-divergence minimization can be now written as: inf D f Q P ) inf P P [ sup E X Q [gx)] E Y P f gy ) )]. g : X domf ) 1. Take P to be a distribution of G θ Z), where Z P Z is a latent noise and G θ is a deep net; 2. JS divergence corresponds to fx) = x + 1) log x+1 2 + x log x and f t) = log 2 e t) and thus domain of f is, log 2); 3. Take g = g f D ω, where g f v) = log 2 log1 + e v ) and D ω is a deep net check that gx) < log 2 for all x). Then up to additive 2 log 2 term we get: inf G θ sup D ω E X Q log 1 1 + e + E DωX) Z P Z log 1 ) 1 ) 1 + e Dω G θ Z) Compare to the original GAN objective Goodfellow et al., NIPS2014) inf G θ sup E X Pd [log D ω X)] + E Z PZ [log 1 D ω G θ Z)) ) ]. D ω

f-gan: Adversarial training for f-divergence minimization Then we get the original GAN objective Goodfellow et al., NIPS2014) inf G θ sup E X Pd [log D ω X)] + E Z PZ [log 1 D ω G θ Z)) ) ]. *) D ω Let s denote a distribution of G θ Z) when Z P Z with P θ. For any fixed generator G θ, the theoretical-optimal D is: D θx) = If we insert this back to *) we get inf G θ dp d x) dp d x) + dp θ x). 2 JSP d P θ ) log4).

f-gan: Adversarial training for f-divergence minimization The f-divergence minimization can be now written as: inf D f Q P ) inf P P This is GREAT, because: [ sup E X Q [gx)] E Y P f gy ) )]. g : X domf ) 1. Estimate E X Q [gx)] using an i.i.d. sample data) X 1,..., X N Q: E X Q [gx)] 1 N N gx i ); i=1 2. Often we don t know the analytical form of P, but can only sample from P. No problems! Sample i.i.d. Y 1,..., Y M P and write: E Y P [ f gy ) )] 1 M M f gy j ) ). j=1 This results in implicit generative modelling: using complex model distributions, which are analytically intractable, but easy to sample.

f-gan: Adversarial training for f-divergence minimization Note that there is still a difference between doing GAN: inf G θ sup E X Pd [log D ω X)] + E Z PZ [log 1 D ω G θ Z)) ) ] D ω and minimizing JSP data, P model ): because inf P model [ )] sup E X Pdata [gx)] E Y Pmodel f JS gy ), g :... 1. We replaced expectations with sample averages. Uniform lows of large numbers may not apply, as our function classes are huge; 2. Instead of taking supremum over all possible witness functions g we optimize over restricted class of DNNs which is still huge); 3. In practice we never optimize g/d ω to the end. This leads to vanishing gradients and other bad things.

Contents 1. Generative Adversarial Networks and f-divergences GANs are trying to minimize specific f-divergence min D f P data P model ). P model P At least, this is one way to explain what they are doing. 2. Minimizing f-divergences with mixtures of distributions We study theoretical properties of the following problem: ) min D T f P data α t P t α T,P T P and show that it can be reduced to min P T P D f t=1 Pdata P T ). *) 3. Back to GANs Now we use GANs to train P T in *), which results in AdaGAN.

Minimizing f-divergences with mixtures of distributions Assumption: Let s assume we know more or less how to solve min D f Q P ) Q G for any f and any P, given we have an i.i.d. sample from P. New problem: Now we want to approximate an unknown P d in D f using a mixture model of the form Greedy strategy: P T model := T α i P i. i=1 1. Train P 1 to minimize D f P 1 P d ), set α 1 = 1, and get P 1 model = P 1; 2. Assume we trained P 1,..., P t with weights α 1,..., α t. We are going to train one more component Q, choose weight β and update P t+1 model = t 1 β)α i P i + βq. i=1

Minimizing f-divergences with mixtures of distributions 1. Train P 1 to minimize D f P 1 P d ), set α 1 = 1, and get P 1 model = P 1; 2. Assume we trained P 1,..., P t with weights α 1,..., α t. We are going to train one more component Q, choose weight β and update P t+1 model = t 1 β)α i P i + βq. i=1 The goal: we want to choose β [0, 1] and Q greedily, so as to minimize where we denoted P g := P t model. D f 1 β)p g + βq P d ), A modest goal: find β [0, 1] and Q, such that, for c < 1: D f 1 β)p g + βq P d ) c D f P g P d ). Inefficient in practice: as t we expect β 0. Only β fraction of points gets sampled from Q gets harder and harder to tune Q.

Minimizing f-divergences with mixtures of distributions D f 1 β)p g + βq P d ) *) Inefficient in practice: as t we expect β 0. Only β fraction of points gets sampled from Q gets harder and harder to tune Q. Workaround: instead optimize an upper bound on *), which contains a term D f Q, Q 0 ) for some Q 0 : Lemma Given P d, P g, and β [0, 1], for any Q and any R with β dr dp d ) βdq R) + 1 β)d f P g P ) d βr. 1 β If furthermore D f is a Hilbertian metric, then, for any R, we have ) βd f Q R) + D f 1 β)p g + βr P d ). JS-divergence, TV, Hellinger, and others are Hilbertian metrics.

Minimizing f-divergences with mixtures of distributions D f 1 β)p g + βq P d ) *) Lemma Given P d, P g, and β [0, 1], for any Q and any R with β dr dp d ) βdq R) + 1 β)d f P g P ) d βr. 1 β If furthermore D f is a Hilbertian metric, then, for any R, we have ) 2 βd f Q R) + D f 1 β)p g + βr P d )). Recipe: choose R to minimize the right-most terms and then minimize D f Q R ), which we can do given we have an i.i.d. sample from R.

Minimizing f-divergences with mixtures of distributions First surrogate upper bound Lemma Given P d, P g, and β [0, 1], for any Q and R, if D f is Hilbertian ) 2 βd f Q R) + D f 1 β)p g + βr P d )). The optimal R is given in the next result: Theorem Given P d, P g, and β 0, 1], the solution to min D f 1 β)p g + βr P d ), R P where P is a class of all probability distributions, has the density dr βx) = 1 β λ dp d x) 1 β)dp g x)) + for some unique λ satisfying dr β = 1. Also, λ = 1 if and only if P d 1 β)dp g > dp d ) = 0, which is equivalent to βdq β = dp d 1 β)dp g.

Minimizing f-divergences with mixtures of distributions Second surrogate upper bound Lemma Given P d, P g, and β [0, 1], for any Q and any R with β dr dp d D f 1 β)p g + βq P d ) βdq R) + 1 β)d f P g P ) d βr. 1 β The optimal R is given in the next result: Theorem Given P d, P g, and β 0, 1], assume P d dp g = 0) < β. Then the solution to min D f P g P ) d βr R:βdR dp d 1 β is given by the density dr β x) = 1 β dpd x) λ 1 β)dp g x) ) + for a unique λ 1 satisfying dr β = 1.

Minimizing f-divergences with mixtures of distributions Two different optimal R : dr βx) = 1 β λ dp d x) 1 β)dp g x)) + dr β x) = 1 β dpd x) λ 1 β)dp g x) ) + Very symmetrically looking, but no obvious connections found... Both do not depend on f! λ and λ are implicitly defined via fixed-point equations. drβ x) is also a solution to the original problem, that is of min D f 1 β)p g + βq P d ). Q R We see that setting β = 1 is the best possible choice in theory. So we ll have to use various heuristics, including uniform, constant,...

Minimizing f-divergences with mixtures of distributions D f 1 β)p g + βq P d ) *) Lemma Given P d, P g, and β [0, 1], for any Q and any R with β dr dp d ) βdq R) + 1 β)d f P g P ) d βr. 1 β If furthermore D f is a Hilbertian metric, then, for any R, we have ) 2 βd f Q R) + D f 1 β)p g + βr P d )). Recipe: choose R to minimize the right-most terms and then minimize D f Q R ), which we can do given we have an i.i.d. sample from R.

Minimizing f-divergences with mixtures of distributions Recipe: choose R to minimize the right-most terms in the surrogate upper bounds and then minimize the first term D f Q R ). We found optimal R distributions. Let s now assume we are super powerful and managed to find Q with D f Q R ) = 0. In other words, we are going to update our model: with Q either R β or R β. P g 1 β)p g + βq How well are we doing in terms of the original objective Theorem We have: D f 1 β)pg +βr β D f 1 β)p g + βq P d )? Pd ) Df 1 β)pg +βp d Pd ) 1 β)df P g P d ) and D f 1 β)pg + βr β Pd ) 1 β)df P g P d ).

Minimizing f-divergences with mixtures of distributions Other highlights: More careful convergence analysis. In particular we provide a necessary and sufficient condition for the iterative process to converge in a finite number of steps: there exists M > 0 such that P d 1 β)dp 1 > MdP d ) = 0. In other words, P 1 P d In practice we don t attain D f Q R ) = 0. We prove that given D f Q R β) γd d P g P d ) or D f Q R β ) γd dp g P d ) we achieve D f 1 β)p g + βq P d ) c D f P g P d ) with c = cγ, β) < 1. This is kind of similar to weak learnability in AdaBoost.

Contents 1. Generative Adversarial Networks and f-divergences GANs are trying to minimize specific f-divergence min D f P data P model ). P model P At least, this is one way to explain what they are doing. 2. Minimizing f-divergences with mixtures of distributions We study theoretical properties of the following problem: ) min D T f P data α t P t α T,P T P and show that it can be reduced to min P T P D f t=1 Pdata P T ). *) 3. Back to GANs Now we use GANs to train P T in *), which results in AdaGAN.

AdaGAN: adaptive mixtures of GANs We want to find Q so as to minimize D f Q Rβ ), where dr βx) = 1 β λ dp d x) 1 β)dp g x)) +. We would like to use GAN for that. How do we sample from R β? Let s rewrite: dr βx) = dp dx) β λ 1 β) dp ) g x). dp d + Recall: E Pd [log DX)] + E Pg [log 1 DY ) ) ] is maximized by: D x) = dp d x) dp d x) + dp g x) dp g x) = 1 D x) dp d D := hx). x) Each training sample end up with a weight w i = p i λ 1 β)hx i ) ) β + 1 λ 1 β)hx i ) ) Nβ +. **) Finally, we compute λ using **).

AdaGAN: adaptive mixtures of GANs Each training sample end up with a weight w i = p i λ 1 β)hx i ) ) β 1 + Nβ λ 1 β) 1 Dx i) Dx i ) Now we run a usual GAN with an importance-weighted sample. Dx i ) 1: our current model P g is missing this point. Big w i. Dx i ) 0: x i is covered by P g nicely. Zero w i. The better our model gets, the more data points will be zeroed. This leads to an aggressive adaptive iterative procedure. β can be also chosen so as to force #{w i : w i > 0} r N. ). +

AdaGAN: adaptive mixtures of GANs Coverage metric C: mass of the true data covered by the model: C := P data dp model X) > t) where P model dp model > t) = 0.95. Experimental setting: Mixture of 5 two-dimensional Gaussians. Best of T GANs Bagging AdaGAN Blue points: median over 35 random runs; Green error bars: 90% intervals. Not the confidence intervals for medians!

AdaGAN: adaptive mixtures of GANs Coverage metric C: mass of the true data covered by the model: C := P data dp model X) > t) where P model dp model > t) = 0.95. TopKLast r: β t is chosen to zero out r N of the training; Boosted: AdaGAN with β t = 1/t.

AdaGAN: adaptive mixtures of GANs Coverage metric C: mass of the true data covered by the model: C := P data dp model X) > t) where P model dp model > t) = 0.95. Experimental setting: Mixture of 10 two-dimensional Gaussians.

AdaGAN: adaptive mixtures of GANs

Conclusions Pros: It seems that AdaGAN is solving the missing modes problem; AdaGAN is compatible with any implementation of GAN and more generally with any other implicit generative modelling techniques; Still easy to sample from; Easy to implement and great freedom with heuristics; Theoretically grounded! Cons: We loose the nice manifold structure in the latent space; Increased computational complexity; Individual GANs are still very unstable.

Thank you! AdaGAN: Boosting Generative Models. Tolstikhin, Gelly, Bousquet, Simon-Gabriel, Schoelkopf, NIPS 2017. https://github.com/tolstikhin/adagan

Minimizing the optimal transport Optimal transport for a cost function cx, y): X X R + is W c P X, P G ) := inf E X,Y ) Γ[cX, Y )], Γ PX P X,Y P G ) If P G Y Z = z) = δ Gz) for all z Z, where G: Z X, we have W c P X, P G ) = [ )] inf E PX E QZ X) c X, GZ), Q : Q Z =P Z where Q Z is the marginal distribution of Z when X P X, Z QZ X).

Minimizing the optimal transport Optimal transport for a cost function cx, y): X X R + is W c P X, P G ) := inf E X,Y ) Γ[cX, Y )], Γ PX P X,Y P G ) If P G Y Z = z) = δ Gz) for all z Z, where G: Z X, we have W c P X, P G ) = [ )] inf E PX E QZ X) c X, GZ), Q : Q Z =P Z where Q Z is the marginal distribution of Z when X P X, Z QZ X).

Relaxing the constraint W c P X, P G ) = [ )] inf E PX E QZ X) c X, GZ), Q : Q Z =P Z Penalized Optimal Transport replaces the constraint with a penalty: POTP X, P G ) := inf Q E P X E QZ X) [ c X, GZ) )] + λ DQZ, P Z ) and uses the adversarial training in the Z space to approximate D. For the 2-Wasserstein distance cx, Y ) = X Y 2 2 POT recovers Adversarial Auto-Encoders Makhzani et al., 2016) For the 1-Wasserstein distance cx, Y ) = X Y 2 POT and WGAN are solving the same problem from the primal and dual forms respectively. Importantly, unlike VAE, POT does not force QZ X = x) to intersect for different x, which is known to lead to the blurriness. Bousquet et al., 2017)

Further details on GAN: the log trick inf G θ sup T ω E X Pd [log T ω X)] + E Z PZ [log 1 T ω G θ Z)) ) ] In practice verify this!) G θ is often trained to instead minimize E Z PZ [log T ω G θ Z)) ) ] The log trick ) One possible explanation: For any f r and P G the optimal T in the variational representation of D fr P X P G ) has the form ) T x) := f px x) r p G x) p X x) p G x) = f r ) 1 T x) ) 1. Approximate T T ω using the adversarial training with f r; 2. Express p X pg x) = f r ) 1 T w x) ) ; 3. Approximate the desired f-divergence D f P X P G ) using ) px x) f p G x)dx f f r X p G x) ) 1 T x) )) p G x)dx X Applying this argument with the JS f r and fx) = log1 + x 1 ) recovers the log trick. The corresponding f-divergence is much more mode-seeking than the reverse KL divergence.