Estimating Posterior Ratio for Classification: Transfer Learning from Probabilistic Perspective

Size: px
Start display at page:

Download "Estimating Posterior Ratio for Classification: Transfer Learning from Probabilistic Perspective"

Transcription

1 Estimating Posterior Ratio for Classification: Transfer Learning from Probabilistic Persective Song Liu, Kenji Fukumizu arxiv: v3 [stat.ml] 9 Oct 205 Abstract Transfer learning assumes classifiers of similar tasks share certain arameter structures. Unfortunately, modern classifiers uses sohisticated feature reresentations with huge arameter saces which lead to costly transfer. Under the imression that changes from one classifier to another should be simle, an efficient transfer learning criteria that only learns the differences is roosed in this aer. We train a osterior ratio which turns out to minimizes the uer-bound of the target learning risk. The model of osterior ratio does not have to share the same arameter sace with the source classifier at all so it can be easily modelled and efficiently trained. The resulting classifier therefore is obtained by simly multilying the existing robabilistic-classifier with the learned osterior ratio. Keywords: Transfer Learning, Domain Adatation. Introduction Transfer learning [2, 3, 6] trains a classifier using limited number of samles with the hel of abundant samles drawn from another similar distribution. Secifically, we have a target task roviding a very small dataset D P as well as a slightly different source task with a large dataset D Q. The Transfer Learning [2, 3, 6] usually refers to rocedures that make use of the similarity between two learning tasks to build a suerior classifier using both datasets. In this aer, we focus on robabilistic classification roblems where the goal is to learn a class osterior y x over D P, where y x is the conditional robability of class labels given an inut x. Due to its comlexity of arametrization, the redicting function is usually encoded in the hardware and executed with great efficiency, thus it is reasonable to look at a comosite algorithm that consists of two arts: a fixed but fast build-in classifier offering comlicated redicting attern and a light-weight rocedure works as an adater that transfers the classifier for a variety of slightly different situations. For examle, a general-urose facial Institute of Statistical Mathematics, Tokyo, Jaan Institute of Statistical Mathematics, Tokyo, Jaan

2 recognition built in a camera cannot change its redicting behavior once its model is trained, however the camera may learn transfer models and adjust itself for recognizing a target user. The challenge is, the transfer rocedure is exected to resonse raidly while learning over the entire feature set of the source classifier may slow us down dramatically. Intuitively, learning a transfer model does not necessarily need comlicated features. Since the task is still facial recognition, we can assume that the changes from one classifier to another are simle and can be described by a trivial say linear model with a few key ersonal features say hair-style or glasses. The general human facial modelling also lays an imortant role, however, we may safely assume that such modelling has been taken care of in the source classifier and remain unchanged in the target task. Thus, we can consider the incremental model only in the transfer rocedure. One of the oular assumtions in transfer learning is to reuse the model from the source classifier by training a target classifier and limiting the distance between it and the source classifier model. Regularization has been utilized to enforce the closeness between learned models [6]. More comlicated structures, such as deendencies between task arameters are also used to construct a good classifier [3]. As most methods require to learn two classifiers of two tasks simultaneously, some works can take already trained classifiers as auxiliary models and learn to reuse their model structures [8, 2, 5]. However, reusing the existing model means we need to bring the entire feature set from the source task and include them in the target classifier during transfer learning, even if we know that a vast majority of them does not contribute to the transition from the source to the target classifier. Such an overly exressive model can be harmful given limited samles in D P. Moreover, the hyer-arameters used for constructing features may also be difficult to tune since the cross-validation may be oor on such a small dataset D P. Finally, obtaining those features in some alications may be time-consuming. Another natural idea of transfer learning is to borrow informative samles from the D Q, and get rid of harmful samles. TrAdaBoost [4] follows this exact learning strategy to assign weights to samles from both D P and D Q. By assigning high weights to samles contributes to the erformance in the target task, and enalizing samles that misleads the classifier, TrAdaBoost reuses the knowledges from both datasets to construct an accurate classifier on the target task. The idea of imortance samling also gives rise to another set of methods learning weights of samles by using density ratio estimation [4, 9, 9]. Using unlabelled samles from both datasets, an imortance weighting function can be learned. By lugging such function into the emirical risk minimization criterion [6], we can use samles from the D Q as if they were samles from D P. However, such method can not allow incremental modelling as well, since it learns a full classifier model during the transfer. It can be noticed that if one can directly model and learn the difference between target and source classifier, one may use only the incremental features which leads to a much more efficient learning criteria. The first contribution of this aer is showing that such difference learning is in fact the learning of a osterior ratio which is the ratio between the osteriors from source and target tasks. We show learning such osterior ratio is equivalent to minimizing the uer-bound of 2

3 the classification error of the target task. Second, an efficient convex otimization algorithm is given to learn the arameters of the osterior ratio model and is roved to give consistent estimates under mild assumtions. Finally, the usefulness of this method is validated over various artificial and real-world datasets. However, we do not claim that the roosed method has suerior erformance against all existing works based on extra assumtions, e.g. the smoothness of the redicting function over unlabeled target samles[5, 2]. The roosed method is simly a novel robabilistic framework working on a very small set of assumtions and offers the flexibility of modelling to transfer learning roblems. It is fully exendable to various roblem settings once new assumtions are made. 2 Problem Setting Consider two sets of samles drawn indeendently from two robability distributions Q and P on {, } R d : D Q = { } y q j, x j n i.i.d. q Q, D P = { y i, x i j= } n i.i.d. P D Q and D P are source and target dataset resectively. We denote y x and qy x as the class osteriors in P and Q resectively. Moreover, n n. Our target is to obtain an estimate of the class osterior ˆy x and redict the class label of an inut x by ŷ = argmax y {,} ˆy x. Clearly, if n is large enough, one may aly logistic regression [3, 20] to obtain a good estimate. In this aer, we focus on a scenario where n is relatively small and n is sufficiently large. Thus, it is desirable if we can transfer information from the source task to boost the erformance of our target classifier. 3 Comosite Modeling Note that the osterior y x can be decomosed into where y x qy x y x = y x qy x qy x, is the class osterior ratio, and the qy x is a source classifier. This decomosition leads to a simle transfer learning methodology: Model and learn the osterior ratio and general-urose classifier searately, then later multily them together as an estimate of the osterior. The main interest of this aer is learning such comosite model using samles from D P and D Q. Now, we introduce two arametric models gy, x; or g for short and qy, x; β or q β for short for y x qy x and qy x resectively. 3

4 3. Kullback-Leibler Divergence Minimization A natural way of learning such a model is to minimize the Kullback-leibler KL [0] divergence between the true osterior and our comosite model. Definition Conditional KL Divergence. KL [ q] = P log y x qy x, We denote P f as the short hand of the integral/sum of a function f over a robability distribution P on its domain. Now, we roceed to obtain the following uer-bound of KL divergence from to the comosite model: Proosition Transfer Learning Uer-bound. if y,x qy,x C max < and 0 < q β <, then the following inequality holds KL [ g q β ] KL [ g q] + C max KL [q q β ] + C, where C is a constant that is irrelevant to or β. Proof. [ ] KL [ g h β ] = KL g q qβ q =KL [ g q] P log q β + P log q y, x =KL [ g q] qy, x qy, x log q β dyx + P log q KL [ g q] + C max Q log q C max Q log q β + C 2 =KL [ g q] + C max KL [q q β ] + C, where C = P log q C max Q log q. Further, KL [ g q] + C max KL [q q β ] n log g y i, x i ; n C max n n j= where C is a constant that is irrelevant to or β. log q y q j, x j q ; β + C 3 4

5 We may minimize the emirical uer-bound 3 of KL divergence in order to obtain estimates of and β. C max is an unknown constant introduced in 2 that illustrates the how dissimilar these two tasks are. Such uer-bound in formalizes the common intuition that if two tasks are similar, transfer learning should be easy, since the more similar two tasks are, the smaller the C max is, and the tighter the bound is. Note that the minimizing 3 leads to two searate maximum likelihood estimation MLE. The MLE of the second likelihood term of bound 3 ˆβ = argmax β n n j= log q y q j, x j q ; β leads to a conventional MLE of a osterior model, and has been well studied. q can be efficiently modeled and trained using techniques such as logistic regression [3, 20]. Here we consider it is already given. However, maximizing the first likelihood term, a osterior ratio ˆ = argmax n n log g y i, x i ; 4 is our main focus. In the next section, we show the modelling and learning of the osterior ratio is feasible and comutationally efficient. 3.2 Posterior Ratio Model Although it is not necessary, to illustrate the idea behind the osterior ratio modelling, we assume y x and qy x belongs to the exonential family, e.g. y x can be arametrized as: m y x; β ex y β i h i x, 5 Given the arametrization model 5, consider the ratio between and q: y x; β m qy x; β q ex y β,i β q,i h i x. For all β,i β q,i = 0, factor feature f i is nullified, and therefore can be ignored when modelling the ratio. In fact, once the ratio is considered, the searate β and β q does not have to be learned, but only their difference i = β,i β q,i is sufficient to describe the transition from to q. Thus, we write our osterior ratio model as ry, x; = Nx; ex y i h i x, 6 i S 5

6 where S = {i β,i β q,i 0} and Nx; is the normalization term defined as Nx; = qy x ex y i h i x. i S y {,} Such normalization is due to the fact that we are minimizing the KL divergence between y x and gy, x; qy x, we need to make sure that gy, x; qy x is a valid conditional robability, i.e., : y qy x g y, x; =. This modelling technique gives us great flexibilities since it only concerns the effective features {h i } i S rather than the entire feature set {h, h 2,..., h m }. In this aer, we assume the transfer should be simle, thus the otential feature set only contains simle features, such as linear ones: h i x = x i, i S. From now on, we simlify y i S ih i x using a linear reresentation fy, x, where fy, x = [yh a x, yh a2 x,..., yh am x], where a, a 2,..., a m S. However, this modelling also causes a roblem: We cannot directly evaluate the outut value of this model, since we do not have access to the true osterior qy x. Therefore, we can only use samles from D Q to aroximate the normalization term. 4 Estimating Posterior Ratio Now we introduce the estimator of the class-osterior ratio y x/qy x. Let us substitute the model of 6 into the objective 4: ˆ = argmax n = n n n log g y i, x i ; f y i, x i n n log N, x i. The normalization term needs to be evaluated in a ointwise fashion N, x i { Note that if we have sufficient observations y q, x aired with each x i, i.e. Q, x = x i, such normalization can be aroximated efficiently via samle average:, x i D P. } k, x y j q j= N, x k k j= ex f y j q, x. However, in ractice not many observed samles may be aired with x i. Esecially when x is in a continuous domain, we may not observe any aired samle at all. We may consider 6

7 ex f y, x E q ex f y, x y q j, x q j x i PX ~Q YX x x using the neighbouring airs Figure : Aroximate N, x using nearest neighbours. y j q, x j q where x j q is a neighbour of x i to aroximate N, x i, which naturally leads to the idea of k-nearest neighbours k-nn estimation of such quantity see Figure : N, x i N n,k = k ; x i j N n x i,k ex f y j q, x j, q where { N n x i, k = j xj q } is one of the k-nns of x i. Now we have a comutable aroximation to the osterior ratio model: g n y, x; = ex fy, x. N n,kx; The resulting otimization is ˆ = argmin + n log n k l; D P, D Q = n j N q x i,k n ex f y j q f y i, x i, x j q, 7 which is convex. Note l reresents the negative likelihood. Moreover, if we assume that the changes between two osteriors are mild, i.e. i = β,i β q,i is small, we may use an extra l 2 regularization to restrict the magnitude of our model arameter : argmin l + λ 2, 8 7

8 where the λ is a regularization term and can be chosen via likelihood cross-validation in ractice. Finally the gradient of l is given as l = n + n n n f y i E n, x i [ ] g n y, x; fy, x x = xi, [ where E ] n Z x = x i is the emirical k-nn estimate of a conditional exectation over Q: [ ] E n Z x = xi = Z j. k j N n x i,k The comutation of this gradient is straightforward, and thus we can use any gradient-based method such as quasi-newton to solve the unconstrained convex otimization in 8. It can be noticed that such algorithm is similar to the density ratio estimation method, KLIEP [5]. Indeed, they are all estimators of learning a ratio function between two robabilities based on maximum-likelihood criteria. However, the roosed method is different from [5] in terms of modelling, motivation and usage. 5 Consistency of the Estimator In this section, we analyze the consistency of the estimator given in 7, i.e. whether the estimated arameter converges to the solution of the oulation objective function. This result is not straightforward since we used an extra k-nn aroximation in our model so that the model itself is an estimate. The question is, does this aroximation lead to a consistent estimator? First, we define the estimated and true arameter as: ˆ = argmax lˆ; D P, D Q = P n log g n y, x; = argmax P log gy, x;, where P n is the emirical measure of distribution P. Assumtion Bounded Ratio Model. There exists < M max <, so that fy, x logm max. Moreover, is in a totally bounded metric sace and max y,x fy, x 2 F max where 0 < F max <. Therefore ex fy, x [, Nx; and N n,kx; M max, M max ], and the osterior ratio model is always bounded by constants. It is a reasonable assumtion as the osterior ratio measures the differences between two tasks, the true osterior ratio must be close to one if two tasks are similar. 8

9 Assumtion 2 Bounded Covariate Shift. x qx R max. The suort between P and Q must overla. If samles in D P distribute comletely differently from those in D Q, it does not make sense to exect the transfer learning method would work well. Assumtion 3 Identifiability. is the unique global maximizer of the oulation objective function P log gy, x;, i.e. for all ɛ > 0, su P log gy, x; < P log gy, x;., ɛ Then we have the following theorem that states our osterior ratio estimator is consistent. Theorem. Suose for each x, the random variable X x is absolutely continuous. If n, n, k n / log n and k n /n 0, where k n is the samle deendent version of k, the number of nearest neighbors used in k-nn aroximation. Then under above assumtions, ˆ. Further lˆ; D P, D Q KL [ q]. The roof relies on the following lemma: Lemma. Under all assumtions stated above, if n, n, k n / log n and k n /n 0. Then su P n log g n y, x; P log gy, x; 0, i.e. the error caused by aroximating objective using samles converges to 0 in robability uniformly w.r.t.. One of the key stes is to decomose the above emirical aroximation error of the objective function into: Aroximation error caused by using samles from P + Modelling error caused by k-nn using samles from Q. It can be observed that the bound of density ratio R max also contributes to the error. The comlete roof is included in the aendix. 6 Decomosing Paramter vs. Decomosing Model Instead of decomosing the model,β = g h β as we roose in this aer, the Model-reuse methods e.g. [6, 3] decomose the arameter: β = + β q, which leads to a roblem of minimizing a KL divergence min KL [ h + β q ]. β q, Two issues come with this criteria. First, this roblem is not identifiable since there exist infinitely many ossible combinations of and β q that minimizes the objective function. One must use extra assumtions. Model-reuse methods add a regularizer on arameter β q using KL-divergence. ˆβ q, ˆ = argmin KL [ h + β q ] + γkl [ q hβ q ], 9 β q, 9

10 xq, Positive xq, Negative 7q, Positive 7q, Negative b negative hold-out likelihood d y x; β, miss-rate: 3.8%. Positive Negative Positive Negative -2-2 a ` ; DP, DQ x, x, 7q, 7q, c Illustration dataset shift -6 of Gaussian e qy x; β q, miss-rate: 5.2% f gy, x; qy x; β q, miss-rate: 8.0% Figure 2: Exeriments on artificial datasets which imlies that the minimizer β q should also make the difference between q and hβ q small, in terms of KL divergence, and γ is a balancing arameter has to be tuned using cross-validation which may be oor when the number of samles from DP is low. As we will show later in the exeriments, the choice of γ is crucialr to the erformance when n is small. Second, since the model must be normalized, i.e. h + β q dy =, so β q and are always couled, one must always solve them together, meaning the algorithm have to handle the comlicated feature sace for β q and. However, things are much easier if we have access to the true arameter of the osterior β q, then we can model the osterior of as gy, x; qy x; β q, where g is the model of the ratio. This setting leads to the roosed osterior ratio learning method: = argmin KL kgy, x; qy x; β q. where β q is a constant, so this otimization is with resect to only. This aer resents an algorithm that can obtain an estimate of gy, x; even if one does not know qy x; β q exactly. qy x; β q is learned searately and is multilied with gy, x; in order to rovide an osterior outut. In comarison, the decomosition of model results two indeendent otimizations and we are free from the join objective where the choice of the arameter γ is roblematic. Neither do we have to assume that and β q are in the same arameter sace. 0

11 a sci.cryt b sci.electronics c sci.med d sci.sace e talk.olitics.guns f talk.olitics.mideast g talk.olitics.misc h talk.religion.misc Figure 3: 20 News datasets. 7 Exeriments We fix the feature function f as fx, y := y [x, ]. It is consistent with our simle transfer model assumtion discussed in Section Synthetic Exeriments KL convergence The first exeriment uses our trained osterior ratio model to aroximate the conditional KL divergence. Since our estimate ˆ, we hoe to see lˆ; D P, D Q KL [ q] 0 as n, n. We draw two balanced-classes of samles from two Gaussian distributions with different means for P and Q. Secifically, for y = {, },

12 a kitchen b dvd c books Figure 4: Amazon sentiment datasets. we construct P and Q as follows: qx = Normal2,,qx = Normal 2,, x = Normal.5,,x = Normal.5,. We draw 5k samles from distribution Q, n samles from P, and k is chosen to minimize the error of conditional mean estimation same below, as it is introduced in the aendix, then train a osterior ratio gy, x; ˆ. By varying n and random samling, we may create a lot for averaged lˆ; D P, D Q, with standard error over 25 runs in Figure 2a. The true conditional KL divergence is lotted alongside as a blue horizontal dash-line. To make comarison, we run the same estimation again with 50k samles from Q, and lot in red. The result shows, our estimator does converge to the true KL divergence, and the estimation error shrinks as n. Increasing n also hel slightly reduce the variance comaring the blue error bar with the red error bar. However, such imrovement is not as significant as increasing n. Joint vs. Searated In this exeriment, we demonstrate the effect of introducing a balancing arameter γ of the joint otimization method discussed in Section 6. We simly reuse the dataset in the revious exeriment, and test the averaged negative hold-out likelihood of the aroach described in 9 and the roosed method using D P of various sizes. It can be seen that the choice of the arameter γ has huge effect on the hold-out likelihood when n is small. However, the roosed method is free from such arameter and can achieve a very low likelihood even when using only 0 samles from D P. 4-Gaussian The second exeriment demonstrates how a simle transfer model hels transfer a non-linear classifier. The dataset D Q is constructed using mixtures of Gaussian distributions with different means on horizontal axis and two classes of samles are not linearly searable. To create dataset D P, we simly shift their means away from each other on the vertical dimension See Figure 2c. We comare the osterior functions learned by kernel logistic regression erformed on D P Figure 2d and D Q Figure 2e with the roosed 2

13 transfer learning method Figure 2f which is a multilication of the learned gy, x; ˆ and qy x; ˆβ q. We set n = 40, n = It can be seen from Figure 2d that although kernel logistic regression has learned the rough decision boundary by using D P only, it has comletely missed the characteristics of the osterior function near the class border due to lack of observations. In contrast, built uon a successfully learned osterior function on dataset D Q Figure 2e, the roosed method successfully transferred the osterior function for the new dataset D P, even though it is equied only with linear features Figure 2f. The classification boundary it rovides is highly non-linear. 7.2 Real-world Alications 20-news Exeriments are run on 20-news dataset where articles are groued into major categories such as sorts and sub-categories such as sorts.basketball. In this exeriment, we adot one versus the others scenario: i.e. The task is to redict whether an article is drawn from a sub-category or not. We first construct D P by randomly selecting a few samles from a certain sub-category T and then mix them with equal number of samles from the rest of the categories. D Q is constructed using abundant random samles from the same major- but different sub-categories and random samles from all the rest categories as negative samles. We adot PCA and reduce the dimension to just 20. Figure 3 summarizes the miss-classification rate of the roosed transfer learning algorithm and all the other methods: LogiP logistic regression on D P, LogiQ logistic regression on D Q, TrAdaBoost [4], Reg [6], CovarShift [5, 9] and Adative [8] over different subcategory T in the sci and talk category. The result shows that the roosed method works well in almost all cases, while the comarison methods Reg CovarShift and TrAdaBoost, some times have difficulties in beating the naive base line LogiP and LogiQ. In most cases, Adative cannot imrove much from LogiP. Amazon Sentiment The final exeriment is conducted on the Amazon sentiment dataset, where the task is to classify the ositive or negative sentiment from user s review comments on kitchen, electronics, books and dvds. Since some of the roducts such as electronics are far better reviewed than the others such as kitchen tools, it is ideal to transfer a classifier from a well-reviewed roduct to another one. In this exeriment, we first samle D P from one roduct T and construct dataset D Q using all samles from all other roducts. We aly locality reserving rojection [8] to reduce the original dimension from to 30. The classification error rate is reorted in Figure 4 for T = kitchen, dvd and books. We omit the T = electronics since it is noticed that logip and logiq has very close erformance on this dataset suggesting transfer learning is not helful. It can be seen that the roosed method has also achieved low miss-classification rate on all three datasets, even though Adavtive gradually catches u when n is large enough. Interestingly, Figure 4b and 4c show that logiq can achieve very low error rate, and the 3

14 roosed method manage to reach similar rates. Even if the benefit of transferring is not clear in these two cases, the roosed method does not seem to bring in extra errors by also considering samles from target dataset D P which could have been misleading. 8 Conclusions As modern classifiers get increasingly comlicated, the cost of transfer learning become major concern: As in many alications, the transfer should be both quick and accurate. To reduce the modeling comlexity, we introduce a comosite method: learn a osterior ratio and the source robabilistic classifier searately then combine them together later. As the osterior ratio allows the incremental modeling, features, no matter how comlicated, can be ignored as long as they do not articiate in the dataset transfer. The osterior ratio is learned via an efficient convex otimization and is roved consistent. Exeriments on both artificial and real-world datasets give romising results. References [] D. WK Andrews. Generic uniform convergence. Econometric theory, 802:24 257, 992. [2] R. Chattoadhyay, Q. Sun, W. Fan, I. Davidson, S. Panchanathan, and J. Ye. Multisource domain adatation and its alication to early detection of fatigue. ACM Transactions on Knowledge Discovery from Data TKDD, 64:8, 202. [3] D. R. Cox. The regression analysis of binary sequences. Journal of the Royal Statistical Society. Series B Methodological, ages , 958. [4] W. Dai, Q. Yang, G. R. Xue, and Y. Yu. Boosting for transfer learning. In Proceedings of the 24th International Conference on Machine Learning, ages ACM, [5] L. Duan, I. W. Tsang, D. Xu, and T-S Chua. Domain adatation from multile sources via auxiliary classifiers. In Proceedings of the 26th Annual International Conference on Machine Learning, ages ACM, [6] T. Evgeniou and M. Pontil. Regularized multi task learning. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ages ACM, [7] L. Györfi. A distribution-free theory of nonarametric regression. Sringer Science & Business Media, [8] X. He, D. Cai, S. Yan, and H-J Zhang. Neighborhood reserving embedding. In Comuter Vision, ICCV Tenth IEEE International Conference on, volume 2, ages IEEE,

15 [9] T. Kanamori, S. Hido, and M. Sugiyama. A least-squares aroach to direct imortance estimation. Journal of Machine Learning Research, 0:39 445, [0] S. Kullback and R. A. Leibler. On information and sufficiency. The Annals of Mathematical Statistics, 22:79 86, 95. [] W. K. Newey and D. McFadden. Large samle estimation and hyothesis testing. Handbook of econometrics, 4:2 2245, 994. [2] S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 220: , 200. [3] R. Raina, A. Y. Ng, and D. Koller. Constructing informative riors using transfer learning. In Proceedings of the 23rd International Conference on Machine Learning, ages ACM, [4] M. Sugiyama, S. Nakajima, H. Kashima, P. von Bünau, and M. Kawanabe. Direct imortance estimation with model selection and its alication to covariate shift adatation. In J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, editors, Advances in Neural Information Processing Systems 20. Curran Associates, Inc., [5] M. Sugiyama, T. Suzuki, S. Nakajima, H. Kashima, P. von Bünau, and M. Kawanabe. Direct imortance estimation for covariate shift adatation. Annals of the Institute of Statistical Mathematics, 604: , [6] V. N. Vanik. Statistical Learning Theory. Wiley, New York, NY, USA, 998. [7] L. Wasserman. All of Statistics: A Concise Course in Statistical Inference. Sringer Publishing Comany, Incororated, 200. [8] J. Yang, R. Yan, and A. G. Hautmann. Cross-domain video concet detection using adative svms. In Proceedings of the 5th International Conference on Multimedia, ages ACM, [9] Y. Zhang, X. Hu, and Y. Fang. Logistic regression for transductive transfer learning from multile sources. In L. Cao, J. Zhong, and Y. Feng, editors, Advanced Data Mining and Alications, volume 644 of Lecture Notes in Comuter Science, ages Sringer Berlin Heidelberg, 200. [20] J. Zhu and T. Hastie. Kernel logistic regression and the imort vector machine. In Advances in Neural Information Processing Systems, ages ,

16 Aendix, Proof for Lemma Proof. First, we decomose suremum of the aroximation error of the emirical objective function: su P n log g n y, x; P log gy, x; = su P n fy, x P n log N n ; x P fy, x P log N ; x su Pn P fy, x + su P n log N n ; x P log N ; x su Pn P fy, x + su P n P log N n ; x + P log N n ; x P log N ; x su Pn P fy, x + 0 su P n P log N n ; x + su P log N n ; x P log N ; x The first two terms in 0 is due to the aroximation using samles from D P, while the third term is the model aroximation error caused by using k-nn to aroximate Nx;. The first two terms are relatively easy to bound. The Uniform Law of Large Numbers see, e.g. Lemma 2.4 in [] can be alied to show the first two terms converges to 0 in robability, since i. is comact, ii. both fy, x and log N n are continuous over, iii. both above functions are Lischitz continuous as we will show later. As to the third term, we first rove for all ɛ > 0 Prob su P log N n ; x P log N ; x > ɛ 0 6

17 by using the following inequality: log a log b a b b. su P log N n ; x P log N ; x su su P P N n ; x N ; x N ; x N n ; x N ; x N ; x M max su P N n ; x N ; x R max M max su Q N n ; x N ; x To show the final line converges to 0 with robability one, we use the Generic Uniform Law of Large Numbers Generic ULLN see [] Theorem.: Theorem 2 Generic ULLN. For a random sequence {G n, Θ, n }, if Θ is a totally bounded metric sace, G n is stochastic equicontinous SE and G n 0,, then su G n 0 as n. Since by assumtion, is bounded. We now verify the rest two conditions of this theorem. The universal consistency of k NN has been roved see [7], Theorem 23.8, Here we restate the results for our conveniences: Theorem 3 Universal consistency of KNN. Given Z is bounded, assume that for each x, the random variable X x is absolutely continuous, if k n / log n and k n /n 0, k n NN estimator is strongly universally consistent, i.e., 2 lim z j E [Z X = x] n k dµx 0 j N n,k x with robability one for all distributions Z, X, where µx is the robability measure of x. From Jensen s inequality, we have k k j N n,k x j N n,k x z j E [Z X = x] dµx 2 z j E [Z X = x] dµx, and it can be seen that the left hand side also converges to 0 in robability. By using the Continuous Maing Theorem, we can finally show that k j N n,k x z j E [Z X = x] dµx converges to 0 in robability. 7 2

18 We let Z = ex fy, X be a new random variable and thus we have samles {z,i, x i } n Z, X drawn from distribution Q, and Q = N n ; x N ; x µx z,j E [Z X = x] k dµx. j N n,k x By alying the Theorem 3, we can conclude, such Q N n ; x N ; x converges 0 in robability for all distribution Z, X indexed by arameter. Next, we verify the SE of Q N n ; x N ; x. Given Assumtion, we have Q N n ; x N ; x Q N n ; x N ; x Q N n ; x N n ; x + N ; x N ; x 2M max F max 2. 2 The last line is due to Mean-value Theorem: ex fy, x ex fy, x fy, x ex fy, x F max M max, where is a vector in-between and elementwisely. In fact, 2 shows the function Q N n ; x N ; x is Lischitz continuous with resect to, and according to Lemma 2 in [], it imlies SE. Similarly, one can show that N n x; is Lischitz continuous. Now we can utilize the roerty of i. boundedness of, ii. SE and iii. universal consistency to conclude that su Q N n ; x N ; x 0, and due to : Prob su P log N n ; x P log N ; x ɛ 0. Similarly, one can rove that Prob su P log N ; x P log N n ; x ɛ 0. As a consequence, the third term in

19 After obtaining Lemma, the rest is similar to the roof of Theorem 9.3 in [7]. Let M := P log gy, x; and M n,n := P n log g n y, x; and M Mˆ = M M n,n ˆ + M n,n ˆ Mˆ M M n,n + M n,n ˆ Mˆ M M n,n + su M n,n M. The last line converges to 0 in robability is roved in Lemma. Therefore, we can write: ɛ > 0, P M Mˆ ɛ 0. Due to Assumtion 3, for an arbitrary choice of ɛ 0 > 0, if ˆ ɛ 0, there must be a ɛ > 0, so that M Mˆ > ɛ. Therefore, we conclude ɛ 0 > 0, P ˆ ɛ 0 P M Mˆ ɛ 0. Also, M n ˆ M = M n ˆ Mˆ + Mˆ M. Due to Lemma, it converges to 0 in robability. Therefore, we have lˆ; D P, D Q KL [ q]. Tuning Parameters in Posterior Ratio Estimation k in k-nn: As it is mentioned in Section 7., k is tuned via 5-fold cross validation, and is based on the testing criterion: MSE = D HO j D HO Z q j k jj N x j q Z q jj 2, 3 where D HO is a holdout dataset and Z q i = ex fy q i, x q i. However, such value deends on and it changes every iteration during the gradient decent. Instead of tuning k after each iteration, we follow a simle heuristics: Fix k and run gradient descent. 2 choose a suitable k that minimizes 3. and 2 are reeatedly carried out until converge. Such heuristics have very good erformance in exeriments. 9

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-step monotone missing data Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo

More information

Probability Estimates for Multi-class Classification by Pairwise Coupling

Probability Estimates for Multi-class Classification by Pairwise Coupling Probability Estimates for Multi-class Classification by Pairwise Couling Ting-Fan Wu Chih-Jen Lin Deartment of Comuter Science National Taiwan University Taiei 06, Taiwan Ruby C. Weng Deartment of Statistics

More information

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)] LECTURE 7 NOTES 1. Convergence of random variables. Before delving into the large samle roerties of the MLE, we review some concets from large samle theory. 1. Convergence in robability: x n x if, for

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Journal of Modern Alied Statistical Methods Volume Issue Article 7 --03 A Comarison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Ghadban Khalaf King Khalid University, Saudi

More information

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers

More information

Elementary Analysis in Q p

Elementary Analysis in Q p Elementary Analysis in Q Hannah Hutter, May Szedlák, Phili Wirth November 17, 2011 This reort follows very closely the book of Svetlana Katok 1. 1 Sequences and Series In this section we will see some

More information

Named Entity Recognition using Maximum Entropy Model SEEM5680

Named Entity Recognition using Maximum Entropy Model SEEM5680 Named Entity Recognition using Maximum Entroy Model SEEM5680 Named Entity Recognition System Named Entity Recognition (NER): Identifying certain hrases/word sequences in a free text. Generally it involves

More information

Analysis of some entrance probabilities for killed birth-death processes

Analysis of some entrance probabilities for killed birth-death processes Analysis of some entrance robabilities for killed birth-death rocesses Master s Thesis O.J.G. van der Velde Suervisor: Dr. F.M. Sieksma July 5, 207 Mathematical Institute, Leiden University Contents Introduction

More information

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch

More information

On split sample and randomized confidence intervals for binomial proportions

On split sample and randomized confidence intervals for binomial proportions On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have

More information

Lecture 2: Consistency of M-estimators

Lecture 2: Consistency of M-estimators Lecture 2: Instructor: Deartment of Economics Stanford University Preared by Wenbo Zhou, Renmin University References Takeshi Amemiya, 1985, Advanced Econometrics, Harvard University Press Newey and McFadden,

More information

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests 009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract

More information

Machine Learning: Homework 4

Machine Learning: Homework 4 10-601 Machine Learning: Homework 4 Due 5.m. Monday, February 16, 2015 Instructions Late homework olicy: Homework is worth full credit if submitted before the due date, half credit during the next 48 hours,

More information

#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS

#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS #A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS Ramy F. Taki ElDin Physics and Engineering Mathematics Deartment, Faculty of Engineering, Ain Shams University, Cairo, Egyt

More information

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material Robustness of classifiers to uniform l and Gaussian noise Sulementary material Jean-Yves Franceschi Ecole Normale Suérieure de Lyon LIP UMR 5668 Omar Fawzi Ecole Normale Suérieure de Lyon LIP UMR 5668

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

Distributed Rule-Based Inference in the Presence of Redundant Information

Distributed Rule-Based Inference in the Presence of Redundant Information istribution Statement : roved for ublic release; distribution is unlimited. istributed Rule-ased Inference in the Presence of Redundant Information June 8, 004 William J. Farrell III Lockheed Martin dvanced

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

where x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H:

where x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H: Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 25, 2017 Due: November 08, 2017 A. Growth function Growth function of stum functions.

More information

Asymptotically Optimal Simulation Allocation under Dependent Sampling

Asymptotically Optimal Simulation Allocation under Dependent Sampling Asymtotically Otimal Simulation Allocation under Deendent Samling Xiaoing Xiong The Robert H. Smith School of Business, University of Maryland, College Park, MD 20742-1815, USA, xiaoingx@yahoo.com Sandee

More information

Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis

Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis HIPAD LAB: HIGH PERFORMANCE SYSTEMS LABORATORY DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING AND EARTH SCIENCES Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis Why use metamodeling

More information

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management

More information

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI ** Iranian Journal of Science & Technology, Transaction A, Vol 3, No A3 Printed in The Islamic Reublic of Iran, 26 Shiraz University Research Note REGRESSION ANALYSIS IN MARKOV HAIN * A Y ALAMUTI AND M R

More information

STK4900/ Lecture 7. Program

STK4900/ Lecture 7. Program STK4900/9900 - Lecture 7 Program 1. Logistic regression with one redictor 2. Maximum likelihood estimation 3. Logistic regression with several redictors 4. Deviance and likelihood ratio tests 5. A comment

More information

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK Towards understanding the Lorenz curve using the Uniform distribution Chris J. Stehens Newcastle City Council, Newcastle uon Tyne, UK (For the Gini-Lorenz Conference, University of Siena, Italy, May 2005)

More information

MATH 2710: NOTES FOR ANALYSIS

MATH 2710: NOTES FOR ANALYSIS MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite

More information

1 Extremum Estimators

1 Extremum Estimators FINC 9311-21 Financial Econometrics Handout Jialin Yu 1 Extremum Estimators Let θ 0 be a vector of k 1 unknown arameters. Extremum estimators: estimators obtained by maximizing or minimizing some objective

More information

Elementary theory of L p spaces

Elementary theory of L p spaces CHAPTER 3 Elementary theory of L saces 3.1 Convexity. Jensen, Hölder, Minkowski inequality. We begin with two definitions. A set A R d is said to be convex if, for any x 0, x 1 2 A x = x 0 + (x 1 x 0 )

More information

Finite Mixture EFA in Mplus

Finite Mixture EFA in Mplus Finite Mixture EFA in Mlus November 16, 2007 In this document we describe the Mixture EFA model estimated in Mlus. Four tyes of deendent variables are ossible in this model: normally distributed, ordered

More information

Topic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar

Topic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar 15-859(M): Randomized Algorithms Lecturer: Anuam Guta Toic: Lower Bounds on Randomized Algorithms Date: Setember 22, 2004 Scribe: Srinath Sridhar 4.1 Introduction In this lecture, we will first consider

More information

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process P. Mantalos a1, K. Mattheou b, A. Karagrigoriou b a.deartment of Statistics University of Lund

More information

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split A Bound on the Error of Cross Validation Using the Aroximation and Estimation Rates, with Consequences for the Training-Test Slit Michael Kearns AT&T Bell Laboratories Murray Hill, NJ 7974 mkearns@research.att.com

More information

Hotelling s Two- Sample T 2

Hotelling s Two- Sample T 2 Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test

More information

Principal Components Analysis and Unsupervised Hebbian Learning

Principal Components Analysis and Unsupervised Hebbian Learning Princial Comonents Analysis and Unsuervised Hebbian Learning Robert Jacobs Deartment of Brain & Cognitive Sciences University of Rochester Rochester, NY 1467, USA August 8, 008 Reference: Much of the material

More information

Notes on Instrumental Variables Methods

Notes on Instrumental Variables Methods Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of

More information

Estimating function analysis for a class of Tweedie regression models

Estimating function analysis for a class of Tweedie regression models Title Estimating function analysis for a class of Tweedie regression models Author Wagner Hugo Bonat Deartamento de Estatística - DEST, Laboratório de Estatística e Geoinformação - LEG, Universidade Federal

More information

Universal Finite Memory Coding of Binary Sequences

Universal Finite Memory Coding of Binary Sequences Deartment of Electrical Engineering Systems Universal Finite Memory Coding of Binary Sequences Thesis submitted towards the degree of Master of Science in Electrical and Electronic Engineering in Tel-Aviv

More information

A New Perspective on Learning Linear Separators with Large L q L p Margins

A New Perspective on Learning Linear Separators with Large L q L p Margins A New Persective on Learning Linear Searators with Large L q L Margins Maria-Florina Balcan Georgia Institute of Technology Christoher Berlind Georgia Institute of Technology Abstract We give theoretical

More information

Convex Optimization methods for Computing Channel Capacity

Convex Optimization methods for Computing Channel Capacity Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem

More information

On-Line Appendix. Matching on the Estimated Propensity Score (Abadie and Imbens, 2015)

On-Line Appendix. Matching on the Estimated Propensity Score (Abadie and Imbens, 2015) On-Line Aendix Matching on the Estimated Proensity Score Abadie and Imbens, 205 Alberto Abadie and Guido W. Imbens Current version: August 0, 205 The first art of this aendix contains additional roofs.

More information

Metrics Performance Evaluation: Application to Face Recognition

Metrics Performance Evaluation: Application to Face Recognition Metrics Performance Evaluation: Alication to Face Recognition Naser Zaeri, Abeer AlSadeq, and Abdallah Cherri Electrical Engineering Det., Kuwait University, P.O. Box 5969, Safat 6, Kuwait {zaery, abeer,

More information

ESTIMATION OF THE RECIPROCAL OF THE MEAN OF THE INVERSE GAUSSIAN DISTRIBUTION WITH PRIOR INFORMATION

ESTIMATION OF THE RECIPROCAL OF THE MEAN OF THE INVERSE GAUSSIAN DISTRIBUTION WITH PRIOR INFORMATION STATISTICA, anno LXVIII, n., 008 ESTIMATION OF THE RECIPROCAL OF THE MEAN OF THE INVERSE GAUSSIAN DISTRIBUTION WITH PRIOR INFORMATION 1. INTRODUCTION The Inverse Gaussian distribution was first introduced

More information

Feedback-error control

Feedback-error control Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller

More information

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.

More information

NONLINEAR OPTIMIZATION WITH CONVEX CONSTRAINTS. The Goldstein-Levitin-Polyak algorithm

NONLINEAR OPTIMIZATION WITH CONVEX CONSTRAINTS. The Goldstein-Levitin-Polyak algorithm - (23) NLP - NONLINEAR OPTIMIZATION WITH CONVEX CONSTRAINTS The Goldstein-Levitin-Polya algorithm We consider an algorithm for solving the otimization roblem under convex constraints. Although the convexity

More information

SAS for Bayesian Mediation Analysis

SAS for Bayesian Mediation Analysis Paer 1569-2014 SAS for Bayesian Mediation Analysis Miočević Milica, Arizona State University; David P. MacKinnon, Arizona State University ABSTRACT Recent statistical mediation analysis research focuses

More information

An Analysis of Reliable Classifiers through ROC Isometrics

An Analysis of Reliable Classifiers through ROC Isometrics An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. Srinkhuizen-Kuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICC-IKAT, Universiteit

More information

DEPARTMENT OF ECONOMICS ISSN DISCUSSION PAPER 20/07 TWO NEW EXPONENTIAL FAMILIES OF LORENZ CURVES

DEPARTMENT OF ECONOMICS ISSN DISCUSSION PAPER 20/07 TWO NEW EXPONENTIAL FAMILIES OF LORENZ CURVES DEPARTMENT OF ECONOMICS ISSN 1441-549 DISCUSSION PAPER /7 TWO NEW EXPONENTIAL FAMILIES OF LORENZ CURVES ZuXiang Wang * & Russell Smyth ABSTRACT We resent two new Lorenz curve families by using the basic

More information

On Wald-Type Optimal Stopping for Brownian Motion

On Wald-Type Optimal Stopping for Brownian Motion J Al Probab Vol 34, No 1, 1997, (66-73) Prerint Ser No 1, 1994, Math Inst Aarhus On Wald-Tye Otimal Stoing for Brownian Motion S RAVRSN and PSKIR The solution is resented to all otimal stoing roblems of

More information

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling Scaling Multile Point Statistics or Non-Stationary Geostatistical Modeling Julián M. Ortiz, Steven Lyster and Clayton V. Deutsch Centre or Comutational Geostatistics Deartment o Civil & Environmental Engineering

More information

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK Comuter Modelling and ew Technologies, 5, Vol.9, o., 3-39 Transort and Telecommunication Institute, Lomonosov, LV-9, Riga, Latvia MATHEMATICAL MODELLIG OF THE WIRELESS COMMUICATIO ETWORK M. KOPEETSK Deartment

More information

arxiv:cond-mat/ v2 25 Sep 2002

arxiv:cond-mat/ v2 25 Sep 2002 Energy fluctuations at the multicritical oint in two-dimensional sin glasses arxiv:cond-mat/0207694 v2 25 Se 2002 1. Introduction Hidetoshi Nishimori, Cyril Falvo and Yukiyasu Ozeki Deartment of Physics,

More information

Inference for Empirical Wasserstein Distances on Finite Spaces: Supplementary Material

Inference for Empirical Wasserstein Distances on Finite Spaces: Supplementary Material Inference for Emirical Wasserstein Distances on Finite Saces: Sulementary Material Max Sommerfeld Axel Munk Keywords: otimal transort, Wasserstein distance, central limit theorem, directional Hadamard

More information

IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES

IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES OHAD GILADI AND ASSAF NAOR Abstract. It is shown that if (, ) is a Banach sace with Rademacher tye 1 then for every n N there exists

More information

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data Quality Technology & Quantitative Management Vol. 1, No.,. 51-65, 15 QTQM IAQM 15 Lower onfidence Bound for Process-Yield Index with Autocorrelated Process Data Fu-Kwun Wang * and Yeneneh Tamirat Deartment

More information

Uniform Law on the Unit Sphere of a Banach Space

Uniform Law on the Unit Sphere of a Banach Space Uniform Law on the Unit Shere of a Banach Sace by Bernard Beauzamy Société de Calcul Mathématique SA Faubourg Saint Honoré 75008 Paris France Setember 008 Abstract We investigate the construction of a

More information

A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST BASED ON THE WEIBULL DISTRIBUTION

A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST BASED ON THE WEIBULL DISTRIBUTION O P E R A T I O N S R E S E A R C H A N D D E C I S I O N S No. 27 DOI:.5277/ord73 Nasrullah KHAN Muhammad ASLAM 2 Kyung-Jun KIM 3 Chi-Hyuck JUN 4 A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST

More information

Inequalities for the L 1 Deviation of the Empirical Distribution

Inequalities for the L 1 Deviation of the Empirical Distribution Inequalities for the L 1 Deviation of the Emirical Distribution Tsachy Weissman, Erik Ordentlich, Gadiel Seroussi, Sergio Verdu, Marcelo J. Weinberger June 13, 2003 Abstract We derive bounds on the robability

More information

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014 Morten Frydenberg Section for Biostatistics Version :Friday, 05 Setember 204 All models are aroximations! The best model does not exist! Comlicated models needs a lot of data. lower your ambitions or get

More information

New Schedulability Test Conditions for Non-preemptive Scheduling on Multiprocessor Platforms

New Schedulability Test Conditions for Non-preemptive Scheduling on Multiprocessor Platforms New Schedulability Test Conditions for Non-reemtive Scheduling on Multirocessor Platforms Technical Reort May 2008 Nan Guan 1, Wang Yi 2, Zonghua Gu 3 and Ge Yu 1 1 Northeastern University, Shenyang, China

More information

The non-stochastic multi-armed bandit problem

The non-stochastic multi-armed bandit problem Submitted for journal ublication. The non-stochastic multi-armed bandit roblem Peter Auer Institute for Theoretical Comuter Science Graz University of Technology A-8010 Graz (Austria) auer@igi.tu-graz.ac.at

More information

t 0 Xt sup X t p c p inf t 0

t 0 Xt sup X t p c p inf t 0 SHARP MAXIMAL L -ESTIMATES FOR MARTINGALES RODRIGO BAÑUELOS AND ADAM OSȨKOWSKI ABSTRACT. Let X be a suermartingale starting from 0 which has only nonnegative jums. For each 0 < < we determine the best

More information

Quantitative estimates of propagation of chaos for stochastic systems with W 1, kernels

Quantitative estimates of propagation of chaos for stochastic systems with W 1, kernels oname manuscrit o. will be inserted by the editor) Quantitative estimates of roagation of chaos for stochastic systems with W, kernels Pierre-Emmanuel Jabin Zhenfu Wang Received: date / Acceted: date Abstract

More information

Unsupervised Hyperspectral Image Analysis Using Independent Component Analysis (ICA)

Unsupervised Hyperspectral Image Analysis Using Independent Component Analysis (ICA) Unsuervised Hyersectral Image Analysis Using Indeendent Comonent Analysis (ICA) Shao-Shan Chiang Chein-I Chang Irving W. Ginsberg Remote Sensing Signal and Image Processing Laboratory Deartment of Comuter

More information

State Estimation with ARMarkov Models

State Estimation with ARMarkov Models Deartment of Mechanical and Aerosace Engineering Technical Reort No. 3046, October 1998. Princeton University, Princeton, NJ. State Estimation with ARMarkov Models Ryoung K. Lim 1 Columbia University,

More information

Hidden Predictors: A Factor Analysis Primer

Hidden Predictors: A Factor Analysis Primer Hidden Predictors: A Factor Analysis Primer Ryan C Sanchez Western Washington University Factor Analysis is a owerful statistical method in the modern research sychologist s toolbag When used roerly, factor

More information

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test) Chater 225 Tests for Two Proortions in a Stratified Design (Cochran/Mantel-Haenszel Test) Introduction In a stratified design, the subects are selected from two or more strata which are formed from imortant

More information

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity Bayesian Satially Varying Coefficient Models in the Presence of Collinearity David C. Wheeler 1, Catherine A. Calder 1 he Ohio State University 1 Abstract he belief that relationshis between exlanatory

More information

Detection Algorithm of Particle Contamination in Reticle Images with Continuous Wavelet Transform

Detection Algorithm of Particle Contamination in Reticle Images with Continuous Wavelet Transform Detection Algorithm of Particle Contamination in Reticle Images with Continuous Wavelet Transform Chaoquan Chen and Guoing Qiu School of Comuter Science and IT Jubilee Camus, University of Nottingham Nottingham

More information

SCHUR S LEMMA AND BEST CONSTANTS IN WEIGHTED NORM INEQUALITIES. Gord Sinnamon The University of Western Ontario. December 27, 2003

SCHUR S LEMMA AND BEST CONSTANTS IN WEIGHTED NORM INEQUALITIES. Gord Sinnamon The University of Western Ontario. December 27, 2003 SCHUR S LEMMA AND BEST CONSTANTS IN WEIGHTED NORM INEQUALITIES Gord Sinnamon The University of Western Ontario December 27, 23 Abstract. Strong forms of Schur s Lemma and its converse are roved for mas

More information

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points. Solved Problems Solved Problems P Solve the three simle classification roblems shown in Figure P by drawing a decision boundary Find weight and bias values that result in single-neuron ercetrons with the

More information

Best approximation by linear combinations of characteristic functions of half-spaces

Best approximation by linear combinations of characteristic functions of half-spaces Best aroximation by linear combinations of characteristic functions of half-saces Paul C. Kainen Deartment of Mathematics Georgetown University Washington, D.C. 20057-1233, USA Věra Kůrková Institute of

More information

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition A Qualitative Event-based Aroach to Multile Fault Diagnosis in Continuous Systems using Structural Model Decomosition Matthew J. Daigle a,,, Anibal Bregon b,, Xenofon Koutsoukos c, Gautam Biswas c, Belarmino

More information

Covariance Matrix Estimation for Reinforcement Learning

Covariance Matrix Estimation for Reinforcement Learning Covariance Matrix Estimation for Reinforcement Learning Tomer Lancewicki Deartment of Electrical Engineering and Comuter Science University of Tennessee Knoxville, TN 37996 tlancewi@utk.edu Itamar Arel

More information

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS Proceedings of DETC 03 ASME 003 Design Engineering Technical Conferences and Comuters and Information in Engineering Conference Chicago, Illinois USA, Setember -6, 003 DETC003/DAC-48760 AN EFFICIENT ALGORITHM

More information

Partial Identification in Triangular Systems of Equations with Binary Dependent Variables

Partial Identification in Triangular Systems of Equations with Binary Dependent Variables Partial Identification in Triangular Systems of Equations with Binary Deendent Variables Azeem M. Shaikh Deartment of Economics University of Chicago amshaikh@uchicago.edu Edward J. Vytlacil Deartment

More information

Optimal Design of Truss Structures Using a Neutrosophic Number Optimization Model under an Indeterminate Environment

Optimal Design of Truss Structures Using a Neutrosophic Number Optimization Model under an Indeterminate Environment Neutrosohic Sets and Systems Vol 14 016 93 University of New Mexico Otimal Design of Truss Structures Using a Neutrosohic Number Otimization Model under an Indeterminate Environment Wenzhong Jiang & Jun

More information

Linear diophantine equations for discrete tomography

Linear diophantine equations for discrete tomography Journal of X-Ray Science and Technology 10 001 59 66 59 IOS Press Linear diohantine euations for discrete tomograhy Yangbo Ye a,gewang b and Jiehua Zhu a a Deartment of Mathematics, The University of Iowa,

More information

Sums of independent random variables

Sums of independent random variables 3 Sums of indeendent random variables This lecture collects a number of estimates for sums of indeendent random variables with values in a Banach sace E. We concentrate on sums of the form N γ nx n, where

More information

COMMUNICATION BETWEEN SHAREHOLDERS 1

COMMUNICATION BETWEEN SHAREHOLDERS 1 COMMUNICATION BTWN SHARHOLDRS 1 A B. O A : A D Lemma B.1. U to µ Z r 2 σ2 Z + σ2 X 2r ω 2 an additive constant that does not deend on a or θ, the agents ayoffs can be written as: 2r rθa ω2 + θ µ Y rcov

More information

Brownian Motion and Random Prime Factorization

Brownian Motion and Random Prime Factorization Brownian Motion and Random Prime Factorization Kendrick Tang June 4, 202 Contents Introduction 2 2 Brownian Motion 2 2. Develoing Brownian Motion.................... 2 2.. Measure Saces and Borel Sigma-Algebras.........

More information

Improved Capacity Bounds for the Binary Energy Harvesting Channel

Improved Capacity Bounds for the Binary Energy Harvesting Channel Imroved Caacity Bounds for the Binary Energy Harvesting Channel Kaya Tutuncuoglu 1, Omur Ozel 2, Aylin Yener 1, and Sennur Ulukus 2 1 Deartment of Electrical Engineering, The Pennsylvania State University,

More information

New Information Measures for the Generalized Normal Distribution

New Information Measures for the Generalized Normal Distribution Information,, 3-7; doi:.339/info3 OPEN ACCESS information ISSN 75-7 www.mdi.com/journal/information Article New Information Measures for the Generalized Normal Distribution Christos P. Kitsos * and Thomas

More information

Generalized optimal sub-pattern assignment metric

Generalized optimal sub-pattern assignment metric Generalized otimal sub-attern assignment metric Abu Sajana Rahmathullah, Ángel F García-Fernández, Lennart Svensson arxiv:6005585v7 [cssy] 2 Se 208 Abstract This aer resents the generalized otimal subattern

More information

On Isoperimetric Functions of Probability Measures Having Log-Concave Densities with Respect to the Standard Normal Law

On Isoperimetric Functions of Probability Measures Having Log-Concave Densities with Respect to the Standard Normal Law On Isoerimetric Functions of Probability Measures Having Log-Concave Densities with Resect to the Standard Normal Law Sergey G. Bobkov Abstract Isoerimetric inequalities are discussed for one-dimensional

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition TNN-2007-P-0332.R1 1 Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition Haiing Lu, K.N. Plataniotis and A.N. Venetsanooulos The Edward S. Rogers

More information

A New Asymmetric Interaction Ridge (AIR) Regression Method

A New Asymmetric Interaction Ridge (AIR) Regression Method A New Asymmetric Interaction Ridge (AIR) Regression Method by Kristofer Månsson, Ghazi Shukur, and Pär Sölander The Swedish Retail Institute, HUI Research, Stockholm, Sweden. Deartment of Economics and

More information

Estimation of Separable Representations in Psychophysical Experiments

Estimation of Separable Representations in Psychophysical Experiments Estimation of Searable Reresentations in Psychohysical Exeriments Michele Bernasconi (mbernasconi@eco.uninsubria.it) Christine Choirat (cchoirat@eco.uninsubria.it) Raffaello Seri (rseri@eco.uninsubria.it)

More information

ECE 534 Information Theory - Midterm 2

ECE 534 Information Theory - Midterm 2 ECE 534 Information Theory - Midterm Nov.4, 009. 3:30-4:45 in LH03. You will be given the full class time: 75 minutes. Use it wisely! Many of the roblems have short answers; try to find shortcuts. You

More information

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model Shadow Comuting: An Energy-Aware Fault Tolerant Comuting Model Bryan Mills, Taieb Znati, Rami Melhem Deartment of Comuter Science University of Pittsburgh (bmills, znati, melhem)@cs.itt.edu Index Terms

More information

Evaluating Process Capability Indices for some Quality Characteristics of a Manufacturing Process

Evaluating Process Capability Indices for some Quality Characteristics of a Manufacturing Process Journal of Statistical and Econometric Methods, vol., no.3, 013, 105-114 ISSN: 051-5057 (rint version), 051-5065(online) Scienress Ltd, 013 Evaluating Process aability Indices for some Quality haracteristics

More information

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Ketan N. Patel, Igor L. Markov and John P. Hayes University of Michigan, Ann Arbor 48109-2122 {knatel,imarkov,jhayes}@eecs.umich.edu

More information

On the capacity of the general trapdoor channel with feedback

On the capacity of the general trapdoor channel with feedback On the caacity of the general tradoor channel with feedback Jui Wu and Achilleas Anastasooulos Electrical Engineering and Comuter Science Deartment University of Michigan Ann Arbor, MI, 48109-1 email:

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information