Ordinal Distance Metric Learning with MDS for Image Ranking
Ordinal Distance Metric Learning with MDS for Image Ranking
Yu, Panpan;Li, Qingna
2019-02-27 00:00:00
Panpan Yu School of Mathematics and Statistics Beijing Institute of Technology Beijing, 100081, China 2120151335@bit.edu.cn Qing-Na Li* School of Mathematics and Statistics Beijing Key Laboratory on MCAACI Beijing Institute of Technology Beijing, 100081, China qnl@bit.edu.cn Image ranking is to rank images based on some known ranked images. In this paper, we propose an improved linear ordinal distance metric learning approach based on the linear distance metric learning model in Li et al. (2015). By decomposing the distance metric A as L L, the problem can be cast as looking for a linear map between two sets of points in dierent spaces, meanwhile maintaining some data structures. The ordinal relation of the labels can be maintained via classical multidimensional scaling, a popular tool for dimension reduction in statistics. A least squares tting term is then introduced to the cost function, which can also maintain the local data structure. The resulting model is an unconstrained problem, and can better t the data structure. Extensive numerical results demonstrate the improvement of the new approach over the linear distance metric learning model both in speed and ranking performance. Keywords: Image ranking; distance metric learning; classical multidimensional scaling; optimization model. *Corresponding author arXiv:1902.10284v1 [cs.LG] 27 Feb 2019 Ordinal Distance Metric Learning with MDS 1. Introduction Given a labeled image dataset (referred as the training set), image ranking is to nd the most relevant images for a query image based on the training set. Dierent from binary classi cation and multi- classi cation, the labels of the training set in image ranking often have an order, for example, age. The two important and challenging aims for image ranking are as follows. The rst aim is to nd which class the query image belongs to, and the second is to nd the most relevant images in the speci c class. The rst aim actually falls into ordinal regression in statistics, where dierent approaches have been proposed, see Gutierrez et al. (2016) for a survey on ordinal regression and Qiao (2015), Wang et al. (2017) for the recent development. However, the second aim makes image ranking dierent from ordinal regression since the training images having the same label with query image need to be further ranked. Therefore, a direct extension of methods for ordinal regression is not appropriate for image ranking. As for the second aim, to nd the most relevant images, a natural way is to use Euclidean distance between images to measure their dissimilarities. However, as we will show later, in most cases, Euclidean distance is not appropriate for dissimilarity. A practical way is to learn a distance metric (denoted as A) to measure the distances between images. This is referred as distance metric learning (DML). Then for a query image, the most relevant images are those with smallest distances under metric A. Many DML methods have been developed for image classi cation and clustering tasks. For example, the SDP approach proposed by Xing et al. (2003), an online learning algorithm proposed by Shalev-Shwartz et al. (2004), a neighborhood component analysis (NCA) by Goldberger et al. (2004), and so on (Bar-hillel et al. (2003); Shen et al. (2010); Yang et al. (2007)). However, most of these methods didn't assume the labels are ordered. Therefore, they can not be directly used for image ranking. Recently, Li et al. (2015) rstly introduced ordinal DML for image ranking. By a carefully designed weighting factor based on ordinal labels, the ordinal relationship of the images is expected to be main- tained. An alternating iterative update was proposed to solve the resulting nonlinear convex semide nite programming model, which is basically a projected gradient algorithm. On the other hand, multidimensional scaling (MDS) is an important method for dimension reduction, which has been widely used in signal processing, molecular conformation, psychometrics and social mea- surement. We refer to some monographs and surveys for more applications (Anjos and Lasserre (2012); Borg and Groenen (2005); Dattorro (2008); Dokmanic et al. (2015); Liberti et al. (2014)). The idea of classical MDS (cMDS) is to embed the given objects into a low dimensional space based on a Euclidean distance matrix. Recently, there has been great progress in MDS, such as the semismooth Newton method for nearest Euclidean distance matrix problem (Qi (2013); Qi and Yuan (2014)), the inexact smoothing Newton method for nonmetric MDS (Li and Qi (2017)), as well as the applications of MDS in nonlinear dimension reduction (Ding and Qi (2016, 2017)), binary code learning (Dai et al. (2016)), and sensor 2 Ordinal Distance Metric Learning with MDS network localization (Qi et al. (2013)). Our Contributions Note that the distance metric A in DML is positive semide nite. We represent A as A = L L, where L is a rectangular matrix. The rst contribution of our work is that we look for L instead of A, which gets around of positive semide nite constraint on A. As a result, our method does not need spectral decomposition in each iteration and thus has quite low computational complexity. Moreover, if L has only a few rows, the obtained A is low rank. This brings new insight on distance metric. Distances between images under A are basically the Euclidean distance between new points in a new space. The second contribution is that we employ cMDS to get the ideal points in the new space, whose Euclidean distances keep the ordinal relations as the labels do. In other words, cMDS is a key step to achieve the goal of maintaining the ordinal relationship of the data. The third contribution is that we propose a new ordinal DML model, which concerns ordinal relations between images and maintains local data structure. Extensive experiments are conducted on two data sets: UMIST face dataset and FG-NET aging dataset. The results demonstrate the eciency and improvement of the new approach over the linear DML model in Li et al. (2015) both in speed and ranking performance. The organization of the paper is as follows. In Section 2, we give some preliminaries about DML model in Li et al. (2015) and cMDS. In Section 3, we propose our new approach, referred as cMDS-DML approach. In Section 4, we discuss the numerical algorithm to solve the resulting unconstrained problem. In Section 5 we report the numerical results to demonstrate the eciency of the proposed model. Final conclusions are given in Section 6. n n Notations. We use S to denote the space of symmetric matrices of n n, and S to denote the space of positive semide nite matrices of n n, and A 0 means A 2 S . We use small bold letters to indicate vectors. 2. Preliminaries In this section, we give a brief review on the linear DML model in Li et al. (2015) and then give some preliminaries on cMDS. 2.1. Problem Statement Suppose X = f(x ; r ) : i = 1; ; ng is the training set, where x 2 IR , i = 1; ; n, are the observed i i i data, and r 2 IR, i = 1; ; n, are the corresponding labels which have an order. n is sample number of the training set. We need the following assumptions. Assumption 1. Suppose there are total m dierent ordinal labels. Assume that the data in the training 3 Ordinal Distance Metric Learning with MDS set are grouped as follows x ; ; x ; with labels r = = r := a ; 1 i 1 i 1 1 1 x ; ; x ; with labels r = = r := a ; i +1 i i +1 i 2 1 2 1 2 x ; ; x ; with labels r = = r := a ; i +1 i i +1 i m m 1 m m 1 m where i = n, and a ; ; a are distinct ordinal labels. m 1 m Assumption 2. Suppose x ; ; x are zero-centralized, i.e., x = 0. 1 n i i=1 To rank images, the distance metric learning approach uses the distance d (;) de ned by d (x ; x ) = kx x k = (x x ) A(x x ); A i j i j A i j i j where A 2 S is positive semide nite. The goal is then to learn an appropriate A, such that the distances under metric A between relevant images are small. Once A is obtained, the most relevant images of a query image can be provided as those with smallest distances under A. To this end, one expects A to have two properties. Firstly, ordinal information needs to be preserved under A, that is, for x ; x with i j r 6= r , d (x ; x ) is small when jr r j is small. Secondly, local geometry structure of the data needs to i j A i j i j be maintained under A. That is, for x ; x with r = r , d (x ; x ) d (x ; x ), where I is the identity i j i j A i j I i j d matrix of size d. See also Li et al. (2015). 2.2. Linear Distance Metric Learning for Ranking As mentioned in the introduction, most DML approaches did not assume the labels are ordered. Li et al. (2015) rstly proposed a method named Linear Distance Metric Learning for Ranking (LDMLR), which dealt with ordinal labels. Below we brie
y review the main idea of LDMLR. To derive LDMLR, for each x , we rst specify K nearest data points (under Euclidean distance) with the same label as its target neighbors. The LDMLR method is to learn a metric A by solving the following nonlinear convex semide nite programming problem: min d h(A) A2S (1) s.t. A 0; where X X 2 2 2 2 h(A) = ! d (x ; x ) + (d (x ; x ) d (x ; x )) : ij i j i j i j A A I i;j =1 ij Here > 0 is a tradeo parameter. 2 f0; 1g indicates whether x is one of x 's target neighbors, i.e., ij j i 1; if x is the target neighbor of x ; j i = (2) ij 0; otherwise: 4 Ordinal Distance Metric Learning with MDS And ! is a weighting factor de ned as ij (jr r j + 1) if r 6= r ; i j i j ! = where p > 0. (3) ij 0 otherwise, The rst term of h(A) can be viewed as a penalty term of the distance between two data points if they have dierent labels. The weighting factor ! is used to adjust the importance of such distances. As we ij can see from the de nition of ! , the larger jr r j is, the bigger ! is. If x and x have the same ij i j ij i j label, we don't want to maximize their distances, so ! = 0 in this case. The second term of h(A) is ij trying to maintain the local structure between the images with the same label. Model (1) is a convex model, and can be solved by state-of-art quadratic semide nite programming packages, such as QSDP by Toh (2007). In Li et al. (2015), the projected gradient method is applied to solve (1), i.e., the following update is used A = d (A rh(A )); k+1 k k where d () denotes the projection onto S . S + In LDMLR, the ordinal relation of the images is maintained by introducing a weighting factor, which is calculated based on the ordinal labels. Furthermore, the local data structure can be kept by the second term in h(A). 2.3. Classical Multidimensional Scaling (cMDS) The aim of cMDS is to embed data in a lower dimensional space while preserving the distances between data. Given the coordinates of a set of points, namely fy ; : : : ; y g with y 2 IR , it is straightforward to 1 n i compute the pairwise Euclidean distances: d = ky y k, i; j = 1; : : : ; n. The matrix D = (d ) is known ij i j ij as the (squared) Euclidean Distance Matrix (EDM) of those points. However, the inverse problem is more interesting and important. Suppose D is given. The method of cMDS generates a set of coordinates that preserve the pairwise distances in D. We give a short description of cMDS below. Let 1 1 J := I 11 and B(D) := JDJ; (4) n 2 where I is the n n identity matrix and 1 is the (column) vector of all ones in IR . In literature, J is known as the centralization matrix and B is the double-centralized matrix of D (also the Gram matrix of D because B is positive semide nite). Suppose B admits the spectral decomposition: 2 32 3 6 76 7 6 76 7 6 76 7 . . B(D) = [p ; : : : ; p ] ; (5) 1 s . 6 76 7 4 54 5 5 Ordinal Distance Metric Learning with MDS where ; : : : ; are positive eigenvalues of B (the rest are zero) and p ; : : : ; p are the corresponding 1 s 1 s orthonormal eigenvectors. Then the following coordinates y ; : : : ; y obtained by 1 n 2 32 3 6 76 7 6 76 7 6 . 76 . 7 [y ; y ; : : : ; y ] := (6) 1 2 n . 6 76 7 4 54 5 preserve the known distances in the sense that ky y k = d for all i; j = 1; : : : ; n. This is the well i j ij known cMDS. We refer to Gower (1985), Schoenberg (1935), Torgerson (1952), Young and Householder (1938), Borg and Groenen (2005), and Dattorro (2008) for detailed description and generalizations of cMDS. 3. A New Approach for Ranking In this section, we will motivate our new approach and discuss some related properties of EDM. 3.1. A New Approach The idea of our approach is as follows. First, by decomposing A = L L, the problem reduces to looking d s for a linear map L from the original space IR to a new space, denoted as IR . The points Lx in the new space are referred as the embedding points corresponding to x , i = 1; ; n. Then we apply cMDS to get the estimations of those embedding points, denoted as fy ; ; y g. Finally, L is learned based on 1 n two sets of points fx ; ; x g and fy ; ; y g. We detail our approach in the following three steps. 1 n 1 n Step 1. Decompose A = L L d T sd A natural way of learning a distance metric A 2 S is to decompose A as A = L L, where L 2 IR is a rectangular matrix and s is a prescribed dimension, where s d. The decomposition has been used in several references, see for example Sugiyama (2007), Weinberger and Saul (2009), Xiang et al. (2008). Learning L instead of A brings us some advantages. Firstly, it allows us to get around of the positive semide nite constraint A 0, resulting in an unconstrained model. Secondly, low rank structure of A can be speci ed by choosing s d. Note that given a query image, it is necessary to compute distances between the query image and every training image. The time complexity of computing distances should be kept as low as possible. With a low rank A, such complexity can be reduced from O(d ) to O(ds). sd Finally, it provides us insights on the Mahalanobis distance metric A. L 2 IR is basically a linear map d s from IR to IR . The distance between x and x under metric A can be reformulated as i j T T (7) d (x ; x ) = (x x ) L L(x x ) = kL(x x )k : =d (x ; x ): A i j i j i j i j i j In other words, the distance between x and x under metric A is essentially the Euclidean distance of i j new points Lx and Lx in the space IR . i j 6 Ordinal Distance Metric Learning with MDS Recall that we denote the space where x lies in (i.e., IR ) as the original space, the space where Lx lies in (i.e., IR ) as the new space, and Lx is referred as the embedding point of x . Now image i i i ranking reduces to looking for a linear map, which maps x to a proper new space such that the following properties hold. (i) The distances between embedding points can well re
ect the corresponding ordinal labels. In other words, the Euclidean distances between embedding points with dierent labels should follow the order of their label dierences, i.e., kLx Lx k > kLx Lx k; if jr r j > jr r j; r 6= r ; r 6= r ; r 6= r : i j i k i j i k i j i k j k (ii) Local data structure must be maintained. That is, the Euclidean distances between a point and its target neighbors with the same label in the original space need to be maintained as much as possible in the new space. That is, d (x ; x ) d (x ; x ); if r = r and x is the target neighbor of x : A i j I i j i j j i In the following, we apply cMDS to get the estimations fy ; ; y g of the embedding points in a new 1 n space, which enjoy property (i), then learn a linear mapping L based on two sets of points fx ; ; x g 1 n and fy ; ; y g. 1 n Step 2. Apply cMDS In order to apply cMDS to get the estimations of embedding points, an EDM is needed. Note that the points with the same label can be basically viewed as one point, and further inspired by the weighting factor de ned in (3), we can construct an EDM based on the ordinal labels. A trivial choice is to de ne D by D = (jr r j ); i; j = 1; ; n: However, from numerical point of view, we can further add a ij i j parameter to jr r j to allow more
exibility. This leads to the following form of D. De ne D 2 S i j as < 2 (jr r j + ) ; if r 6= r ; i j i j D = (8) ij 0; otherwise. Under Assumption 1, let ja a j; if i 6= j; i; j = 1; ; m; i j = (9) ij 0; otherwise. The following theorem shows that if is properly chosen, then D is an EDM. 1 1 2 1 2 Theorem 1. Let := ( ) and is the smallest eigenvalue of J J . If 4 , then D ij 0 0 de ned by (8) is an EDM. The proof is postponed in Section 3.2. If D is not an EDM, we refer to Qi (2013), Li and Qi (2017) for more details. By applying cMDS to D, we can get the estimations fy ; ; y g of embedding points 1 n in the new space. 7 Ordinal Distance Metric Learning with MDS Remark 1. For x and x with r = r , their estimations of embedding points y ; y basically collapse to i j i j i j one point, since ky y k = D = 0. For x and x with r 6= r , the Euclidean distance between their i j ij i j i j estimations y ; y of embedding points is ky y k = D = jr r j + . Consequently, there is i j i j i j ij ky y k > ky y k; if jr r j > jr r j; r 6= r ; r 6= r ; r 6= r : i j i k i j i k i j i k j k In other words, fy ; ; y g enjoy property (i). 1 n Step 3. Matching Two Sets of Points The nal step is to learn L based on two sets of points fx ; ; x g and fy ; ; y g to make L 1 n 1 n have properties (i) and (ii). To deal with property (i), we need to match fx ; ; x g and fy ; ; y g 1 n 1 n as much as possible since y ; ; y already satisfy property (i). A natural statistical way is to use a 1 n least squares tting term. To tackle property (ii), we adopt the second term of h(A) in (1), since it does a good job based on the numerical performance. Now we reach the following model X X 2 L 2 2 2 min sd f (L; c) := kLx cy k + ((d (x ; x )) d (x ; x )) ; (10) i i i j i j L2IR ;c2IR I i=1 =1 ij where is de ned as in (2). To allow more
exibility, we also use a scaling variable c 2 IR in the tting ij term. Although (10) is a nonconvex model in L, the proposed approach enjoys the following good properties. By dealing with L instead, the resulting model (10) is an unconstrained problem, which allows various numerical algorithms to solve. Further, we can emphasize the low rank structure of A by restricting L to be a short fat matrix, i.e., s d. By applying cMDS, we take into account of the ordinal information of labels, which leads us a good estimation of embedding points. By matching fx ; ; x g with fy ; ; y g with the least squares tting term, hopefully, the 1 n 1 n resulting embedding points will also keep property (i). Our numerical results actually verify this observation. 3.2. Proof of Theorem 1 De ne 2 S as + ; if i 6= j; i; j = 1; ; m; ij = ( ); where = (11) ij ij 0; otherwise. Then we have the following lemma. n m Lemma 1. Let D 2 S and 2 S be de ned as in (8) and (11). Let Assumption 2 hold. D is an EDM if and only if is an EDM. 8 Ordinal Distance Metric Learning with MDS Proof. Suppose D is an EDM generated by points fy ; ; y g. By the de nition of D, there is 1 n ky y k = 0; if r = r ; i j i j which implies that y = = y , t = 1; ; m. Let y = = y := z , t = 1; ; m. i +1 i i +1 i t t 1 t t 1 t Obviously, is an EDM generated by points fz ; ; z g. Conversely, suppose that is an EDM 1 m generated by points fz ; ; z g. Let y = = y = z , t = 1; ; m. One can show that D is 1 m i +1 i t t 1 t an EDM generated by fy ; ; y g. The proof is nished. 1 n Next, we show that is an EDM if is properly chosen. 1 1 2 1 2 Lemma 2. Let := ( ) and is the smallest eigenvalue of J J . If 4 , then de ned ij 0 0 by (11) is an EDM. Proof. It is well known (Schoenberg (1935); Young and Householder (1938)) that is an EDM if and only if diag() = 0 and B() = J J 0: Also note that J1 = 0; B()J = B(); JB() = B(): (12) To prove that is an EDM, we only need to show the positive semide niteness of B(). Let = ( ). ij Note that B() = B() + 2 B( ) + J: It suces to show if 4 , then for any x 2 IR , there is T T 2 2 T x B()x + 2 x B( )x + x Jx 0: m T 2 Obviously, is an EDM. Consequently, for any x 2 IR , x B()x 0. Further, B( ) I 0 implies that T 2 m x (B( ) I )x 0; 8 x 2 IR : By substituting x by Jx and noting equalities in (12), we have T T m x B( )x x Jx 0; 8 x 2 IR : It gives that T T 2 2 T T x B()x + 2 x B( )x + x Jx (2 + )x Jx 0; 2 2 where the last inequality follows by the assumption 4 as well as the positive semide niteness of J . The proof is nished. The proof of Lemma 2 is inspired by Theorem 1 in Cailliez (1983). The dierence is that B() is an EDM and is allowed to be negative in Lemma 2. 9 Ordinal Distance Metric Learning with MDS Proof of Theorem 1. The result of Theorem 1 can be directly derived from Lemma 1 and Lemma Remark 2. Note that in cMDS, fy ; ; y g obtained from D is not unique due to the eigenvalue 1 n decomposition of B(D). However, y ; ; y are centralized, i.e., y = 0. The computational cost 1 n i i=1 3 3 for generating fy ; ; y g is O(n ). If n is large, the computational cost can be further reduced to O(m ) 1 n by the following process, which is based on Lemma 1 and Lemma 2. It is easy to verify that y ; ; y 1 n generated by Algorithm 1 satisfy y = 0, and the corresponding EDM is D de ned in (8). i=1 Algorithm 1 Alternative way to generate fy ; ; y g 1 n Step 1. Compute de ned by (11). Step 2. Apply cMDS to to get z ; ; z 2 IR . 1 m s s s Step 3. Let y ~ = = y ~ = (z ; 0) 2 IR , t = 1; ; m, where 0 2 IR . i +1 i t t 1 t Step 4. Denote y = y ~ . Let y = y ~ y, i = 1; ; n. i i i i=1 4. Numerical Algorithm Problem (10) is an unconstrained nonlinear problem, and can be solved by various algorithms. Here, we choose the traditional steepest descent method with the Armijo line search. The convergence result of the steepest descent method can be found in classical optimization books, e.g. Nocedal and Wright (2006, P42). Algorithm 2 summarizes the details of our approach. Implementations Let X := (x x )(x x ) , the gradient rf (L; c) takes the following form ij i j i j X X T T T 2 r f (L; c) = (Lx x cy x ) + 4 L(X L LX X ) L i i ij ij i i ij i=1 =1 ij X X T T 2 2 T = (Lx x cy x ) + 4 (kL(x x )k kx x k )L(x x )(x x ) ; i i i j i j i j i j i i i=1 =1 ij n n X X T T r f (L; c) = c y y y Lx : c i i i i i=1 i=1 Computational Complexity We compare the computational complexity (mainly in multiplication and division) of Algorithm 2 with that of LDMLR, and the details are summarized in Table 1, where steps with underline indicate the iterative steps. Note that if n is large, S2 can be replaced by Algorithm 3 3 1 and the computational complexity for S2 can be further reduced from O(n ) to O(m ). For the iterative process S4-S6, the complexity for each iteration is O(rnKd ), where r is the maximum number 10 Ordinal Distance Metric Learning with MDS Algorithm 2 cMDS-DML for image ranking S0 Given a training set: x ; ; x 2 IR , and their corresponding labels r ; ; r . 1 n 1 n 0 T sd Initialize: L = (e ; : : : ; e ) 2 IR , c = 1. 1 s 0 Parameters: , > 0, 2 (0; 1), 2 (0; 1),
> 0, k = 0. S1 Compute the Euclidean distance matrix D according to (8). S2 Apply cMDS to get estimations of embedding points y ; ; y 2 IR . 1 n S3 Search K target neighbors in the original space IR for each training sample x , : : : , x . 1 n k k k k S4 Compute rf (L ; c ). If krf (L ; c )k , stop; otherwise, let d = rf (L ; c ), go to S5. k k k S5 Apply the Armijo line search to determine a steplength =
, where m is the smallest k k positive integer such that the following inequality holds k m k k m k T k f ((L ; c ) +
d ) f (L ; c )
rf (L ; c ) d : k k k k+1 k k S6 Let (L ; c ) = (L ; c ) + d , k = k + 1, go to S4. k+1 k k for the line search loop. In contrast, for LDMLR, the computational complexity in each iteration is 2 2 3 O(max(n d ; nKd )), which is higher than that of S3-S6 in Algorithm 2, no matter n > d or n < d. Table 1. Computational Complexity for Algorithm 2 and LDMLR. Algorithm 2 LDMLR Step Complexity Complexity Step S0 O(sd) O(d ) Initialize S1 O(n ) K target 3 2 2 S2 O(n ) O(dn + Kn ) neighbor 2 2 S3 O(dn + Kn ) search 2 2 2 3 S4 O(nsd + nk(d + sd + s )) O(n d + nKd ) rh(A) S5 O(r(nKsd + nKd )) n () O(d ) S6 O(ds) 11 Ordinal Distance Metric Learning with MDS 5. Numerical Results In this section, we present some numerical results to verify the eciency of the proposed model. To evaluate the performance of the model, we employ the following popular procedure to assess the image ranking model. For a given dataset, we divide it into the training set and the testing set. We rst learn a distance metric based on the training set, then apply it to rank each image in the testing set. Denote by fm g the images in the testing set, here N is the size of testing set. The estimated label p ^ is i i i=1 obtained based on the distance in the new space. We employ the popular k-nearest neighbor regression to obtain p ^ , which is used in Li et al. (2015), Weinberger and Saul (2009). The mean absolute error MAE = 1=N jp ^ p j is used as a measure to evaluate the performance. Here p ; ; p are the i i 1 N i=1 true labels of test data m ; ; m . 1 N We test the proposed method on the UMIST dataset (Graham and Allinson (1998)) and FG-NET dataset (Lanitis (2008)). We also compare our method with the method LDMLR in Li et al. (2015). For each test problem, we repeat each experiment 50 times and report the average results. The algorithm is implemented in Matlab R2016a and is run on a computer with Intel Core 2 Duo CPU E7500 2.93GHz, RAM 2GB. 5.1. Experiments on the UMIST image dataset The UMIST face dataset is a multiview dataset which consists of 575 images of 20 people, each covers a wide range of poses from pro le to frontal views. Fig. 1 shows some examples from the UMIST dataset. Fig. 1. Some examples from the UMIST face dataset. Based on the query man wearing glasses, we can label the dataset in the following way: man wearing glasses is regarded as completely relevant, which is labeled as 2 in our experiment; man not wearing glasses or woman wearing glasses is regarded as partially relevant, which is labeled as 1; woman not wearing glasses is regarded as irrelevant, which is labeled as 0. Thus, there are 225, 239 and 111 images in the three categories, respectively. The dimension of original data is 10304. 12 Ordinal Distance Metric Learning with MDS In this experiment, for LDMLR, we set iteration number T = 30 and the tradeo parameter max 3 10 9 = 10 according to Li et al. (2015). For our method, we set parameters = 10 ,
= 10 , = 0:5, = 0:05, the maximum number for line search loop is r = 20. To get an EDM D in (8), we set parameter = 1 ( = 0 in this situation). To apply our algorithm, we rst use PCA to reduce dimension as done in Li et al. (2015). When using PCA, we center the data but don't scale the data. The nal dimension is 150, i.e., d = 150. Role of the Embedding Dimension s and Distance Metric To see the role of the Embedding dimension s and distance metric, we do the following test. We randomly select 10 images from each label for training and use the rest for testing. The images in the training set are grouped as follows. The training data x ; ; x are of label 1, x ; ; x are of label 2, and x ; ; x are of label 3. Then 1 10 11 20 21 30 there are n = 30 training data in total. We x the number of target neighbors as K = 5. Table 2. Results of cMDS-DML on the UMIST dataset, with dierent values of dimension s. s 2 3 5 8 10 MAE 0.3539 0.3463 0.3498 0.3684 0.3798 STD 0.0812 0.0671 0.0640 0.0762 0.0830 t(s) 2.27 3.10 4.58 6.60 10.08 To choose a proper embedding dimension, we tried several values for s, i.e., s = 2; 3; 5; 8; 10. The preliminary results are reported in Table 2. Since n = 30 is not so big, we directly apply cMDS to D in S2 of Algorithm 2. The observation is that s = 3 and s = 5 are the best in terms of MSE. Taking visualization into account, we choose s = 3 in our following test. Then we compute the Euclidean distance between the training data x ; x ; i; j = 1; ; 30. Fig. 2 i j shows kx x k, the Euclidean distance between x and the rst data x , i = 1; ; n. It is observed that i 1 i 1 the distance between x and x is less than the distance between x and x . Moreover, the distance 1 12 1 8 between x and x is bigger than the distance between x and x . It implies that the Euclidean 1 16 1 22 distances between the original images can not be used for ranking. With embedding dimension s = 3, we apply our method to learn L. After learning L, the embedding points of the training data in the three dimensional space can be found, i.e., Lx , i = 1; ; 30. Fig. 3 plots the embedding points. As we can see, points highly cluster together with the same label. However, the distances between points with dierent labels can not be clearly seen from Fig. 3. We use the learned L to measure the distances between the training data. Fig. 4 illustrates the distances between x and x under L, i.e., kLx Lx k, i 1 i 1 i = 1; ; n. Comparing Fig. 4 with Fig. 2, we can see that the data is much better layered with the L distance than with the Euclidean distance. Hence the proposed model does preserve the ordinal relationship. 13 Ordinal Distance Metric Learning with MDS ×10 0 5 10 15 20 25 30 Fig. 2. The Euclidean distance between x and x , i.e., kx x k. i 1 i 1 -50 -50 -100 -100 -200 Fig. 3. The embedding data points of the training data points in the three dimensional space, i.e., Lx . Comparison with LDMLR Now we compare with LDMLR in Li et al. (2015). First, we randomly select 10 images from each distinct label as the training data and use the rest for testing. Dierent values of K are chosen to investigate the performance. Table 3 gives the results including MAE, STD (standard deviation), and CPU time in seconds. We can see in all cases, cMDS-DML uses much less time than LDMLR, which is not surprising since our method has lower computational complexity. In terms of MAE, cMDS-DML also outperforms LDMLR. Next, to evaluate the in
uence of dimension on the performance of our method, we increase dimension d while xing the size of the training set n = 30 and the number of target neighbors K = 5. Table 4 lists the ranking results. It can be seen that as d increases, the resulting MAE of both algorithms is ||x -x || i 1 Ordinal Distance Metric Learning with MDS ×10 1.5 0.5 0 5 10 15 20 25 30 Fig. 4. The distance between x and x under L, i.e., kL(x x )k. i 1 i 1 Table 3. Results of cMDS-DML and LDMLR on the UMIST dataset, with dierent values of target neighbors K and xed n = 30. K cMDS-DML LDMLR MAE 0.3488 0.4291 4 STD 0:0684 0:0760 t(s) 2.88 10.26 MAE 0.3463 0.4676 5 STD 0:0671 0:0735 t(s) 3.10 10.39 MAE 0.3521 0.4782 6 STD 0:0689 0:0724 t(s) 3.32 11.92 not sensitive to d. As for computing time, as d increases, LDMLR obviously costs more time while CUP time for our method is fairly stable. It's reasonable since the computational complexity of our method is 2 3 proportional to d while that of LDMLR is d . Finally, we increase the size of the training set n with xed dimension d = 150 and the number of target neighbors K = 5. We randomly select n=3 images from each distinct label for training and report results in Table 5. As n increases, the performance of both methods becomes better, which is reasonable. cMDS-DML achieves higher ranking performance than LDMLR. In particular, cMDS-DML achieves 25:94%, 36:90%, 39:36% improvement in MAE (jMAE(LDMLR)-MAE(cMDS-DML)j/MAE(LDMLR)) over LDMLR, respectively. Moreover, cMDS-DML is also faster than LDMLR. ||L(x -x )|| i 1 Ordinal Distance Metric Learning with MDS Table 4. Results of cMDS-DML and LDMLR on the UMIST dataset, with dierent values of dimension d. d cMDS-DML LDMLR MAE 0.3463 0.4676 150 STD 0:0671 0:0735 t(s) 3.10 10.39 MAE 0.3518 0.4695 200 STD 0:0668 0:0730 t(s) 3.18 17.21 MAE 0.3545 0.4708 250 STD 0:0674 0:0742 t(s) 4.15 23.90 Table 5. Results of cMDS-DML and LDMLR on the UMIST dataset, with dierent sizes of the training set n. n cMDS-DML LDMLR MAE 0.3463 0.4676 30 STD 0:0671 0:0735 t(s) 3.10 10.39 MAE 0.1958 0.3103 60 STD 0:0478 0:0465 t(s) 4.94 39.13 MAE 0.1373 0.2264 90 STD 0:0329 0:0319 t(s) 5.91 79.92 5.2. Experiments on the FG-NET dataset In this experiment, we test our algorithm on the FG-NET dataset which is labeled by age. The FG-NET dataset contains 1002 face images. There are 82 subjects in total with the age ranges from 1 to 69. Fig. 5 shows some examples from the FG-NET dataset. To get better performance of LDMLR, we set the 3 10 iteration number T = 50 and the tradeo parameter = 10 . For our method, = 10 ,
= 10, max = 0:5, = 0:05, the embedding dimension s = 3 and the maximum number for line search loop is r = 20. 16 Ordinal Distance Metric Learning with MDS Fig. 5. Some examples from the FG-NET dataset. -3 ×10 -2 -4 -6 0.01 -0.015 -0.02 0.005 0 -0.005 -0.01 -0.01 0.02 0.015 0.01 Fig. 6. The embedding data points of the training data points in the three dimensional space, i.e., Lx . We pick up subjects with age 1, 5, 9, 15, 19 and relabel them as 1, 2, 3, 4, 5. There are 27, 40, 25, 30, 23 images in the ve categories, respectively. The original dimension of images is 136. As in subsection 5:1, we preprocess the data by PCA to reduce dimension to 80. We randomly select 8 images from each distinct label for training and set K = 5. Fig. 6 plots the embedding data points of the training data points in the three dimensional space, i.e., Lx , i = 1; ; 40. As we can see, points almost cluster together with the same label. Next, we randomly select 10 images from each distinct label for training and use the rest for testing. That is, the size of the training set is n = 50. We also set = 1 in (8). We set dierent values for target neighbors to investigate the performance. Table 6 lists the experimental results. In the three cases, cMDS-DML achieves 48:55%, 47:27%, 46:57% improvement over LDMLR, respectively. Finally, we x the value of target neighbors K = 5. We randomly select n=5 images from each distinct label for training. The size of the training set is chosen as n = 40; 50; 75. See Table 7 for the results, which again verify the eciency of the proposed model. Overall speaking, our numerical results show that cMDS-DML outperforms LDMLR signi cantly both in ranking performance and CPU time. 17 Ordinal Distance Metric Learning with MDS Table 6. Results of cMDS-DML and LDMLR on the FG-NET dataset, with dierent values of target neighbors K . K cMDS-DML LDMLR MAE 0.7295 1.4179 4 STD 0:0694 0:1228 t(s) 2.70 21.59 MAE 0.7109 1.3482 5 STD 0:0657 0:0942 t(s) 3.11 18.37 MAE 0.7095 1.3278 6 STD 0:0682 0:1141 t(s) 3.86 25.43 Table 7. Results of cMDS-DML and LDMLR on the FG-NET dataset, with dierent sizes of the training set n. n cMDS-DML LDMLR MAE 0.7613 1.3634 40 STD 0:0845 0:1144 t(s) 2.34 11.13 MAE 0.7377 1.3482 50 STD 0:0657 0:0942 t(s) 4.56 18.37 MAE 0.7243 1.2551 75 STD 0:0809 0:1023 t(s) 29.97 40.04 6. Conclusions In this paper, we proposed a so-called cMDS-DML approach for image ranking, which uni es the idea of classical multidimensional scaling and distance metric learning. The algorithm enjoys low computational complexity, compared with LDMLR in Li et al. (2015). Numerical results veri ed the eciency of the new approach and the improvement over LDMLR. 18 Ordinal Distance Metric Learning with MDS References Anjos, M F and J B Lasserre (2012). Handbook on Semide nite, Conic and Polynomial Optimization. Springer US. Bar-hillel, A, T Hertz, N Shental and D Weinshall (2003). Learning distance functions using equivalence relations. In Proceedings of the Twentieth International Conference on Machine Learning. Borg, I and P J F Groenen (2005). Modern Multidimensional Scaling. Springer. Dai, M, Z Lu, D Shen, H Wang, B Chen, X Lin, S Zhang, L Zhang and H Liu (2016). Design of (4, 8) binary code with MDS and zigzag-decodable property. Wireless Personal Communications, 89(1):1{13. Dattorro, J (2008). Convex Optimization and Euclidean Distance Geometry. Meboo Publishing. Ding, C and H D Qi (2016). Convex optimization learning of faithful Euclidean distance representations in nonlinear dimensionality reduction. Mathematical Programming, 164(1):341-381. Ding, C and H D Qi (2017). Convex Euclidean distance embedding for collaborative position localization with NLOS mitigation. Computational Optimization and Applications, 66(1):187{218. Dokmanic, I, R Parhizkar, J Ranieri and M Vetterli (2015). Euclidean distance matrices: Essential theory, algorithms, and applications. IEEE Signal Processing Magazine, 32(6):12{30. Cailliez, F (1983). The analytical solution of the additive constant problem. Psychometrika, 48(2):305{ Goldberger, J, S T Roweis, G Hinton and R Salakhutdinov (2004). Neighbourhood components analysis. In Proceeding of the 17th International Conference on Neural Information Processing Systems. Gower, J C (1985). Properties of Euclidean and non-Euclidean distance matrices. Linear Algebra and its Applications, 67:81{97. Graham, D B and N M Allinson (1998). Characterising virtual eigensignatures for general purpose face recognition. Face recognition: From theory to applications. NATO ASI Series F, Computer and Systems Sciences, 163:446{456. Gutierrez, P A, M Perez-Ortiz, J Sanchez-Monedero, F Fernandez-Navarro and C Hervas-Martinez (2016). Ordinal regression methods: survey and experimental study. IEEE Transactions on Knowledge and Data Engineering, 28(1):127{146. Lanitis, A (2008). Comparative evaluation of automatic age progression methodologies. EURASIP Journal on Advances in Signal Processing, 2008:1{10. 19 Ordinal Distance Metric Learning with MDS Li, C, Q Liu, J Liu and H Lu (2015). Ordinal distance metric learning for image ranking. IEEE Transactions on Neural Networks and Learning Systems, 26(7):1551{1559. Li, Q and H D Qi (2017). An inexact smoothing newton method for Euclidean distance matrix optimiza- tion under ordinal constraints. Journal of Computational Mathematics, 35(4):467{483. Liberti, L, C Lavor, N Maculan and A Mucherino (2014). Euclidean distance geometry and applications. SIAM Review, 56(1):3{69. Nocedal, J and S J Wright (2006). Numerical Optimization. Springer New York. Qi, H D (2013). A semismooth newton method for the nearest Euclidean distance matrix problem. SIAM Journal on Matrix Analysis and Applications, 34(1):67{93. Qi, H D, N Xiu and X Yuan (2013). A lagrangian dual approach to the single-source localization problem. IEEE Transactions on Signal Processing, 61(15):3815{3826. Qi, H D and X Yuan (2014). Computing the nearest Euclidean distance matrix with low embedding dimensions. Mathematical Programming, 147:351{389. Qiao, X (2015). Noncrossing ordinal classi cation. Statistics. Schoenberg, I J (1935). Remarks to maurice frechet's article sur la de nition axiomatique d'une classe d'espace distances vectoriellement applicable sur l'espace de hilbert. Annals of Mathematics, 36(3):724{ Shalev-Shwartz, S, Y Singer and A Y Ng (2004). Online and batch learning of pseudo-metrics. In Proceedings of the Twenty- rst international conference on Machine learning. Shen, C, J Kim and L Wang (2010). Scalable large-margin mahalanobis distance metric learning. IEEE Transactions on Neural Networks, 21(9):1524{1530. Sugiyama, M (2007). Dimensionality reduction of multimodal labeled data by local sher discriminant analysis. Journal of Machine Learning Research, 8(1):1027{1061. Toh, K C (2007). An inexact primal-dual path-following algorithm for convex quadratic SDP. Mathe- matical Programming, 112:221{254. Torgerson, W S (1952). Multidimensional scaling: I. theory and method. Psychometrika, 17(4):401{419. Wang, H, Y Shi, L Niu and Y Tian (2017). Nonparallel support vector ordinal regression. IEEE Trans- actions on Cybernetics, 47(10):3306{3317. 20 Ordinal Distance Metric Learning with MDS Weinberger, K Q and L K Saul (2009). Distance metric learning for large margin nearest neighbor classi cation. Journal of Machine Learning Research, 10(1):207{244. Xiang, S, F Nie and C Zhang (2008). Learning a mahalanobis distance metric for data clustering and classi cation. Pattern Recognition, 41(12):3600{3612. Xing, E P, A Y Ng, M I Jordan and S Russell (2003). Distance metric learning, with application to clustering with side-information. In Proceedings of the Conference on Neural Information Processing Systems. Yang, L, R Jin and R Sukthankar (2007). Bayesian active distance metric learning. In Proceedings of the Twenty-Third Conference on Uncertainty in Arti cial Intelligence. Young, G and A S Householder (1938). Discussion of a set of points in terms of their mutual distances. Psychometrika, 3(1):19{22.
http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.pngStatisticsarXiv (Cornell University)http://www.deepdyve.com/lp/arxiv-cornell-university/ordinal-distance-metric-learning-with-mds-for-image-ranking-Taa47mFLhl
Ordinal Distance Metric Learning with MDS for Image Ranking