138
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Interval-based KKT framework for support vector machines and beyond

, & ORCID Icon
Article: 2334017 | Received 11 Nov 2023, Accepted 19 Mar 2024, Published online: 27 Mar 2024

Abstract

Our article proves inequalities for interval optimization and shows that feasible and descent directions do not intersect in constrained cases. Mainly, we establish some new interval inequalities for interval-valued functions by defining LC-partial order. We use LC-partial order to study Karush–Kuhn–Tucker (KKT) conditions and expands Gordan's theorems for interval linear inequality systems. By applying Gordan's theorem, we can determine the best outcomes for interval optimization problems (IOPs) that have constraints, such as Fritz John and KKT conditions. The optimality conditions are observed with inclusion relations rather than equality. We can use the KKT condition for binary classification with interval data and support vector machines(SVMs). We present some examples to illustrate our results.

2020 MSC:

1. Introduction

Optimization theory is applied in a variety of industries, including engineering. Data collection and quantification are crucial in modelling an optimization problem. An optimization model's data is typically derived through measurements or observations. Frequently, acquired data sets are published with an error percentage or imprecision. Fuzzy numbers or intervals are appropriate representations for such data. As a result, the various parameters/coefficients used for the formulation of the modelled constraint and objective functions using obtained data, are converted to intervals or fuzzy numbers.

Optimization is the process of determining the best available values across a collection of inputs to maximize or minimize an objective function. Whether utilizing supervised or unsupervised learning, all deep learning models require some variance in optimization.

IOPs have been the subject of various investigations. Based on the worst- or best-case situations, many current strategies optimize the objective function's upper or lower function or their mean. Consequently, conventional methods of optimization can address the resulting problem of IOP, transforming it into single-valued optimization problems based on worst- or best-case situations and numerous current strategies that optimize the objective function's upper or lower function or their mean. IOPs have also been subjected to the KKT optimality conditions. Wu, did a lot of research on it [Citation1–3].

Using a generalized derivative, Chalco-Cano [Citation4] proposed KKT optimality conditions for IOPs. The generalized derivatives and the partial ordering of intervals were utilized by Singh et al. [Citation5,Citation6] to define KKT conditions for the IOPs combining upper and lower functions. It is important to note that current research on IOP optimality conditions used algebraic manipulations rather than a geometrical analysis of an optimal point in order to generalize existing optimality conditions.

Mathematical and theoretical modelling are crucial components of optimization problems [Citation7]. Practically, determining the coefficients of an objective function as a real number is generally difficult. Since a wide range of real-world problems can involve data imprecision due to measurement errors or other unanticipated circumstances. Robust optimization [Citation8] and interval-valued optimization are two methods of deterministic optimization that deal with unknown data.

Slowinski and Teghem compared two different optimization problems for multi-objective programming problems [Citation9]. The KKT optimality criteria have been studied for over a century and play a significant role in optimization theory. Wu [Citation1], Dar, and Singh [Citation6] developed KKT optimality conditions for optimization problems with interval-valued objective functions and constraints. Lodwick and Chalco-Cano [Citation4] used the generalized derivative to examine the interval-valued optimization problem's KKT optimality criteria.

SVMs are a type of machine learning technique that may be used to classify and predict data. Training and testing data are required for the development of a SVM [Citation10]. The user sorts the training data into the appropriate categories. An optimization problem is used to create a model using this data. This model generates a hyperplane that divides the testing data set into the right categories in a linear manner.

SVMs are cutting-edge machine-learning approaches with a foundation in structural risk minimization [Citation11,Citation12]. The structural risk minimization theory states that a function from a function class has a low expectation of risk on all data from an underlying distribution if that a function has a low empirical risk on a particular data set sampled from that distribution and the function class has a low level of complexity, as determined by Vapnik [Citation13]. By utilizing the fact that a bigger margin correlates with a smaller fat-shattering dimension for the particular function classes, the well-known large margin technique in SVMs [Citation14] fundamentally limitize the complexity of the function class.

The derivative is the most commonly used term in classical optimization theory. In constrained optimization problems, it is useful for studying optimality criteria and duality theorems. H-derivative is a well-known notion. But the H-derivative has limitations. Luciano and Stefanini [Citation15] proposed the gH-derivative in 2009 to address the shortcomings of the H-derivative. These IVF versions have been widely used by optimal problem researchers. For example, Wu [Citation3] used H-derivative to examine the KKT conditions for nonlinear IOPs.

It is impossible to overestimate the importance of derivatives in IOP problems that are nonlinear. Wu [Citation1–3] explored interval-valued nonlinear programming problems and showed how the H-derivative may be used in interval-valued KKT optimization problems. The gH-differentiability was also extended to learn interval-valued KKT optimality conditions, according to Chalco-Cano [Citation4].

Ghosh et al. used extended KKT conditions for IOPs in their study [Citation16] to apply them to interval-valued SVMs using LU-partial order. In [Citation17,Citation18], researchers recently discussed a generalized interval-valued portfolio optimization problem. It is well known that a set of all compact intervals I is a partially ordered set. In [Citation19], Younus et al. defined several partial orders of I and obtained relationships between them. They have shown that LC-partial order is not equivalent to LU-partial order. However, LC-partial order implies LU-partial order. For some interesting results, see, also, Dastgeer et al. [Citation20]. Building on the previous studies, we extended the results of [Citation16] for LC-partial order and identified some variations. Motivated by gH-differentiability of interval-valued functions and latest LC -partial order, we discuss all optimality conditions and SVM problem under gH-differentiability and LC-partial order. Which generalized many results in the literature.

We would now like to outline the contributions of this work.

Interval optimization inequalities: The article introduces and proves inequalities specifically tailored for IOPs. This suggests a departure from traditional optimization techniques and an exploration of methods suited to handling intervals.

Non-intersecting directions: In constrained scenarios, the article demonstrates that feasible and descent directions do not intersect. This observation likely has implications for optimization algorithms and could lead to more efficient optimization strategies.

LC-partial order: The authors introduce the concept of LC-partial order, a new mathematical framework that appears to be a key component of their approach. This concept may have applications beyond the specific problem discussed in the article.

Extension of Gordan's theorems: The article extends Gordan's theorems to interval linear inequality systems. This extension could have broader implications in mathematical theory and its application to optimization.

Inclusion-based optimality conditions: Instead of traditional equality-based optimality conditions, the article suggests the use of inclusion relations for constrained IOPs. This shift in perspective may lead to new insights and methods for solving such problems.

Application to binary classification: The article highlights the application of KKT conditions for binary classification with interval data and support vector machines. This application demonstrates the practical relevance of the theoretical developments presented in the article.

Illustrative examples: The authors provide examples to illustrate their results, which can help readers to understand the practical implications and potential applications of their findings.

Section 2 provides the basic concepts, definitions, and notations used in the article. The KKT and Fritz John's criteria for IOPs are derived in Section 3 along with extended Gordan's theorems. For both constrained and unconstrained IOPs, we develop the optimality conditions. In Section 4, we apply the optimality conditions given in Section 3 to solve the SVM classification problem on the interval-valued data set. We provide an example of the generated classifier. We give a graphical representation of the classification problem. We also present the conclusion and future scope in Section 5.

2. Preliminaries

Notations

We used the following notations throughout this article:

  • All the capital and bold letters denote the interval-valued functions or intervals.

  • I represents the set of all bounded and closed intervals in R and In denotes the interval-valued vectors.

  • Sets are represented by ordinary capital letters.

  • CgHD is the gH-difference between two intervals C and D.

  • CD signifies interval addition.

  • CD is the subtraction of two intervals C and D.

  • kC represents the scalar multiplication of an interval C.

  • 0vn denotes the interval vector with n components.

  • |K| represents the cardinality of the set K.

Definition 2.1

The addition and subtraction of intervals C=[c_,c¯] and D=[d_,d¯] are defined by CD=[c_+d_,c¯+d¯],CD=[c_d_,c¯d¯].Similarly, for scalar multiplication kC={[kc_,kc¯]if k0[kc¯,kc_]if k<0,where k is the real constant. It can be seen that the definition of interval-difference has the following two limitations

(i) CC{0}, and

(ii) for A=CD, the relation C=DA does not necessarily hold.

Definition 2.2

[21]

Let C and D be two intervals in I. The gH-difference between two intervals is defined by A=CgHD=[min{c_d_,c¯d¯},max{c_d_,c¯d¯}].

Definition 2.3

Let ARn and x0 be an interior point of A such that x0=(x10,x20,x30,,xn0) and there exist hRn such that x0+hA. Let F:AI be a function such that we define Ψj(xj)=F(x10,x20,x30,,xj10,xj0,xj+10,,xn0)if limh0Ψj(xj0+hj)gHΨj(xj0)hjexists. Then F is said to have the jth gH-partial derivative at x0 and It is represented by DjF(x0), j=1,2,,n. The gH-partial derivatives of F at x0 can be written as DjF(x0)=[min{F_xj(x0),F¯xj(x0)},max{F_xj(x0),F¯xj(x0)}],j=1,2,,n.

Definition 2.4

Consider an interval-valued function F. The gH-gradient of F at any point x0A is a vector defined by F(x0)=(D1F(x0),D2F(x0),D3F(x0),,DnF(x0))T.

Definition 2.5

Consider a function F:AI. F is called gH-differentiable at x0A if there exist two functions (interval-valued) J(x0;k) and Gx0(k):RnI such that F(x0+k)gHF(x0)=Gx0(k)||k||J(x0;k)for ||k||<δ for some δ>0, where lim||k||0J(x0;k)=0 and Gx0(k) is a function such that Gx0(ax1ax2)=aGx0(x1)aGx0(x2), x1,x2AandaR.

Definition 2.6

[Citation19]

For two intervals A=[a_,a¯] and B=[b_,b¯], we define LC-partial order as: ALCB, if a_b_andC(A)C(B),where C(A)=a_+a¯2,andC(B)=b_+b¯2.

Definition 2.7

A vector y which gives the minimum value of the objective function bTy in an optimization problem over the set of vectors satisfying the constraints By=d, y0, is called an optimal solution.

Lemma 2.8

For any C and D in I such that CLCD, then CgHDLC0.

Proof.

Let C and D be two intervals in I such that C=[c_,c¯]andD=[d_,d¯].Suppose that CLCD, then [c_,c¯]LC[d_,d¯].It implies that (1) c_d_(1) and (2) c_+c¯2d_+d¯2.(2) As we know that CgHD=[min{c_d_,c¯d¯},max{c_d_,c¯d¯}].Case 1: If c¯d¯c_d_, then CgHD=[c¯d¯,c_d_]and from inequality (Equation1), we have c¯d¯c_d_0.It implies that (3) c¯d¯0.(3) From (Equation2) c_+c¯2d_+d¯20,it follows (4) c¯d¯+c_d_20.(4) From (Equation3) and (Equation4) CgHDLC0.

Case 2: If c_d_c¯d¯, then CgHD=[c_d_,c¯d¯].From (Equation1) (5) c_d_0,(5) from (Equation5) and (Equation4) CgHDLC0.This completes the proof.

Remark 2.9

Let C=[c_,c¯] and D=[d_,d¯] such that Len(C):=c¯c_Len(D):=d¯d_.We know from [22]: CgHD={[c_d_,c¯d¯]if Len(C)Len(D),[c¯d¯,c_d_]if Len(C)<Len(D).If Len(C)Len(D), then CgHDLC0CLCDand if Len(C)<Len(D), CgHDLC0CLCD.

3. KKT conditions under LC-partial order

Definition 3.1

Let A be a convex subset of Rn. We say that an interval-valued function F:AI is LC-convex on A, if for any x1 and x2 in A F(λx1+(1λ)x2)LCλF(x1)(1λ)F(x2), λ[0,1].

Theorem 3.2

Let F be gH-differentiable at x0. Then Gx0(k) exists for every k in Rn and Gx0(k)=kTF(x0).

Proof.

See [23].

Theorem 3.3

Let A be a non-empty open convex subset of Rn and F:AI be gH-differentiable at any xA. Then F is LC-convex on A if and only if (x2x1)TF(x1)LCF(x2)(1)F(x1), x1,x2A.

Proof.

Let F be LC-convex on A, and any x1,x2A. Then, for k=x2x1 and 0<α0<1, F(x1+α0k)=F(x1+α0(x2x1))=F((1α0)x1+α0x2)LC(1α0)F(x1)α0F(x1+k).By Lemma 2.8, we have F(x1+α0k)gHF(x1)LC(1α0)F(x1)α0F(x1+k)gHF(x1)LC(1α01)F(x1)α0F(x1+k)LCα0F(x1)α0F(x1+k).Hence, the above inequality can be written as (6) 1α0(F(x1+α0k)gHF(x1))LCF(x2)(1)F(x1).(6) As α00+, then by Definition 2.5 and Theorem 3.2 F(x1+k)gHF(x1)=Gx1(k)||k||J(x1;k),where Gx1(k)=kTF(x1)and lim||k||0J(x1;k)=0.Therefore F(x1+k)gHF(x1)=kTF(x1),Let k=x2x1 in the inequality (Equation6), we get (x2x1)TF(x1)LCF(x2)(1)F(x1),which is the desired result.

Now for the converse part, let (x2x1)TF(x1)LCF(x2)(1)F(x1)be true for any x1 and x2 in A.

Then, for any 0β1, we denote xβ=βx1+(1β)x2.Hence, the following inequalities hold true (7) (1β)[(x1x2)TF(x1)]LCF(x1)(1)F(xβ)(7) and (8) β[(x2x1)TF(x1)]LCF(x2)(1)F(xβ).(8) Multiplying inequality (Equation7) by β and inequality (Equation8) by (1β), we get (9) β(1β)[kTF(x1)]LCβF(x1)(β)F(xβ)(9) and (10) (1β)β[kTF(x1)]LC(1β)F(x2)((1β))F(xβ).(10) By adding inequalities (Equation9) and (Equation10), we obtain 0LC(βF(x1)(1β)F(x2))(1)F(xβ).By rearranging the above inequality, we obtain F(xβ)LCβF(x1)(1β)F(x2).Substituting the value of xβ in the above inequality, we have F(βx1+(1β)x2)LCβF(x1)(1β)F(x2).The arbitrariness of β[0,1] proves that F is LC-convex on A.

Theorem 3.4

Consider an interval-valued function F:RnI, which is gH-differentiable at x0. If a vector vRn which satisfies vTF(x0)<LC0, then there exists δ>0 such that for each β(0,δ), F(x0+βv)<LCF(x0).

Proof.

As F is gH-differentiable at x0, from Definition 2.5 and Theorem 3.2, we have F(x0+k)gHF(x0)=kTF(x0)||k||J(x0;k),where J(x0;k)0 as ||k||0. By replacing k=βv, for β>0, we get F(x0+βv)=F(x0)βvTF(x0)|β|||v||J(x0;βv).Since, vTF(x0)<LC0 and J(x0;βv)0 as β0+, we have F(x0+βv)<LCF(x0)for each β(0,δ), for some δ>0.

Definition 3.5

Let F:RnI be an interval-valued function. If F is gH-differentiable at x0, the set of descent directions at x0 is given by the set F(x0)={vRn:vTF(x0)<LC0}.For any v in F(x0), βvF(x0) for all β>0, the set F(x0) is said to be the cone of descent directions.

Definition 3.6

For a non-empty set ARn and x0A, the cone of feasible directions of A at x0 is given by R(x0)={vRn:v0, x0+βvA  β(0,δ) for some δ>0Rn}.

Definition 3.7

A feasible solution xA is said to be an efficient solution of the IOP (11) minxARnF(x)(11) if there does not exist any xA in Nδ(x) such that F(x)<LCF(x), where Nδ(x) is a δ-neighbourhood of x. If a solution x is an efficient solution, then we say that F(x) is a non-dominated solution to the IOP.

Theorem 3.8

For a non-empty set ARn, let us consider the following IOP minxARnF(x),where F:RnI. If F is gH-differentiable at x0A and x0 is a local efficient solution to the IOP (Equation11), then F(x0)R(x0)=ϕ.

Proof.

Contrarily, suppose that F(x0)R(x0)ϕ and v be an element in F(x0)R(x0). Then, by Theorem 3.4 there exists δ1>0 such that F(x0+βv)<LCF(x0)for each β(0,δ1).By Definition 3.6, there exists δ2>0 such that x0+βvAfor each β(0,δ2).Let us define δ:=min(δ1,δ2)>0, we note that  β(0,δ), x0+βvAand F(x0+βv)<LCF(x0).It is a contradiction to x0 a local efficient solution. Hence, F(x0)R(x0)=ϕ.

Next example illustrates the necessary condition given in Theorem 3.8

Example 3.9

Let AR2 be the set {(y1,y2)|1y12,1y22}. Let the IOP (12) minyAF(y1,y2),(12) where F(y1,y2)=[F_(y1,y2),F¯(y1,y2)]. Furthermore, F_(y1,y2)=2+3(y11)2+3(y22)2,F¯(y1,y2)=6+6(y11)2+6(y21)2.

The lower and upper functions are shown in the above figure. It is verified that y0=(1,1.5)A is an efficient point. The cone of feasible directions at y0 is given by R(x0)={(v1,v2)(0,0):(1+βv1,1.5+βv2)A, β(0,δ) for some δ>0}.={(v1,v2)(0,0):v10}.The gH-partial derivatives of F at y0 are D1F(y0)=[min{F_y1(y0),F¯y1(y0)},max{F_y1(y0),F¯y1(y0)}]=(0,0)andD2F(y0)=[min{F_y2(y0),F¯y2(y0)},max{F_y2(y0),F¯y2(y0)}]=(3,6).The cone of descent directions at y0 is given as F(x0)={(v1,v2)R2:(v1,v2)F(x0)<LC0}={(v1,v2)R2:v1D1F(y0)v2D2F(y0)<LC0R2}={(v1,v2)R2:v2(3,6)<LC0}=[3v2,6v2]<LC03v2<0and3v2+6v22<03v2<0and3v22<0v2>0andv2<0which is not possible. Hence, F(x0)=ϕ.Therefore, R(x0)F(x0)=ϕ.

Lemma 3.10

Let us consider the set R={xA:Hj(x)LC0 for j=1,2,,k} for the interval-valued functions Hj:RnI, where Aϕ is an open set in Rn. Let x0R and J(x0)={j:HDvpj(x0)=0}. Suppose Hj to be gH-differentiable at x0  jJ(x0) and gH-continuous for jJ(x0), we define H(x0)={v:vTHj(x)LC0,  jJ(x0)}.Then, H(x0)R(x0),where R(x0)={vRn:v0, x0+βvA  β(0,δ) for some δ>0Rn}.

Proof.

Consider v to be an element in H(x0). As x0A and A is an open set, there exists δ0>0 such that x0+βvAfor β(0,δ0).For each jJ(x0), as Hj is gH-continuous at x0 Hj(x0+βv)=Hj(x0)Lj(x0;βv),where Lj(x0;βv)0 as ||v||0. Since Hj(x)LC0 for jJ(x0), there exists δj>0 such that Hj(x0+βv)LC0for β(0,δj)andjJ(x0).As we know that vH(x0), for each jJ(x0) there exists δj>0 such that by Theorem 3.4 Hj(x0+βv)LCHj(x0)=0, β(0,δj).Suppose δ=min{δ0,δ1,δ2,,δk}. It is evident that δ>0. From the above inequalities, we note that the points of the form x0+βv belong to R for each β(0,δ). Therefore, vR(x0). Hence, H(x0)R(x0).

Theorem 3.11

Let Aϕ be an open set in Rn. Let us consider an IOP minF(x)such that Hj(x)LC0for j=1,2,,kxA},where F:RnI and Hj:RnI for j=1,2,,k. For a feasible point x0, define J(x0)={j:Hj(x0)=0}. Consider at x0, F and Hj, ,jJ(x0), be gH-differentiable, and for jJ(x0), Hj be gH-continuous. If x0 is an efficient solution of the IOP, then F(x0)H(x0)=ϕ,where F(x0)={v:vTFj(x)LC0}and H(x0)={v:vTHj(x)LC0 for each jJ(x0)}.

Proof.

By using Theorem 3.8 and Lemma 3.10, we can conclude that, x0 is a local efficient solution F(x0)R(x0)=ϕF(x0)H(x0)=ϕ.

Theorem 3.12

For an interval-valued vector Bvn=(bj)n×1 in In, only one of these given systems has a solution:

  1. xTBvn<LC0 for some x=(xj)n×1Rn.

  2. 0vnxBvn for some xR, x>0.

Proof.

Consider (i) is true. Let us prove that (ii) must be false. Contrarily, let (ii) be also true. Since (i) is true, we have x0TBvn<LC0for some x0Rnconsequently, y(x0TBvn)<LC0for all yR, y>0,it can also be written as (13) x0T(yBvn)<LC0for all yR, y>0.(13) As (ii) is also true, then 0vn(y0Bvn)for some y0R, y0>0also, we have (14) 0xT(y0Bvn)for all xRn.(14) As (Equation13) and (Equation14) cannot be true together, so we have contradiction here. Hence, if (i) is true, (ii) cannot be true. For the other case, Let us suppose that (i) is false. We prove that (ii) is true. Contrarily, Let us suppose (ii) is false. Therefore, 0vnyBvn yR,y>00vnBvn.Consequently, (15) ()j{1,2,,n}such that 0bj,()j{1,2,,n}such that bj<LC0 or 0<LCbj.(15) Let the sets K={k:0bk, k{1,2,,n}}and I={i:0bi, i{1,2,,n}}.By (Equation15) it can be seen that Iϕ. Also KI={1,2,,n} and KI=ϕ. Let us create a vector x0=(x10,x20,,xn0)T such that xj0={0if jK1if jI and bj<LC01if jI and 0<LCbj.For this x0Rn, we can see that iIxi0bisumkKxkobk<LC0.More generally, (16) x0TBvn<LC0.(16) As (i) is false, then xTBvn<LC0for no yRn.Which is a contradiction to (Equation16). So (ii) must be true. Which completes the proof.

Theorem 3.13

If y0 is a local efficient solution to the following IOP minyRnG(y),then 0vnG(y0), where G:RnI is gH-differentiable at y0,

Proof.

By using Definition 3.4 and Theorem 3.5, if y0 is a local efficient solution, then G(y0)=ϕ. Consequently, vTG(x0)<LC0for no vRn.From Theorem 3.12 with Bvn=G(x0), () y0Rn, y0>0, such that 0vny0G(y0)0vnG(y0).

Remark 3.14

It is very interesting to observe that the optimality condition in the above theorem is not an equality relation G(y0)=0vn but an inclusion relation 0vnG(y0). Inclusion relations are less restrictive and more correct than equality relations. For example if G(y0)=([0,0],[2,5]) then G(y0)0v2 but 0v2G(y0).

Theorem 3.15

For an interval-valued matrix B=(bij)m×n, where bijI, only one of the given systems has a solution:

  1. Bx<LC0vn for some x=(xi)m×1Rm

  2. 0vmBy for some 0y=(yi)n×1Rn,  yi0.

Proof.

Consider (i) is true. Let us prove that (ii) is false. Contrarily, consider (ii) is also true. Since, (i) is true, we have BTx0<LC0vnfor some x0=(x10,x20,,xm0)Rm.Consequently, yT(BTx0)<LC0, y0,y=(yi)n×1Rn, yi0.Then, it can be written as (17) (By)Tx0<LC0, y0,y=(yi)n×1Rn, yi0.(17) As we considered (ii) is also true, then for some non-zero y0=(yi0)n×1Rn, where yi00. Now we have (18) 0vmBy0.(18) Consider z=By0=(z1,z2,,zm)T. It implies that zIm and (By0)Tx0=i=1mxi0zi.From (Equation18), we have 0zi, i=1,2,,m0xi0zii=1,2,,m.Then, it can be written as (19) 0(By0)Tx0.(19) As (Equation17) and (Equation19) cannot be true together, so here is a contradiction. Hence, (ii) cannot be true if (i) is true.

For this, Let us suppose that (i)is not true. We shall prove that (ii) is true. If (i) is not true, then (20) BTx<LC0vnfor no xRm.(20) Let us suppose, contrarily, that (ii) is also false. Then, 0vmBy, y0,y=(yi)n×1Rn,for all yi0.Consequently, ()i{1,2,,m}such that 0ziit implies that (21) () i{1,2,,m}such that zi<0or 0zi,(21) where By=(z1,z2,,zm)T. Let the sets K={k:0zk, k{1,2,,m}}and J={j:0zj,j{1,2,,m}}.By (Equation21), it can be seen that Jϕ. Also KJ={1,2,,n} and KJ=ϕ. Let us create a vector x0=(x10,x20,,xm0)TRn by xi0={0if iK1if iJ and zi<01if iJ and 0<ziFor this x0Rm, we notice that jJxj0zjkKxkozk<LC0,which is equivalent to (22) yT(BTx0)<LC0, y0,y=(yi)n×1Rn, yi0.(22) The inequality (Equation22) can be true only when BTx0<LC0. The inequalities (Equation20) and (Equation22) are contradictory, so (i) and (ii) cannot hold together. Hence, (ii) must be true. This completes the proof.

Theorem 3.16

Let Aϕ be a set in Rn; F:RnI and Jk:RnI for k=1,2,,m. Consider IOP (23) {minF(x)such that Jk(x)LC0,k=1,2,,mxA.(23) Let x0 be a feasible point of IOP (Equation23), we define K(x0)={k:Jk(x0)=0}.Suppose, F and Jk are gH-differentiable at x0 for kK(x0) and gH-continuous for kK(x0). If x0 is a local efficient point of (Equation23), then there exist constants w0 and wk for kK(x0) such that {0vn(w0F(x0)kK(x0)wkJk(x0)),w00,wk0 for kK(x0),(w0,wK)(0,0v|K(x0)|),where, wK is the vector whose components are wk for kK(x0). Furthermore, if Jk  kK(x0) are also gH-differentiable at x0, then there exist w1,w2,,wm constants such that {0vn(w0F(x0)k=1mwkJk(x0)),wkJk(x0)=0,k=1,2,,m,w00,wk0,k=1,2,,m,(w0,w)(0,0vm)where, w is the vector (w1,w2,,wm).

Proof.

As x0 is a local efficient point of (Equation23), by Theorem 3.11, we get F(x0)J(x0)=ϕor we can say that vRn such that (24) vTF(x0)<LC0andvTJk(x0)<LC0, kK(x0).(24) Let B be the matrix such that B=[F(x0),[Jk(x0)]kK(x0)]n×(1+|K(x0)|).By (Equation24) we notice that BTvLC0v1+|K(x0)|for no vRn.Now by Theorem 3.15, () q0, q=(qk)|K(x0)+1|×1R|K(x0)|,qi0such that 0vnBq.Consider q of the form (25) q=[w0wk]kK(x0)(25) by putting (Equation25) in 0vnBq, we have 0vn[F(x0),[Jk(x0)]kK(x0)]n×(1+|K(x0)|)[w0wk]kK(x0)by simplifying the above expression, we have {0vn(w0F(x0)kK(x0)wkJk(x0)),w00,wk0 for kK(x0),(w0,wK)(0,0,,0).The first part of the theorem is proved here.

For the second part, suppose for kK(x0), Jk(x0)=0. Consequently, wkJk(x0)=0.If Jk,  kK(x0) are also gH-differentiable at x0, then by setting wk=0 for kK(x0) we have {0vn(w0F(x0)k=1mwkJk(x0)),wkJk(x0)=0,k=1,2,,m,w00,wk0, k=1,2,,m,(w0,w)(0,0vm)which completes our proof.

Definition 3.17

The collection of n interval vectors {(Yvk)1,(Yvk)2,,(Yvk)n} is called linearly independent if for n real numbers a1,a2,,an 0vka1(Yvk)1a2(Yvk)2,,an(Yvk)niff ai=0 i=1,2,,notherwise dependent.

Example 3.18

Let (Yv2)1={(1,3),(5,7)} and (Yv2)2={(2,3),(4,7)}.

Now we have a1(Yv2)1a2(Yv2)2=a1{(1,3),(5,7)}a2{(2,3),(4,7)}.and 0v2a1(Yv2)1a2(Yv2)2for a1=a2=0.So, it is a linearly independent set of interval vectors.

Theorem 3.19

Let Aϕ be a subset of Rn and F:RnI and Jk:RnI for k=1,2,,m be IVFs. Let x0 be a feasible point of the following IOP {minF(x)such that Jk(x)LC0,k=1,2,,mxA.We define K(x0)={k:Jk(x0)=0}.Suppose, F and Jk are gH-differentiable at x0 for kK(x0) and gH-continuous for kK(x0). Consider the interval vectors {Jk(x0) for kK(x0)} are linearly independent and if x0 is an efficient solution then there exist scalars wk  kK(x0) such that {0vn(F(x0)kK(x0)wkJk(x0)),wk0, kK(x0).Furthermore, if Jk for kK(x0) are also gH-differentiable at x0, then there exist w1,w2,,wm constants such that {0vn(F(x0)k=1mwkJk(x0)),wkJk(x0)=0,k=1,2,,m,wk0,k=1,2,,m.

Proof.

By Theorem 3.16, () w0 and wk  kK(x0), where w0 and wk are real constants and not all zeros, such that {0vn(w0F(x0)kK(x0)wkJk(x0))w00,wk0 kK(x0).We need to have w0>0. Because in another case, the set {Jk(x0) for kK(x0)} will become linearly dependent. We define wk=wkw0 then, wk0 for all kK(x0) and 0vn(F(x0)kK(x0)wkJk(x0)).For kK(x0), Jk(x0)=0. Thus, 0wkJk(x0). If the functions Jk for kK(x0) are also gH-differentiable at x0, then by setting wk=0 for kK(x0) we have 0vn(F(x0)k=1mwkJk(x0))which proves the second part of our theorem.

To illustrate the necessary conditions given in Theorems 3.16 and 3.19, let consider a detail example.

Example 3.20

Consider the IOP with feasible point y0=(0,2) minF(y1,y2)=[4,1]y12[1,0]y23[3,2]y22[0,1](y12y2)such that J1(y1,y2)=[3,2]y1[3,2]y2gH[6,4]LC0J2(y1,y2)=[0,1]y12[6,4]y2gH[2,1]LC0.In this IOP, the functions F, J1, and J2 are gH-differentiable on R2. We notice that, at y0 J1(y0)=(0,0)LC0J2(y0)=[10,7]LC0.Now, we find gH-partial derivatives of F(y1,y2) by using Definition 2.3, D1F(y1,y2)=[4,1]2y1[0,1](2y1y2)andD2F(y1,y2)=[1,0]3y22[3,2]2y2[0,1](y12)Thus, K(x0)={1}. We notice that F(y0)=(D1F(y0),D2F(y0))T=([0,0],[24,8])T.J1(y0)=(D1J1(y0),D2J1(y0))T=([3,2],[3,2])T.J2(y0)=(D1J2(y0),D2J2(y0))T=([0,0],[6,4])T.We don't actually need J2(y0) because K(x0)={1}. Therefore only J1(y0) is enough. Now, the conclusions in Theorem 3.16 hold for w0=2, w1=1 and w2=0 as 0vn(2([0,0],[24,8])T1([3,2],[3,2])T)0vn(([0,0],[48,16])T([3,2],[3,2])T).And that of Theorem 3.19 hold for w0=1, w1=1 and w2=0 as 0vn(([0,0],[24,8])T1([3,2],[3,2])T)0vn(([0,0],[24,8])T([3,2],[3,2])T).

Theorem 3.21

Let Aϕ be an open convex set such that F:ARnI and Ji:AI, i=1,2,,m be gH-differentiable LC-convex functions on A. Let x0 be a feasible point of the IOP {minF(x)such that Jk(x)LC0,k=1,2,,mxA.If there exist w1,w2,,wm scalars such that {0vn(F(x0)k=1mwkJk(x0)),wkJk(x0)=0,k=1,2,,m,wk0,k=1,2,,m.Then x0 is an efficient solution of the IOP.

Proof.

By supposition, for every xA satisfying Jk(x)LC0,  k=1,2,,m. We have 0vn(F(x0)k=1mwkJk(x0))T(xx0)=F(x0)T(xx0)k=1mwkJk(x0)T(xx0).By using Theorem 3.3 F(x0)T(xx0)k=1mwkJk(x0)T(xx0)LC(F(x)(1)F(x0))k=1mwk(Jk(x)(1)Jk(x0))LC(F(x)(1)F(x0)).Therefore, for every xA, either 0vn(F(x)(1)F(x0))or 0vnLC(F(x)(1)F(x0)).In either case, x0 is an efficient solution to the considered IOP. Here is the proof.

4. Applications of KKT conditions in SVM

Let us consider a binary classification problem. For a data set S={(ui,vi)|uiRn, vi{1,1}, i=1,2,,k},the problem of classifying data using SVMs is identical to the optimization problem below: (26) {minF(z,d)=12z2such that vi(zTui+d)1,i=1,2,,m,(26) where zRn is the weight vector and dR is the bias.

The constraints specify that the data points must be on opposite sides of the separating hyperplanes zTui+d=±1. There is uncertainty and imprecision in the data set in many classification problems. This might be due to errors in measurement, implementation, and so on.

For example, weather problems include interval-type data because we cannot find the weather condition for an instant of time, it is always for some duration and other circumstances during that time duration. And the inequality (Equation26) does not deal with interval-type data. As a result, the SVM problem is adjusted for the interval-valued data set {(Ui,vi)|UiIn, vi{1,1}, i=1,2,,k}by {minF(z,d)=12z2such that vi(zTUi+d)[1,1],i=1,2,,k.By adjusting the above problem accordingly, we get (27) {minF(z,d)=12z2such that Hi(z,d)=[1,1]gHvi(zTUid)LC0,i=1,2,,k.(27) We can see that F and Hi are both gH-differentiable and LC-convex functions. These functions of gH-gradients are as follows: F(z,d)=(D1F(z,d),D2F(z,d))T=(z,0)TandHi(z,d)=(D1Hi(z,d),D2Hi(z,d))T=(viUi,vi)T,where D1 and D2 are the gH-partial derivatives corresponding to z and d, respectively.

By Theorem 3.19, for an efficient point (z,d) of (Equation27) there exist non-negative constants w1,w2,,wk such that (28) 0vn+1((z,0)Ti=1kwi(viUi,vi)T)(28) and (29) 0=wiHi(z,d),i=1,2,,k.(29) The condition (Equation28) can be written as 0vn([z,z]i=1k(wivi)Ui)and i=1kwivi=0.The points in the data Ui for which wi0 are called support vectors. From (Equation29), we notice that corresponding to any wi>0, we have Hi(z,d)=0. Therefore, with respect to z, the value of the bias dis such a value that Hi(z,d)=0  i=1,2,,k for which wi>0.

Since, the functions F(z,d) and Hi(z,d) are gH -differentiable and LC-convex, by using Theorems 3.19 and 3.21, the collection of conditions that we must solve in order to find efficient solutions of the SVM IOP (Equation27) are (30) {0vn([z,z]i=1k(wivi)Ui)i=1kwivi=0and0=wiHi(z,d),i=1,2,,k.(30) For any of the values of z that fulfill the condition in (Equation30), let us define the set of possible values of the bias as (31) i:wi>0{d|Hi(z,d)=0}.(31) By using any solution z~ and d~ of (Equation30) and (Equation31), a classifying hyperplane and the SVM classifier function are given respectively z~TU+d~=0andp(U)=sign(z~TU+d~).

Example 4.1

Let us consider the interval data set U1=[[3,4], [1,2]],v1=1,U2=[[4,5], [2,3]],v1=1,U3=[[5,6], [1,2]],v1=1,U4=[[0,1], [1,2]],v1=1,U5=[[1,2], [2,3]],v1=1,U6=[[0,2], [3,4]],v1=1.Let us find a classifying hyperplane for the above data set.

For finding a classifying hyperplane, the possible solution (z,d) to (Equation30) with respective wi,s are required.

For the choice (w1,w2,w3,w4,w5,w6)=(1,0,0,0,0,1), we notice that we have i=16wivi=0.According to the first condition (Equation30), the values of Ui for which wi0 are U1 and U6. The condition in (Equation30) becomes (32) 0vn([z,z](1)(1×1)U1(1)(1×1)U6)0vn([z,z](1)U1U6)or [z,z](1)((1)U1U6),(32) where (1)U1=(1)[[3,4],[1,2]]=[[4,3],[2,1]](1)U1U6=[[4,3],[2,1]][[0,2],[3,4]]=[[4,1],[1,3]](1)((1)U1U6)=(1)[[4,1],[1,3]]=[[1,4],[3,1]].Substituting in (Equation32), we have z[[1,4],[3,1]].As zRn, and we are working on 2D so, n = 2. As a result zR2, which means z=[z1,z2]. The above condition becomes 1z14and3z21.We choose z1=1 and z2=1. Corresponding to this z= (z1,z2)=(1,1), from (Equation31) and the condition 3 in (Equation30), the set of possible values for the bias d is as follows i=1,6{dR|Hi(z,d)=0}={dR|H1(z,d)=0}{dR|H6(z,d)=0}.Since, Hi(z,d)=[1,1]gHvi(zTUid)LC0, i=1,2,,k. Then, for i = 1, we have H1(z,d)=[1,1]gH1((1,1)[[3,4],[1,2]]d)upon simplification and setting H1(z,d)=0 we get [min(0d,2d),max(0d,2d)]=0.By applying the definition of gH-difference there are 2 cases by the first Lemma, that is 2d1d or 2d1d. But it does not matter because of equality with 0 0d=0and2d=0d=0andd=2.For H1(z,d)=0, we have (33) d[2,0].(33) Similarly, for i = 6 H6(z,d)=[1,1]gH(1)((1,1)[[0,2],[3,4]]d)upon simplification, we get H6(z,d)=[1,1]gH[4d,1d].For H6(z,d)=0, we have [min(3+d,0+d),max(3+d,0+d)]=03+d=0and0+d=0d=3andd=0.This implies that (34) d[0,3].(34) Now, we have i=1,6{dR|Hi(z,d)=0}={dR|dH1(z,d)=0}{dR|dH6(z,d)=0}.Substituting (Equation33) and (Equation34) in the above expression ={dR|d[2,0]}{dR|d[0,3]}d=0.Therefore, corresponding to z1=1 and z2=1, the expression for the classifying hyperplane (see Figure ) is as follows z1u1+z2u2+d=0,d=0u1u2=0.Similarly, we can find equations for other hyperplanes.

Figure 1. The objective function F(y1,y2) of example.

Figure 1. The objective function F(y1,y2) of example.

Figure 2. The hyperplane represents u1u2=0.

Figure 2. The hyperplane represents u1−u2=0.

Now, we give a graphical representation of the hyperplane by the following figure.

By translating the above equation, we can get more equations for classifying hyperplane by the following Figure . The blocks in Figure  show the interval data set considered in the example. The red hyperplane is the best classifying hyperplane which represents the equation 2u12u2+1=0 and the blue dotted lines are margin lines that represent the equations u1u2=0 and u1u2+1=0. And the strip, which is formed by the equations u1u2=0 and u1u2+1=0, contains all the hyperplanes that classify the given data.

Figure 3. An interval-valued classifying hyperplane.

Figure 3. An interval-valued classifying hyperplane.

5. Conclusion

In this article, we considered the constrained IOP for characterizing the efficient solution. We generalized Gordan's theorems to get the Fritz–John condition for IOPs. We also derived an extension to KKT's necessary optimality condition for LC-partial order. The derived conditions formed the basis of our expression for SVMs, and we demonstrated how SVMs classify data using KKT conditions. To get hyperplane in interval-valued vectors is not a simple task. In Example 4.1, we only consider a very small data set. For a large data set, it can be more complex. Also in this article, we only tackle a non-overlapping data set. For overlapping data set, it will not work. Therefore, in the future, SVM regression analysis might apply to the SVM classifier and also one can discuss overlapping data set and also generalized this idea for multi-classification problem.

Authors' contributions

All authors contributed equally to this article. They have read and approved the final manuscript.

Disclosure statement

The authors confirm that there are no known conflicts of interest associated with this publication.

References

  • Wu HC. The Karush–Kuhn–Tucker optimality conditions in multiobjective programming problems with interval-valued objective functions. Eur J Oper Res. 2009;196:49–60. doi: 10.1016/j.ejor.2008.03.012
  • Wu HC. On interval-valued nonlinear programming problems. J Math Anal Appl. 2008;338:299–316. doi: 10.1016/j.jmaa.2007.05.023
  • Wu HC. The Karush–Kuhn–Tucker optimality conditions in an optimization problem with interval-valued objective function. Eur J Oper Res. 2007;176:46–59. doi: 10.1016/j.ejor.2005.09.007
  • Chalco-Cano Y, Lodwick WA, Rufian-Lizana A. Optimality conditions of type KKT for optimization problem with interval-valued objective function via generalized derivative. Fuzzy Optim Decis Mak. 2013;12:305–322. doi: 10.1007/s10700-013-9156-y
  • Singh D, Dar B, Kim DS. KKT optimality conditions in interval valued multiobjective programming with generalized differentiable functions. Eur J Oper Res. 2016;254:29–39. doi: 10.1016/j.ejor.2016.03.042
  • Singh D, Dar BA, Goyal A. KKT optimality conditions for interval valued optimization problems. J Nonlinear Anal Optim. 2014;5:91–103.
  • Slyn'ko VI, Tunç C. Instability of set differential equations. J Math Anal Appl. 2018;467:935–947. doi: 10.1016/j.jmaa.2018.07.048
  • Ben-Tal A, El Ghaoui L, Nemirovski A. Robust optimization. Princeton (NJ): Princeton University Press; 2009. (Princeton Series in Applied Mathematics).
  • Slowinski R, Teghem J. Stochastic versus fuzzy approaches to multiobjective mathematical programming under uncertainty. Edited by Roman Słowiński and Jacques Teghem. Theory and Decision Library. Series D: System Theory, Knowledge Engineering and Problem Solving, 6. Kluwer Academic Publishers Group, Dordrecht, 1990.
  • Snyder CThe optimization behind support vector machines and an application in handwriting recognition; 2016; p. 1–15.
  • Shawe-Taylor J, Bartlett PL, Williamson RC, Anthony M. Structural risk minimization over data-dependent hierarchies. IEEE Trans Inform Theory. 1998;44:1926–1940. doi: 10.1109/18.705570
  • Vapnik V. Estimation of dependences based on empirical data. Translated from the Russian by Samuel Kotz. Springer Series in Statistics. Springer-Verlag, New York–Berlin, 1982.
  • Vapnik VN. The nature of statistical learning theory. New York: Springer-Verlag; 1995.
  • Boser BE, Guyon IM, Vapnik VN. A training algorithm for optimal margin classifiers. Proceedings of the fifth annual workshop on Computational learning theory. 1992; p. 144–152.
  • Stefanini S, Bede B. Generalized Hukuhara differentiability of interval-valued functions and interval differential equations. Nonlinear Anal. 2009;71:1311–1328. doi: 10.1016/j.na.2008.12.005
  • Ghosh D, Singh A, Shukla KK, et al. Extended Karush–Kuhn–Tucker condition for constrained interval optimization problems and its application in support vector machines. Inform Sci. 2019;504:276–292. doi: 10.1016/j.ins.2019.07.017
  • Debnath AK, Ghosh D. Generalized-Hukuhara penalty method for optimization problem with interval-valued functions and its application in interval-valued portfolio optimization problems. Oper Res Lett. 2022;50:602–609. doi: 10.1016/j.orl.2022.08.010
  • Kumar K, Ghosh D, Kumar G. Weak sharp minima for interval-valued functions and its primal-dual characterizations using generalized Hukuhara subdifferentiability. Soft Comput. 2022;26:10253–10273. doi: 10.1007/s00500-022-07332-0
  • Younus A, Nisar O. Convex optimization of interval valued functions on mixed domains. Filomat. 2019;33:1715–1725. doi: 10.2298/FIL1906715Y
  • Dastgeer Z, Younus A, Tunç, C. Distinguishability of the descriptor systems with regular pencil. Linear Algebra Appl. 2022;652:82–96. doi:10.1016/j.laa.2022.07.004
  • Chalco-Cano Y, Rufián-Lizana A, Román-Flores H, et al. Calculus for interval-valued functions using generalized Hukuhara derivative and applications. Fuzzy Sets Syst. 2013;219:49–67. doi: 10.1016/j.fss.2012.12.004
  • Tao J, Zhang ZH. Properties of interval vector-valued arithmetic based on gH-difference. Math Comput. 2015;4:7–12.
  • Ghosh D. Newton method to obtain efficient solutions of the optimization problems with interval-valued objective functions. J Appl Math Comput. 2017;53:709–731. doi: 10.1007/s12190-016-0990-2