323
Views
0
CrossRef citations to date
0
Altmetric
Articles

Compression schemes for concept classes induced by three types of discrete undirected graphical models

&
Pages 287-295 | Received 08 Nov 2022, Accepted 14 Jul 2023, Published online: 26 Sep 2023

Abstract

Sample compression schemes were first proposed by Littlestone and Warmuth in 1986. Undirected graphical model is a powerful tool for classification in statistical learning. In this paper, we consider labelled compression schemes for concept classes induced by discrete undirected graphical models. For the undirected graph of two vertices with no edge, where one vertex takes two values and the other vertex can take any finite number of values, we propose an algorithm to establish a labelled compression scheme of size VC dimension of associated concept class. Further, we extend the result to other two types of undirected graphical models and show the existence of labelled compression schemes of size VC dimension for induced concept classes. The work of this paper makes a step forward in solving sample compression problem for concept class induced by a general discrete undirected graphical model.

1. Introduction

Compression is an important facet of learning. In statistical learning, excellent and appropriate compression can guarantee a small generalization error (Blumer et al., Citation1987). A significant notion of compression is sample compression scheme, which was introduced by Littlestone and Warmuth (Citation1986). There are two types of sample compression schemes: labelled and unlabelled. Such compression schemes are approaches for encoding a finite sample in a small subsample.

A labelled sample compression scheme of size k for a concept class consists of a compression function g and a reconstruction function h. Compression function g compresses a finite sample to a small labelled subsample of size at most k that is called compression set, and reconstruction function h aims to construct a concept consistent with the original sample set by the compression set. A crucial fact called compactness lemma was given by Ben-David and Litman (Citation1998), which essentially says that if a finite class admits a compression scheme so does an infinite class. Vapnik-Chervonenkis (VC) dimension is a measure of the complexity of function classes. In statistical learning, VC dimension is deeply related to probably approximately correct learning (PAC) and sample compression scheme (Blumer et al., Citation1989; Floyd & Warmuth, Citation1995). In particular, for any concept class of VC dimension d the size of its sample compression scheme must be at least d (Bollobás & Radcliffe, Citation1995). Furthermore, Floyd and Warmuth (Citation1995) and Warmuth (Citation2003) proposed a sample compression conjecture, that is, for every class of VC dimension d, there exists a sample compression scheme of size O(d). Up to now, this question has not been solved. However, it has been proved for many families of concept classes which are natural and important in statistical learning. Floyd and Warmuth (Citation1995) showed that there is a labelled sample compression scheme of size d for any maximum class of VC dimension d. Moran and Warmuth (Citation2016) presented the labelled sample compression schemes of size d for ample classes which are a natural generalization of maximum classes based on Sandwich Lemma (Anstee et al., Citation2002; Bollobás & Radcliffe, Citation1995). One noteworthy result is that Moran and Yehudayoff (Citation2016) proved the existence of labelled compression scheme of size exp(d) for arbitrary class of VC dimension d. In a latest study, Chepoi et al. (Citation2021) showed that the topes of a complex of oriented matroids (COM) have a labelled sample compression scheme of size VC dimension d.

Another type of compression scheme is unlabelled. An unlabelled sample compression scheme is that the compression set is unlabelled. In other words, the compression function compresses each given sample to a subset of the domain of the sample. For some specific concept classes, unlabelled sample compression schemes have been explored by several researchers and they have achieved a series of results (Chalopin et al., Citation2019; Helmbold et al., Citation1990; Kuzmin & Warmuth, Citation2007; Marc, Citation2022; Pálvölgyi & Tardos, Citation2020; Rubinstein & Rubinstein, Citation2012).

Undirected graphical model (UGM), also known as Markov network, is one of the research hot points in the field of statistical learning. In the past a few decades, it has been successfully applied to many real problems (Friedman, Citation2004; Lauritzen, Citation1996). We know that a statistical model is a family of probability distributions, which is a (part of) real algebraic variety from the viewpoint of algebraic geometry (Pistone et al., Citation2001). In particular, considering discrete data, a statistical model is the set of all solutions of some polynomials in the probability simplex (Geiger et al., Citation2001; Settimi & Smith, Citation2000).

Given an undirected graph (UG) and sample space, one has an algebraic geometric characterization of the corresponding discrete UGM (Geiger et al., Citation2006). Based on the work in Geiger et al. (Citation2006), for general discrete UGMs, Li and Yang (Citation2018) developed an algebraic method for computing VC dimensions of corresponding concept classes. Their work lays a theoretical foundation for this paper.

A natural question is whether the sample compression conjecture holds for the concept classes induced by discrete UGMs. In this paper, we consider concept classes induced by three types of discrete UGMs. Using the quadratic binomials associated with a positive UGM, we construct a labelled compression scheme of size VC dimension d. That is, the compression conjecture is correct for these three concept classes. It is an open question whether the concept class induced by a general discrete UGM has a sample compression scheme of size VC dimension.

The remainder of this paper is organized as follows. Section 2 introduces main definitions and notations. Section 3 discusses a property of one-inclusion graphs of concept classes induced by general UGMs and establishes the existence of labelled sample compression schemes of size equal to their VC dimensions for classes induced by three kinds of discrete UGMs. Section 4 summarizes the work of this paper.

2. Preliminaries

In this section, we introduce some formal definitions and notations that appear in this paper.

2.1. Discrete UGMs

We consider UGMs whose underlying UGs are simple. A UG G=(V,E) contains a set of vertices V and a set of unordered pairs vertices (undirected edges) E. A graph is complete if there exists an edge between every pair of vertices. Given a UG G, a clique is a maximal complete subgraph of G, and we use κG to denote the set of all cliques of G. Let X1,X2,,Xn be discrete random variables (vertices in G), where Xi[ki]{0,1,,ki1}, kiN, ki2. Let X=(X1,X2,,Xn) be an n-dimension vector, and then the sample space X=i=1n[ki]. A UGM P is a family of probability distributions, and each element (probability distribution on X) in P has the form (1) px1x2xnP(x1,x2,,xn)=P(x)KκGψK(x),(1) where x=(x1,x2,,xn)X, and ψK(x) is a potential function that depends on X only through the values of variables in K. We focus on the collection of positive probability distributions in P, denoted as P+.

Example 2.1

The graph G1 in Figure  is a simple UG, V={X1,X2}, E={(X1,X2)} and κG1={{X1},{X2}}. If X1,X2[2], X has four values: x(1)=(0,0), x(2)=(1,1), x(3)=(0,1), x(4)=(1,0).

Figure 1. A simple UG G1.

Figure 1. A simple UG G1.

2.2. Algebraic geometry of discrete UGMs

We work in the real polynomial ring R[X] whose indeterminates are elementary probabilities px1x2xn. A subset IR[X] is an ideal if it satisfies the following conditions: (1) 0I, (2) if f,gI, then f+gI, and (3) if fI and hR[X], then hfI. Every ideal I in R[X] associates a variety XIR>0={yR>0m:f(y)=0 for every fI},where m=i=1nki, R>0m is the set of m-dimensional vectors whose components are positive real numbers.

Given a UG G, we say Xi is independent of Xj given {X1,X2,,Xn}{Xi,Xj} if (Xi,Xj)E. Denote it by XiXj|{X1,X2,,Xn}{Xi,Xj} and call it a pairwise conditional independence statement. Each XiXj|{X1,X2,,Xn}{Xi,Xj} corresponds to a set of quadratic binomials:  xi,xi[ki],  xj,xj[kj],  zl=1,li,ljn[kl], pxixjzpxixjzpxixjzpxixjzholds.Let Ipairwise(G) be the ideal in R[X] generated by the quadratic binomials corresponding to all the pairwise conditional independence statements. The well-known Hammersley–Clifford theorem shows that XIpairwise(G)R>0=P+ (Geiger et al., Citation2006; Lauritzen, Citation1996).

Example 2.2

Consider the same UG and sample space given in Example 2.1. Since X1X2, the unique quadratic binomial associated with this discrete UGM is (2) px(1)px(2)px(3)px(4).(2)

2.3. Concept classes, VC dimension, sample compression schemes

A concept class C over domain X (also denoted as dom(C)) is a family of functions of the form c:X{0,1}. Alternatively, C can be represented by the one-inclusion graph (Haussler et al., Citation1994). The vertices of the graph are all the concepts in C and two concepts have an edge if they differ at a single point. Each function cC is said to be a concept, and it can also be viewed as a subset of X where xc if and only if xX and c(x)=1. For some subset AX, the restriction of C on A is the class C|A={cA:cC}. If |C|A|=2|A| that means for every binary vector b{0,1}|A| there is a concept c such that c(A)=b, then A is said to be shattered by C. The VC dimension of C is given by VCdim(C)=sup{m|AX shattered by C and |A|=m}.The sign function is defined as (3) sign(z)={1,if z0,0,if z<0,(3) where zR. The concept class induced by a discrete UGM is the set of Boolean functions over X of the form sign(log(P(x)/Q(x))) for P,QP+, where P+ is a distribution class. Hence, sign(log(P(x)/Q(x)))=1 if P(x)Q(x) and sign(log(P(x)/Q(x)))=0 otherwise. A concept class C is called maximum if its size is (nd)=i=0d(ni), where n=|dom(C)| and d=VCdim(C).

Example 2.3

According to Example 2.2., clearly, there is no pair of distributions P,QP1+ such that (sign(log(p00q00)),sign(log(p11q11)), sign(log(p01q01)),sign(log(p10q10))) take the values (1,1,0,0) and (0,0,1,1). Since p00+p01+ p10+p11=q00+q01+q10+q11=1, the value (0,0,0,0) cannot appear. The concept class C1 is shown in Figure (a). Note that VCdim(C1)=3, and it is not a maximum class. The one-inclusion graph of C1 is given in Figure (b).

Figure 2. (a) C1 and (b) its one-inclusion graph.

Figure 2. (a) C1 and (b) its one-inclusion graph.

A labelled sample is a set s={(x(1),y1),,(x(m),ym)}, where x(i)X, yi{0,1}. A labelled sample compression scheme of a concept class C is a pair of functions (g,h). Function g is the compression function for mapping a labelled realizable finite sample that comes from some concept to a labelled subsample as a compression set. h is the reconstruction function for mapping the subsample to a hypothesis on X consistent with the entire sample. The size of the compression scheme is the size of the largest compression set. A sample compression scheme is proper if h(g(s))C for any realizable sample s, otherwise, it is called improper.

3. Labelled sample compression schemes

In this section, we point out that one-inclusion graph of the concept class induced by a discrete UGM is not the tope graph of a COM if the underlying graph is incomplete and construct labelled sample compression schemes of size VC dimension for classes corresponding to three types of discrete UGMs.

3.1. One-inclusion graphs of concept classes induced by general discrete UGMs

For a discrete UGM, vertices in its underlying UG represent random variables. In this section, each vertex in the graph under consideration represents an object, which is not a random variable (the object is a concept in one-inclusion graph).

Let G=(V,E) be a finite, connected and simple graph. The distance d(u,v)dG(u,v) between vertices u and v is defined as the length of a shortest (u,v)-path. We say that G=(V,E) is isometrically embeddable into a graph H=(W,F) if there is a map φ:VW such that dH(φ(u),φ(v))=d(u,v) for all u,vV. A graph G is a partial cube if it allows an isometric embedding into a hypercube graph Qn.

It is significant for us to introduce two types of metric subgraphs. For a subgraph H, H is even if for every vertex v there exists unique vertex v such that d(v,v)=diam(G), where diam(G) is the diameter of G. An even subgraph is symmetric if d(u,v)+d(u,v)=diam(G) for all u, v in H. A symmetric-even subgraph H of G is also called an antipodal subgraph. Two antipodal subgraphs of the graph Figure (b) are shown in Figure . A subgraph H is gated if for each vertex v in G there exists a gate x in H, such that there is a shortest path from v through x to y for each vertex yH. In a partial cube that means there exists a path from v to H that does not use any color that appears in the edges of H. We say that a graph is antipodally gated if all of its antipodal subgraphs are gated.

Figure 3. Two antipodal subgraphs of the graph in Figure (b).

Figure 3. Two antipodal subgraphs of the graph in Figure 2(b).

Considering the subgraph Figure (b), it is an antipodal subgraph, while the vertex ‘0010’ does not have gate. For vertex ‘0010’, there is not a path from ‘0010’ to Figure (b) that does not use red or blue edge. Similarly, the subgraph induced by vertex set {‘0010’,‘0110’,‘1110’,‘1010’} is a non-gated antipodal subgraph. Therefore, Figure (b) is not antipodally gated. By Theorem 1.1 in Knauer and Marc (Citation2020), it is not the tope graph of a COM. Furthermore, each one-inclusion graph of concept class induced by a UGM whose underlying graph is incomplete is not the tope graph of a COM.

3.2. Sample compression schemes for the concept classes induced by discrete UGMs

Given a sample space, if a UG is complete, the corresponding concept class is a maximum class (Li & Yang, Citation2018), and its one-inclusion graph is the tope graph of a COM (Knauer & Marc, Citation2020). Now, we consider labelled compression scheme of C1.

Proposition 3.1

C1 has a labelled compression scheme of size 3.

Proof.

Let X1,X2{0,1}, and consider G2 in Figure . X1 and X2 are dependent. Note that the induced concept class C2 contains 15 concepts, and it is a maximum class. Then we can get a labelled compression scheme for C2 (Floyd & Warmuth, Citation1995; Moran & Warmuth, Citation2016), and a labelled compression scheme of size 3 for C1 as shown in Table  due to C1C2 and VCdim(C1)=VCdim(C2)=3.

Figure 4. A UG G2.

Figure 4. A UG G2.

Table 1. A labelled compression scheme for C1.

Remark 3.1

For G1 in Figure , let X1[k1], X2[k2], where k1,k2N, k1,k22 excluding the case k1=k2=2. Then we can give the algebraic characterization of this discrete UGM: (4) pijplkpikplj,(4) where i,l[k1], i<l, j,k[k2], j<k. It is easy to know that only (0,0,1,1) and (1,1,0,0) cannot occur on {(i,j),(l,k), (i,k),(l,j)}, that is, there are 14 functions, denoted by C3.

Note that VCdim(C3)=3. A natural idea would be to embed it into a maximum class of the same VC dimension that contains C3. As we all know there are 16 functions on 4 domain points in total, and then we can delete (0,0,1,1) to get a maximum class of VC dimension 3 which contains C3. The labelled compression scheme for this maximum class is shown in Table . The concept (1,1,0,0)C3, and it has four compression sets. If we know {((i,j),1),((l,k),1),((i,k),0)}, this evidence cannot be used to predict the label of (l,j) in C3 though the fact that it comes from (1,1,0,1) is obvious. The key point is that compression sets corresponding to (1,1,0,1) do not contain {((i,j),1),((l,k),1),((i,k),0)} in the maximum class C3(1,1,0,0). Thus we can assign the labelled compression set {((i,j),1),((l,k),1),((i,k),0)} to (1,1,0,1), {((i,j),1),((l,k),1),((l,j),0)} to (1,1,1,0), {((i,j),1),((i,k),0),((l,j),0)} to (1,0, 0,0), and {((l,k),1),((i,k),0),((l,j),0)} to (0,1,0,0). Then one can obtain a new compression scheme for C3 as shown in Table . This compression scheme is crucial in this paper.

Table 2. A labelled compression scheme for a maximum class.

Table 3. A new labelled compression scheme for C3.

Remark 3.2

Consider formula (4). If we know arbitrary three of the four values pij, plk, pik, plj, then the value of the unknown probability can be obtained. However, according to Table , we must abide by the following rules: if we want to reconstruct (0,0,0,0), (0,1,0,1), (0,1,1,0), (1,0,0,1), (1,0,1,0), (1,1,1,1) we must know {((i,j),0),((l,k),0)}, {((i,j),0),((l,j),1)}, {((i,j),0),((i,k),1)}, {((l,k),0), ((l,j),1)}, {((l,k),0),((i,k),1)}, {((i,k),1,((l,j),1)}, respectively. Similarly, if we want to reconstruct concepts (0,0,0,1), (0,0,1,0), (0,1,1,1), (1,0,1,1), we need to know the specified three points {((i,j),0),((l,k),0),((l,j),1)}, {((i,j),0),((l,k),0), ((i,k),1)}, {((i,j),0),((i,k),1),((l,j),1)}, {((l,k),0),((i,k),1),((l,j),1)}, respectively. The remaining four concepts in C3 need not be specific 0−1 values.

Next we will show that when X1[2] and X2 takes any finite number of values, each concept in the induced class can be compressed to a labelled set of size VC dimension.

Lemma 3.2

For G1 in Figure , let X1[2], X2[k2], k2N and k22, let C4 denote the class induced by this discrete UGM, and then for each concept cC4, there exists a compression set of size k2+1 which can reconstruct c exactly.

Proof.

For k2=2, we have C4=C1, and then the lemma holds. For k2>2, C4 has 2k2 domain points, denoted as (0,0),(1,0),,(1,k21). The quadratic binomials associated with this UGM are as follows: (5) p0ip1jp0jp1i,(5) where i[k21], i<j[k2]. We know that VCdim(C4)=k2+1 (Li & Yang, Citation2018).  cC4, there are four cases to discuss.

Case (1): {c|{(0,m),(1,m)}: m[k2]}=(1,1).

Note that c|{(0,i),(1,j),(0,j),(1,i)}=(1,1,1,1), so by Remark 3.2, we must retain ((0,j),1) and ((1,i),1) to reconstruct ‘1111’. Therefore, for any m we first retain ((0,m),1) and ((1,m),1). Then for any n[k2]m, if n<m retain ((1,n),1); if n>m retain ((0,n),1). Then we can get the compression set of size k2+1.

Case (2): {c|{(0,m),(1,m)}: m[k2]}={(1,1),(0,0)}. There are two subcases as follows.

Case (2a): c|{(0,m),(1,m)}=(0,0) and c|{(0,n),(1,n)}=(1,1) for each n<m. We can first retain ((0,m),0) and ((1,m),0). Then according to reconstruction rules, for n<m we retain ((1,n),1). For n>m, if c|{(0,n),(1,n)}=(1,1) then retain ((0,n),1); if c|{(0,n),(1,n)}=(0,0) then retain ((1,n),0).

Case (2b): c|{(0,m),(1,m)}=(1,1) and c|{(0,n),(1,n)}=(0,0) for each n<m. We can first retain ((0,m),1), ((1,m),1) and ((0,n),0) for n<m. For n>m, if c|{(0,n),(1,n)}=(1,1) then retain ((0,n),1); if c|{(0,n),(1,n)}=(0,0) then retain ((1,n),0).

Case (3):  m[k2],c|{(0,m),(1,m)}{(1,1),(0,0),(0,1)}, and there exists at least one m[k2] such that c|{(0,m), (1,m)}=(0,1).

Let k=min{m;c|{(0,m),(1,m)}=(0,1)}. We first retain ((0,k),0) and ((1,k),1). Then by the rule shown in Table , for any n<k if c|{(0,n),(1,n)}=(0,0), we retain ((0,n),0) or ((1,n),0); if c|{(0,n),(1,n)}=(1,1), we retain ((0,n),1) or ((1,n),1). For any n>k if c|{(0,n),(1,n)}=(0,0), we retain ((1,n),0); if c|{(0,n),(1,n)}=(0,1), retain ((0,n),0) or ((1,n),1); if c|{(0,n),(1,n)}=(1,1), retain ((0,n),1). This means that k2+1 points are reserved.

Case (4):  m[k2],c|{(0,m),(1,m)}{(1,1),(0,0),(1,0)}, and there exists at least one m[k2] such that c|{(0,m), (1,m)}=(1,0).

Let l=max{m;c|{(0,m),(1,m)}=(1,0)}. First, we retain ((0,l),1) and ((1,l),0). Then using similar techniques as in Case (3), it follows that for any n<l if c|{(0,n),(1,n)}=(0,0), we retain ((0,n),0); if c|{(0,n),(1,n)}=(1,1), we retain ((1,n),1); if c|{(0,n),(1,n)}=(1,0), we retain ((0,n),1) or ((1,n),0). For any n>l if c|{(0,n),(1,n)}=(0,0), we retain ((0,n),0) or ((1,n),0); if c|{(0,n),(1,n)}=(1,1), we retain ((0,n),1) or ((1,n),1).

In summary, we can get a compression set A of size k2+1 for any concept c. Clearly, the compression process ensures that we can reconstruct c|(dom(c)A) correctly only by points in the compression set A.

In fact, in the proof of Lemma 3.2, Case (3) contains the following four subcases: {c|{(0,m),(1,m)}:m[k2]}={(0,1)}, {(0,1),(0,0)}, {(0,1),(1,1)}, {(1,1),(0,0), (0,1)}. Case (4) also has four subcases: {c|{(0,m),(1,m)}:m[k2]}={(1,0)}, {(1,0),(0,0)}, {(1,0),(1,1)}, {(1,1),(0,0),(1,0)}.

On a high level, our goal is to construct a sample compression scheme for C4 which compresses any sample s of C4 to a subsample of size at most k2+1, and then this subsample can be used to reconstruct a concept consistent with s. Next we will give an algorithm which generates such a compression scheme. The input is a labelled sample s of C4. The output is a subsample s of size at most k2+1 that represents some hypothesis consistent with s.

The Algorithm for Constructing a Labelled Compression Scheme.

  • The compression algorithm: The input is a finite sample s of C4. Let A1,A2,, An denote all domain sets corresponding to quadratic binomials that 2k2 domain points need to satisfy, n=(k22) and further let A={At: |s|At|=4,t=1,,n}. If A=, then s=s. Otherwise, let A=At where AtA. Then compress s|A by the way used in the proof of Lemma 3.2 and denote the compression set for s|A by s, and then s=s(s|dom(s)A).

  • The reconstruction algorithm: The input is a subsample s of size at most k2+1, and reconstruction function is asked to predict the labels of elements in dom(C4)dom(s). For any element xdom(C4)dom(s), if there is a quadratic binomial that contains x and the other three points are in the subsample s, then predict the label of x by the scheme in Table . Let the set of such x be B, and then for elements in dom(C4)(dom(s)B), assign any set of possible 0−1 values associated binomial (5) as their labels.

The following theorem (3.3) shows that the algorithm produces a correct compression set of size at most k2+1.

Theorem 3.3

Let s be a sample labelled consistently with some concept of C4, 1|s|=m2k2. Then the algorithm produces a compression set of size at most k2+1 and the reconstructed hypothesis is consistent with the original sample s.

Proof.

We first show that the size of the compression set s is at most k2+1. If A=, there is at most one pair of points (x0i,y0i), (x1i,y1i) in the sample s, where y0i,y1i{0,1}. Therefore, |s|=|s|2+(k21)=k2+1. If A, there exist at least two pairs of points (x0i,y0i), (x1i,y1i) and (x0j,y0j), (x1j,y1j) in sample s where ij. Let the set of these points be D, |D|=2n, where 2nk2. By Lemma 3.2, we can compress the labelled set D to s of size n+1. Then |s|=|s|+|sD|=n+1+m2nk2+1.

From the algorithm and Lemma 3.2, it follows immediately that reconstructed hypothesis is consistent with s on dom(s)dom(s), and consequently, the sample set s. This ends the proof.

Example 3.4

Consider the graph G1. If X1[2], X2[3], let C5 denote the concept class induced by this discrete UGM, and then dom(C5)={(0,0),(1,0),(0,1),(1,1),(0,2),(1,2)}. The associated quadratic binomials are:

(1) p00p11p01p10; (2) p00p12p02p10; (3) p01p12p02p11.

It is easy to know the VC dimension of C5 is 4 (Li & Yang, Citation2018). Consider the sample s={((0,0),1), ((1,0),1),((0,1),1),((1,1),1),((0,2),0),((1,2),1)} which is a concept in C5 (e.g. let P=(0.12,0.28,0.12,0.28,0.06,0.14), Q=(0.09,0.01, 0.09,0.01,0.72,0.08), s=sign(log(P(x)/Q(x))), xdom(C5)). Using the proposed algorithm, we can compress s to {((0,0),1),((0,1),1), ((0,2),0),((1,2),1)}. The reconstruction function predicts the label ‘1’ for (1,0), ‘1’ for (1,1) by binomials (2) and (3), respectively. That is, the reconstructed hypothesis we obtain is exactly s.

If s={((0,0),1),((0,1),1),((1,1),1),((1,2),1)}, by the proposed algorithm, s itself is the labelled sample compression set. Then (1,0) and (0,2) are both labelled by ‘0’. This concept does not belong to C5, which means that the labelled compression scheme we presented is improper.

Naturally, we can extend Lemma 3.2 to a family of concept classes whose corresponding graph has three vertices and two edges.

Theorem 3.5

For the graph shown in Figure , let X1[2], Xi[ki], where kiN, ki2, i = 2, 3. Let C6 denote the class induced by this UGM. Then there exists a labelled compression scheme of size k2(k3+1) for C6.

Figure 5. A UG of three vertices with two edges.

Figure 5. A UG of three vertices with two edges.

Proof.

There are 2k2k3 domain points in C6, and VCdim(C6)=k2(k3+1) (Li & Yang, Citation2018). For the UG in Figure , X1X3|X2, and one can use the sample compression scheme of C4 to construct scheme for C6.

For any sample set s, we can divide it into k2 parts according to the value of X2, denoted by s1,s2,,sk2. Note that for the elements in each part si, X2 takes the same value (i1)[k2]. Let the compression sets of s1,s2,,sk2 be s1,s2,,sk2, respectively, where |sn|k3+1, n=1,2,,k2. Then the compression set of s is sn and |sn|k2(k3+1). This ends the proof of Theorem 3.5.

Immediately, we have the following result.

Corollary 3.6

For any UGM whose underlying graph has two cliques K1={X1, X2,,Xn1}, K2={X2,X3,,Xn} with X1[2], then there is a labelled sample compression scheme of size VC dimension of the corresponding induced concept class.

Proof.

Note that we number the vertex that only in K1 first, then vertices in K1K2, and K2K1 finally. Suppose Xi[ki], where kiN,ki2, i=2,,n.

If n1=1 (K1K2=), the two cliques can be viewed as two isolated vertices taking 2 and i=2nki values respectively, and this case reduces to the case of Lemma 3.2. If n12 (K1K2={X2,,Xn1}), this UG can be viewed as the UG in Figure taking 2, i=2n1ki and i=n1+1nki values respectively, and this case reduces to the setting of Theorem 3.5. The conclusion is confirmed.

4. Conclusion and discussion

UGMs have become one of the popular models used for classification in statistical learning. In this paper, we focus on the question whether there exists a sample compression scheme of size VC dimension for the concept class induced by a discrete UGM. We show that for three types of discrete UGMs the answers are positive. The construction of these labelled compression schemes utilizes algebraic characterization of discrete UGMs.

A natural question is whether there are unlabelled sample compression schemes of size VC dimension for these three types of induced concept classes. For a general discrete UGM whether there exists a labelled sample compression scheme of size equal to VC dimension of the induced concept class remains open.

Acknowledgments

We would like to thank the Editor, Managing Editor, the Associate Editor and the reviewer for constructive comments and helpful suggestions.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by National Natural Science Foundation of China (Research Plan in Shaanxi Province [Grant Number 12171382]), the Natural Science Basic Research Plan in Shaanxi Province [Grant Number 2020JM-188], and the Fundamental Research Funds for the Central Universities [Grant Number QTZX23002].

References

  • Anstee, R. P., Rányai, L., & Sali, A. (2002). Shattering news. Graphs and Combinatorics, 18(1), 59–73. https://doi.org/10.1007/s003730200003.
  • Ben-David, S., & Litman, A. (1998). Combinatorial variability of Vapnik–Chervonenkis classes with applications to sample compression schemes. Discrete Applied Mathematics, 86(1), 3–25. https://doi.org/10.1016/S0166-218X(98)00000-6 .
  • Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1987). Occam's razor. Information Processing Letters, 24(6), 377–380. https://doi.org/10.1016/0020-0190(87)90114-1 .
  • Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1989). Learnability and the Vapnik–Chervonenkis dimension. Journal of the ACM, 36(4), 929–965. https://doi.org/10.1145/76359.76371 .
  • Bollobás, B., & A. J. Radcliffe (1995). Defect sauer results. Journal of Combinatorial Theory, Series A, 72(2), 189–208. https://doi.org/10.1016/0097-3165(95)90060-8.
  • Chalopin, J., Chepoi, V., Moran, S., & Warmuth, M. K. (2019). Unlabeled sample compression schemes and corner peelings for ample and maximum classes. ICALP, 34, 1–15. https://doi.org/10.4230/LIPIcs.ICALP.2019.34 .
  • Chepoi, V., Knauer, K., & Philibert, M. (2021). Labeled sample compression schemes for complexes of oriented matroids. arXiv:2110.15168v1.
  • Floyd, S., & Warmuth, M. K. (1995). Sample compression, learnability, and the Vapnik–Chervonenkis dimension. Machine Learning, 21(3), 269–304. https://doi.org/10.1023/A:1022660318680 .
  • Friedman, N. (2004). Inferring cellular networks using probabilistic graphical models. Science, 303(5659), 799–805. https://doi.org/10.1126/science.1094068.
  • Geiger, D., Heckerman, D., King, H. P., & Meek, C. (2001). Stratified exponential families: graphical models and model selection. The Annals of Statistics, 29(2), 505–529. https://doi.org/10.1214/aos/1009210550.
  • Geiger, D., Meek, C., & Sturmfels, B. (2006). On the toric algebra of graphical models. Annals of Statistics, 34(3), 1463–1492. https://doi.org/10.1214/009053606000000263.
  • Haussler, D., Sloan, R., & Warmuth, M. K. (1994). Predicting {0,1} functions on randomly drawn points. Inf. Comput., 115(2), 248–292. https://doi.org/10.1006/inco.1994.1097.
  • Helmbold, D., Sloan, R., & Warmuth, M. K. (1990). Learning nested differences of intersection-closed concept classes. Machine Learning, 5, 165–196. https://doi.org/10.1023/A:1022696716689 .
  • Knauer, K., & Marc, T. (2020). On tope graphs of complexes of oriented matroids. Discrete & Computational Geometry, 63(2), 377–417. https://doi.org/10.1007/s00454-019-00111-z.
  • Kuzmin, D., & Warmuth, M. K. (2007). Unlabeled compression schemes for maximum classes. Journal of Machine Learning Research, 8, 2047–2081. https://doi.org/10.1007/11503415_40.
  • Lauritzen, S. L. (1996). Graphical models. Oxford University Press.
  • Li, B., & Yang, Y. (2018). Complexity of concept classes induced by discrete Markov networks and Bayesian networks. Pattern Recognition, 82, 31–37. https://doi.org/10.1016/j.patcog.2018.04.026.
  • Littlestone, N., & Warmuth, M. K. (1986). Relating data compression and learnability [Unpublished manuscript]. http://www.cse.ucsc.edu/manfred.
  • Marc, T. (2022). Unlabeled sample compression schemes for oriented matroids. arXiv:2203.11535v2.
  • Moran, S., & Warmuth, M. K. (2016). Labeled compression schemes for extremal classes. In ALT (pp. 34–39). Springer. https://doi.org/10.1007/978-3-319-46379-7_3
  • Moran, S., & Yehudayoff, A. (2016). Sample compression schemes for VC classes. In ITA (pp. 1–14). IEEE. https://doi.org/10.1109/ITA.2016.7888187
  • Pálvölgyi, D., & Tardos, G. (2020). Unlabeled compression schemes exceeding the VC-dimension. Discrete Applied Mathematics, 276, 102–107. https://doi.org/10.1016/j.dam.2019.09.022.
  • Pistone, G., Riccomagno, E., & Wynn, H. P. (2001). Algebraic statistics: Computational commutative algebra in statistics. Chapman and Hall/CRC.
  • Rubinstein, B. I. P., & Rubinstein, J. H. (2012). A geometric approach to sample compression. Journal of Machine Learning Research, 13(42), 1221–1261.
  • Settimi, R., & Smith, J. Q. (2000). Geometry, moments and conditional independence trees with hidden variables. Annals of Statistics, 28(4), 1179–1205. https://doi.org/10.1214/aos/1015956712.
  • Warmuth, M. K. (2003). Compressing to VC dimension many points. In Proceedings of the 16th Annual Conference on Learning Theory (pp. 743–744). Springer. https://doi.org/10.1007/978-3-540-45167-9_60