### Principle of spectral analysis

The main point of this technique is to introduce a useful metric on data set based on the connectivity of points within the graph of data, and also provide coordinates on the data set that reorganize the points according to this metric [

1,

2]. Let

*X* = {

*x*
_{
1
},

*x*
_{
2
},...,

*x*
_{
N
}} be

*N* data points (images), each data

*x*
_{
i
}ϵ

*R*
^{
n
} where

*n* is the dimension of the space data (measures). The first step is to represent the dataset

*X* = {

*x*
_{
1
},

*x*
_{
2
},...,

*x*
_{
N
}} by a weighted symetric graph

*G* = (

*V*,

*E*) where each data point

*x*
_{
i
}corresponds to a node. Two nodes

*x*
_{
i
}and

*x*
_{
j
}are connected by an edge with weight w(x

_{i},x

_{j}) = w(x

_{j},x

_{i}), reflecting the degree of similarity (or affinity) between these two points. The weight

*w*(.,.) describes the first-order interaction between the data points and its choice is application-driven. For instance, in applications where a distance

*d*(.,.) already exists on the data, it is custom to weight the edge between

*x*
_{
i
}and

*x*
_{
j
}by:

where *ε* > 0 is a scale parameter, while other weighting functions can be also used.

Following a classical construction in spectral graph theory and manifold learning, we now create a random walk on the data set

*X* by forming the kernel:

is the degree of node *x*
_{
i
}.

As we have that

*p*(

*x*
_{
i
},

*x*
_{
j
}) ≥ 0 and

the quantity *p*(*x*
_{
i
}, *x*
_{
j
}) ≥ 0 can be interpreted as the probability of random walker to jump from *x*
_{
i
}to *x*
_{
j
}in single time step.

From spectral theory and harmonic analysis we know that the eigenfunctions can be interpreted as a generalization of the Fourier harmonics on the manifold defined by the data points. In our problem, smaller eigenvalues correspond to higher frequency eigenfunctions, and larger eigenvalues correspond to lowers ones.

The eigenvalues and eigenvectors provide embedding coordinates for the set

*X*. The data points can be mapped into Euclidean space via embedding:

The second eigenvector *ψ*
_{2} is known as the Fiedler vector and can be used to order the underlying dataset *X* (segmentation and data reduction). When it is associated with the third eigenvector *ψ*
_{3}, it allows a visualization of the base.