The Hopkins statistic (introduced by Brian Hopkins and John Gordon Skellam) is a way of measuring the cluster tendency of a data set. It belongs to the family of sparse sampling tests. It acts as a statistical hypothesis test where the null hypothesis is that the data is generated by a Poisson point process and are thus uniformly randomly distributed. If individuals are aggregated, then its value approaches 0, and if they are randomly distributed along the value tends to 0.5.

Preliminaries

A typical formulation of the Hopkins statistic follows.

Let X {\displaystyle X} be the set of n {\displaystyle n} data points.
Generate a random sample X {\displaystyle {\overset {\sim }{X}}} of m n {\displaystyle m\ll n} data points sampled without replacement from X {\displaystyle X} .
Generate a set Y {\displaystyle Y} of m {\displaystyle m} uniformly randomly distributed data points.
Define two distance measures,
u i , {\displaystyle u_{i},} the minimum distance (given some suitable metric) of y i Y {\displaystyle y_{i}\in Y} to its nearest neighbour in X {\displaystyle X} , and
w i , {\displaystyle w_{i},} the minimum distance of x i X X {\displaystyle {\overset {\sim }{x}}_{i}\in {\overset {\sim }{X}}\subseteq X} to its nearest neighbour x j X , x i x j . {\displaystyle x_{j}\in X,\,{\overset {\sim }{x_{i}}}\neq x_{j}.}

Definition

With the above notation, if the data is d {\displaystyle d} dimensional, then the Hopkins statistic is defined as:

H = i = 1 m u i d i = 1 m u i d i = 1 m w i d {\displaystyle H={\frac {\sum _{i=1}^{m}{u_{i}^{d}}}{\sum _{i=1}^{m}{u_{i}^{d}} \sum _{i=1}^{m}{w_{i}^{d}}}}\,}

Under the null hypotheses, this statistic has a Beta(m,m) distribution.

Notes and references

External links

  • http://www.sthda.com/english/wiki/assessing-clustering-tendency-a-vital-issue-unsupervised-machine-learning

GitHub romusters/hopkins Hopkins statistic for determining cluster

Hopkins statistic of clustering tendency for knearestneighbor

GitHub lohith0501/HopkinsStatisticsForClusterAnalysis The

RKI und Johns Hopkins Wie kommen die unterschiedlichen Zahlen zustande

The Hopkins Notebook on Behance