K-Means

class RONAALP.utilities.online_kmeans.K_Means(k=2, tol=0.001, max_iter=300, n_knn=5)[source]

K-means clustering custom class based on skicit-learn version augmented with a sequential (online) update procedure.

Parameters
kint, default = 2

The number of clusters to form as well as the number of centroids to generate.

max_iterint, default=300

Maximum number of iterations of the k-means algorithm for a single run.

tolfloat, default=1e-3

Relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence.

n_knnint, default=5

Number of centroid neighbors to consider when computing the mean inter cluster distance.

Attributes
centroidsndarray of shape (k, n_features)

Coordinates of cluster centers. If the algorithm stops before fully converging (see tol and max_iter), these will not be consistent with labels_.

labels_ndarray of shape (n_samples)

Labels of each point.

deltafloat

Mean of inter cluster distance.

countsndarray of shape (k,)

Number of data point belonging to each cluster.

nearest_CSklearn nearest neighbor object

Nearest neighbor graph fitted on k-means centroid.

fit(data)[source]

Fit kmeans centroids to data using sklearns implementation.

Parameters
datandarray, shape (n_samples, n_features,)

Array of points to divide in k clusters.

predict(data)[source]

Predict within which cluster lie new data.

Parameters
datandarray, shape (n_samples, n_features,)

Array of points to classify.

set_delta(new_delta)[source]

Update delta parameter.

update(new_data)[source]

Sequentially update the clustering using online k-means version.

Parameters
new_datandarray, shape (n_samples2, n_features,)

Array of points to sequentially clusterize.

References

1

Hart, P. E., Stork, D. G., & Duda, R. O. (2000). Pattern classification. Hoboken: Wiley.