-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPKG-INFO
110 lines (88 loc) · 3.88 KB
/
PKG-INFO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
Metadata-Version: 2.1
Name: lshkrepresentatives
Version: 1.0.2
Summary: A python package for LSH-k-Representatives algorithm
Home-page: https://github.com/nmtoan91/lshkrepresentatives
Author: nmtoan91
Author-email: [email protected]
License: UNKNOWN
Description: Python implementations of the LSH-k-Representatives algorithms for clustering categorical data:
Different from k-Modes algorithm, LSH-k-Representatives define the "representatives" that keep the frequencies of all categorical values of the clusters.
## Installation:
### Using pip:
```shell
pip install lshkrepresentatives
```
### Import the packages:
```shell
import numpy as np
from LSHkRepresentatives.LSHkRepresentatives_Init import LSHkRepresentatives_Init
from LSHkRepresentatives.LSHkRepresentatives_Full import LSHkRepresentatives_Full
```
### Generate a simple categorical dataset:
```shell
X = np.array([[0,0,0],[0,1,1],[0,0,0],[1,0,1],[2,2,2],[2,3,2],[2,3,2]])
y = np.array([0,0,0,0,1,1,1])
```
### LSHk-Representatives (Init):
```shell
kreps = LSHkRepresentatives_Init(X,y,n_init=5,n_clusters=2 ,verbose=3)
kreps.SetupLSH()
kreps.DoCluster()
```
### Built-in evaluattion metrics:
```shell
kreps.CalcScore()
```
### Out come:
```shell
Generating disMatrix for DILCA
Saving DILCA to: saved_dist_matrices/json/DILCA_None.json
Generating LSH hash table: hbits: 2(4) k 2 d 3 n= 7
LSH time: 0.016015699999999633 Score: 6.333333333333334 Time: 0.0019595600000000602
Purity: 1.00 NMI: 1.00 ARI: 1.00 Sil: 0.59 Acc: 1.00 Recall: 1.00 Precision: 1.00
```
### LSHk-Representatives (Full):
```shell
kreps = LSHkRepresentatives_Full(X,y,n_init=5,n_clusters=2 ,verbose=3)
kreps.SetupLSH()
kreps.DoCluster()
```
### Built-in evaluattion metrics:
```shell
kreps.CalcScore()
```
### Out come:
```shell
SKIP LOADING distMatrix because: True bd=None
Generating disMatrix for DILCA
Saving DILCA to: saved_dist_matrices/json/DILCA_None.json
Generating LSH hash table: hbits: 2(4) k 2 d 3 n= 7
n_group=2 Average neighbors:1.0
LSH time: 0.00661619999999985 Score: 6.333333333333334 Time: 0.000932080000000024
Purity: 1.00 NMI: 1.00 ARI: 1.00 Sil: 0.59 Acc: 1.00 Recall: 1.00 Precision: 1.00
```
## Parameters:
X: Categorical dataset\
y: Labels of object (for evaluation only)\
n_init: Number of initializations \
n_clusters: Number of target clusters\
max_iter: Maximum iterations\
verbose: \
random_state:
If the variable MeasureManager.IS_LOAD_AUTO is set to "True": The DILCA will get the pre-caculated matrix
## Outputs:
cluster_representatives: List of final representatives\
labels_: Prediction labels\
cost_: Final sum of squared distance from objects to their centroids\
n_iter_: Number of iterations\
epoch_costs_: Average time for an initialization
## References:
T. N. Mau and V.-N. Huynh, ``An LSH-based k-Representatives Clustering Method for Large Categorical Data." Neurocomputing,
Volume 463, 2021, Pages 29-44, ISSN 0925-2312, https://doi.org/10.1016/j.neucom.2021.08.050.
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown