You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! I have been using the codes recently to analyze text. Yet, I encounter an error when executing the "tutorial_french" file. I have already tried with different versions of Python, but the error (see below) is not solved. I have tried changing all of the parameters in the NarrativeModel, and found out that the issue is the clustering parameter. I tried reinstalling the hdbscan, but the error persists. For the tutorial_english I did not have any problem. I appreaciate your help in this issue.
Thank you in advance!
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[13], line 15
1 from relatio.narrative_models import NarrativeModel
3 m = NarrativeModel(
4 clustering = 'hdbscan',
5 PCA = False,
(...)
12 threshold = 0.3
13 )
---> 15 m.fit(postproc_roles, weight_by_frequency = True)
File ~/anaconda3/envs/adsml/lib/python3.8/site-packages/relatio/narrative_models.py:158, in NarrativeModel.fit(self, srl_res, pca_args, umap_args, cluster_args, weight_by_frequency, progress_bar)
156 print("No fitting required, this model is deterministic!")
157 if self.clustering in ["hdbscan", "kmeans"]:
--> 158 self.fit_static_clustering(
159 srl_res,
160 pca_args,
161 umap_args,
162 cluster_args,
163 weight_by_frequency,
164 progress_bar,
165 )
166 if self.clustering == "dynamic":
167 pass
File ~/anaconda3/envs/adsml/lib/python3.8/site-packages/relatio/narrative_models.py:362, in NarrativeModel.fit_static_clustering(self, srl_res, pca_args, umap_args, cluster_args, weight_by_frequency, progress_bar)
355 if k not in [
356 "min_cluster_size",
357 "min_samples",
358 "cluster_selection_method",
359 ]:
360 args[k] = v
--> 362 hdb = hdbscan.HDBSCAN(**args).fit(self.training_vectors)
363 models.append(hdb)
364 score = hdbscan.validity.validity_index(
365 self.training_vectors.astype(np.float64), hdb.labels_
366 )
File ~/anaconda3/envs/adsml/lib/python3.8/site-packages/hdbscan/hdbscan_.py:1205, in HDBSCAN.fit(self, X, y)
1195 kwargs.pop("prediction_data", None)
1196 kwargs.update(self._metric_kwargs)
1198 (
1199 self.labels_,
1200 self.probabilities_,
1201 self.cluster_persistence_,
1202 self._condensed_tree,
1203 self._single_linkage_tree,
1204 self._min_spanning_tree,
-> 1205 ) = hdbscan(clean_data, **kwargs)
1207 if self.metric != "precomputed" and not self._all_finite:
1208 # remap indices to align with original data in the case of non-finite entries.
1209 self._condensed_tree = remap_condensed_tree(
1210 self._condensed_tree, internal_to_raw, outliers
1211 )
File ~/anaconda3/envs/adsml/lib/python3.8/site-packages/hdbscan/hdbscan_.py:824, in hdbscan(X, min_cluster_size, min_samples, alpha, cluster_selection_epsilon, max_cluster_size, metric, p, leaf_size, algorithm, memory, approx_min_span_tree, gen_min_span_tree, core_dist_n_jobs, cluster_selection_method, allow_single_cluster, match_reference_implementation, **kwargs)
820 elif metric in KDTREE_VALID_METRICS:
821 # TO DO: Need heuristic to decide when to go to boruvka;
822 # still debugging for now
823 if X.shape[1] > 60:
--> 824 (single_linkage_tree, result_min_span_tree) = memory.cache(
825 _hdbscan_prims_kdtree
826 )(
827 X,
828 min_samples,
829 alpha,
830 metric,
831 p,
832 leaf_size,
833 gen_min_span_tree,
834 **kwargs
835 )
836 else:
837 (single_linkage_tree, result_min_span_tree) = memory.cache(
838 _hdbscan_boruvka_kdtree
839 )(
(...)
849 **kwargs
850 )
File ~/anaconda3/envs/adsml/lib/python3.8/site-packages/joblib/memory.py:349, in NotMemorizedFunc.__call__(self, *args, **kwargs)
348 def __call__(self, *args, **kwargs):
--> 349 return self.func(*args, **kwargs)
File ~/anaconda3/envs/adsml/lib/python3.8/site-packages/hdbscan/hdbscan_.py:265, in _hdbscan_prims_kdtree(X, min_samples, alpha, metric, p, leaf_size, gen_min_span_tree, **kwargs)
260 core_distances = tree.query(
261 X, k=min_samples + 1, dualtree=True, breadth_first=True
262 )[0][:, -1].copy(order="C")
264 # Mutual reachability distance is implicit in mst_linkage_core_vector
--> 265 min_spanning_tree = mst_linkage_core_vector(X, core_distances, dist_metric, alpha)
267 # Sort edges of the min_spanning_tree by weight
268 min_spanning_tree = min_spanning_tree[np.argsort(min_spanning_tree.T[2]), :]
File hdbscan/_hdbscan_linkage.pyx:55, in hdbscan._hdbscan_linkage.mst_linkage_core_vector()
File hdbscan/_hdbscan_linkage.pyx:165, in hdbscan._hdbscan_linkage.mst_linkage_core_vector()
TypeError: 'float' object cannot be interpreted as an integer
The text was updated successfully, but these errors were encountered:
Hi there,
This is a general bug on scikit-learn's side related to Cython versions.
See the issue here: scikit-learn-contrib/hdbscan#600
We will have to let them figure this out on their side before we can do anything about it.
In the meantime, I would suggest relying on the KMeans algorithm.
Sorry for the inconvenience!
Best,
Germain
Hello! I have been using the codes recently to analyze text. Yet, I encounter an error when executing the "tutorial_french" file. I have already tried with different versions of Python, but the error (see below) is not solved. I have tried changing all of the parameters in the NarrativeModel, and found out that the issue is the clustering parameter. I tried reinstalling the hdbscan, but the error persists. For the tutorial_english I did not have any problem. I appreaciate your help in this issue.
Thank you in advance!
The text was updated successfully, but these errors were encountered: