Concept Clustering

Browser-based · No backend

Embedding Model
Clustering Method
Pairwise Parameters
Links pairs with cosine sim ≥ threshold, then finds connected components.
UMAP Hyperparameters
Applied before clustering to reduce noise. Matches config.py CLUSTERING_PARAMS.
Controls how many neighbours UMAP considers. Larger = more global structure.
Minimum distance between embedded points. Smaller = tighter packing.
Dimensionality of UMAP space used as input to HDBSCAN (0 = skip UMAP pre-step, use raw cosine distance).
HDBSCAN Hyperparameters
Optional density floor for cluster selection (0 = disabled, i.e. pure EOM). Equivalent to sklearn's cluster_selection_epsilon.
Minimum concepts to form a cluster.
Minimum neighbours for a core point. Higher = more conservative, more noise.
ELSST Hierarchy Lookup
Search the ELSST thesaurus hierarchy. Each concept’s root→leaf path is embedded using a selectable strategy. Returns the most similar leaf concepts. Duplicate leaves are collapsed to the best-scoring path.
Embedding model for both pre-computed paths and browser query.
How each concept’s path is converted to text before embedding.
Number of similar leaf concepts to return.
Stage 1: bi-encoder retrieves the top-N candidates. Stage 2: ms-marco-MiniLM-L-6-v2 deeply scores each (query, concept) pair and re-orders the list. Slower on first use (model download), but significantly more accurate for nuanced free-text queries.
How many candidates the bi-encoder retrieves before the cross-encoder re-ranks them. Must be ≥ Top-K.
Enter a concept or phrase (case-insensitive). Press Enter to search, Shift+Enter for a newline.
Concepts 0
Ready
Ready