projects

software, datasets, and research tools.

software

project thumbnail

CESSC

causal and non-causal sentence classification dataset and model

CESSC provides a curated dataset and a fine-tuned BERT-based model for binary classification of causal and non-causal sentences within social science texts. The work is connected to the paper (Norouzi et al., 2025).

1,000 manually annotated sentences, supplementary machine-labeled sentences, scripts for model fine-tuning and evaluation, and benchmark results.

project thumbnail

SocioCausaNet

Multi-task BERT model for joint causal extraction from text

SocioCausaNet is a fine-tuned BERT-based multi-task model that jointly extracts causal relationships from text. It performs three tasks simultaneously: classifying whether sentences contain causal claims, identifying cause and effect spans via BIO tagging, and linking cause-effect pairs with typed relations. The model handles complex patterns including one-to-many and many-to-many cause-effect structures.

The model is used in production by the MetaCheck tool on ScienceVerse for evaluating randomization and causal claims in scientific reports. Training data includes expert-annotated sentences and the model supports multiple prediction strategies with adjustable confidence thresholds.

project thumbnail

Social Science Construct Harmonization

Benchmarking ML models for merging heterogeneous social science concepts

Social Science Construct Harmonization evaluates machine learning models for concept integration — merging heterogeneous social science terms like “Insomnia” and “Sleeping Disorders” into standardized, unified constructs. The project uses a factorial experimental design testing five vector representation models against three harmonization strategies, with an evaluation framework assessing performance, reliability, and bias.

The companion webapp is a fully client-side tool for clustering conceptual terms and mapping them to the ELSST thesaurus hierarchy. It offers three techniques:

  • Pairwise and HDBSCAN (with UMAP dimensionality reduction): with default best-calibrated hyperparameters, these methods cluster related constructs by semantic similarity directly in the embedding space (powered by All-MPNet-Base-v2).
  • ELSST Lookup: based on the ELSST thesaurus, this mode standardizes a user’s input query and maps it to the most similar concept within the taxonomy, with an optional two-stage cross-encoder re-ranking pipeline using ms-marco-MiniLM-L-6-v2 for more accurate matching.
project thumbnail

Paper Hunter

Human-in-the-loop bulk paper downloader using DOIs, forked from the eScience project

Paper Hunter is a human-in-the-loop tool for bulk-downloading academic papers by DOI, created as a fork in collaboration with the eScience project. It provides a straightforward browser-based interface where researchers paste a list of DOIs and receive direct download links, with smart handling for both open-access and paywalled content.

research

project thumbnail

Research Tools in Progress

DAG generation and Meta-Science RAG prototypes

Several research tools are currently in development:

  • DAG Generator: automated generation of directed acyclic graphs from causal texts.
  • Meta-Science RAG: a retrieval-augmented generation pipeline for supporting theory-building in the social sciences.

These entries are placeholders for ongoing work and will be expanded as the tools become ready to share.

fun

project thumbnail

Audio Compress

browser-based tool to compress and split audio files

A browser-based tool to compress, split, or both — process audio files directly in the browser. Supports MP3, M4A, OGG, FLAC, WAV, and WebM formats. No uploads to any server, everything runs client-side via Web Audio API.

project thumbnail

Sub Translator

SRT subtitle translation tool that runs in the browser

SRT subtitle translator that runs entirely client-side. Upload .srt files, choose source/target languages, and batch-translate via DeepSeek, Google Gemini, or OpenRouter. Side-by-side preview with individual or bulk .zip download.

Live app: rasoulnorouzi.github.io/sub_translator