Text Mining Systematic Reviews
About This Workshop
This workshop teaches you how to build a complete, transparent pipeline for extracting structured knowledge from published scientific papers. You will go from a search query all the way to a visual causal knowledge graph, using only deterministic models, meaning every result is traceable back to an exact sentence in an exact paper.
The workshop is built around a real tool: rasoultilburg/SocioCausaNet, a model specifically trained to detect causal claims in social science text and extract the cause-effect pairs from them.
What You Will Learn
📥 Module 1
Set up API keys and use the eScience Center package to bulk-download papers on any topic you choose.
📄 Module 2
Tour the major PDF-to-text tools and convert your downloaded corpus into clean, usable text.
✂️ Module 3
Split paper text into sentences and run the light preprocessing steps needed to feed them into the model.
🤗 Module 4
Explore the HuggingFace Hub, understand model types, and run your first inference in a few lines of code.
🕸️ Module 5
Run SocioCausaNet, harmonize the extracted social science constructs, and build a queryable causal knowledge graph.
See the Schedule for detailed topics and code examples for each module.
Who Should Come
This workshop is for:
- Social science PhD students who want to make their literature reviews faster and more systematic
- Researchers interested in automating qualitative reviews
- Anyone who wants to extract and visualize the causal theory embedded in a field’s published literature
- People curious about NLP tools but with no background in machine learning
You need basic Python familiarity. Nothing else is assumed.
This Workshop Is Relevant For
- Knowledge extraction from published papers
- Automating qualitative systematic reviews
- Transparent and reproducible text analysis
- Constructing theory from published causal claims
- Creating a nomological map for a research area