Research Focus
This project studies whether synthetic data generated from chest X-rays can improve tuberculosis classification workflows and support privacy-preserving alternatives when real clinical data is limited.
The work compares two generative approaches:
- DCGAN
- Stable diffusion fine-tuned from RoentGen-v2 and then adapted to tuberculosis chest X-ray data
Methodology
The full pipeline was designed around both generation quality and downstream utility. Rather than stopping at sample inspection, the project evaluates how synthetic data changes classifier performance under controlled training regimes.

Evaluation Design
The repo README and figures show a stronger research story than the résumé summary alone:
- Structural and diversity metrics were tracked with SSIM and LPIPS.
- A foundation-model distance based on RAD-DINO embeddings was proposed to measure disease-structure realism.
- DenseNet-121 was evaluated across fixed-size, augmented fixed-budget, and augmented scaled-budget regimes.
- Train-synthetic/test-real comparisons were used to probe whether diffusion outputs could act as privacy-preserving surrogate training data.
Key Findings
Diffusion consistently outperformed GAN generation in structural coherence and downstream usefulness. The most important result was not simply that diffusion looked better, but that it preserved clinically useful signal under downstream classification more reliably than GAN-based samples.


Engineering Work
This repo is not just a paper artifact. It includes training code, downstream experiment scripts, evaluation tooling, data outputs, and environment setup details for cluster-based experimentation. That makes it a strong portfolio research entry because the contribution is both analytical and infrastructural.
Why It Matters
The project connects generative modeling, evaluation methodology, and deployment-minded engineering. It demonstrates how to frame a research problem as a reproducible system instead of a one-off notebook experiment.