Test Beam Data Pipeline Pipeline de Datos de Haz de Pruebas

Project Overview

Role: Developer & Analyst Duration: 2021-Present Data Volume: TB-scale per campaign

The Challenge

Process and analyze TB-scale datasets from particle beam experiments with quick turnaround to guide ongoing data-taking decisions.

Solution

Developed an automated analysis pipeline featuring:

  • Real-time monitoring: Data quality checks during beam time
  • Parallel processing: Distributed computing on GRID infrastructure
  • Automated reporting: Daily summaries and visualizations
  • Version control: Reproducible analysis framework

GRID computing usage statistics

Technical Stack

Python ROOT Pandas NumPy Git Bash HTCondor GRID Computing

Results

  • Reduced analysis turnaround from weeks to days
  • Reproducible results across different test beam campaigns
  • Framework adopted by collaborating institutions
  • Enabled quick feedback for detector optimization

Industry Relevance

Skills directly applicable to:

  • Data Engineering: Building robust data pipelines
  • DevOps: Automation and workflow orchestration
  • Scientific Computing: Statistical analysis and visualization
  • Distributed Systems: GRID/cluster computing experience