Senior Data Engineer (Bioinformatics)

Location Cracow & Remote

Region / State / Province Kraków; Remote

Offer description

At Ardigen, we apply the power of Artificial Intelligence to accelerate the development of precision therapies.

We are seeking a highly motivated and entrepreneurial person to join our services team as a Senior Data Engineer (Bioinformatics). This role is ideal for someone experienced in developing, refactoring, and managing scalable data pipelines, specifically for genomic data ingestion and processing. The selected candidate will primarily focus on designing, building, and improving AWS Glue-based ETL pipelines for GWAS datasets, ensuring data quality, robustness, and efficiency. The role will require strong collaboration with bioinformatics teams to integrate biological knowledge into computational workflows. This individual must be able to communicate effectively in both formal and informal settings and work well within a team.

Let's grow together & code against cancer!

Key duties and responsibilities

Develop, maintain, and enhance AWS Glue-based ETL pipelines for GWAS dataset ingestion and processing using PySpark and Python.
Refactor existing data pipelines to improve efficiency, reliability, maintainability, and scalability, including implementing unit and integration tests.
Troubleshoot, debug, and resolve pipeline-related issues and bugs.
Collaborate closely with the Human Genetics team to integrate biological context and scientific rigor into data engineering workflows.
Utilize SQL and R for data analysis, querying, and validation.
Maintain databases, perform optimization, and ensure data consistency and integrity.
Create and maintain clear documentation and reports.
Facilitate discussions with internal teams and clients to define requirements and deliver solutions aligned with biological research needs.

Requirements

Master's or Ph.D. in Bioinformatics, Computational Biology, Computer Science, Data Engineering, or a related discipline.
At least 3-4 years of relevant work experience as a data engineer, bioinformatics engineer, or computational biologist with significant exposure to data pipeline engineering.
Proven experience in developing and managing AWS Glue ETL pipelines, PySpark scripts, and data processing workflows.
Solid knowledge of AWS cloud services for data engineering (EMR, EMR Serverless, Batch, AWS MWAA).
Strong proficiency in Python, SQL, Bash scripting, and R.
Familiarity with database management systems and SQL optimization techniques.
Biological knowledge or experience working with genomic data (GWAS, genomic variation, sequencing data).
Experience with cloud infrastructure (AWS) and tools like Nextflow or Snakemake.
Excellent communication skills in English, both written and verbal.

You get extra points for:

Hands-on experience with projects in Data Science, Machine Learning, Big Data, and Data Mining
Contributions to scientific publications and/or demonstrated presentation skills

We offer

Flexible working hours
Employee Stock Option Plan
Mental health support (HearMe Platform)
English classes
Funding for professional development, training, and an internal mentoring program
The opportunity to not just code, but to code with a purpose to make a difference - making a meaningful impact through your daily work #CodeAgainstCancer
Private medical care
Multisport card