LinkedIn Profile|Lattes CV|Orcid Profile
Bioinformatician and Data Scientist with +7 years of experience in Computational Biology.
M.Sc. in Genetics (UFMG). I specialize in Data Architecture for biological systems, leveraging Python, R, SQL, and Bash to build reproducible analysis pipelines.
My work bridges the gap between molecular biology and data science, handling complex datasets such as eDNA metabarcoding and variant calling. I focus on statistical rigor, data visualization, and software development to drive biological discovery in any organism or context.
This project implements a pipeline developed to address the exercise of group 2 “somatic variants”. The objective is to perform somatic variant filtering and preparation for Cancer Genome Interpreter (CGI) analysis.
The script developed is available at /variantes_somaticas/script_variantes_somaticas.sh.
Filters VCF files to retain only clinically relevant somatic variants
Applies filters based on population frequency (MAX_AF ≤ 0.01), impact (excluding LOW), clinical significance (pathogenic, not benign), and gene panel matching
Generates TSV files formatted for CGI interpretation
Processes multiple samples in batch mode
Removes variants without consequence annotations (CSQ)
Filters variants using Ensembl VEP criteria
Extracts and formats variant data into TSV with specific columns
Ensures adequate sequencing depth (DP≥20) and allele frequency (AF≥0.1)
The pipeline prepares somatic variant data from annotated VCFs into analysis-ready formats for cancer genomics interpretation.
git clone https://github.com/gbrl-mendes/variantes_somaticas.git
cd variantes_somaticas/ && vi script_variantes_somaticas.sh
# Define main directories
BCFTOOLS_DIR="/home/gabriel/downloads/bcftools-1.21"
FILTER_VEP_DIR="/home/gabriel/downloads/ensembl-vep"
PROJECT_DIR="/home/gabriel/projetos/variantes_somaticas"
INPUT_DIR="/home/gabriel/projetos/variantes_somaticas/liftOver-hg38-MF-annotVep"
GENES_FILE="/home/gabriel/projetos/variantes_somaticas/genes/myelofibrosis.txt"
/variantes_somaticas$ ./script_variantes_somaticas.sh
The script executes all the filtering steps of the VCF file and the creation of the TSV files so that the results can be interpreted in the Cancer Genome Interpreter (CGI).
The annotation results are made available in the /variantes_somaticas/output directory.
For more information, contact me through my e-mail 😊