LinkedIn Profile|Lattes CV|Orcid Profile
Bioinformatician and Data Scientist with +7 years of experience in Computational Biology.
M.Sc. in Genetics (UFMG). I specialize in Data Architecture for biological systems, leveraging Python, R, SQL, and Bash to build reproducible analysis pipelines.
My work bridges the gap between molecular biology and data science, handling complex datasets such as eDNA metabarcoding and variant calling. I focus on statistical rigor, data visualization, and software development to drive biological discovery in any organism or context.
This project evaluates the efficacy of Environmental DNA (eDNA) metabarcoding for monitoring fish biodiversity in the Cipó River basin. We compared eDNA results with traditional survey methods (electrofishing and netting) to assess taxonomic coverage, alpha/beta diversity, and the potential of eDNA as a non-invasive monitoring tool.
tidyverse (Data manipulation), vegan (Ecological statistics), phyloseq (Sequence manipulation), ggplot2 (Visualization).Our bioinformatics pipeline processes raw high-throughput sequencing data (HTS) to identify fish species with high confidence. The workflow involves strict quality control, contamination filtering, and taxonomic assignment using BLASTn against a curated reference database.
The analysis was performed in R using a custom pipeline integrating dada2, phyloseq, and vegan.
| Stage | Description |
|---|---|
| 1. Raw Screening | Initial filtering of ASVs by length (120-240bp) and target class (Actinopteri). Removal of contaminants found in extraction/PCR controls. |
| 2. Curation | Validation of taxonomic assignments. Ambiguous hits (e.g., Brycon aff.) were manually corrected based on local biogeography. |
| 3. Comparison | Integration with traditional survey data to compare species richness and composition across 7 sampling sites (SC1–SC7). |
eDNA metabarcoding consistently detected equal or higher species richness (Observed) compared to traditional methods across most sampling sites.
[chart:76]
Figure 1: Comparison of observed species richness between eDNA metabarcoding and traditional methods across sampling sites.
We generated compositional heatmaps to visualize the relative abundance of detected taxa. The analysis revealed that eDNA captures cryptic and rare species often missed by traditional gear.
(Note: High-resolution heatmaps and PCoA ordination plots are available in the results folder of the repository).
The repository is organized to ensure reproducibility:
scripts/: Contains the main analysis file (eDNA_cipo_script.qmd) and helper functions.data/: Raw and curated datasets (metabarcoding + traditional).results/: Generated figures and statistical tables.If you use this code or data, please cite the associated publication (in prep). For questions, please open an issue in the GitHub repository.