A reproducible geospatial baseline for Chincha Norte (Peru) built as a conservation-focused MVP: Sentinel-2 seasonal composites, NDVI/NDWI-based guano-candidate proxy, 2020->2025 delta masks, terrain-derived sensor-siting shortlist, vessel-activity context, and lightweight validation (patch coherence plus visual plausibility overlays). Published with outreach-ready notebook outputs, figures, and GeoJSON artifacts.
A SparkSQL case study on the UCI Online Retail dataset simulating a holdout-style experiment: eligible population selection, deterministic hash-based cohort assignment, conversion lift and demand impact analysis, and group comparison with Welch's t-test. Includes practical caveats around non-causal interpretation, seasonality effects, and heavy-tailed demand.
An ELECTRA model for binary classification to determine if a tweet is about a disaster or not. This implementation is part of an exercise to find the best performing model among an RNN, an MLP, and a transformer-based model.
A repository containing the solutions of all CS224N assignments I made during the 2021 class.
An experimentation project to help choose a SOTA neural model for summarization. The project uses HuggingFace's pre-trained models over the XSum and CNN / Daily Mail datasets. The generated summaries are evaluated intrinsically and extrinsically with ROUGE1 and ROUGEL.
A Spanish-English translator based on a Transformer model. It was trained on Google Cloud with 114k examples and a customized training loop. This is not a notebook but a modular object-oriented project with a simple TUI that allows interaction with the translator.
A prototype to know family-units' water consumption using IoT, time-series databases, and visualization software. This project aims to understand how we use water to reduce our footprint as a way to help against climate change.
A website I made with several objectives in mind: