Experience Projects Skills Education Contact
Available for Opportunities

I'M
SAI SANJEEV

ALPHA TL Manager · NIE Mysore · AI/ML Engineer

CS grad specializing in AI/ML. Deployed 6+ production ML models, optimized data pipelines achieving 25% faster throughput, and led cross-functional teams of 10+ members.

6+
ML Projects
8.7
CGPA
200+
Students Led
Sai Sanjeev
ML Engineer
IoT + AI
Professional Journey

Work Experience

Oct 2024

Machine Learning Intern

Plasmid
  • Developed 3 ML models achieving 25% higher prediction accuracy than baseline
  • Collaborated with 4 cross-functional teams to deliver 20% faster project completion
  • Analyzed 50K+ real-world records, identifying 8 key trends that boosted model precision by 15%
May 2025 –
Present

Student TL Manager

ALPHA, NIE Mysuru
  • Organized 6 workshops and 2 hackathons with 200+ student attendees
  • Mentored 15 junior students, guiding 5 AI-driven projects to completion
  • Launched weekly technical discussions, increasing peer participation by 40%
Oct 2025 –
Present

Teaching Assistant

Interdisciplinary PBL · NIE
  • Delivered 20+ lectures to 80 first-year students on project-based learning
  • Advised 12 student teams, helping 10 secure top grades in final presentations
  • Streamlined documentation templates, reducing report preparation time by 30%
Technical Build Log

Featured Projects

Industrial IoT

Modbus-Based SCADA System

A low-cost, production-grade SCADA system for real-time industrial monitoring and remote device control — built entirely with off-the-shelf hardware and open protocols.

Goal

Replace expensive proprietary SCADA setups with a cost-effective alternative using commodity IoT hardware. The system needed to be offline-capable, remotely accessible, and maintainable by non-specialist staff.

Architecture

ESP32 acts as the Modbus Slave, collecting DHT11 temperature/humidity readings into mapped Modbus registers. A Raspberry Pi 5 serves as the Modbus Master, polling the ESP32 every 2 seconds via Modbus RTU over serial (UART). The Pi also runs a lightweight local web server (Flask/Python) that exposes a live dashboard with real-time readings and device control buttons.

Implementation

  • Programmed ESP32 in embedded C to sample DHT11 sensor data and populate Modbus register map (holding registers 40001–40010)
  • Configured Raspberry Pi as Modbus RTU master with automatic reconnection and error-handling for dropped packets
  • Built responsive web dashboard (HTML/JS/Chart.js) showing live time-series graphs and toggle controls for connected relays
  • Integrated Cloudflare Tunnel (cloudflared) for zero-trust remote access — no port forwarding, no public IP exposure
  • Added local data logging to SQLite for offline analytics and audit trails

Outcome & Impact

System runs continuously with 99% data accuracy across 72-hour stress tests. Remote access latency under 200ms via Cloudflare Tunnel. Total hardware cost under ₹4,000 vs. ₹50,000+ for commercial equivalents.

Skills Applied

Modbus RTU/TCP, embedded C, Python (pymodbus, Flask), Cloudflare Tunnel, SQLite, IoT system architecture, real-time dashboard development.

99% data accuracy
<200ms remote latency
₹4K vs ₹50K cost
Offline capable
Cybersecurity

STIX Malware Analyser

An automated ML pipeline for malware detection using structured cyber threat intelligence in STIX format — combining unsupervised anomaly detection with supervised classification for layered defence.

Goal

Automate the process of ingesting raw threat feeds, normalizing them into the STIX 2.1 standard, and applying ML models to detect both known and unknown malware with minimal analyst intervention.

Approach

  • Parsed and normalized threat data (IP indicators, file hashes, attack patterns, TTPs) into STIX 2.1 JSON bundles using the stix2 Python library
  • Engineered features from STIX objects: IP geolocation behavior, file signature entropy, lateral movement patterns, C2 communication frequency
  • Layer 1 — Isolation Forest (unsupervised): flags zero-day and novel anomalies where no labelled data exists
  • Layer 2 — SVM with RBF kernel (supervised): classifies flagged samples against a labelled corpus of known malware families
  • Built a report generation module that outputs STIX Course-of-Action objects with remediation suggestions for each detected threat

Results

Hybrid two-layer approach improved overall detection reliability by 30% over a single-model baseline. False positive rate reduced by 18% compared to Isolation Forest alone. End-to-end pipeline processes a 10,000-indicator feed in under 45 seconds.

Skills Applied

STIX 2.1 standard, Isolation Forest, SVM, feature engineering from threat intelligence data, end-to-end ML pipeline automation, Python (scikit-learn, stix2).

+30% detection reliability
-18% false positives
10K indicators / 45s
Predictive Maintenance

NVMe Level-3 Health Analyzer

A predictive maintenance system for NVMe SSDs that uses SMART telemetry data and XGBoost — tuned via Genetic Algorithm — to forecast drive failures before they occur.

Goal

Provide data centre operators and end users with an early warning system for NVMe drive failure, reducing unplanned downtime and data loss. Target: predict failure with at least 90% accuracy with actionable lead time of 24–72 hours.

Workflow

  • Automated extraction of 15+ SMART attributes: NVMe temperature, read/write error rates, power-on hours, unsafe shutdowns, media errors, wear levelling counts, percentage-used endurance
  • Built a custom feature extraction module to handle missing SMART attributes (vendor-specific registers) using median imputation and flag encoding
  • Derived rolling-window features (7-day and 30-day) to capture degradation trends, not just point-in-time values
  • Trained XGBoost classifier — chosen for its native handling of missing data and structured log data
  • Applied Genetic Algorithm (DEAP library, 200+ generations, tournament selection) to optimize XGBoost hyperparameters: max_depth, learning_rate, subsample, colsample_bytree, n_estimators
  • Compared GA-tuned model against Grid Search and Bayesian Optimization baselines — GA achieved best F1 at comparable compute cost

Performance

Processed 10,000 drive records in 137 seconds end-to-end. Achieved 92% failure prediction accuracy with 89% recall on the minority (failure) class. GA tuning added 4.2% accuracy improvement over default XGBoost hyperparameters.

Skills Applied

XGBoost, Genetic Algorithms (DEAP), feature engineering, predictive maintenance, NVMe/SMART telemetry parsing, model evaluation (precision/recall/F1), Python.

92% failure accuracy
89% recall on failures
10K records / 137s
+4.2% from GA tuning
RAG + LLM

AI Document Query Chatbot

A Retrieval-Augmented Generation (RAG) chatbot that answers user questions from uploaded documents using semantic search — not keyword matching — backed by LLaMA 2 and Chroma DB.

Goal

Enable users to query large documents (PDFs, reports, manuals) in natural language and receive accurate, grounded answers — eliminating hallucination by always anchoring responses to retrieved document chunks.

Architecture

  • Documents ingested and split into 500-token overlapping chunks (50-token overlap) to preserve context at boundaries
  • Each chunk converted to a 768-dim dense vector using Sentence Transformers (all-mpnet-base-v2 model)
  • Vectors stored in Chroma DB (local persistent vector store) with document metadata for source attribution
  • At query time: user query embedded with same Sentence Transformer, cosine similarity search retrieves top-3 most relevant chunks
  • Retrieved chunks + user query assembled into a structured prompt and passed to LLaMA 2 (7B, quantized with GGUF) via LangChain's RetrievalQA chain
  • Source citations returned alongside answers, showing which document sections were used

Optimizations

  • Switched from FAISS to Chroma DB — 30% faster retrieval due to persistent indexing
  • Quantized LLaMA 2 to 4-bit GGUF format — reduced memory footprint from 14GB to 4GB, enabling local CPU inference
  • Implemented query caching for repeated questions — reduced average response time from 2.1s to 1.26s (40% improvement)

Metrics

90% answer accuracy on a held-out evaluation set of 200 QA pairs. Retrieval precision@3 of 88%. Runs fully offline on consumer hardware (8GB RAM, no GPU required).

Skills Applied

RAG architecture, vector databases (Chroma DB), LLM integration, LangChain, Sentence Transformers, GGUF quantization, Python.

90% answer accuracy
40% faster responses
2.1s → 1.26s avg
Runs fully offline
FinTech

Online Payment Fraud Detection

A robust fraud detection pipeline for credit card transactions tackling the real-world challenge of extreme class imbalance (only 0.17% fraud cases in 284,807 transactions).

Goal

Build a production-ready fraud classifier that maintains high precision and recall simultaneously — minimising both missed fraud (false negatives, which cost money) and false alarms (false positives, which hurt customer trust).

Process

  • Dataset: Kaggle European credit card dataset — 284,807 transactions, 492 fraud cases (0.17%), 28 PCA-transformed features + Time + Amount
  • Applied SMOTE (Synthetic Minority Oversampling Technique) to oversample fraud class from 492 → 10,000 synthetic samples, preserving realistic feature distributions
  • Feature engineering: transaction velocity (frequency per hour per card), balance shift ratio, time-since-last-transaction, normalized Amount (RobustScaler to handle outliers)
  • Trained and benchmarked three models: Logistic Regression (baseline), XGBoost, Random Forest with 5-fold stratified cross-validation
  • Optimized prediction threshold (default 0.5 → 0.35) using Precision-Recall curve to maximize F1 on imbalanced test set
  • Built a lightweight inference pipeline: feature transformation → model predict_proba → threshold check → alert flag, achieving 25% faster runtime than naive sklearn pipeline

Outcome

Random Forest outperformed both baselines. Final model: 95% precision, 93% recall, F1-score 0.94 on held-out test set. ROC-AUC of 0.98. Pipeline throughput: 50,000 transactions processed per minute.

Skills Applied

SMOTE, imbalanced learning, threshold optimization, Random Forest, XGBoost, feature engineering, precision/recall tradeoff analysis, Python (scikit-learn, imbalanced-learn).

95% precision
93% recall
0.98 ROC-AUC
25% faster pipeline
Smart Agri · Active 🌱

SOIL — Sustainable Organic Intelligence Layer

An end-to-end smart agriculture system providing real-time soil health monitoring and AI-powered crop/fertilizer recommendations for small-scale farmers — a Govt. of Karnataka Grassroot Innovation 2025 finalist.

Goal

Eliminate the dependence on expensive and infrequent laboratory soil testing for small farmers. Provide continuous, affordable, and actionable soil intelligence directly in the field, reducing input costs and improving yield decisions.

System Components

  • Sensor layer: capacitive soil moisture sensor, analog pH probe (SEN0161), NPK sensor (RS485 Modbus output) — all connected to an ESP32 microcontroller
  • ESP32 aggregates multi-sensor readings every 5 minutes and transmits compressed packets over LoRa (SX1278, 433 MHz) to a central gateway up to 2km away
  • Gateway (Raspberry Pi 4) receives LoRa packets, decodes sensor values, and runs local AI inference — no internet dependency
  • AI model: Random Forest classifier trained on a regional crop-soil dataset (5,000 labelled samples across Karnataka crops) — classifies soil health into 3 categories (Healthy / Nutrient-deficient / Degraded) and recommends suitable crops + fertilizer ratios
  • Results displayed on a local e-ink display at the gateway node and a mobile-optimised web dashboard accessible on the farm's local WiFi

Pilot Results

Tested on 3 farms in the Mysore district over 8 weeks. Reduced manual soil testing time by 80% (from 2 hours per sample trip to 20 minutes per week of dashboard review). Crop recommendations matched expert agronomist advice in 85% of cases. Avg. sensor power draw: 12mA — projected battery life of 6 months on a 10,000mAh pack.

Planned Upgrades

  • Weather API integration (OpenWeather) to factor rainfall forecasts into irrigation advice
  • Auto-irrigation control via relay-controlled solenoid valves triggered by soil moisture thresholds
  • SMS alert gateway for farmers without smartphones

Skills Applied

LoRa communication, ESP32 embedded programming, Modbus RS485, edge AI inference, Random Forest, agricultural domain knowledge, low-power IoT design, Raspberry Pi, Python.

80% less manual testing
85% expert match rate
2km LoRa range
🏆 Gov. Finalist 2025
Technical Toolkit

Core Competencies

Languages
Python (advanced)Java CC++SQL
Web & APIs
HTML / CSSJavaScriptREST APIs
ML / Data Science
Scikit-learnXGBoost PandasNumPy LangChainMatplotlib
Embedded & Hardware
Raspberry PiESP32 ModbusLoRaSensor Integration
Tools & Platforms
Git / GitHubChroma DB CloudflareFigma
Interests
Drone BuildingCompetitive Robotics Open-source ML
Academics & Recognition

Education & Certifications

Degree

B.E. Computer Science (AI & ML)

National Institute of Engineering, Mysuru
Aug 2022 – Present
8.7 / 10
Core: Data Structures · Algorithms · ML · Deep Learning · Embedded Systems
Certifications
🏅 Certificate of Merit – Techkriti'24 (top 5% of 500 teams)
🌱 Finalist – Grassroot Innovation 2025, SOIL project (Govt. of Karnataka)
🤖 Generative AI Specialist – NVIDIA DLI Certified (Sep 2024)
📜 Internship Completion Certificate – Plasmid (Oct 2024)
Get in Touch

Let's Connect

Contact Details

+91 80737 08129
Bellary, Karnataka, India — 583101
Languages
English (professional) Kannada (native) Telugu (native) Hindi (basic)