APPLIED AI ENGINEER
My focus is on bridging the gap between research and production. I architect systems that optimize for latency, deployment constraints, and reliability in the real world.
We needed to process 5 concurrent 1080p video streams on edge hardware. The initial PyTorch implementation had high latency (150ms), causing frame drops.
I migrated the inference to the OpenVINO runtime and applied INT8 quantization. I also replaced batch processing with an asynchronous streaming architecture to manage memory usage effectively.
Standard dense vector search struggled with queries requiring exact matches, such as specific part numbers or error codes.
I implemented a hybrid search system combining dense vectors with BM25 keyword retrieval. I also built a fallback mechanism to switch providers automatically if API rate limits were triggered.
Synchronous processing of large documents was blocking the main application thread, leading to timeouts.
I decoupled the ingestion process using FastAPI background tasks for lightweight updates and Celery workers for CPU-intensive OCR tasks. This ensured the application remained responsive under load.
This research focused on optimizing vehicle detection accuracy under challenging lighting conditions.
I implemented the training pipeline and defined evaluation metrics on a custom annotated dataset. The findings highlighted specific failure cases relevant to deploying vision systems in real-world environments.
I write to clarify my thinking on system design and data strategy.