
Bob Hosseini, PhD
Senior Data Scientist | GenAI & ML Systems | Team Lead
Designing AI systems that scale and teams that ship.
What I Do
I'm passionate about pushing AI beyond traditional boundaries, bridging state-of-the-art concepts with scalable engineering solutions. I design end-to-end AI systems, from backend integration to intuitive frontend experiences, delivering impactful products like automated call summarization agents, intelligent CRM tools, and enterprise-scale recommendation systems.
GenAI Systems
Lead development of GenAI systems using LLMs, RAG, and agentic pipelines
ML Architecture
Architect ML solutions from experimentation to scalable production
Team Leadership
Mentor data teams and implement best practices for ML/AI delivery
Data Strategy
Drive data product strategy and cross-functional execution in e-commerce
Problem Solving
Translate complex business problems into real-world AI products
๐ ๏ธ Skills & Technical Expertise
AI & ML Stack
Cloud & DevOps
Software & Engineering
๐ข Real-World AI Systems (Professional)
LLM-Based Call Summarization
Problem:
Manual call logging was time-consuming and error-prone.
Solution:
Developed a generative AI system summarizing 2,000+ calls/month using LLMs and audio transcripts.
Impact:
Increased productivity and customer satisfaction.
Data Analytics Assistant
Problem:
Business teams lacked real-time insights from complex data.
Solution:
Built an LLM-powered dashboard enabling natural-language querying of business metrics.
Impact:
Accelerated insight generation and decision-making.
Customer Churn Prediction
Problem:
Retention teams struggled to identify at-risk customers.
Solution:
Built predictive models using behavior and transaction data.
Impact:
Reduced churn by 20% via proactive outreach.
Product Recommendation Engine
Problem:
Users struggled to find relevant products in a large portfolio.
Solution:
Developed collaborative and content-based recommendation system for a 55K-product catalog.
Impact:
Boosted sales and improved user engagement.
๐ง Generative AI Projects (Personal)

๐ Two-Stage RAG for Document QA
Problem:
Traditional RAG systems either suffer from poor precision or high compute overhead. Enterprises waste resources on retrieving irrelevant document chunks during semantic search.
Data:
Enterprise-grade PDFs and document collections that demand scalable and precise retrieval in QA workflows.
Solution:
Architected a two-stage retrieval pipeline:
- Stage 1: Fast keyword search (BM25) on fine-grained chunks for high-recall filtering.
- Stage 2: Cross-Encoder reranker for semantic scoring and Sentence Transformer embeddings over coarse-grained chunks.
Powered by LangChain, ChromaDB, and Docker, with support for OpenAI and local LLaMA models (via llama-cpp-python).
Impact:
- ๐ 75% reduction in retrieval compute
- โก Lower latency and higher precision across QA tasks
- ๐ก Designed for real-world document QA and scalable cloud or local deployment
- ๐บ Interactive Streamlit demo for stakeholder engagement
Example Use:
๐ "What were the key risk factors mentioned for tech stocks in Q2 2024?" โ Precise retrieval of relevant document sections with semantic context.

๐งฌ LLM Agents for Clinical Trials
Problem:
Matching patients to clinical trials is complex, time-consuming, and prone to regulatory risks, often requiring human review of hundreds of eligibility criteria.
Data:
Clinical trial criteria and patient profiles, with structured and unstructured medical data.
Solution:
Built an agentic LLM workflow using LangGraph to automate trial eligibility screening, hallucination detection, and compliance checks. Integrated human-in-the-loop review and tool calling, with modular backend and vector search.
Impact:
- โ Live MVP deployed with real-time UI and backend orchestration
- โก Improved speed and consistency of trial matching
- ๐ผ Reduced manual effort in preliminary screening
- ๐ก๏ธ Privacy-safe design with structured redaction
- ๐ Extensible to domains like insurance or finance
Example Use:
๐งช "Is Patient 15 eligible for any trials?" โ Response with rationale, matched trials, and policy check.

๐ LLM Tutorials & Applications
A collection of practical LLM architectures and end-to-end notebooks featuring carefully selected case studies across domains like healthcare, customer support, and product search. Includes RAG, tool-using agents, clinical trial retrieval, chatbot workflows, and document QA with real-world data sources.
Tutorials:
๐ Machine Learning & Data Science Projects (Personal)

๐ Social Sphere: Student Behavior Analytics
Problem:
Educators and psychologists seek to understand how digital behavior affects students' mental health, relationships, and academic performance.
Data:
Survey of ~700 students aged 16โ25 across multiple countries (Kaggle Q1 2025 dataset). Features include screen time, platform usage, conflicts, sleep, and well-being.
Solution:
Led this SuperDataScience community project to predict social-media addiction scores and relationship conflicts. Conducted comprehensive data exploration to uncover key behavioral patterns and insights.
Impact:
- Accurately predicted student addiction levels with 1% error
- Flagged at-risk relationship conflicts with 99% sensitivity (Recall)
- Revealed critical insights into student behavior, such as the impact of daily screen time and platform usage on mental health and relationships

๐ Energy Forecasting with SARIMAX
Problem:
Facility managers and sustainability teams need reliable short-term energy forecasts to optimize planning and reduce costs. However, real-world energy consumption is volatile and only weakly correlated with exogenous drivers like temperature.
Data:
Synthetic building energy dataset (Kaggle) with hourly and daily electricity usage, temperature, humidity, and occupancy variables.
Solution:
Built a time-series forecasting pipeline using SARIMAX, including time-series CV, ADF tests, and outlier detection for robust preprocessing. Simulation of noisy exogenous inputs via random walks to mimic real-world uncertainty. SARIMAX trained and benchmarked against ARIMA, achieving Rยฒ โ 0.33 despite injected noise.

๐ฌ Data Science & ML Mini Tasks
Single-notebook projects showcasing applied machine learning and data science, including various prediction and recommendation tasks across different domains.
Sub-Projects:
โ๏ธ Writing
๐ Interpretable AI: Explaining Predictions with SHAP
An exploration of how SHAP transforms black-box models into explainable systems by quantifying feature impact on individual predictions, with real-world examples from student mental health analytics and medical risk assessment.
๐ Why AI ROI Still Fails, Even With the Right Talent?
A data-backed reflection on why enterprise AI projects often underdeliver, despite surging user adoption and engineering talent. Emphasizes the need for product-thinking and clear business alignment from Day 1.
๐ก๏ธ Guardrails in LLM Apps
Strategies for implementing ethical safeguards, ensuring compliance, and enhancing security in Large Language Model applications.
โ๏ธ LLM Model Selection and Updates
Guidelines for selecting appropriate Large Language Models and managing their updates to balance quality, cost, and scalability in AI applications.
๐ Two-Stage RAG for Document QA
An innovative approach to document-based question answering using a two-stage retrieval strategy to enhance precision and scalability in Retrieval-Augmented Generation systems.
๐ท Data Engineers: The Unsung Heroes Behind AI
An exploration of the pivotal role data engineers play in AI development, emphasizing their contributions to data quality, infrastructure, and the overall success of data science teams.