Bob Hosseini, PhD

Senior Data Scientist | GenAI & ML Systems | Team Lead

Designing AI systems that scale and teams that ship.

Let's Connect View Work

What I Do

I'm passionate about pushing AI beyond traditional boundaries, bridging state-of-the-art concepts with scalable engineering solutions. I design end-to-end AI systems, from backend integration to intuitive frontend experiences, delivering impactful products like automated call summarization agents, intelligent CRM tools, and enterprise-scale recommendation systems.

GenAI Systems

Lead development of GenAI systems using LLMs, RAG, and agentic pipelines

ML Architecture

Architect ML solutions from experimentation to scalable production

Team Leadership

Mentor data teams and implement best practices for ML/AI delivery

Data Strategy

Drive data product strategy and cross-functional execution in e-commerce

Problem Solving

Translate complex business problems into real-world AI products

🛠️ Skills & Technical Expertise

🧠

AI & ML Stack

Large Language Models (LLMs) Retrieval-Augmented Generation (RAG) Agentic Pipelines Model Context Protocol (MCP) NLP & Semantic Search Transformers LangChain LlamaIndex OpenAI APIs Vector Databases Scikit-learn TensorFlow PyTorch

☁️

Cloud & DevOps

AWS Azure Google Cloud (GCP) Docker CI/CD Airflow MLflow Jenkins

💻

Software & Engineering

Python C++ FastAPI SQL Pydantic Git REST APIs Streamlit Gradio Data Pipelines Backend Design Frontend Integration

🏢 Real-World AI Systems (Professional)

📞

LLM-Based Call Summarization

Problem:

Manual call logging was time-consuming and error-prone.

Solution:

Developed a generative AI system summarizing 2,000+ calls/month using LLMs and audio transcripts.

Impact:

Increased productivity and customer satisfaction.

OpenAI LangChain AWS Lambda AWS S3 AWS CloudWatch AWS SAM Prompt Engineering SQL CLI Bash Audio Preprocessing

📊

Data Analytics Assistant

Problem:

Business teams lacked real-time insights from complex data.

Solution:

Built an LLM-powered dashboard enabling natural-language querying of business metrics.

Impact:

Accelerated insight generation and decision-making.

LangChain Google Cloud LangGraph SQL Pandas Prompt Engineering Data Visualization Streamlit

🔄

Customer Churn Prediction

Problem:

Retention teams struggled to identify at-risk customers.

Solution:

Built predictive models using behavior and transaction data.

Impact:

Reduced churn by 20% via proactive outreach.

XGBoost Scikit-learn SHAP Google Cloud MLflow Data Visualization

🛒

Product Recommendation Engine

Problem:

Users struggled to find relevant products in a large portfolio.

Solution:

Developed collaborative and content-based recommendation system for a 55K-product catalog.

Impact:

Boosted sales and improved user engagement.

FastAPI Matrix Factorization Pandas PySpark Docker MLflow

🧠 Generative AI Projects (Personal)

🟢 Production Ready - Backend + Frontend

🔁 Two-Stage RAG for Document QA

Problem:

Traditional RAG systems either suffer from poor precision or high compute overhead. Enterprises waste resources on retrieving irrelevant document chunks during semantic search.

Data:

Enterprise-grade PDFs and document collections that demand scalable and precise retrieval in QA workflows.

Solution:

Architected a two-stage retrieval pipeline:

Stage 1: Fast keyword search (BM25) on fine-grained chunks for high-recall filtering.
Stage 2: Cross-Encoder reranker for semantic scoring and Sentence Transformer embeddings over coarse-grained chunks.

Impact:

🔍 75% reduction in retrieval compute
⚡ Lower latency and higher precision across QA tasks
💡 Designed for real-world document QA and scalable cloud or local deployment
📺 Interactive Streamlit demo for stakeholder engagement

Example Use:

📄 "What were the key risk factors mentioned for tech stocks in Q2 2024?" → Precise retrieval of relevant document sections with semantic context.

Live MVP Medium Article GitHub

RAG Sentence Transformers Cross-Encoder Reranker LangChain ChromaDB Docker Poetry Streamlit Modal

🟢 Production Ready - Backend + Frontend

🧬 LLM Agents for Clinical Trials

Problem:

Matching patients to clinical trials is complex, time-consuming, and prone to regulatory risks, often requiring human review of hundreds of eligibility criteria.

Data:

Clinical trial criteria and patient profiles, with structured and unstructured medical data.

Solution:

Built an agentic LLM workflow using LangGraph to automate trial eligibility screening, hallucination detection, and compliance checks. Integrated human-in-the-loop review and tool calling, with modular backend and vector search.

Impact:

✅ Live MVP deployed with real-time UI and backend orchestration
⚡ Improved speed and consistency of trial matching
💼 Reduced manual effort in preliminary screening
🛡️ Privacy-safe design with structured redaction
📈 Extensible to domains like insurance or finance

Example Use:

🧪 "Is Patient 15 eligible for any trials?" → Response with rationale, matched trials, and policy check.

Live MVP Demo Video GitHub Repo

LangGraph OpenAI Agentic Tool-calling Pydantic ChromaDB SQLite Gradio

🔵 Development Notebooks - Educational

📚 LLM Tutorials & Applications

A collection of practical LLM architectures and end-to-end notebooks featuring carefully selected case studies across domains like healthcare, customer support, and product search. Includes RAG, tool-using agents, clinical trial retrieval, chatbot workflows, and document QA with real-world data sources.

Tutorials:

GitHub Repo

OpenAI RAG LangChain ChromaDB Pinecone Streamlit

📈 Machine Learning & Data Science Projects (Personal)

🟢 Production Ready - Backend + Frontend

📊 Social Sphere: Student Behavior Analytics

👨‍💻 Team Lead • SuperDataScience community project

Problem:

Educators and psychologists seek to understand how digital behavior affects students' mental health, relationships, and academic performance.

Data:

Survey of ~700 students aged 16–25 across multiple countries (Kaggle Q1 2025 dataset). Features include screen time, platform usage, conflicts, sleep, and well-being.

Solution:

Led this SuperDataScience community project to predict social-media addiction scores and relationship conflicts. Conducted comprehensive data exploration to uncover key behavioral patterns and insights.

Impact:

Accurately predicted student addiction levels with 1% error
Flagged at-risk relationship conflicts with 99% sensitivity (Recall)
Revealed critical insights into student behavior, such as the impact of daily screen time and platform usage on mental health and relationships

Live MVP Live MLflow Dashboard GitHub Repo Dataset

Python Scikit-Learn XGBoost Regression Clustering SHAP MLflow Modal Dagshub Streamlit Data Visualization

🟢 Production Ready - Backend + Frontend

🔋 Energy Forecasting with SARIMAX

SuperDataScience community project

Problem:

Facility managers and sustainability teams need reliable short-term energy forecasts to optimize planning and reduce costs. However, real-world energy consumption is volatile and only weakly correlated with exogenous drivers like temperature.

Data:

Synthetic building energy dataset (Kaggle) with hourly and daily electricity usage, temperature, humidity, and occupancy variables.

Solution:

Built a time-series forecasting pipeline using SARIMAX, including time-series CV, ADF tests, and outlier detection for robust preprocessing. Simulation of noisy exogenous inputs via random walks to mimic real-world uncertainty. SARIMAX trained and benchmarked against ARIMA, achieving R² ≈ 0.33 despite injected noise.

Live MVP GitHub Repo Dataset

Python pandas statsmodels SARIMAX Modal Streamlit

🔵 Development Notebooks

🔬 Data Science & ML Mini Tasks

Single-notebook projects showcasing applied machine learning and data science, including various prediction and recommendation tasks across different domains.

Sub-Projects:

Full Repository

Python scikit-learn XGBoost pandas Jupyter Streamlit matplotlib

✍️ Writing

🔍 Interpretable AI: Explaining Predictions with SHAP

An exploration of how SHAP transforms black-box models into explainable systems by quantifying feature impact on individual predictions, with real-world examples from student mental health analytics and medical risk assessment.

Model Interpretability SHAP Explainable AI Trust in ML Feature Importance ML Transparency

Read Article

📉 Why AI ROI Still Fails, Even With the Right Talent?

A data-backed reflection on why enterprise AI projects often underdeliver, despite surging user adoption and engineering talent. Emphasizes the need for product-thinking and clear business alignment from Day 1.

AI Product Strategy AI ROI Business Alignment AI Adoption Enterprise AI Product Thinking

Read Article

🛡️ Guardrails in LLM Apps

Strategies for implementing ethical safeguards, ensuring compliance, and enhancing security in Large Language Model applications.

LLM Safety AI Ethics Compliance Security Guardrails

Read Article

⚙️ LLM Model Selection and Updates

Guidelines for selecting appropriate Large Language Models and managing their updates to balance quality, cost, and scalability in AI applications.

Model Selection LLM Updates Cost Optimization Scalability Quality Management

Read Article

📚 Two-Stage RAG for Document QA

An innovative approach to document-based question answering using a two-stage retrieval strategy to enhance precision and scalability in Retrieval-Augmented Generation systems.

RAG Document QA Retrieval Systems Precision Scalability Two-Stage

Read Article

👷 Data Engineers: The Unsung Heroes Behind AI

An exploration of the pivotal role data engineers play in AI development, emphasizing their contributions to data quality, infrastructure, and the overall success of data science teams.

Data Engineering AI Infrastructure Data Quality Team Collaboration Data Science

Read Article

Let's Connect

Feel free to reach out for collaboration, professional opportunities, or just to swap ideas on building better GenAI systems.

Email LinkedIn GitHub