Bob Hosseini, PhD - AI Architect & Data Scientist

Bob Hosseini, PhD

Senior Data Scientist | GenAI & ML Systems | Team Lead

Designing AI systems that scale and teams that ship.

What I Do

I'm passionate about pushing AI beyond traditional boundaries, bridging state-of-the-art concepts with scalable engineering solutions. I design end-to-end AI systems, from backend integration to intuitive frontend experiences, delivering impactful products like automated call summarization agents, intelligent CRM tools, and enterprise-scale recommendation systems.

GenAI Systems

Lead development of GenAI systems using LLMs, RAG, and agentic pipelines

ML Architecture

Architect ML solutions from experimentation to scalable production

Team Leadership

Mentor data teams and implement best practices for ML/AI delivery

Data Strategy

Drive data product strategy and cross-functional execution in e-commerce

Problem Solving

Translate complex business problems into real-world AI products

๐Ÿ› ๏ธ Skills & Technical Expertise

๐Ÿง 

AI & ML Stack

Large Language Models (LLMs) Retrieval-Augmented Generation (RAG) Agentic Pipelines Model Context Protocol (MCP) NLP & Semantic Search Transformers LangChain LlamaIndex OpenAI APIs Vector Databases Scikit-learn TensorFlow PyTorch
โ˜๏ธ

Cloud & DevOps

AWS Azure Google Cloud (GCP) Docker CI/CD Airflow MLflow Jenkins
๐Ÿ’ป

Software & Engineering

Python C++ FastAPI SQL Pydantic Git REST APIs Streamlit Gradio Data Pipelines Backend Design Frontend Integration

๐Ÿข Real-World AI Systems (Professional)

๐Ÿ“ž

LLM-Based Call Summarization

Problem:

Manual call logging was time-consuming and error-prone.

Solution:

Developed a generative AI system summarizing 2,000+ calls/month using LLMs and audio transcripts.

Impact:

Increased productivity and customer satisfaction.

OpenAI LangChain AWS Lambda AWS S3 AWS CloudWatch AWS SAM Prompt Engineering SQL CLI Bash Audio Preprocessing
๐Ÿ“Š

Data Analytics Assistant

Problem:

Business teams lacked real-time insights from complex data.

Solution:

Built an LLM-powered dashboard enabling natural-language querying of business metrics.

Impact:

Accelerated insight generation and decision-making.

LangChain Google Cloud LangGraph SQL Pandas Prompt Engineering Data Visualization Streamlit
๐Ÿ”„

Customer Churn Prediction

Problem:

Retention teams struggled to identify at-risk customers.

Solution:

Built predictive models using behavior and transaction data.

Impact:

Reduced churn by 20% via proactive outreach.

XGBoost Scikit-learn SHAP Google Cloud MLflow Data Visualization
๐Ÿ›’

Product Recommendation Engine

Problem:

Users struggled to find relevant products in a large portfolio.

Solution:

Developed collaborative and content-based recommendation system for a 55K-product catalog.

Impact:

Boosted sales and improved user engagement.

FastAPI Matrix Factorization Pandas PySpark Docker MLflow

๐Ÿง  Generative AI Projects (Personal)

๐ŸŸข Production Ready - Backend + Frontend
LLM Agents for Clinical Trials - AI-powered clinical trial matching system

๐Ÿงฌ LLM Agents for Clinical Trials

Problem:

Matching patients to clinical trials is complex, time-consuming, and prone to regulatory risks, often requiring human review of hundreds of eligibility criteria.

Data:

Clinical trial criteria and patient profiles, with structured and unstructured medical data.

Solution:

Built an agentic LLM workflow using LangGraph to automate trial eligibility screening, hallucination detection, and compliance checks. Integrated human-in-the-loop review and tool calling, with modular backend and vector search.

Impact:

  • โœ… Live MVP deployed with real-time UI and backend orchestration
  • โšก Improved speed and consistency of trial matching
  • ๐Ÿ’ผ Reduced manual effort in preliminary screening
  • ๐Ÿ›ก๏ธ Privacy-safe design with structured redaction
  • ๐Ÿ“ˆ Extensible to domains like insurance or finance

Example Use:

๐Ÿงช "Is Patient 15 eligible for any trials?" โ†’ Response with rationale, matched trials, and policy check.

LangGraph OpenAI Agentic Tool-calling Pydantic ChromaDB SQLite Gradio
๐Ÿ”ต Development Notebooks - Educational
LLM Tutorials & Applications - Educational notebooks and case studies

๐Ÿ“š LLM Tutorials & Applications

A collection of practical LLM architectures and end-to-end notebooks featuring carefully selected case studies across domains like healthcare, customer support, and product search. Includes RAG, tool-using agents, clinical trial retrieval, chatbot workflows, and document QA with real-world data sources.

OpenAI RAG LangChain ChromaDB Pinecone Streamlit

๐Ÿ“ˆ Machine Learning & Data Science Projects (Personal)

๐ŸŸข Production Ready - Backend + Frontend
Social Sphere: Student Behavior Analytics - Data science project

๐Ÿ“Š Social Sphere: Student Behavior Analytics

๐Ÿ‘จโ€๐Ÿ’ป Team Lead โ€ข SuperDataScience community project

Problem:

Educators and psychologists seek to understand how digital behavior affects students' mental health, relationships, and academic performance.

Data:

Survey of ~700 students aged 16โ€“25 across multiple countries (Kaggle Q1 2025 dataset). Features include screen time, platform usage, conflicts, sleep, and well-being.

Solution:

Led this SuperDataScience community project to predict social-media addiction scores and relationship conflicts. Conducted comprehensive data exploration to uncover key behavioral patterns and insights.

Impact:

  • Accurately predicted student addiction levels with 1% error
  • Flagged at-risk relationship conflicts with 99% sensitivity (Recall)
  • Revealed critical insights into student behavior, such as the impact of daily screen time and platform usage on mental health and relationships
Python Scikit-Learn XGBoost Regression Clustering SHAP MLflow Modal Dagshub Streamlit Data Visualization
๐ŸŸข Production Ready - Backend + Frontend
Energy Forecasting with SARIMAX - Time series forecasting project

๐Ÿ”‹ Energy Forecasting with SARIMAX

Problem:

Facility managers and sustainability teams need reliable short-term energy forecasts to optimize planning and reduce costs. However, real-world energy consumption is volatile and only weakly correlated with exogenous drivers like temperature.

Data:

Synthetic building energy dataset (Kaggle) with hourly and daily electricity usage, temperature, humidity, and occupancy variables.

Solution:

Built a time-series forecasting pipeline using SARIMAX, including time-series CV, ADF tests, and outlier detection for robust preprocessing. Simulation of noisy exogenous inputs via random walks to mimic real-world uncertainty. SARIMAX trained and benchmarked against ARIMA, achieving Rยฒ โ‰ˆ 0.33 despite injected noise.

Python pandas statsmodels SARIMAX Modal Streamlit
๐Ÿ”ต Development Notebooks
Data Science & ML Mini Tasks - Applied machine learning projects

๐Ÿ”ฌ Data Science & ML Mini Tasks

Single-notebook projects showcasing applied machine learning and data science, including various prediction and recommendation tasks across different domains.

Python scikit-learn XGBoost pandas Jupyter Streamlit matplotlib

โœ๏ธ Writing

๐Ÿ” Interpretable AI: Explaining Predictions with SHAP

An exploration of how SHAP transforms black-box models into explainable systems by quantifying feature impact on individual predictions, with real-world examples from student mental health analytics and medical risk assessment.

Read Article

๐Ÿ“‰ Why AI ROI Still Fails, Even With the Right Talent?

A data-backed reflection on why enterprise AI projects often underdeliver, despite surging user adoption and engineering talent. Emphasizes the need for product-thinking and clear business alignment from Day 1.

Read Article

๐Ÿ›ก๏ธ Guardrails in LLM Apps

Strategies for implementing ethical safeguards, ensuring compliance, and enhancing security in Large Language Model applications.

Read Article

โš™๏ธ LLM Model Selection and Updates

Guidelines for selecting appropriate Large Language Models and managing their updates to balance quality, cost, and scalability in AI applications.

Read Article

๐Ÿ“š Two-Stage RAG for Document QA

An innovative approach to document-based question answering using a two-stage retrieval strategy to enhance precision and scalability in Retrieval-Augmented Generation systems.

Read Article

๐Ÿ‘ท Data Engineers: The Unsung Heroes Behind AI

An exploration of the pivotal role data engineers play in AI development, emphasizing their contributions to data quality, infrastructure, and the overall success of data science teams.

Read Article

Let's Connect

Feel free to reach out for collaboration, professional opportunities, or just to swap ideas on building better GenAI systems.