Archive | AI/ML RSS feed for this section

How Azure AI Search Works as a Vector DB

Azure AI Search can function as a vector database search when configured with vectorization and semantic search capabilities. Here’s how it works and where the vectors are stored:

✅ How Azure AI Search Works as a Vector DB

When you set up Azure AI Search to import and vectorize data (e.g., documents from a selected folder in a Blob Storage container), it performs the following:

Data Ingestion:
- You define a data source (Blob Storage).
- Azure Search pulls documents from the selected folder.
Vectorization:
- You can use built-in vectorization (via Azure OpenAI embedding models) or bring your own embeddings.
- Each document (or chunk of text) is converted into a vector representation.
Indexing:
- These vectors are stored in a vector field within the Azure Search index.
- You define this field in your index schema (e.g., contentVector).
Search:
- You can perform vector similarity search using cosine similarity or other metrics.
- Combine it with a keyword search for hybrid search scenarios.

📍 Where Are Vectors Stored?

The vectors are stored inside the Azure AI Search index itself. Specifically:

Each document in the index has a field (e.g., vector) that holds the vector embedding.
Azure Search indexes these vectors and uses them for similarity search.
You can configure the vector field with parameters like dimensions, vectorSearchAlgorithmConfiguration, etc.

🧠 Example Use Case

If you’re indexing PDFs or text files from Blob Storage:

Azure Search will chunk the documents.
Each chunk gets vectorized.
Vectors are stored in the index.
You can then query using a vector (e.g., from a user query) to retrieve semantically similar chunks.

Comments Leave a Comment
Categories AI/ML

What is RAG (Retrieval-Augmented Generation)?

30 Mar

RAG = Search + LLM generation

Retrieval-Augmented Generation (RAG) is an LLM architecture pattern that combines information retrieval with text generation.

Instead of relying only on a model’s frozen training data, RAG:

Retrieves relevant documents from an external knowledge source at query time
Injects that context into the prompt
Generates an answer grounded in retrieved data

Core Components

Embedding model → converts documents and queries into vectors
Vector store → performs semantic similarity search
Retriever → fetches top-K relevant chunks
Generator (LLM) → produces the final response using retrieved context

High‑level idea of RAG

RAG = Search + LLM generation

Instead of asking an LLM to answer purely from its training data, RAG:

Retrieves relevant knowledge from your private data (documents, PDFs, wikis, etc.)
Augments the prompt with that knowledge
Generates a grounded, accurate answer

This avoids hallucinations and enables enterprise knowledge Q&A.

1️⃣ Indexing Phase (Offline / Preprocessing)

This happens before users ask questions.

Goal

Convert raw documents into a searchable vector index.

Step 1: Document ingestion & parsing

PDF / DOCX / HTML

↓ parse

Plain text

PDFs, Word files, web pages, etc. are parsed into text
Parsing removes layout noise (headers, footers, images)
Output is clean text

✅ Why this matters
LLMs and embedding models operate on text, not binary formats.

Step 2: Chunking (critical design decision)

Text → Chunks (small passages)

Large documents are split into chunks (e.g., 300–1,000 tokens)
Often overlapping chunks (e.g., 20–30%) to preserve context

✅ Why chunking is needed

Embedding models have token limits
Smaller chunks improve retrieval precision
You retrieve just the relevant part, not the entire document

✅ Typical chunk strategies

Fixed-size tokens
Semantic chunking (paragraph / heading based)
Sliding window with overlap

Step 3: Embedding (semantic encoding)

Chunk → Azure Embedding Model → Vector

Each chunk is passed to an Azure OpenAI embedding model
Output is a high‑dimensional vector (e.g., 1,536 dimensions)

✅ What embeddings represent
They capture semantic meaning, not keywords.

Example:

“How to reset password”
“Steps to change login credentials”
→ very similar vectors

Step 4: Store in Vector Database

Embeddings → Azure Vector Store

Stored items typically include:

Vector embedding
Chunk text
Metadata (document name, page, section, timestamp)

✅ Azure options

Azure AI Search (vector + hybrid)
Cosmos DB with vector search
PostgreSQL + pgvector

✅ Outcome
You now have a semantic index of your enterprise knowledge.

2️⃣ Retrieval Phase (R – Runtime)

This happens when a user asks a question.

Step 1: User query

User → “How does leave approval work?”

Raw natural language question.

Step 2: Query embedding

Query → Azure Embedding Model → Query Vector

The same embedding model used during indexing must be used here
This ensures vector space consistency

✅ Important
Using different embedding models breaks similarity search.

Step 3: Semantic search in vector DB

Query Vector → Similarity Search → Top‑K chunks

Cosine similarity / dot product used
Returns most semantically relevant chunks

✅ Often combined with:

Metadata filters (department, date, access level)
Hybrid search (vector + keyword)

✅ Output A small set of relevant chunks, not documents.

3️⃣ Augmentation Phase (A)

This is the bridge between search and generation.

Step 1: Combine retrieved chunks

Relevant Chunks → Context

Chunks are:
- Deduplicated
- Ordered
- Truncated to token limits

✅ Typical structure

Context:

[Chunk 1]

[Chunk 2]

[Chunk 3]

Step 2: Augment the user query

Prompt = System Instructions

+ User Query

+ Retrieved Context

Example:

You are an HR policy assistant.

Answer ONLY using the context below.

Context:

Question:

How does leave approval work?

✅ Why this is powerful

LLM is forced to ground answers
No reliance on model’s internal memory
Enables citations & traceability

4️⃣ Generation Phase (G)

This is where the LLM produces the final answer.

Step 1: Feed augmented prompt to LLM

Prompt → Azure OpenAI GPT Model

Model sees:
- The question
- The retrieved enterprise knowledge
It does reasoning + language generation

Step 2: Generate response

LLM → Final Answer

✅ Characteristics of RAG responses

Grounded in provided data
Up‑to‑date (depends on index)
Enterprise‑safe
Explainable (can show source chunks)

🔁 Why RAG is superior to fine‑tuning for enterprise data

Aspect	Fine‑tuning	RAG
Data freshness	Static	Real‑time
Cost	High	Low
Hallucination risk	Medium	Low
Source citations	Hard	Easy
Compliance	Risky	Strong

🧠 Key architectural best practices

Chunk size matters more than model choice
Use hybrid search (vector + keyword) in production
Add metadata filtering for access control
Keep prompt instructions strict
Log retrieved chunks for observability

✅ Final mental model

Think of RAG as:

“Search engine + LLM reasoning layer”

Vector DB = semantic memory
Embeddings = meaning encoder
LLM = language + reasoning engine

Why RAG exists

Problem	RAG Solution
LLM hallucinations	Ground answers in real data
Stale knowledge	Fetch live or frequently updated content
Private data	Keep proprietary data outside model training
Cost of fine-tuning	Avoid retraining models

Production Use Cases Commonly Implemented with RAG

Below are real, production-grade RAG use cases that teams deploy—not demos.

1. Enterprise Knowledge Assistant

Use case

Internal chatbot for policies, SOPs, wikis, Confluence, PDFs

How RAG helps

Retrieves policy clauses or documents
Answers with citations and source links

Production details

Chunking by semantic sections
Role-based access filtering at retrieval time
Caching frequent queries

2. Customer Support & Helpdesk Automation

Use case

Support bot answering FAQs, troubleshooting guides, and manuals

How RAG helps

Grounds answers in official docs
Reduces hallucinated instructions

Enhancements

Confidence thresholds → fallback to human agent
Query rewriting for vague user questions

3. Code & Developer Assistants

Use case

Query internal repositories, APIs, and design docs

How RAG helps

Retrieves relevant code snippets
Explains logic using actual implementation

Key technique

Repository-aware chunking (functions, classes)
Metadata filters (language, repo, branch)

4. Legal / Compliance Search

Use case

Contract analysis, regulation Q&A, audit prep

Why RAG is critical

Exact wording matters
Answers must be traceable

Production safeguards

Source citation mandatory
Retrieval-only mode for sensitive answers

5. Analytics & BI Natural Language Interface

Use case

Ask questions over dashboards, metrics definitions, and data catalogs

RAG role

Retrieves metric definitions before generating explanations
Prevents semantic drift (“revenue” vs “net revenue”)

6. Healthcare / Scientific Literature Assistants

Use case

Search clinical guidelines, research papers
(for example, Care plans for people who require care from care staff so that the person needing care and the staff can ask questions about how to cope or manage certain situations)

Why RAG

Models cannot invent facts
Must cite authoritative sources

Controls

Strict context window limits
Generation constrained to retrieved text

Typical Production RAG Stack

Common tooling used in real systems:

Frameworks
- LangChain
- LlamaIndex
Vector Databases
- Pinecone
- Weaviate
- FAISS
LLMs
- OpenAI
- Anthropic

What Makes a RAG System “Production-Ready”?

Key differences from toy implementations:

Advanced chunking (semantic, hierarchical)
Hybrid retrieval (vector + keyword/BM25)
Re-ranking models for precision
Observability (retrieval quality, answer grounding)
Security (PII filtering, ACL-aware retrieval)
Evaluation pipelines (faithfulness, relevance, latency)

Summary (Executive View)

RAG = Retrieval + Generation
It grounds LLM outputs in trusted, up-to-date data
It’s the default architecture for enterprise LLM applications
Most real-world LLM products today are RAG-based

Re-ranking is a second-pass ranking that improves the quality of retrieved documents. Initial retrieval is fast but approximate. Re-ranking is slower but more accurate. You use both together: retrieve top 100 quickly with vectors, re-rank top 100 accurately with a stronger model.

=====================================================================

1️⃣ End-to-End Production RAG Architecture

High-Level Flow

A production RAG system has two pipelines:

Offline indexing pipeline
Online query pipeline

A. Offline Pipeline (Indexing Phase)

Step 1 — Data Ingestion

Sources:

PDFs
Confluence / SharePoint
Databases
S3
Git repos
APIs

Step 2 — Preprocessing

Cleaning
Deduplication
PII masking (if required)
Metadata enrichment (doc type, department, ACL tags)

Step 3 — Chunking Strategy

This is critical.

Common strategies:

Fixed token windows (e.g., 512 tokens + overlap)
Semantic chunking (split by section headers)
Recursive chunking (hierarchical)

Poor chunking = poor retrieval.

Step 4 — Embeddings

Use embedding models from:

OpenAI
Cohere
Google

Each chunk → converted into a dense vector.

Step 5 — Vector Store

Stored in:

Pinecone
Weaviate
FAISS
Milvus

Metadata indexing:

department
document version
access control tags
timestamps

B. Online Pipeline (Query Time)

Step 1 — User Query

Example:

“What’s the data retention policy for EU customers?”

Step 2 — Query Processing

Query rewriting
Expansion
Intent detection
Metadata filters (e.g., EU region only)

Step 3 — Retrieval

Modern systems use Hybrid Retrieval:

Dense vector similarity
BM25 keyword search
Metadata filtering

Then:

Re-ranking using cross-encoder models

Step 4 — Context Construction

Top-K chunks (e.g., 5–20) are:

Deduplicated
Ordered
Compressed (if needed)

Inserted into prompt template:

You must answer using ONLY the context below.

If answer not found, say “Not found”.

Context:

[retrieved chunks]

Step 5 — Generation

LLM examples:

OpenAI
Anthropic

Output:

Answer
Citations
Confidence score (optional)

Step 6 — Observability Layer

Track:

Retrieval latency
Answer faithfulness
Token cost
Query success rate
Hallucination rate

Production systems ALWAYS include logging + evaluation.

2️⃣ RAG vs Fine-Tuning

Here’s the decision framework.

Dimension	RAG	Fine-Tuning
Knowledge updates	Real-time	Requires retraining
Private data	Stays external	Embedded into weights
Hallucination control	High (if good retrieval)	Lower
Cost	Cheaper long term	Expensive training
Personalization	Metadata-based	Style-based
Domain knowledge depth	Moderate	Very deep possible

When to Use RAG

Dynamic knowledge
Large document corpora (corpora – a collection of written or spoken texts)
Need citations
Compliance-heavy domains
Enterprise data access control

When to Fine-Tune

Tone/style control
Structured output consistency
Domain language modeling
Classification tasks
Reducing prompt length

Hybrid Approach (Common in Production)

Most serious systems:

Use RAG for knowledge
Fine-tune for behavior

Example:
Fine-tuned LLM + RAG retrieval backend.

3️⃣ Common Production Failure Modes

Now we move into real problems teams face.

1. Retrieval Miss (Most Common)

Problem:
Correct answer exists in corpus but not retrieved.

Causes:

Bad chunking
Embedding mismatch
Query phrasing mismatch
Top-K too small

Fix:

Hybrid retrieval
Query rewriting
Better chunk granularity

2. Context Overload

Too many chunks → LLM confusion.

Symptoms:

Blended answers
Irrelevant info included
Long but low-quality responses

Fix:

Re-ranking
Context compression
Smaller top-K

3. Hallucination Despite Retrieval

LLM ignores context and fabricates.

Fix:

Strict prompting
Answer-only-from-context instructions
Retrieval-only fallback mode

4. Access Control Leakage

User retrieves documents they shouldn’t.

Fix:

Metadata-based ACL filtering before retrieval
Zero trust design

5. Latency Explosion

Vector search + rerank + LLM = slow.

Fix:

Cache embeddings
Smaller embedding models
Asynchronous re-ranking

6. Embedding Drift

Switching embedding models breaks retrieval quality.

Always re-embed full corpus if model changes.

4️⃣ Evaluation Metrics for RAG Systems

Evaluation must measure:

Retrieval quality
Generation quality
End-to-end performance

A. Retrieval Metrics

Measured against labeled dataset.

Recall@K – % of queries where correct doc in top K
Precision@K – % of retrieved docs relevant
MRR (Mean Reciprocal Rank)
nDCG (ranking quality metric)

B. Generation Metrics

Key metric:

1. Faithfulness (Groundedness)

Is the answer supported by retrieved context?

Measured via:

LLM-as-judge
Fact overlap scoring

2. Answer Relevance

Does answer match question?

3. Hallucination Rate

% answers containing unsupported claims.

C. End-to-End Business Metrics

Most important in production:

Task completion rate
Escalation rate (to human)
CSAT (if support bot) – A chatbot CSAT score specifically measures how pleased customers are with their interactions with your automated chatbot
Cost per query
Latency

D. Automated RAG Evaluation Frameworks

Used in production:

LangChain
LlamaIndex
Weights & Biases

They help measure:

Retrieval recall
Groundedness
Regression testing after updates

Final Executive Summary

Production RAG is NOT:

“Embed PDFs → call LLM → done.”

It is:

Carefully designed ingestion
Advanced retrieval strategies
Context optimization
Strict evaluation loops
Continuous monitoring

Comments Leave a Comment
Categories AI/ML

Machine Learning and Deep Learning Basics

10 Sep

1. Introduction

Artificial Intelligence is a broad field, but most of its modern breakthroughs stem from Machine Learning (ML) and its subfield Deep Learning (DL).

Machine Learning focuses on algorithms that learn patterns from data and improve with experience.

Deep Learning is a specialized subset of ML that uses neural networks with many layers to process large, complex data like images, speech, and text.

2. Concepts & Explanations

Machine Learning Paradigms

Machine Learning (ML) is a core subfield of Artificial Intelligence that enables systems to learn from data and improve over time without being explicitly programmed for every task. In ML, models identify patterns and make decisions based on training data.

Machine learning is the scientific study of algorithms and statistical models that computer systems use to effectively perform a specific task without using explicit instructions, relying on patterns and inference instead.

Building a model by learning the patterns of historical data with some relationship between data to make a data-driven prediction

General Architecture of Machine Learning:

Business understanding: Understand the given use case, and also, it’s good to know more about the domain for which the use cases are built.

Data Acquisition and Understanding: Data gathering from different sources and understanding the data. Cleaning the data, handling the missing data if any, data wrangling, and EDA( Exploratory data analysis).

Modeling: Feature Engineering – scaling the data, feature selection – not all features are important. We
use the backward elimination method, correlation factors, PCA and domain knowledge to select the
features.
Model Training based on trial and error method or by experience, we select the algorithm and train with
the selected features.
Model evaluation Accuracy of the model , confusion matrix and cross-validation.
If accuracy is not high, to achieve higher accuracy, we tune the model…either by changing the algorithm
used or by feature selection or by gathering more data, etc.
Deployment – Once the model has good accuracy, we deploy the model either in the cloud or Rasberry
Pi or any other place. Once we deploy, we monitor the performance of the model. if its good…we go live
with the model or reiterate the all process until our model performance is good.
It’s not done yet!!!
What if, after a few days, our model performs badly because of new data. In that case, we do all the
process again by collecting new data and redeploy the model.

ML can be classified into 3 main paradigms:

Supervised Learning : Here the machine learns from labeled data.

Learn from labeled data (input + output).

Example: Predicting house prices from square footage.

Algorithms: Linear regression, decision trees, support vector machines.

In supervised learning, the model is trained on a labeled dataset — where each input is paired with the correct output. The goal is to learn a mapping from inputs to outputs, enabling the model to make accurate predictions on new, unseen data.

Supervised learning is classified into two categories of algorithms:
Classification: A classification problem is when the output variable is a category, such as “Red” or “blue”, “disease” or “no disease”.
Regression: A regression problem is when the output variable is a real value, such as “dollars” or “weight”.

Examples:

Predicting house prices (input: size, location; output: price)
Email spam detection (input: email content; output: spam/not spam)
Image classification (input: image pixels; output: object label)

Common Algorithms:

Linear Regression
Logistic Regression
Decision Trees
Support Vector Machines (SVM)
Neural Networks

Use Cases:
1. Fraud detection
2. Sentiment analysis
3. Medical diagnosis

Unsupervised Learning
- Learn from unlabeled data, finding patterns and structure.
- Example: Grouping customers into segments based on shopping behavior.
- Algorithms: K-means clustering, PCA (Principal Component Analysis).

In unsupervised learning, the model is given input data without labels. The goal is to find hidden patterns, groupings, or structures within the data.

An unsupervised model, in contrast, provides unlabelled data that the algorithm tries to make sense of by extracting features, co-occurrence and underlying patterns on its own.

Examples:

Clustering users based on browsing behavior
Dimensionality reduction for data visualization
Anomaly detection in network traffic

Common Algorithms:

K-Means Clustering
Hierarchical Clustering
Principal Component Analysis (PCA)
Autoencoders

Use Cases:

Market segmentation
Recommender systems
Data compression

Reinforcement Learning (RL)

An agent learns by interacting with an environment, receiving rewards or penalties.

Example: Training a robot to walk, or AI to play chess.

Reinforcement Learning (RL) is a goal-directed learning paradigm in which an agent learns to make decisions by interacting with an environment. It receives feedback in the form of rewards or penalties based on its actions and aims to maximize cumulative reward over time.

Reinforcement learning is less supervised and depends on the learning agent in determining the output solutions by arriving at different possible ways to achieve the best possible solution.

Examples:

Teaching a robot to walk,

Training an AI to play chess or Go

Optimizing delivery routes

Key Concepts:

Agent: The learner or decision-maker

Environment: Everything the agent interacts with

Reward Signal: Feedback for good or bad behavior

Policy: The strategy the agent uses to make decisions

Use Cases:

Game AI
Robotics control
Dynamic pricing
Personalized recommendations

💡 Analogy:

Supervised → A teacher provides answers to all practice problems.
Unsupervised → A student tries to find patterns in problems without answers.
Semi-supervised → Some answers are given, the rest must be figured out.

Reinforcement → Learning by trial and error, like a baby learning to walk.

Traditional ML vs. Deep Learning

Traditional ML
- Relies on hand-crafted features (e.g., edge detectors in images).
- Works well for small-to-medium datasets.
- Examples: Decision Trees, Random Forests, SVMs.

Deep Learning
- Learn features automatically from raw data using neural networks.
- Requires large datasets and high computational power.
- Excels in complex tasks (e.g., speech recognition, image generation).

📘 Diagram (text description):

Traditional ML pipeline: Raw Data → Feature Engineering → Model Training → Prediction.

Deep Learning pipeline: Raw Data → Neural Network (learns features + model) → Prediction.

2.3 Neural Networks: Architecture & Learning

A neural network is inspired by the human brain:

Neurons → simple units that take input, apply a function, and pass output.
Layers →
- Input layer (data features).
- Hidden layers (transformations).
- Output layer (prediction/classification).

💡 Analogy: Imagine a bakery:

Input layer → Ingredients.
Hidden layers → Baking process (mixing, heating, decorating).
Output layer → Final cake.

2.4 Core Concepts

Activation Functions: Decide how much signal passes through a neuron.
- Examples: Sigmoid, ReLU, Tanh.
Loss Function: Measures how far predictions are from true values.
- Example: Mean Squared Error for regression, Cross-Entropy Loss for classification.
Optimizers: Algorithms that adjust model parameters to minimize loss.
- Example: Gradient Descent, Adam optimizer.
Backpropagation: The process of propagating errors backward through the network to update weights.

2.5 Evaluation Metrics

Different tasks require different evaluation metrics:

Classification: Accuracy, Precision, Recall, F1-score.
Regression: Mean Squared Error (MSE), R² score.
Generative models (later chapters): BLEU score, Perplexity, Fréchet Inception Distance (FID).

2.6 Bias, Fairness & Ethics

AI models can inherit bias from data:

Example: A recruitment model trained on biased data may unfairly reject female candidates.
Fairness techniques: Data balancing, bias detection, and fairness-aware algorithms.
Ethics: Transparency, accountability, and ensuring AI benefits society.

3. Use Cases & Applications

Healthcare: Predict disease risks, detect cancer from medical scans.
Finance: Credit scoring, fraud detection.
Education: Personalized learning systems.
Retail: Customer segmentation, demand forecasting.
Transportation: Autonomous driving using deep learning for object detection.

4. Algorithms & Techniques

Let’s explore two practical ML approaches:

4.1 Supervised Learning Example (Classification)

# Classifying iris flowers using scikit-learn

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import accuracy_score

# Load dataset

iris = load_iris()

X, y = iris.data, iris.target

# Split into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train decision tree

clf = DecisionTreeClassifier()

clf.fit(X_train, y_train)

# Predictions

y_pred = clf.predict(X_test)

print(“Accuracy:”, accuracy_score(y_test, y_pred))

4.2 Deep Learning Example (Neural Network for Digit Recognition)

import tensorflow as tf

from tensorflow.keras.datasets import mnist

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense, Flatten

from tensorflow.keras.utils import to_categorical

# Load dataset

(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train, X_test = X_train / 255.0, X_test / 255.0 # normalize

y_train, y_test = to_categorical(y_train), to_categorical(y_test)

# Build neural network

model = Sequential([

Flatten(input_shape=(28, 28)),

Dense(128, activation=’relu’),

Dense(10, activation=’softmax’)

])

# Compile and train

model.compile(optimizer=’adam’, loss=’categorical_crossentropy’, metrics=[‘accuracy’])

model.fit(X_train, y_train, epochs=5, validation_data=(X_test, y_test))

# Evaluate

loss, accuracy = model.evaluate(X_test, y_test)

print(“Test Accuracy:”, accuracy)

5. Case Study / Mini-Project

Mini-Project: Spam Email Classifier

We’ll build a simple spam detection model using Naive Bayes.

from sklearn.model_selection import train_test_split

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.naive_bayes import MultinomialNB

from sklearn.metrics import classification_report

# Example dataset

emails = [“Win money now!!!”, “Meeting at 3 pm”, “Get cheap loans instantly”, “Lunch tomorrow?”]

labels = [1, 0, 1, 0] # 1 = spam, 0 = not spam

# Vectorize text

vectorizer = CountVectorizer()

X = vectorizer.fit_transform(emails)

# Train-test split

X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.25, random_state=42)

# Train Naive Bayes classifier

clf = MultinomialNB()

clf.fit(X_train, y_train)

# Predictions

y_pred = clf.predict(X_test)

print(classification_report(y_test, y_pred))

➡️ This shows how supervised learning can classify emails as spam or not spam.

6. Summary

We explored practical examples: iris classification, digit recognition, and spam detection.
Machine Learning enables computers to learn patterns from data.

Main paradigms: supervised, unsupervised, semi-supervised, reinforcement learning.
Traditional ML relies on feature engineering; Deep Learning learns features automatically.
Key concepts: activation functions, loss functions, optimizers, and backpropagation.
Evaluation metrics help measure model performance.
Ethical challenges (bias, fairness) must be addressed.

➡️ (Above content is taken by my best selling book Generative AI & Machine Learning)

Tags: AI, artificial-intelligence, data-science, machine-learning, Technology

Comments Leave a Comment
Categories AI/ML

Fast API Project Structure

14 Aug

FastAPI project structure that incorporates multiple design patterns including:

Dependency Injection
Repository Pattern
Service Layer
Strategy Pattern
Factory Pattern
Observer Pattern
Builder Pattern
Domain-Driven Design (DDD) principles

🧱 Project Structure

fastapi_project/
│
├── app/
│   ├── main.py                  # FastAPI app entry point
│   ├── config.py                # App configuration
│   ├── dependencies/            # DI providers
│   │   ├── db.py
│   │   ├── auth.py
│   │   └── __init__.py
│   │
│   ├── models/                  # Pydantic models
│   │   ├── user.py
│   │   ├── request.py
│   │   └── __init__.py
│   │
│   ├── domain/                  # Domain models (DDD)
│   │   ├── entities/
│   │   │   ├── user_entity.py
│   │   │   └── __init__.py
│   │   └── __init__.py
│   │
│   ├── repositories/            # Repository pattern
│   │   ├── user_repository.py
│   │   └── __init__.py
│   │
│   ├── services/                # Business logic (Service Layer)
│   │   ├── user_service.py
│   │   └── __init__.py
│   │
│   ├── strategies/              # Strategy pattern
│   │   ├── auth/
│   │   │   ├── jwt_strategy.py
│   │   │   ├── oauth_strategy.py
│   │   │   └── __init__.py
│   │   └── __init__.py
│   │
│   ├── factories/               # Factory pattern
│   │   ├── service_factory.py
│   │   └── __init__.py
│   │
│   ├── observers/               # Observer pattern
│   │   ├── event_manager.py
│   │   └── __init__.py
│   │
│   ├── builders/                # Builder pattern
│   │   ├── report_builder.py
│   │   └── __init__.py
│   │
│   ├── middleware/              # Custom middleware
│   │   ├── logging_middleware.py
│   │   └── __init__.py
│   │
│   ├── routes/                  # API routes
│   │   ├── user_routes.py
│   │   └── __init__.py
│   │
│   └── utils/                   # Utility functions
│       ├── helpers.py
│       └── __init__.py
│
├── requirements.txt
└── README.md

🧩 How Patterns Fit Together

Pattern	Folder	Purpose
Dependency Injection	`dependencies/`	Inject DB, auth, config
Repository	`repositories/`	Abstract DB access
Service Layer	`services/`	Business logic
Strategy	`strategies/`	Pluggable auth or processing logic
Factory	`factories/`	Create services based on config
Observer	`observers/`	Event-driven notifications
Builder	`builders/`	Construct complex objects
DDD	`domain/`	Domain entities and aggregates

Conclusion: In a FastAPI project, you can implement several software design patterns to improve modularity, scalability, and maintainability. Here’s a categorized overview of the most relevant patterns and how they apply to FastAPI

Comments Leave a Comment
Categories AI/ML

Understanding the Evolution: Generative AI, AI Agents, and Agentic AI

12 Aug

Introduction

As artificial intelligence continues to evolve, it’s essential to distinguish between three foundational yet distinct paradigms: Generative AI, AI Agents, and Agentic AI. While these concepts are closely related, each represents a different level of autonomy, complexity, and capability. This guide breaks down their core differences, practical applications, and how they build upon one another.

1. Generative AI: Creating Content on Demand

What It Is

Generative AI refers to models—like large language models (LLMs) and image generators—that produce original content based on patterns learned from massive datasets. These models are trained on diverse data (text, images, audio, video) and contain billions of parameters.

How It Works

Generative AI is reactive: it responds to user prompts without initiating actions or managing tasks. For example, when prompted to “write a poem about data science,” the model generates a poem but doesn’t decide to write one on its own.

Key Features

Trained on large, multimodal datasets
Generates text, images, audio, or video
Requires prompt engineering to guide output
Examples: OpenAI’s GPT-4, Meta’s LLaMA 3
Supported by tools like LangChain, LlamaIndex, and Grok

2. AI Agents: Task-Oriented Intelligence

What They Are

AI agents extend generative AI by adding autonomy and interactivity. They can perform specific tasks by integrating with external tools and APIs, making them more dynamic and useful in real-world applications.

Why They Matter

LLMs alone can’t access real-time or private data. AI agents solve this by making tool calls—requests to external systems—to fetch current or specialized information.

Example Workflow

User asks a question.
Agent checks if the LLM can answer.
If not, it calls an external API (e.g., for today’s news).
It processes the response.
It delivers a final answer to the user.

Key Features

Built on LLMs with external tool integration
Can retrieve real-time or private data
Perform single, well-defined tasks
Still reactive, but with enhanced capabilities
Act autonomously within defined boundaries

3. Agentic AI: Orchestrating Complex Workflows

What It Is

Agentic AI represents the next level—multi-agent systems that collaborate to complete complex, multi-step workflows. Each agent specializes in a subtask, and together they operate like a coordinated team.

Use Case: YouTube to Blog

An agentic AI system might:

Extract a transcript from a YouTube video
Generate a blog title
Write a summary and description
Compose a conclusion

Each step is handled by a different agent, and outputs are passed between them to produce a polished blog post.

Key Features

Multiple agents working in sequence or parallel
Each agent handles a specific subtask
Enables end-to-end automation of complex workflows
Supports human feedback for refinement
Adds adaptability and robustness through collaboration

4. Comparative Summary

5. Strategic Implications

Generative AI

Ideal for creative content generation, but limited by its reactive nature. Success depends heavily on prompt quality.

AI Agents

Bridge the gap between static models and dynamic applications. Useful in domains like customer service, analytics, and decision support.

Agentic AI

Best suited for automating complex, multi-step processes. Aligns with real-world workflows and supports scalability, adaptability, and human oversight.

Conclusion

Understanding the distinctions between generative AI, AI agents, and agentic AI is essential for anyone working with modern AI systems. From content creation to autonomous task execution and workflow orchestration, these paradigms represent a clear evolution in capability and complexity. By choosing the right approach, organizations can unlock new levels of efficiency, creativity, and intelligence in their AI-driven solutions.

Comments Leave a Comment
Categories AI/ML

← Older Entries

Search

Learn with Sandeep

How Azure AI Search Works as a Vector DB

What is RAG (Retrieval-Augmented Generation)?

High‑level idea of RAG

1️⃣ Indexing Phase (Offline / Preprocessing)

Goal

Step 1: Document ingestion & parsing

Step 2: Chunking (critical design decision)

Step 3: Embedding (semantic encoding)

Step 4: Store in Vector Database

2️⃣ Retrieval Phase (R – Runtime)

Step 1: User query

Step 2: Query embedding

Step 3: Semantic search in vector DB

3️⃣ Augmentation Phase (A)

Step 1: Combine retrieved chunks

Step 2: Augment the user query

4️⃣ Generation Phase (G)

Step 1: Feed augmented prompt to LLM

Step 2: Generate response

🔁 Why RAG is superior to fine‑tuning for enterprise data

🧠 Key architectural best practices

✅ Final mental model

Why RAG exists

Production Use Cases Commonly Implemented with RAG

1. Enterprise Knowledge Assistant

2. Customer Support & Helpdesk Automation

3. Code & Developer Assistants

4. Legal / Compliance Search

5. Analytics & BI Natural Language Interface

6. Healthcare / Scientific Literature Assistants

Typical Production RAG Stack

What Makes a RAG System “Production-Ready”?

Summary (Executive View)

1️⃣ End-to-End Production RAG Architecture

High-Level Flow

A. Offline Pipeline (Indexing Phase)

Step 1 — Data Ingestion

Step 2 — Preprocessing

Step 3 — Chunking Strategy

Step 4 — Embeddings

Step 5 — Vector Store

B. Online Pipeline (Query Time)

Step 1 — User Query

Step 2 — Query Processing

Step 3 — Retrieval

Step 4 — Context Construction

Step 5 — Generation

Step 6 — Observability Layer

2️⃣ RAG vs Fine-Tuning

When to Use RAG

When to Fine-Tune

Hybrid Approach (Common in Production)

3️⃣ Common Production Failure Modes

1. Retrieval Miss (Most Common)

2. Context Overload

3. Hallucination Despite Retrieval

4. Access Control Leakage

5. Latency Explosion

6. Embedding Drift

4️⃣ Evaluation Metrics for RAG Systems

A. Retrieval Metrics

B. Generation Metrics

1. Faithfulness (Groundedness)

2. Answer Relevance

3. Hallucination Rate

C. End-to-End Business Metrics

D. Automated RAG Evaluation Frameworks

Final Executive Summary

Machine Learning and Deep Learning Basics

1. Introduction

2. Concepts & Explanations

Machine Learning Paradigms

2.3 Neural Networks: Architecture & Learning

2.4 Core Concepts

2.5 Evaluation Metrics

2.6 Bias, Fairness & Ethics

3. Use Cases & Applications

4. Algorithms & Techniques