We built an AI agent that actually works in production. Here's every line of code.

By Rizwan Yousuf

March 5, 2026

We built an AI agent that actually works in production. Here's every line of code.

Let me tell you about the moment I stopped being skeptical about AI agents.

It was 11pm on a Tuesday. Our team had just deployed a customer support agent for a Series B fintech client. Not a chatbot. Not a glorified FAQ lookup. A real, multi-step reasoning agent that could pull a customer's transaction history from Snowflake, cross-reference it against compliance rules stored in Unity Catalog, search through 40,000 pages of policy documentation via Mosaic AI Vector Search, and compose a response that actually solved the problem. The first ticket it resolved autonomously would have taken a human analyst 25 minutes. The agent did it in 4.2 seconds.

That was 6 months ago. Since then, we have built and deployed 9 production agents on Databricks for clients across financial services, healthcare, and logistics. Some of them are simple retrieval agents. Some are full multi-agent orchestrations that make decisions, call tools, and self-correct when they get it wrong.

This post is the guide I wish existed when we started. I am going to walk you through building a production-grade AI agent on Databricks from scratch: the architecture, the code, the evaluation framework, the deployment pipeline, and the hard lessons we learned along the way. Every code snippet here is adapted from real production systems. Nothing is hypothetical.

Grab some coffee. This is going to be fun.

The problem: why most AI agents never leave the notebook

Here is the uncomfortable truth about AI agents in 2026: most of them are demos. They work in notebooks. They impress stakeholders in a conference room. Then they die quietly because nobody figured out how to evaluate them, govern the data they access, or monitor them after deployment.

The gap between "cool notebook demo" and "production system that handles 10,000 requests per day without hallucinating" is enormous. It involves answering questions that most tutorials skip entirely. How do you version an agent's tools separately from its model? How do you run automated quality checks before every deployment? How do you catch a hallucination before it reaches a customer? How do you do all of this while keeping your data governance team happy?

This is where Databricks and the Mosaic AI Agent Framework earn their keep. Unity Catalog gives you governed tool access. MLflow gives you versioned evaluation and deployment. Mosaic AI Vector Search gives you retrieval that actually scales. Model Serving gives you a production endpoint with monitoring baked in.

Let me show you what this looks like in practice.

The use case: an intelligent document agent for financial compliance

Our client is a mid-market financial services company processing 2,000+ compliance inquiries per week. Each inquiry requires an analyst to search through regulatory documents, cross-reference customer account data, apply business rules, and compose a response. Average handling time: 22 minutes. Error rate on manual review: 8%.

We built an agent that handles 70% of these inquiries autonomously, escalating the rest to human analysts with a pre-drafted response and supporting evidence. Here is the full architecture.

Architecture overview

text
┌─────────────────────────────────────────────────────────────────────┐
│                     MOSAIC AI AGENT FRAMEWORK                       │
│  ┌───────────┐    ┌──────────────────┐    ┌────────────────────┐   │
│  │ Customer   │───>│  Agent Gateway   │───>│  LangGraph Agent   │   │
│  │ Portal     │    │  (Model Serving) │    │  (Orchestrator)    │   │
│  └───────────┘    └──────────────────┘    └─────────┬──────────┘   │
│                                                      │              │
│                   ┌────────────┐  ┌──────────┐  ┌──────────┐      │
│                   │ Vector     │  │ Account  │  │ Business │      │
│                   │ Search     │  │ Lookup   │  │ Rules    │      │
│                   │ Retriever  │  │ (SQL)    │  │ Engine   │      │
│                   └─────┬──────┘  └────┬─────┘  └────┬─────┘      │
│                         │              │              │             │
│                   ┌────────────┐  ┌──────────┐  ┌──────────┐      │
│                   │ 40K pages  │  │ Snowflake│  │ Rules    │      │
│                   │ regulatory │  │ account  │  │ catalog  │      │
│                   │ docs       │  │ tables   │  │ (UC)     │      │
│                   └────────────┘  └──────────┘  └──────────┘      │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │  EVALUATION + MONITORING (MLflow + Agent Evaluation)          │  │
│  └──────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────┘

Now let's build it, piece by piece.

Step 1: Setting up the data foundation with Vector Search

Before your agent can reason about anything, it needs access to knowledge. In our case, that means 40,000 pages of regulatory documentation, internal policy guides, and compliance procedures.

Most teams reach for a third-party vector database at this point. We did not. Mosaic AI Vector Search is built directly into the Databricks platform, which means it inherits Unity Catalog governance automatically. Every query is auditable. Every document is access-controlled. Your compliance team will thank you.

First, we prepare and chunk the documents using a Delta table:

python
from pyspark.sql import functions as F
from databricks.sdk import WorkspaceClient

raw_docs = spark.read.format("binaryFile").load(
    "dbfs:/Volumes/compliance_catalog/documents/regulatory_filings/"
)

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=50,
    separators=["\n\n", "\n", ". ", " "]
)

@F.udf("array<string>")
def chunk_text(content: str):
    return splitter.split_text(content)

chunked_docs = (
    raw_docs
    .withColumn("text", F.col("content").cast("string"))
    .withColumn("chunks", chunk_text(F.col("text")))
    .withColumn("chunk", F.explode("chunks"))
    .withColumn("chunk_id", F.monotonically_increasing_id())
    .select("path", "chunk_id", "chunk", "modificationTime")
)

chunked_docs.write.format("delta").mode("overwrite").saveAsTable(
    "compliance_catalog.documents.regulatory_chunks"
)

Next, create a Vector Search endpoint and sync the index:

python
from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

w.vector_search_endpoints.create_endpoint(
    name="compliance-vs-endpoint",
    endpoint_type="STANDARD"
)

w.vector_search_indexes.create_index(
    name="compliance_catalog.documents.regulatory_chunks_index",
    endpoint_name="compliance-vs-endpoint",
    primary_key="chunk_id",
    index_type="DELTA_SYNC",
    delta_sync_index_spec={
        "source_table": "compliance_catalog.documents.regulatory_chunks",
        "embedding_source_columns": [
            {"name": "chunk", "embedding_model_endpoint_name": "databricks-bge-large-en"}
        ],
        "pipeline_type": "TRIGGERED"
    }
)

The beautiful thing here: Databricks handles embedding generation automatically. You point it at a text column, specify an embedding model endpoint, and the platform manages the rest. When your source Delta table updates, the index syncs. No Airflow DAG. No cron job. No orphaned embeddings.

Step 2: Defining agent tools in Unity Catalog

Here is where things get interesting. In the Mosaic AI Agent Framework, tools are first-class citizens registered in Unity Catalog. This means they are versioned, governed, discoverable, and auditable, just like your tables and models.

Our agent needs three tools. Let's build them.

Tool 1: Regulatory document retriever

We wrap our Vector Search index in a Unity Catalog function so the agent can call it as a tool:

sql
CREATE OR REPLACE FUNCTION compliance_catalog.tools.search_regulatory_docs(
  query STRING COMMENT 'The compliance question or topic to search for',
  num_results INT DEFAULT 5 COMMENT 'Number of relevant document chunks to return'
)
RETURNS TABLE(
  chunk_id BIGINT,
  content STRING,
  source_document STRING,
  relevance_score DOUBLE
)
COMMENT 'Searches regulatory documentation for relevant policy and compliance information.
Use this tool when the user asks about regulations, compliance requirements, or policy guidelines.'
RETURN
  SELECT
    chunk_id,
    SUBSTRING(chunk, 0, 8192) AS content,
    path AS source_document,
    score AS relevance_score
  FROM vector_search(
    index => 'compliance_catalog.documents.regulatory_chunks_index',
    query => query,
    num_results => num_results
  )

Notice the COMMENT on the function and each parameter. These are not decoration. The agent's LLM reads these comments to decide when and how to invoke the tool. Good tool descriptions are the difference between an agent that works and an agent that flails.

Tool 2: Customer account lookup

sql
CREATE OR REPLACE FUNCTION compliance_catalog.tools.get_customer_account(
  customer_id STRING COMMENT 'The unique customer identifier (format: CUS-XXXXX)'
)
RETURNS TABLE(
  customer_id STRING,
  account_type STRING,
  risk_tier STRING,
  last_review_date DATE,
  open_inquiries INT,
  account_status STRING
)
COMMENT 'Retrieves customer account details and compliance status.
Use this tool when you need to look up a specific customer account, check their risk tier, or verify their compliance review status.
Do NOT use this tool for general regulatory questions unrelated to a specific customer.'
RETURN
  SELECT customer_id, account_type, risk_tier, last_review_date, open_inquiries, account_status
  FROM compliance_catalog.accounts.customer_profiles
  WHERE customer_id = get_customer_account.customer_id

Tool 3: Business rules engine

sql
CREATE OR REPLACE FUNCTION compliance_catalog.tools.evaluate_compliance_rule(
  rule_category STRING COMMENT 'The compliance category: KYC, AML, TRANSACTION_MONITORING, or REPORTING',
  account_type STRING COMMENT 'The customer account type: INDIVIDUAL, CORPORATE, or TRUST',
  risk_tier STRING COMMENT 'The customer risk tier: LOW, MEDIUM, HIGH, or CRITICAL'
)
RETURNS TABLE(
  rule_id STRING,
  rule_description STRING,
  required_action STRING,
  escalation_threshold STRING,
  review_frequency_days INT
)
COMMENT 'Returns applicable compliance rules for a given category, account type, and risk tier.
Always call get_customer_account first to obtain the account_type and risk_tier parameters.'
RETURN
  SELECT rule_id, rule_description, required_action, escalation_threshold, review_frequency_days
  FROM compliance_catalog.rules.compliance_rulebook
  WHERE category = rule_category
    AND applicable_account_type = account_type
    AND applicable_risk_tier = risk_tier

Three tools. All governed by Unity Catalog. All with explicit comments that guide the LLM's tool selection. All queryable, auditable, and version-controlled.

Step 3: Building the agent with LangGraph and ChatAgent

Now we wire everything together. We are using LangGraph for the orchestration layer because it gives us explicit control over the agent's reasoning loop. We wrap the whole thing in an MLflow ChatAgent so it plugs directly into Databricks Model Serving.

python
# agent.py
import mlflow
from mlflow.pyfunc import ChatAgent
from mlflow.types.agent import ChatAgentMessage, ChatAgentResponse, ChatAgentChunk, ChatContext
from typing import Optional, Any
from dataclasses import dataclass
from langchain_community.chat_models import ChatDatabricks
from databricks_langchain import UCFunctionToolkit
from langgraph.graph import StateGraph, START, END
from langgraph.prebuilt import ToolNode
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage

config = mlflow.models.ModelConfig(development_config="agent_config.yaml")
LLM_ENDPOINT = config.get("llm_endpoint")
UC_TOOL_NAMES = config.get("uc_tool_names")
MAX_REASONING_STEPS = config.get("max_reasoning_steps", 10)

SYSTEM_PROMPT = """You are a compliance assistant for a financial services company.
Your job is to answer customer compliance inquiries accurately and completely.

RULES:
1. Always search regulatory documentation before answering compliance questions.
2. Always look up the customer's account before applying business rules.
3. If you are less than 70% confident in your answer, escalate to a human analyst.
4. Cite the specific regulatory document and section in your response.
5. Never fabricate regulatory requirements. If you cannot find the answer, say so.

ESCALATION CRITERIA:
- Novel regulatory interpretations
- Cross-jurisdictional questions
- High-risk or critical-tier customer accounts
- Any question involving pending litigation
"""

llm = ChatDatabricks(endpoint=LLM_ENDPOINT, temperature=0.1)
toolkit = UCFunctionToolkit(function_names=UC_TOOL_NAMES)
tools = toolkit.get_tools()
llm_with_tools = llm.bind_tools(tools)


@dataclass
class AgentState:
    messages: list
    reasoning_steps: int = 0
    should_escalate: bool = False


def reasoning_node(state: AgentState) -> dict:
    messages = [SystemMessage(content=SYSTEM_PROMPT)] + state.messages
    response = llm_with_tools.invoke(messages)
    new_step_count = state.reasoning_steps + 1
    should_escalate = (
        new_step_count >= MAX_REASONING_STEPS
        or "ESCALATE" in response.content.upper()
        or "[LOW CONFIDENCE]" in response.content.upper()
    )
    return {
        "messages": state.messages + [response],
        "reasoning_steps": new_step_count,
        "should_escalate": should_escalate,
    }


def route_after_reasoning(state: AgentState) -> str:
    if state.should_escalate:
        return "escalate"
    last_message = state.messages[-1]
    if hasattr(last_message, "tool_calls") and last_message.tool_calls:
        return "tools"
    return END


def escalation_node(state: AgentState) -> dict:
    escalation_msg = AIMessage(content=(
        "[ESCALATED TO HUMAN ANALYST]\n\n"
        "This inquiry has been flagged for human review."
    ))
    return {"messages": state.messages + [escalation_msg]}


tool_node = ToolNode(tools)
graph = StateGraph(AgentState)
graph.add_node("reason", reasoning_node)
graph.add_node("tools", tool_node)
graph.add_node("escalate", escalation_node)
graph.add_edge(START, "reason")
graph.add_conditional_edges("reason", route_after_reasoning, {
    "tools": "tools",
    "escalate": "escalate",
    END: END,
})
graph.add_edge("tools", "reason")
graph.add_edge("escalate", END)
agent_graph = graph.compile()


class ComplianceAgent(ChatAgent):
    def predict(self, messages, context=None, custom_inputs=None):
        lc_messages = []
        for msg in messages:
            if msg.role == "user":
                lc_messages.append(HumanMessage(content=msg.content))
            elif msg.role == "assistant":
                lc_messages.append(AIMessage(content=msg.content))
        result = agent_graph.invoke(AgentState(messages=lc_messages))
        final_message = result["messages"][-1]
        return ChatAgentResponse(
            messages=[ChatAgentMessage(role="assistant", content=final_message.content)]
        )

    def predict_stream(self, messages, context=None, custom_inputs=None):
        lc_messages = []
        for msg in messages:
            if msg.role == "user":
                lc_messages.append(HumanMessage(content=msg.content))
            elif msg.role == "assistant":
                lc_messages.append(AIMessage(content=msg.content))
        for event in agent_graph.stream(AgentState(messages=lc_messages), stream_mode="messages"):
            if hasattr(event, "content") and event.content:
                yield ChatAgentChunk(
                    delta=ChatAgentMessage(role="assistant", content=event.content)
                )


mlflow.models.set_model(ComplianceAgent())

A few things worth highlighting in this code.

The ModelConfig pattern. We externalize all configuration into a YAML file. Swap the LLM from Llama 3.3 70B to DBRX or GPT-4o by changing one line of config, then re-evaluate. No code changes required.

The escalation logic. This is where most agent tutorials stop and real production systems start. Our agent tracks reasoning depth, checks for self-reported low confidence, and escalates automatically for high-risk accounts. In production, 30% of queries get escalated. That is by design. A good agent knows what it does not know.

The ChatAgent wrapper. By subclassing ChatAgent and implementing predict and predict_stream, your agent automatically gets a REST API endpoint, request logging, trace capture, and integration with the Agent Evaluation framework. One class, and you are production-ready.

Here is the agent configuration YAML:

yaml
# agent_config.yaml
llm_endpoint: "databricks-meta-llama-3-3-70b-instruct"
vector_search_index: "compliance_catalog.documents.regulatory_chunks_index"
uc_tool_names:
  - "compliance_catalog.tools.search_regulatory_docs"
  - "compliance_catalog.tools.get_customer_account"
  - "compliance_catalog.tools.evaluate_compliance_rule"
max_reasoning_steps: 10
escalation_threshold: 0.7

Step 4: Evaluating the agent before deployment

This is the step that separates production agents from demo agents. You would not deploy a machine learning model without evaluating it on a test set. Why would you deploy an agent without doing the same?

Databricks Agent Evaluation lets you run structured quality assessments using LLM judges, custom metrics, and human feedback, all tracked in MLflow.

First, build an evaluation dataset:

python
import pandas as pd

eval_dataset = pd.DataFrame([
    {
        "request": "What are the KYC documentation requirements for corporate accounts?",
        "expected_facts": [
            "articles of incorporation",
            "beneficial ownership at 25% threshold",
            "government-issued ID for signers",
            "proof of address within 90 days"
        ],
        "category": "kyc"
    },
    {
        "request": "Customer CUS-44821 wants to increase their wire transfer limit. What is the process?",
        "expected_facts": [
            "risk assessment required",
            "enhanced due diligence",
            "manager approval above $50,000",
            "business justification documentation"
        ],
        "category": "transaction_limits"
    },
    {
        "request": "Are we required to file a SAR for cash deposits totaling $8,500 over three days?",
        "expected_facts": [
            "CTR threshold is $10,000",
            "structuring suspicion",
            "BSA regulations apply",
            "SAR filing recommended"
        ],
        "category": "aml"
    },
    # ... 50+ more test cases across all compliance categories
])

Now run the evaluation:

python
import mlflow

with mlflow.start_run(run_name="compliance_agent_eval_v2"):
    eval_results = mlflow.evaluate(
        model="agent.py",
        data=eval_dataset,
        model_type="databricks-agent",
        evaluator_config={
            "databricks-agent": {
                "metrics": [
                    "groundedness",
                    "relevance_to_query",
                    "chunk_relevance",
                    "safety",
                ],
                "custom_metrics": [
                    {
                        "name": "factual_accuracy",
                        "definition": (
                            "Score 1 if the response contains all expected facts. "
                            "Score 0.5 if it contains most but misses key details. "
                            "Score 0 if it contains incorrect regulatory information."
                        )
                    },
                    {
                        "name": "appropriate_escalation",
                        "definition": (
                            "Score 1 if the agent correctly escalated (or did not escalate) "
                            "based on the complexity and risk level of the inquiry."
                        )
                    }
                ]
            }
        }
    )

    print(f"Groundedness:    {eval_results.metrics['groundedness/mean']:.2f}")
    print(f"Relevance:       {eval_results.metrics['relevance_to_query/mean']:.2f}")
    print(f"Chunk relevance: {eval_results.metrics['chunk_relevance/mean']:.2f}")
    print(f"Safety:          {eval_results.metrics['safety/mean']:.2f}")
    print(f"Factual acc:     {eval_results.metrics['factual_accuracy/mean']:.2f}")
    print(f"Escalation:      {eval_results.metrics['appropriate_escalation/mean']:.2f}")

Here is what our evaluation results looked like after three iterations:

text
Metric                   | V1 (Baseline) | V2 (Tuned Prompts) | V3 (Production) | Target
Groundedness             | 0.72          | 0.89               | 0.94            | > 0.90
Relevance                | 0.81          | 0.88               | 0.93            | > 0.90
Chunk relevance          | 0.65          | 0.82               | 0.91            | > 0.85
Safety                   | 0.99          | 1.00               | 1.00            | 1.00
Factual accuracy         | 0.68          | 0.83               | 0.91            | > 0.85
Appropriate escalation   | 0.74          | 0.90               | 0.95            | > 0.90

The jump from V1 to V2 came almost entirely from better tool descriptions and a more explicit system prompt. The jump from V2 to V3 came from switching chunking from 1024-token to 512-token chunks with overlap, and adding the "cite your sources" instruction to the system prompt.

The ability to swap configurations and re-evaluate in minutes is what makes this iterative loop practical. Without it, you are guessing.

Step 5: Deploying to production with Model Serving

Deployment is where the Databricks platform really shines. Because our agent is wrapped in a ChatAgent, deploying it to a production endpoint is straightforward:

python
import mlflow
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import EndpointCoreConfigInput

with mlflow.start_run(run_name="compliance_agent_production_v3"):
    model_info = mlflow.pyfunc.log_model(
        artifact_path="compliance_agent",
        python_model="agent.py",
        model_config="agent_config.yaml",
        pip_requirements=[
            "langchain-community==0.3.x",
            "databricks-langchain==0.4.x",
            "langgraph==0.2.x",
            "mlflow>=2.18",
        ],
        registered_model_name="compliance_catalog.models.compliance_agent"
    )

w = WorkspaceClient()

w.serving_endpoints.create(
    name="compliance-agent-prod",
    config=EndpointCoreConfigInput(
        served_entities=[{
            "entity_name": "compliance_catalog.models.compliance_agent",
            "entity_version": model_info.registered_model_version,
            "scale_to_zero_enabled": False,
            "workload_size": "Medium",
            "environment_vars": {
                "DATABRICKS_HOST": "{{secrets/compliance/databricks_host}}",
                "DATABRICKS_TOKEN": "{{secrets/compliance/databricks_token}}"
            }
        }]
    ),
    ai_gateway={
        "usage_tracking_config": {"enabled": True},
        "inference_table_config": {
            "catalog_name": "compliance_catalog",
            "schema_name": "monitoring",
            "table_name_prefix": "agent_inference_logs",
            "enabled": True
        }
    }
)

The ai_gateway configuration enables usage tracking and inference table logging — a complete audit trail of every decision your agent makes, written to a Delta table in Unity Catalog. Not optional in financial services.

Step 6: Production monitoring that actually catches problems

Deploying the agent is not the finish line. It is the starting line. Here is how we set up production monitoring:

python
from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

w.online_evaluation.create(
    name="compliance-agent-monitor",
    endpoint_name="compliance-agent-prod",
    schedule={"interval_minutes": 30},
    metrics=["groundedness", "safety", "relevance_to_query"],
    alert_configs=[
        {
            "metric": "groundedness",
            "threshold": 0.85,
            "direction": "below",
            "notification_channels": ["#compliance-agent-alerts"]
        },
        {
            "metric": "safety",
            "threshold": 0.99,
            "direction": "below",
            "notification_channels": ["#compliance-agent-alerts", "#security-critical"]
        }
    ]
)

The same evaluation logic you used in development now runs against production traffic. When groundedness drops below 0.85 or a safety issue is detected, alerts fire immediately. We caught a retrieval degradation issue within 45 minutes of it starting. Without monitoring, it would have been days.

Results and what they mean for your stack

After 6 months in production, here are the numbers:

text
Metric                          | Before (Manual) | After (Agent + Human)          | Improvement
Avg. handling time              | 22 min          | 3.1 min (auto) / 14 min (esc.) | 86% / 36%
Error rate                      | 8.0%            | 2.1%                           | 74% reduction
Weekly inquiry capacity         | 2,000           | 5,500                          | 175% increase
Compliance analyst time freed   | 0 hrs/week      | 120 hrs/week                   | N/A
Cost per inquiry                | $18.40          | $4.20                          | 77% reduction
Mean time to first response     | 4.2 hours       | 4.2 seconds                    | 99.97% reduction

The 2.1% error rate is lower than the human-only baseline because the agent is consistent. It does not forget to check a rule. It does not skip a step because it is the end of the day and it is tired. And when it is not confident, it escalates. Every time.

The 5 things we learned the hard way

After 9 production agents, here are the lessons that no documentation will teach you.

1. Tool descriptions are your most important prompt engineering. We spent 3x more time writing and iterating on Unity Catalog function COMMENT fields than on the system prompt. The LLM reads these to decide which tool to call and how to call it. Vague descriptions produce vague tool usage.

2. Start with aggressive escalation, then relax. Our V1 agent escalated 60% of queries. That is fine. We gradually tuned the confidence thresholds down as we built trust. Launching with a 5% escalation rate and hoping for the best is how you end up on the front page for the wrong reasons.

3. Chunk size matters more than model size. Switching from 1024-token to 512-token chunks improved our retrieval relevance by 26%. Switching from Llama 3.1 70B to Llama 3.3 70B improved response quality by 8%. Fix your retrieval before you upgrade your model.

4. Evaluation datasets are living documents. Every production failure becomes a new test case. Our evaluation dataset started at 30 examples and is now at 340. The dataset is the institutional memory of everything your agent has gotten wrong.

5. Inference tables are gold. The production logs that Model Serving writes to Delta tables are the richest source of insight you have. We run weekly analyses on them to find edge cases, measure tool usage patterns, and identify queries the agent struggles with. This feeds directly back into the evaluation dataset.

Where this is heading

We are currently building multi-agent orchestrations where a supervisor agent routes inquiries to specialized sub-agents (one for KYC, one for AML, one for transaction monitoring), each with their own tool sets and evaluation criteria. Databricks recently introduced Agent Bricks, which automates much of the agent construction process. We are watching this closely.

The infrastructure for production AI agents is finally mature enough that the bottleneck is no longer technology. It is knowing what to build and how to evaluate it. The code patterns in this post are our answer to the "how." The "what" is up to you.

If your team is sitting on a pile of unstructured documents, a set of business rules, and a manual process that is eating your analysts' time, you have all the ingredients. The Databricks platform gives you the tools. Unity Catalog gives you the governance. MLflow gives you the evaluation loop. The rest is engineering.

And engineering is what we do.

Rizwan Yousuf is VP of Data and AI at Blue Orange Digital, where he leads a team of data engineers and ML practitioners building production data platforms and AI systems. He has shipped 200+ production pipelines and 9 production AI agents across financial services, healthcare, and logistics. Book an architecture review | Explore our AI/ML practice

EDGE Overview

Data Foundation

AI & Intelligent Automation

Analytics & Decision Intelligence

AI & Data Strategy

Modernization

HealthTech

Real Estate & Construction

CPG

Fintech Payments

Financial Services

Blog

News

Case Studies

Insights

Events

The EDGE Framework

Unlocking Impact with a Fractional Data Team

Unlock data's full potential—without the overhead

Reaching New Heights with Databricks: A Feature Roundup for Data-Driven Success

Boost your data and AI workflows with the latest Databricks features

About Us

Leadership Team

Careers