Week of July 15th

Stefan Krawczyk

CEO @ DAGWorks Inc. | Co-creator of Hamilton & Burr | Pipelines & Agents: Data, Data Science, Machine Learning, & LLMs

Published Jul 19, 2024

+ Follow

TL;DR:

#Hamilton release highlights: User contributed Graceful Failure Adapter improvements, SparkConnect support, Async updates & upgrades, new UI Schema view.
Hamilton OS Meetup Group: August scheduled.
#Burr release highlights: new GraphBuilder API, SERDE specification for test case creation, adds adaptive CRAG example, adds FastAPI async streaming example with Server-Sent-Events + React .
In the wild: A Hamilton user blog

Hamilton Release Notes:

Hamilton Framework == 1.71.0, 1.70.0

User Contributed Graceful Failure Adapter Improvements

A few weeks ago we added a new feature, the ability to run a Hamilton DAG all the way through even though error occurred. Last week, an open source user (thanks James Arruda !) added some cool new capabilities to this adapter.

The high-level is that you can use it to not fail the DAG if an upstream node fails, and instead bypass all downstream nodes. Here’s a simple example — you define an error to catch (so you don’t catch everything), as well as a sentinel value that will get cascaded through. It will continue as normal, but if it detects that an upstream node has failed, it will fail in itself.

# my_module.py

class DoNotProceed(Exception):
    pass # custom exception

def wont_proceed() -> int:
    raise DoNotProceed()

def will_proceed() -> int:
    return 1

def never_reached(wont_proceed: int) -> int:
    return 1  # this should not be reached

# your driver code:
dr = (
    driver.Builder()
    .with_modules(my_module)
    .with_adapters(
        default.GracefulErrorAdapter(
            error_to_catch=DoNotProceed,
            sentinel_value=None
        )
    )
    .build()
)
# will return {'will_proceed': 1, 'never_reached': None}
dr.execute(["will_proceed", "never_reached"])

The new features added now enable it to work with the `Parallel[]/Collect[…]` constructs, and has a few more toggles - see the documentation for details. For example, a new decorator `@accept_error_sentinels` was added, that allows you to pass in sentinel "error value" to a function, and handle the error in your own way in a function. Thanks James Arruda !

Spark Connect Support

Databricks recently pushed out some changes where the "SparkSession" class used is different in a "Spark Connect" context. What this meant is that Hamilton's type checking would fail and complain. Databricks plans to unify the classes, but that wont happen for a while. So in the meantime we've added an adapter that can help you out. To use it you'd just do:

from hamilton import driver
from hamilton.plugins import h_spark

dr = (
  driver.Builder()
     .with_modules(...)
      # add the adapter if you're using Hamilton with Spark Connect.
     .with_adapters(h_spark.SPARK_INPUT_CHECK)
     .build()
)

Async Upgrades & Updates

Thanks to Ryan Whitten for finding some 🐛s. We've upgraded the Async Builder and Driver.
The AsyncBuilder can now construct an AsyncDiver in a synchronous fashion, i.e. no await needed. Just use the build_without_init() function:

def build_without_init(self) -> AsyncDriver:

Hamilton SDK & UI

We've added improved capture of schema metadata and extra metadata that can be captured. This required some SDK and UI work. So now, for example, when you run say a PySpark job with the HamiltonTracker, you'll get a nice schema view, and way to explore

Examples / Documentation Updates:

Hamilton OS Meetup Group

Reminder there's no meet-up in July. But we have August scheduled. Join/sign-up here. We're excited to have Gilad Rubin speak about some of the work he's been doing on Hamilton.

Burr Release Updates 🌟

Burr == 0.23.0

GraphBuilder API

In an effort to streamline the API, we've given the ability to separate the graph definition from the application definition, specifically creating a GraphBuilder API. This allows one to clearly construct the graph once, and then reference/refer to it as needed.

base_graph = (
    graph.GraphBuilder()
    .with_actions(
        # your actions go here
    )
    .with_transitions(
        # transitions go here
    )
    .build()
)
# then you can build an application like this
app = (
        ApplicationBuilder()
        .with_graph(base_graph) # <--- this is where you add the graph
        .with_tracker(tracker)
        .with_identifiers(app_id=app_id)
        .build()
    )

For a full example, see it in action here.

SERDE Handling for Test Case Creation

Thanks to Rinat Gareev for find the bug, but we pushed a fix to enable serialization and deserialization updates to Burr's test case creation capability. It now properly handles custom serialization/deserialization that Burr enables.

More Burr Examples

We've added two new examples:

A Corrective RAG Example - thanks to Hamza Farhan for adding it!
New examples with server-sent-events, fastapi, and react to build a streaming chat app

Corrective RAG

Corrective-RAG (CRAG) is a strategy for RAG that incorporates self-reflection / self-grading on retrieved documents. In this example we show how you can build an application with Burr, using LanceDB as the vector store, Exa as the search engine, Instructor by Jason Liu , and Google 's Gemini.

Streaming Chatbot with Burr, FastAPI, and React

We're excited by this example and accompanying blog post, as it's a great overview and introduction to a few things, for example async, streaming, and server-sent-events.

Example code snippet in the blog that explains how to create a streaming endpoint

We've seen a hunger for this type of content, so we're working on adding more.

Seen in the wild: a Hamilton User Blog

It's always fun to receive word when someone writes about Hamilton. This time we had a user Carl Trachte , who stopped by our booth at PyCon, write about his first experience picking up Hamilton doing some processing; it's one way he internalizes tools is that he writes about them.

It's a short read, and what I like the most is how straightforward it is to read and understand his code. Thanks Carl!

Links with this icon were created by LinkedIn and links without it were added by the author.

Week of July 15th

Stefan Krawczyk

CEO @ DAGWorks Inc. | Co-creator of Hamilton & Burr | Pipelines & Agents: Data, Data Science, Machine Learning, & LLMs

TL;DR:

Hamilton Release Notes:

Hamilton Framework == 1.71.0, 1.70.0

Hamilton SDK & UI

Examples / Documentation Updates:

Hamilton OS Meetup Group

Recommended by LinkedIn

Burr Release Updates 🌟

Burr == 0.23.0

More Burr Examples

Seen in the wild: a Hamilton User Blog

Stefan's Weekly Updates

749 followers

More articles by this author

Insights from the community

Others also viewed

Incorporating Data Science Models and Visualizations into Web Applications: A Comprehensive Guide

Your Front-end needs a BFF!

An In-depth Look at Apollo Client for Angular Applications

Extracting Data from JSON with JSON Path

Issue #8: Marvelous MLOps

Day 2: Understanding core components of RAG pipeline

Entity Framework Core: Lazy Loading vs. Eager Loading

Building data applications with Databricks Apps

Anypoint Datagraph: an even faster solution to leverage your Application Network and everything you need to know before using it

Explore topics

TL;DR:

Hamilton Release Notes:

Hamilton Framework == 1.71.0, 1.70.0

Hamilton SDK & UI

Examples / Documentation Updates:

Hamilton OS Meetup Group

Recommended by LinkedIn

Burr Release Updates 🌟

Burr == 0.23.0

More Burr Examples

Seen in the wild: a Hamilton User Blog

Stefan's Weekly Updates

749 followers

Week of December 9th

Dec 13, 2024

Week of December 2nd

Dec 5, 2024

Week of November 18th

Nov 22, 2024

Week of November 11th

Nov 15, 2024

Week of November 4th

Nov 8, 2024

Week of October 28th

Oct 31, 2024

Week of October 21st

Oct 24, 2024

Week of October 14th

Oct 17, 2024

Week of October 7th

Oct 11, 2024

September 30th

Oct 3, 2024

Insights from the community

Others also viewed

Incorporating Data Science Models and Visualizations into Web Applications: A Comprehensive Guide

Your Front-end needs a BFF!

An In-depth Look at Apollo Client for Angular Applications

Extracting Data from JSON with JSON Path

Issue #8: Marvelous MLOps

Day 2: Understanding core components of RAG pipeline

Entity Framework Core: Lazy Loading vs. Eager Loading

Building data applications with Databricks Apps

Anypoint Datagraph: an even faster solution to leverage your Application Network and everything you need to know before using it

Explore topics