Wild Moose helps on-call developers more quickly identify the source of production incidents, by providing a conversational AI trained on their environment. When you’re on-call and everything’s on fire, instead of frantically sifting through logs and going through other people’s code, our moose just gives you the answers.
Debugging in production with our moose allows you to solve issues in minutes instead of hours, reducing MTTR 100x. It helps you avoid costly downtime, save time, and keep your SLAs in check.
Here are the biggest misconceptions about AI adoption in incident response:
1: “Our logs are too messy.”
I hear this one all the time. But actually, messy data is exactly where AI shines.
LLMs are designed for unstructured data such as Logs, Slack threads and postmortems… Actually, the richer and more organic the data, the better the results get.
2: “Our observability stack is all over the place”
We know you use legacy observability tools, and you’re right in the middle of migrating to new ones.
The right AI tool adds an abstraction layer that integrates with any stack you’re already using.
3: “Our playbooks are too outdated.”
Yes, that’s the case for every company. Production moves so fast, it’s impossible to keep up. But AI doesn’t need you to manually update everything.
It learns from what’s happening in real time and feeds back into the AI model.
Here’s the bottom line: These barriers are simply myths.
Tools like ours - Wild Moose - can handle messy logs, legacy tools, and outdated playbooks, so your team can focus on solving real problems. AI isn’t the future of incident response - it’s already here.
The bad news: today, many companies use AI and LLMs unsafely and irresponsibly. This is an unfortunate consequence of working teams being constantly pushed to deliver functionality improvements while holding an increasingly useful set of AI-based tools. The trend is troubling: just two years ago, realistic threats stemming from the use of AI were scarce and unique. Today, we -- those of us who are keeping an eye out -- see them at every turn.
AI practitioners are not always aware of the pervasive security pitfalls: LLMs are universally vulnerable to adversarial inputs, and they leak information in unexpected ways that defy their natural/intuitive usage.
While these problems are ubiquitous, there are also solutions. Once we've analyzed a given system to characterize the threats, we can usually address each of them through a combination of tools like differential privacy, input/data validation, complete analysis of control flows affected by LLM outputs, and more.
At Wild Moose we work hard to ensure our SRE copilot is absolutely safe and secure to use. AI safety is a top concern, to the point where we are actively participating in the public discussion that shapes how the industry addresses it.
Yesterday I was a panelist in the excellent NeurIPS workshop on adversarial AI. This was a great opportunity to process and share our vision on how our tech ecosystem can produce trustworthy and robust AI systems; ones that can be safely entrusted with the increasingly sensitive roles they play in our daily lives.
It was also an opportunity to give back to the academic community that I grew up in --- shout out to Cornell UniversityCornell TechVector InstituteTel Aviv University
I thank the organizers (especially Avital Shafran and Niv Cohen), and fellow workshop participants, from whom I learned a lot in very little time. Fortunately, the future of AI safety and alignment is truly in the best of hands.
(in the pictures: the panel, and a beautiful Vancouver rainbow as seen from the NeurIPS conference venue)
So good to be in the same room with our engineers in Tel Aviv!
Many early-stage Israeli startups face the challenge of managing two sites - with tech talent often in Tel Aviv and the market in the US. Some make the transition later, but we decided to take this step from the very beginning at Wild Moose.
It’s not without its challenges, but the benefits of building strong foundations early are clear.
We have a two-site system that includes an ambient Zoom, adjusting work hours to optimize overlap, and frequent visits to the Tel Aviv office.
With this setup, we maintain the speed, flexibility, and unity of a small startup, while also building a solid foundation for scale.
Grateful to work alongside such talented people, no matter the distance! 🥲🤍
You’ve been dragged out of bed at 4 a.m. by a production alert, and you're not even sure where to start. Every engineer has been there.
AI is going to change this for good.
It’s going to replace engineers as first responders.
For years we had to suffer this chaos of critical incidents. This isn’t just frustrating—it’s risky. Every extra minute of downtime costs money and risks impacting major accounts.
But - there’s a new standard thanks to LLMs
When an alert is triggered, an AI agent can:
🔸 Analyze logs
🔸 Identify metric anomalies
🔸 Review recent code changes
And -
✨ Give you actionable insights ✨
All before an engineer even looks at the alert.
That means - first of all, you don’t need to waste time on repetitive checks on alerts that often turn out to be nothing.
But that’s not all. With AI that *truly* understands your system, you can convert all that tribal knowledge that burdens your SMEs and blocks engineers from solving issues by themselves—into powerful automations.
Soon - every company will rely on AI as the first responder to production issues.
That’s exactly what we do at wild moose - we can integrate into your system in minutes.
The result is less downtime, happier engineers, and more time for innovation.
#AI#AIOps#Reliability#SRE#UptimeMatters
Your team’s tribal knowledge is buried in Slack. How can AI surface the answers you need—right when you need them?
In critical incidents, context is everything. Engineers need immediate access to the right information to resolve issues fast, without dragging half the team into the fire.
But here’s the reality we know all too well:
You’re on-call, unsure who’s familiar with the service.
The documentation? Incomplete and outdated.
You end up endlessly scrolling Slack, hunting for clues.
This isn’t just frustrating—it’s risky.
Every extra minute of downtime risks impacting major accounts, with serious business consequences.
We realized there’s a better way.
The shift to remote work has unintentionally done us a favor: troubleshooting conversations are now documented in Slack. Enter LLMs— technology that thrives on large, unstructured, human-centric data like this.
Wild Moose uses advanced AI to make Slack work for you by turning it into a living knowledge base:
👉 Surface past conversations to see who solved similar issues and how.
👉 Generate and update incident response playbooks, automatically.
👉Transform Slack into a dynamic, searchable resource for your entire team.
The result? Less downtime, happier engineers, and more time for innovation.
Finding a co-founder for a startup is one of toughest things ever. I met mine through Zoom.
TL;DR I believe stress-testing the relationship as fast as possible is crucial.
What was your experience meeting your co-founders?
—
When we first started exploring the idea of working together, we were all living in different countries. I was in California completing my MBA, Roei was in Canada doing his postdoc and commuting on weekends to New York to be with his wife, and Tom had just moved to Israel with his new fiancée.
To start getting to know each other, we went through a list of 50 questions for co-founders over Zoom calls.
It was clear this could be a strong partnership, but still, this is a huge decision: you are deciding who you’re potentially spending the next 10 years of your life with. You are tying your professional destiny with those people. Arguably, it’s the most important decision you’ll make as a founder, far more important than any startup idea that can change a hundred times.
To make a decision, we flew for weekend offsites together. We rented Airbnbs in Santa Cruz, then Austin, then New York... Each time, we spent days and nights brainstorming and, more importantly, getting to know each other.
When I was about to graduate, I wanted us to get to a decision - are we going for this? But Roei had to leave his postdoc for the startup, which is a decision with no turning back.
We decided to apply the “fail fast” principle to this choice. We moved in together for three months. It was a high-stakes experiment, it could fail miserably but we’d know quickly. We rented an Airbnb in the middle of nowhere with a goal: to secure initial funding by the end of our stay. If we succeeded, we’d continue. If not, we’d walk away.
In the first week, a small disagreement escalated to a huge argument, just because we hadn’t yet learned how to have difficult conversations. We could decide to say goodbye, but we chose to work through it. And I think that was the real test. We proved ourselves again and again that we are willing to do A LOT to make this work.
Because, as every founder knows, things are going to become tough. If you’re growing the company and working together for years - you can be sure you’ll experience new levels of professional and personal challenges, and you want to make sure as early as possible that your team can face those, support each other, and grow from them.
Moving in together was kind of crazy, and definitely isn't the right idea for every team, but I do think that stress-testing the relationship as fast as possible is really important. Of course, while working continuously to build strong communication and trust.
Production moves fast—here’s how we built an on-call copilot that can keep up.
The problem was obvious:
Autonomously solving production issues is an incredibly complex challenge for an AI agent.
Why? First, playbooks are outdated.
We received this response from every company we spoke with.
And it makes sense: production is always shifting, and companies want to move fast. So trying to keep every potential issue and response up to date is practically impossible.
When we realized we can’t rely on playbooks, we decided to explore new approaches to make our AI agent fully autonomous.
But letting AI make every decision in real-time didn’t work. The debugging search space is massive, with endless checks to run, making it too slow and computationally expensive.
So, what’s the solution? It's all about finding a balance between the two extremes.
Here’s what we’ve learned:
✨ Slack Is Gold
With so much troubleshooting happening in Slack (especially since WFH became the norm), we can capture this information without adding extra work for teams. Slack conversations show us exactly what steps were taken—and when.
✨ Coverage Can Grow With Use
If a new issue arises and someone runs a query we haven’t seen before—great! We’ll add it to our automated responses for the next time a similar issue shows up.
✨ Every Company Has Repetitive, Time-Consuming Checks
While these checks may differ between organizations, they exist everywhere. Automating them saves critical time—right where teams need it most.
The results speak for themselves. 💁♀️
With this approach, we’re cutting investigation time by over 40% right out of the gate.
In the photo, my team demonstrates the best view as we work to nail the perfect balance between two extremes.
Yesterday I met Yasmin Dunsky of Wild Moose. They are building autonomous investigation and debugging agents to help SREs triage and resolve incidents. They made great progress since we last met. Quite a few companies are using them to reduce the load and time to solve production issues. If you are an SRE or manager - Reach out to Yasmin and check Wild Moose out!