Show HN: Instantly visualize any codebase as an interactive diagram

corysama · 2024-12-27T17:03:57 1735319037

To save everyone a lot of time and OP some money, here is https://github.com/id-Software/Quake-III-Arena diagrammed:

https://gitdiagram.com/id-Software/Quake-III-Arena

Clicking on any box takes you directly to either a file or a folder in the repo. AFAICT, the boxes, wires, groups, labels are all inferred by the AI.

ahmedkhaleel · 2024-12-27T17:09:09 1735319349

yup, if you look at the backend its basically just a pipeline of information extracted from the file tree and readme using Claude 3.5 Sonnet, sick diagram tho

ComputerGuru · 2024-12-27T17:20:52 1735320052

Bug: url checker is case sensitive. For those of us that type out the url from memory on a stupid touch-type device with auto incorrect, you’ll get things like Http(s)://Github

Also might want to coalesce https and http

Not sure if it queues jobs for processing so that when I refresh after a failure it is continuing where it left off or if it is starting over anew? “Progress bar” makes it hard to say.

Aside: I dislike the “modern progress bar” that’s just a scrolling marquee of pithy quips. One of the difficult problems I worked on for a SW project was adding sane progress to a multi-stage backup tool so that the completed percentage and ETA correctly represented a mix of millions of single kb files and random multi-gb files, backed up across multiple pipelines on multiple cores, asynchronously piping from one stage to the next with buffering. Needed to add a good progress metric without poisoning cpu core caches or hurting the efficiency of how work was being divided. This doesn’t seem as hard by comparison!

Sorry for only having tangentially relevant things to report at this time; still waiting for it to finish with the fish-shell codebase so I can give some good feedback!

layer8 · 2024-12-27T22:29:41 1735338581

It’s also unclear why you have to enter the prefix https://github.com in the first place.

uncomplexity_ · 2024-12-27T23:16:55 1735341415

Of course I have to test it with https://gitdiagram.com/torvalds/linux

"Estimated cost: $8.07 USD" lol

Edit: "Repository is too large (>200k tokens) for analysis. Claude 3.5 Sonnet's max context length is 200k tokens. Current size: 1334798 tokens."

mshockwave · 2024-12-28T00:09:14 1735344554

> Of course I have to test it with https://gitdiagram.com/torvalds/linux

I really wish to see how well (or bad) it works on mega projects. Because those are usually the ones I need diagrams like this the most.

mazambazz · 2024-12-28T22:26:17 1735424777

Same for https://gitdiagram.com/nixos/nixpkgs @ 1208750 tokens, haha.

freakynit · 2024-12-28T04:27:09 1735360029

Same lol... my immediate test case for this was this repo...

""" Repository is too large (>200k tokens) for analysis. Claude 3.5 Sonnet's max context length is 200k tokens. Current size: 1334798 tokens. """

btown · 2024-12-27T22:39:32 1735339172

I really, really like this approach! Rather than trying to build a full graph of related low-level components from function calls etc., which is almost always overwhelming, this just finds the "vibes" of how modules are named in the filesystem, and how they relate to common design patterns - which in many cases is exactly what you want for exploring a codebase or understanding the scope of its offering.

https://github.com/ahmedkhaleel2004/gitdiagram/blob/main/bac... - the prompts in question. Don't sell yourself short as per the comments, they're very well designed prompts!

Using an inexpensive LLM to summarize each file might be an interesting next step, putting few-word summaries alongside the filenames in much the same setup you currently have! But, honestly, it may not be particularly necessary for large existing open-source projects that have already bikeshedded their file naming over many iterations, and/or have highly intentional structures for maintainability.

billyp-rva · 2024-12-27T17:32:45 1735320765

Didn't get a response (expected), but I would caution everyone to keep expectations low. Generating system diagrams from code is extremely difficult, if not impossible, even with AI [0].

[0] https://www.ilograph.com/blog/posts/diagrams-ai-can-and-cann...

mulmboy · 2024-12-27T21:12:03 1735333923

This is a poor quality blog. They "upload" source code to chatgpt so who knows if it's in context or ragd plus it looks like they reuse the chat from a previous prose -> code session. They criticise LLM for high level diagram despite using a garbage prompt and rejecting the idea of iterating on the diagram.

Anecdotally I've had great success with code to diagram via LLM including fine details. But as with anything LLM you need to really get the context right. This can not be overemphasized. And iterate with the LLM, goodness.

billyp-rva · 2024-12-27T21:23:28 1735334608

As mentioned in the blog, iterating on generating a diagram from a repo kind of defeats the purpose. The information is in the repo; if the LLM isn't going to analyze it properly, you might as well just tell it exactly what to diagram (like in the previous section on "whiteboarding"). It is much more capable at that.

If you have some examples of an LLM doing better, by all means please share.

mulmboy · 2024-12-27T21:55:06 1735336506

"properly" is the key word here. You've got to communicate what you want. Or at least communicate something like what your goal is from the diagram, why you want it, so the LLM knows the audience it's targeting.

Like imagine giving the same prompt (instruction, directive, task) to a human - you would in all likelihood get out a similar high level diagram because you've not provided even the slightest whiff of what you want to use the diagram for.

The blog's takeaway is essentially "LLM didn't read my mind so no good". They're tools to be used and you get out what you put in.

lor_louis · 2024-12-27T16:57:45 1735318665

I tried it on a personal repo and it never ended up generating a diagram.

Might be a bug, so here's the repo. https://github.com/lorlouis/cedit

ahmedkhaleel · 2024-12-27T17:05:30 1735319130

i see a diagram there now, might've taken a while to load https://gitdiagram.com/lorlouis/cedit

billyp-rva · 2024-12-27T17:53:21 1735322001

I guess to get the discussion going: the diagrams look nice, but I think the erroneous and missed connections limit their usefulness. In this example, there is a connection from "Line manager" (line.h/.c) to "Memory Allocator" (xalloc.h/.c), but it doesn't look like such a dependency exists in the codebase. Meanwhile, "Syntax Highlighter" does directly import xalloc.h, but there is no dependency shown in the diagram.

lor_louis · 2024-12-27T19:03:09 1735326189

As the author of the repo I was surprised how close to my mental model the graph was. (I haven't worked on this repo in a while so my mental model is a bit fuzzy)

And yeah it seems to be missing a few connections but the ones that are there are correct line.h depends on str.h which links str.c which depends on xmalloc.c/h.

But had I been a new contributor to the project I would have found the missing links between modules pretty frustrating.

My guess is that the graph is only good enough to give and overview of what things are dependent on what if I were working on a monorepo and I had to justify to my manager that team B needs to do something for us.

I like the idea though, a lot of direct code to graph tools are too noisy and that tends scares non technical people away.

antonpirker · 2024-12-27T18:13:25 1735323205

This looks really nice!

I tried it with mine: https://gitdiagram.com/getsentry/sentry-python

A view things:

There are way more integrations in the integration layer, so maybe they should be either shown or a "..." somewhere should tell people that there is more.

The "Hub" is deprecated so it would be cool, that this fact is shown somewhere.

Otherwise really cool!

Animats · 2024-12-27T21:52:04 1735336324

Nice.

I put in a repository of mine that implements a UI in Rust, and it gave me a reasonable diagram. It's just a top-level structure of the program, though. No detail. Not much info about connections between components. The layout was kind of weird.[1]

Another one, from a fork I have of a rendering library.[2] It found the big parts, but provides little insight.

Here's a JPEG 2000 decoder. Even less insight.[3]

The progress messages are bogus. They have no relationship to what's going on. Progress messages indicating progress appear for a bad URL.

[1] https://gitdiagram.com/John-Nagle/ui-mock

[2] https://gitdiagram.com/John-Nagle/rend3-hp

[3] https://gitdiagram.com/John-Nagle/jpeg2000-decoder

owenpalmer · 2024-12-27T16:36:44 1735317404

Inputted llvm-project, resulted in this error:

  Repository is too large (>200k tokens) for analysis. Claude 3.5 Sonnet's max context length is 200k tokens. Current size: 1448461 tokens.

karmakaze · 2024-12-27T20:10:39 1735330239

I tried with sorbet/sorbet:

> File tree and README combined exceeds token limit (50,000). Current size: 159829 tokens. This GitHub repository is too large for my wallet, but you can continue by providing your own Anthropic API key.

Without having an idea of the output it would produce, I can't tell if it's worth it. I'm not particularly interested in this test example so it's something I might try for an easy win, but probably tweak and maintain whatever it produces--or discard it and make something by hand. Showing something subjectively incorrect is good motivation.

ahmedkhaleel · 2024-12-27T16:42:15 1735317735

1.4M tokens... even if Claude can process that there would be a crater in my wallet

d0mine · 2024-12-27T17:12:43 1735319563

Deepseek v3 is ~1$ for 1M tokens (cheaper at the moment). It is comparable to Sonnet in performance

https://api-docs.deepseek.com/news/news1226

NathanFlurry · 2024-12-28T04:55:20 1735361720

This is fun! I’ve come across a few tools like this before and almost dismissed it due to past poor experiences. However, I just diagrammed our startup’s codebase, and it was surprisingly similar to our hand-made diagram. I tried customizing it with specific instructions to ignore legacy code & provide an understanding of our edge architecture, but the one-generation-per-day rate limit is a bit restrictive.

For comparison:

- Hand made diagram: https://github.com/rivet-gg/rivet/blob/d45bf556e903404ab2df0...

- GitDiagram (no instructions): https://gitdiagram.com/rivet-gg/rivet

jasfi · 2024-12-28T10:39:45 1735382385

I tried it on one of my repos and was somewhat impressed. I'd like to be able to drill down into subsystems though. This sort of tool could help to learn about codebases faster.

thih9 · 2024-12-27T21:57:59 1735336679

Note: the messages that are displayed during diagram generation seem to describe the current progress - but actually they are generic/comedic messages that run in an endless loop.

Also, I tried this with https://github.com/rails/rails and it never finished.

dylan604 · 2024-12-27T22:04:32 1735337072

Are any of them kicking the llama’s ass while reticulating spines?

thih9 · 2024-12-27T22:34:36 1735338876

It’s mostly re-reticulating splines that I find annoying. By that I mean: running it all in a loop, endlessly - showing neither a result nor an error message.

But you’re right, reticulating splines and similar generic/comedic messages do have a long tradition.

dylan604 · 2024-12-27T23:05:42 1735340742

Getting useful feedback from remote servers via webui is my least fave.

thih9 · 2024-12-28T07:19:43 1735370383

A simple timeout would already be an improvement. Or some generic information about how long is the process supposed to take.

ssivark · 2024-12-28T04:38:48 1735360728

This kind of capability would be quite nifty as an IDE plugin -- I'd like to get visualizations based on some selected code -- maybe a few lines, files or even a whole project.

initramfs · 2024-12-27T21:35:03 1735335303

Wow, this is awesome:

https://gitdiagram.com/EI2030/Low-power-E-Paper-OS

https://gitdiagram.com/hatonthecat/Solar-Kernel

https://gitdiagram.com/hatonthecat/OpenSourceCondo

https://gitdiagram.com/hatonthecat/Open-Source-Car

gloosx · 2024-12-28T06:37:42 1735367862

>I created this because I wanted to contribute to open-source projects but quickly realized their codebases are too massive for me to dig through manually

I don't get this statement: tried the first repository with the following result

>Repository is too large (>200k tokens) for analysis

So seems this is not suitable for codebases "too massive" because they are "too large", what to do with these?

shahzaibmushtaq · 2024-12-27T17:01:48 1735318908

I learn faster with visualization, whether it's a stat or a codebase. Much appreciated!

ahmedkhaleel · 2024-12-27T17:06:19 1735319179

yup, exact same here, glad to see my project helping others alike

layer8 · 2024-12-27T22:34:35 1735338875

I tried this with a larger project, and after looping through the obviously-fake progress messages for a couple of minutes, it resulted in “Failed to generate diagram. Please try again later.”

diamondage · 2024-12-28T16:56:47 1735405007

would love to see this taken further. I note that structure 101 seems to have been bought by sonar (https://www.youtube.com/watch?v=6Mie3Iya4zE) but they did this back in the day for java etc. would love to have an accessible modern add on with the same concepts of fat, tangle etc.

WillAdams · 2024-12-27T20:57:23 1735333043

Failed with:

https://github.com/WillAdams/gcodepreview

which is probably the weirdest structure one could imagine (Literate Program as a .tex file containing Python and OpenSCAD code for https://pythonscad.org/ ) there the Python file is the core, there is an intermediate OpenSCAD file which wraps it, and then a top-level OpenSCAD file which the user interacts with.

jesse__ · 2024-12-27T17:06:33 1735319193

Tried https://github.com/scallyw4g/bonsai and I believe it hung after the green loading bar completed. Left it for several minutes

ahmedkhaleel · 2024-12-27T17:07:44 1735319264

yea i just checked the db, nothing there. thats weird, ill try loading it myself

jesse__ · 2024-12-27T22:35:48 1735338948

Looks like it works now, although the diagram it generated is pretty watery.

It mis-judged that the "work queue system" is kind of off by itself, when in fact almost all of the important work in the engine goes through the work queue. It did do a good job of at least approximately figuring out the render pipeline stages. Somehow it thinks that "input processing" isn't related to the platform layer, which doesn't make any sense at all.

Seems like a pretty reasonable result for a weekend project, nice work :)

DoingIsLearning · 2024-12-28T01:47:31 1735350451

https://gitdiagram.com/anntzer/mplcursors gave me this err:

Error code: 529 - {'type': 'error', 'error': {'type': 'overloaded_error', 'message': 'Overloaded'}}

I assume Anthropic is suffering...

nulld3v · 2024-12-27T21:23:38 1735334618

Hashicorp Vault (had to use my own API key as the repo is fairly large): https://gitdiagram.com/hashicorp/vault

Diagram could maybe have a bit more detail but what is there looks accurate! Really cool stuff OP!

Minervaskell · 2024-12-27T21:32:08 1735335128

Final boss: https://github.com/torvalds/linux

Error message: Repository is too large (>200k tokens) for analysis. Claude 3.5 Sonnet's max context length is 200k tokens. Current size: 1334798 tokens.

Cool project though! Kudos!

fsndz · 2024-12-27T20:57:12 1735333032

I have been attempting to do this and failed. Good implementation, but still fails in a lot of cases like my implementation. For example fails for this: https://github.com/stanfordnlp/dspy

nhatcher · 2024-12-27T19:29:38 1735327778

I tried with mine, of course. It worked quite well I would say:

https://gitdiagram.com/ironcalc/IronCalc

I think the color coding for the legend is incorrect though.

Overall looks great, congratulations and thanks!

chris_5f · 2024-12-27T18:00:27 1735322427

This is just amazing. I tried out with my startup's repo and it was a blast. Shared with my community

ahmedkhaleel · 2024-12-27T18:05:20 1735322720

thank you so much!!

k0ns0l · 2024-12-28T08:20:13 1735374013

Love what you've built OP

Quick thought: Since you're tackling large codebases, maybe add some zoom controls?

phoenixreader · 2024-12-28T06:43:32 1735368212

That’s very cool! Does it use the source code at all? Or does it just use README and directory tree?

louthy · 2024-12-27T17:26:46 1735320406

Doesn’t like my repo [1]

“Failed to generate diagram. Please try again later.”

[1] https://github.com/louthy/language-ext

mparnisari · 2024-12-27T16:58:46 1735318726

It never generated a diagram for mine :(

ahmedkhaleel · 2024-12-27T17:06:36 1735319196

thats unfortunate, whats the url? ill check it out

abrookewood · 2024-12-27T23:50:48 1735343448

Looks great, but would love to see it work with Private Repos.

whalesalad · 2024-12-27T16:04:10 1735315450

I hope you have a cap set on your billing!

ahmedkhaleel · 2024-12-27T16:06:34 1735315594

yup, i did that, i have it currently on balance mode so its not using anything without my knowledge

visch · 2024-12-27T16:07:21 1735315641

:D , he got a star from me for the ease!

ahmedkhaleel · 2024-12-27T16:10:18 1735315818

haha thanks so much! really went for that with this project. hopefully wont cause any issues

blondin · 2024-12-27T17:22:27 1735320147

love this idea! here's ghostty: https://gitdiagram.com/ghostty-org/ghostty

ahmedkhaleel · 2024-12-27T18:06:06 1735322766

ghostty is sick, and the diagram seems accurate on a higher level

Jerrrry · 2024-12-28T03:00:40 1735354840

This is the neatest and most useful thing Shown on HN using AI in recent memory.

Well done!