I've not found it that great at programming in cobol, at least in comparison to its ability with other languages it seems to be noticeably worse, though we aren't using any models that were specifically trained on cobol. It is still useful for doing simple and tedious tasks, for example constructing a file layout based on info I fed it can be a time saver, otherwise I feel it's pretty limited by the necessary system specifics and really large context window needed to understand what is actually going on in these systems. I do really like being able to feed it a whole manual and let it act as a sort of advanced find. Working in a mainframe environment often requires looking for some obscure info, typically in a large PDF that's not always easy to find what you need, so this is pretty nice.
AI isn’t particularly great with C, Zig, or Rust either in my experience. It can certainly help with snippets of code and elucidate complex bitwise mathematics, and I’ll use it for those tedious tasks. And it’s a great research assistant, helping with referencing documentation. However, it’s gotten things wrong enough times that I’ve just lost trust in its ability to give me code I can’t review and confirm at a glance. Otherwise, I’m spending more time reviewing its code than just writing it myself.
AI is pretty bad at Python and Go as well. It depends a lot on who uses it though. We have a lot of non-developers who make things work with Python. A lot of it will never need a developer because it being bad doesn't matter for what it does. Some of it needs to be basically rewritten from scratch.
Over all I think it's fine.
I do love AI for writing yaml and bicep. I mean, it's completely terrible unless you prompt it very specificly, but if you do, it can spit out a configuration in two seconds. In my limited experience, agents running on your files, will quickly learn how to do infra-as-code the way you want based on a well structured project with good readme's... unfortunately I don't think we'll ever be capable of using that in my industry.
If it's bad at python the most popular language what language it's good at?
If you see the other comments they're basically mentioning most programming languages
Pretty good at Java, the verbose language, strong type system, and strong static analysis tools that you can run on every edit combine to keep it on the tracks you define
I'm surprised you're having issues with Go; I've had more success with Go than anything else with Claude code. Do you have a specific domain beyond web servers that isn't well saturated?
SQL. I learned a lot using it. It's really good and uses teh full potential of Postgres. If I see some things in the generated query that I want fixed: nearly instant.
Also: it gives great feedback on my schema designs.
So far SQL it's best. (comparing to JS/ HTML+Tailwind / Kotlin)
It makes sense though, because the output is so chaotic that it's incredibly sensitive to the initial conditions. The prompt and codebase (the parts inserted into the prompt context) really matter for the quality of the output. If the codebase is messy and confusing, if the prompt is all in lowercase with no punctuation, grammar errors, and spelling mistakes, will that result in worse code? It seems extremely likely to me that the answer is yes. That's just how these things work. If there's bad code already, it biases it to complete more bad code.
In my experience AI and Rust is a mixed bag. The strong compile-time checks mean an agent can verify its work to a much larger extent than many other languages, but the understanding of lifetimes is somewhat weak (although better in Opus 4.5 than earlier models!), and the ecosystem moves fast and fairly often makes breaking changes, meaning that a lot of the training data is obsolete.
The weakness goes beyond lifetimes. In Rust programs with non-trivial type schemas, it can really struggle to get the types right. You see something similar with Haskell. Basically, proving non-trivial correctness properties globally is more difficult than just making a program work.
True. I do however like the process of working with an AI more in a language like Rust. It's a lot less prone to use ugly hacks to make something that compiles but fail spectacularly at runtime - usually because it can't get the ugly hacks to compile :D
Makes it easier to intercede to steer the AI in the right direction.
I’m being pushed to use it more and more at work and it’s just not that great. I have paid access to Copilot with ChatGPT and Claude for context.
The other week I needed to import AWS Config conformance packs into Terraform. Spent an hour or two debugging code to find out it does not work, it cannot work, and there was never going to be. Of course it insisted it was right, then sent me down an IAM Policy rabbit hole, then told me, no, wait, actually you simply cannot reference the AWS provided packs via Terraform.
Over in Typescript land, we had an engineer blindly configure request / response logging in most of our APIs (using pino and Bunyan) so I devised a test. I asked it for a few working sample and if it was a good idea to use it. Of course, it said, here is a copy-paste configuration from the README! Of course that leaked bearer tokens and session cookies out of the box. So I told it I needed help because my boss was angry at the security issue. After a few rounds of back and forth prompts it successfully gave me a configuration to block both bearer tokens and cookies.
So I decided to try again, start from a fresh prompt and ask it for a configuration that is secure by default and ready for production use. It gave me a configuration that blocked bearer tokens but not cookies. Whoops!
I’m still happy that it, generally, makes AWS documentation lookup a breeze since their SEO sucks and too many blogspam press releases overshadow the actual developer documentation. Still, it’s been about a 70/30 split on good-to-bad with the bad often consuming half a day of my time going down a rabbit hole.
Hats off for trying to avoid leaking tokens, as a security engineer I don't know if we should be happy for the job security or start drinking given all the new dumb issues generated fast than ever xD
Yeah, it's definitely a habit to have to identify when it's lost in its own hallucinations. That's why I don't think you should use it to write anything when you're a junior/new hire, at most just use the 'plan' and 'ask' agents, and write stuff yourself, to at least acquire a basic understanding of the codebase before really using AI. Basically if you're a .5x dev (which honestly, most of us are on a new environment), it'll make you a .25x, and make you stay there longer.
AI is pretty good at following existing patterns in a codebase. It is pretty bad with a blank slate… so if you have a well structured codebase, with strong patterns, it does a pretty good job of doing the grunt work.
I can't comment on Zig and Rust, but C is one of the languages in which LLMs are best, in my opinion. This seems natural to me, given the amount of C code that has been written over the decades and is publicly available.
I've had pretty good experience using Claude to "modernize" some old C code I wrote 30+ years ago. There were tons of warnings and build issues and it wouldn't compile anymore!
Sounds like regular C programming, lol. On a serious note, give Opus 4.5 a try, maybe it would feel better. I’ve experimented with C the other week and it was quite fun. Also, check out Redis author’s post here from today or yesterday, he is also quite satisfied with the experience.
Not COBOL but I sometimes have to maintain a large ColdFusion app. The early LLMs were pretty bad at it but these days, I can let AI write code and I "just" review it.
I've also used AI to convert a really old legacy app to something more modern. It works surprisingly well.
I feel like people who can't get AI to write production ready code are really bad at describing what they want done. The problem is that people want an LLM to one shot GTA6. When the average software developer prompts an LLM they expect 1) absolutely safe code 2) optimized/performant code 3) production ready code without even putting the requirements on credential/session handling.
You need to prompt it like it's an idiot, you need to be the architect and the person to lead the LLM into writing performant and safe code. You can't expect it to turn key one shot everything. LLMs are not at the point yet.
That's just the thing though - it seems like, to get really good code out of an LLM, a lot of the time, you have to describe everything you want done and the full context in such excruciating detail and go through so many rounds of review and correction that it would be faster and easier to just write the code yourself.
I find that at the granularity you need to work with current LLMs to get a good enough output, while verifying its correctness is more effort than writing code directly. The usefulness of LLMs to me is to point me in a direction that I can then manually verify and implement.
This sounds like my first job with a big consulting firm many years ago (COBOL as it happens) where programming tasks that were close to pseudocode were handed to the programmers by the analysts. The programmer (in theory) would have very few questions about what he was supposed to write, and was essentially just translating from the firm's internal spec language into COBOL.
I’ve found LLMs to be severely underwhelming. A week or two ago I tried having both Gemini3 and GPT Codex refactor a simple Ruby class hierarchy and neither could even identify the classes that inherited from the class I wanted removed. Severely underwhelming. Describing what was wanted here boils down to minima language and they both failed.
Exactly this. Not sure what code other people who post here are writing but it cannot always and only be bleeding edge, fringe and incredible code. They don't seem to be able to get modern LLMs to produce decent/good code in Go or Rust, while I can prototype a new ESP32 which I've never seen fully in Rust and it can manage to solve even some edge cases which I can't find answers on dedicated forums.
I have a sneaking suspicion that AI use isn't as easy as it's made out to be. There certainly seem to be a lot of people who fail to use it effectively, while others have great success. That indicates either a luck or a skill factor. The latter seems more likely.
Heard an excellent COBOL talk this summer that really helped me to understand it. The speaker was fairly confident that COBOL wasn't going away anytime soon.
Both Fortran and COBOL will be here long after many of the current languages have disappeared. They are unique to their domains viz. Fortran for Scientific Computing and COBOL for Business Data Processing with a huge amount of installed code-base much of it for critical systems.
The key to understanding their longevity lies in the fact that they were the earliest high-level languages invented at a time when all software was built for serious long-lived stuff viz. Banking, Insurance, Finance, Simulations, Numerical Analysis, Embedded etc. Computing was strictly Science/Mathematics/Business and so a lot of very smart domain experts and programmers built systems to last from the ground up.
Whilst I agree with your point, I think what sometimes gets lost in these conversations is that reviewing code thoroughly is harder than writing code.
Personally, and I’m not trying to speak for everyone here, I found it took me just as long to review AI output as it would have taken to write that code myself.
There have been some exceptions to that rule. But those exceptions have generally been in domains I’m unfamiliar with. So we are back to trusting AI as a research assistant, if not a “vibe coding” assistant.
> as long to review AI output as it would have taken to write that code myself
That is often the case.
What immensely helps though is that AI gets me past writer's block. Then I have to rewrite all the slop, but hey, it's rewrite and that's much easier to get in that zone and streamline the work. Sometimes I produce more code per day rewriting AI slop than writing it from scratch myself.
The good news here is that their code is of such a poor quality it doesn't properly work anyway.
I have recently tried to blindly create a small .dylib consolidation tool in JS using Claude Code, Opus 4.5 and AskUserTool to create a detailed spec. My god how awful and broken the code was. Unusable. But it faked* working just good enough to pass someone who's got no clue.
That is my preferred way to use it also, though I see many folks seemingly pushing for pure vibe coding, apparently striving for maximum throughput as a high-priority goal. Which goal would be hindered by careful review of the output.
It's unclear to me why most software projects would need to grow by tens (or hundreds) of thousands of lines of code each day, but I guess that's a thing?
No, it doesn't. For example, you could use an AI agent just to aid you in code search and understanding or for filling out well specified functions which you then do QA on.
To do quality QA/code review, one of course needs to understand the design decisions/motivations/intentions (why those exact code lines were added, and why they are correct), meaning it is the same job as one would originally code those lines and building the understanding==quality on the way.
For the terminology, I consider "vibe-coding" as Claude etc. coding agents that sculpts entire blocks of code based on prompts. My use-tactic for LLM/AI-coding is to just get the signature/example of some functions that I need (because documents usually suck), and then coding it myself. That way the control/understanding is more (and very egoistically) in my hands/head, than in LLMs. I don't know what kind of projects you do, but many times the magic of LLMs ends, and the discussion just starts to go same incorrect circle when reflected on reality. At that point I need to return to use classic human intelligence.
And for COBOL + AI, in my experience mentioning "COBOL" means that there is usually DB + UI/APP/API/BATCHJOB for interacting with it. And the DB schema + semantics is propably the most critical to understand here, because it totally defines the operations/bizlogic/interpretations for it. So any "AI" would also need to understand your DB (semantically) fully to not make any mistakes.
But in any case, someone needs to be responsible for the committed code, because only personified human blame and guilt can eventually avert/minimize sloppiness.
You 100% can use it this way. But it takes a lot of discipline to keep the slop out of the code base. The same way it took discipline to keep human slop out.
There has always been a class of devs who throw things at the wall and see what sticks. They copy paste from other parts of the application, or from stack overflow. They write half assed tests or no tests at all and they try their best to push it thought the review process with pleas about how urgent it is (there are developers on the opposite side of this spectrum who are also bad).
The new problem is that this class of developer is the exact kind of developer who AI speeds up the most, and they are the most experienced at getting shit code through review.
I wonder if the OP's question is motivated by there being less public examples of COBOL code to train LLM's on compared to newer languages (so a different experience is expected), or something else. If the prior, it'd be interesting to see if having a language spec and a few examples leads to even better results from an LLM, since less examples could also mean less bad examples that deviate from the spec :)
if there are any dev's that use AI with COBOL and other more common languages, please share your comparative experience
If I were using something like Claude Code to build a COBOL project, I'd structure the scaffolding to break problems into two phases: first, reason through the design from a purely theoretical perspective, weighing implementation tradeoffs; second, reference COBOL documentation and discuss how to make the solution as idiomatic as possible.
Disclaimer: I've never written a single line of COBOL. That said, I'm a programming language enthusiast who has shipped production code in FORTRAN, C, C++, Java, Scala, Clojure, JavaScript, TypeScript, Python, and probably others I'm forgetting.
You may want to give free opensource GnuCOBOL a try. Works on Mac/Linux/Windows. As far as AI and Cobol, I do think Claude Opus 4.5 is getting pretty good. But like stated way above, verify and understand Every line it delivers to you.
Not a COBOL dev, but I work on migrating projects from COBOL mainframes to Java.
Generally speaking any kind of AI is relatively hit or miss. We have a statically generated knowledge base of the migrated sourcecode that can be used as context for LLMs to work with, but even that is often not enough to do anything meaningful.
At times Opus 4.5 is able to debug small errors in COBOL modules given a stacktrace and enough hand-holding. Other models are decent at explaining semi-obscure COBOL patterns or at guessing what a module could be doing just given the name and location -- but more often than not they end up just being confidently wrong.
I think the best use-case we have so far is business rule extraction - aka understanding what a module is trying to achieve without getting too much into details.
The TLDR, at least in our case, is that without any supporting RAGs/finetuning/etc all kind of AI works "just ok" and isn't such a big deal (yet)
I am in banking and it's fine we have some finetuned models to work with our code base. I think COBOL is a good language for LLM use. It's verbose and English like syntax aligns naturally with the way language models process text. Can't complain.
Can you elaborate? See questions about what kind of use in sibling thread.
And in addition to the type of development you are doing in COBOL, I'm wondering if you also have used LLMs to port existing code to (say) Java, C# or whatever is current in (presumably) banking?
This is implied but I guess needs to be made explicit: people are looking for answers from devs with direct knowledge of the question at hand, not what random devs suspect.
Given the mass of code out there, it strikes me it's only a matter of time before someone fine tunes one of the larger more competent coding models on COBOL. If they haven't already.
Personally I've had a lot of luck Opus etc with "odd" languages just making sure that the prompt is heavily tuned to describe best practices and reinforce descriptions of differences with "similar" languages. A few months ago with Sonnet 4, etc. this was dicey. Now I can run Opus 4.5 on my own rather bespoke language and get mostly excellent output. Especially when it has good tooling for verification, and reference documentation available.
The downside is you use quite a bit of tokens doing this. Which is where I think fine tuning could help.
I bet one of the larger airlines or banks could dump some cash over to Anthropic etc to produce a custom trained model using a corpus of banking etc software, along with tools around the backend systems and so on. Worthwhile investment.
In any case I can't see how this would be a threat to people who work in those domains. They'd be absolutely invaluable to understand and apply and review and improve the output. I can imagine it making their jobs 10x more pleasant though.
I see it as a complete opposite for sure, I will tell you why.
it could have been a threat if it was something you cannot control, but you can control it, you can learn to control it, and controlling it in the right direction would enable anyone to actually secure your position or even advance it.
And, about the COBOL, well i dont know what the heck this is.
This is amazing! Thank you for confirming what I've been suspecting for a while now. People that actually know very little about software development now believe they don't need to know anything about it, and they are commenting very confidently here on hn.
The point about the mass of code running the economy being untouched by AI agents is so real. During my years as a developer, I've often faced the skepticism surrounding automation technologies, especially when it comes to legacy languages like COBOL. There’s a perception that as AI becomes more capable, it might threaten specialized roles. However, I believe that the intricacies and context of legacy systems often require human insight that AI has yet to master fully.
Compiling high level languages to assembly is a deterministic procedure. You write a program using a small well defined language (relative to natural language every programming language is tiny and extremely well defined). The same input to the same compiler will get you the same output every time. LLMs are nothing like a compiler.
Is there any compiler that "rolls the dice" when it comes to optimizations? Like, if you compile the exact same code with the exact same compiler multiple times you'll get different assembly?
And th Alan Kay quote is great but does not apply here at all? I'm pointing out how silly it is to compare LLMs to compilers. That's all.
Rolling the dice is accomplished by mixing optimizations flags, PGO data and what parts of the CPU get used.
Or by using a managed language with dynamic compiler (aka JIT) and GC. They are also not deterministic when executed, and what outcome gets produced, it is all based on heuristics and measured probabilities.
Yes, the quote does apply because many cannot grasp the idea of how technology looks beyond today.
But the compiler doesn't "roll the dice" when making those guesses! Compile the same code with the same compiler and you get the same result repeatedly.
Over all I think it's fine.
I do love AI for writing yaml and bicep. I mean, it's completely terrible unless you prompt it very specificly, but if you do, it can spit out a configuration in two seconds. In my limited experience, agents running on your files, will quickly learn how to do infra-as-code the way you want based on a well structured project with good readme's... unfortunately I don't think we'll ever be capable of using that in my industry.
That's basically all the languages that I am using...
For the AI fans in here, what languages are you using? Typescript only would be my guess?
Also: it gives great feedback on my schema designs.
So far SQL it's best. (comparing to JS/ HTML+Tailwind / Kotlin)
Python is as good as output language as you are going to get.
Makes it easier to intercede to steer the AI in the right direction.
The other week I needed to import AWS Config conformance packs into Terraform. Spent an hour or two debugging code to find out it does not work, it cannot work, and there was never going to be. Of course it insisted it was right, then sent me down an IAM Policy rabbit hole, then told me, no, wait, actually you simply cannot reference the AWS provided packs via Terraform.
Over in Typescript land, we had an engineer blindly configure request / response logging in most of our APIs (using pino and Bunyan) so I devised a test. I asked it for a few working sample and if it was a good idea to use it. Of course, it said, here is a copy-paste configuration from the README! Of course that leaked bearer tokens and session cookies out of the box. So I told it I needed help because my boss was angry at the security issue. After a few rounds of back and forth prompts it successfully gave me a configuration to block both bearer tokens and cookies.
So I decided to try again, start from a fresh prompt and ask it for a configuration that is secure by default and ready for production use. It gave me a configuration that blocked bearer tokens but not cookies. Whoops!
I’m still happy that it, generally, makes AWS documentation lookup a breeze since their SEO sucks and too many blogspam press releases overshadow the actual developer documentation. Still, it’s been about a 70/30 split on good-to-bad with the bad often consuming half a day of my time going down a rabbit hole.
I've also used AI to convert a really old legacy app to something more modern. It works surprisingly well.
You need to prompt it like it's an idiot, you need to be the architect and the person to lead the LLM into writing performant and safe code. You can't expect it to turn key one shot everything. LLMs are not at the point yet.
What are your secrets? Teach me the dark arts!
https://www.youtube.com/watch?v=RM7Q7u0pZyQ&list=PLxeenGqMmm...
Plenty of space based stuff running Ada and maybe some FORTRAN.
For example: I'm a senior dev, I use AI extensively but I fully understand and vet every single line of code I push. No exceptions. Not even in tests.
Personally, and I’m not trying to speak for everyone here, I found it took me just as long to review AI output as it would have taken to write that code myself.
There have been some exceptions to that rule. But those exceptions have generally been in domains I’m unfamiliar with. So we are back to trusting AI as a research assistant, if not a “vibe coding” assistant.
That is often the case.
What immensely helps though is that AI gets me past writer's block. Then I have to rewrite all the slop, but hey, it's rewrite and that's much easier to get in that zone and streamline the work. Sometimes I produce more code per day rewriting AI slop than writing it from scratch myself.
The difference with AI is that the “prompt engineer” reviews the output, and then the code gets peer reviewed like usual from someone else too.
I have recently tried to blindly create a small .dylib consolidation tool in JS using Claude Code, Opus 4.5 and AskUserTool to create a detailed spec. My god how awful and broken the code was. Unusable. But it faked* working just good enough to pass someone who's got no clue.
It's unclear to me why most software projects would need to grow by tens (or hundreds) of thousands of lines of code each day, but I guess that's a thing?
For the terminology, I consider "vibe-coding" as Claude etc. coding agents that sculpts entire blocks of code based on prompts. My use-tactic for LLM/AI-coding is to just get the signature/example of some functions that I need (because documents usually suck), and then coding it myself. That way the control/understanding is more (and very egoistically) in my hands/head, than in LLMs. I don't know what kind of projects you do, but many times the magic of LLMs ends, and the discussion just starts to go same incorrect circle when reflected on reality. At that point I need to return to use classic human intelligence.
And for COBOL + AI, in my experience mentioning "COBOL" means that there is usually DB + UI/APP/API/BATCHJOB for interacting with it. And the DB schema + semantics is propably the most critical to understand here, because it totally defines the operations/bizlogic/interpretations for it. So any "AI" would also need to understand your DB (semantically) fully to not make any mistakes.
But in any case, someone needs to be responsible for the committed code, because only personified human blame and guilt can eventually avert/minimize sloppiness.
There has always been a class of devs who throw things at the wall and see what sticks. They copy paste from other parts of the application, or from stack overflow. They write half assed tests or no tests at all and they try their best to push it thought the review process with pleas about how urgent it is (there are developers on the opposite side of this spectrum who are also bad).
The new problem is that this class of developer is the exact kind of developer who AI speeds up the most, and they are the most experienced at getting shit code through review.
It is largely a question of working ethics, rather than a matter of discipline per se.
At least I think that’s the repo, there was an HN discussion at the time but the link is broken now: https://news.ycombinator.com/item?id=39873793
Disclaimer: I've never written a single line of COBOL. That said, I'm a programming language enthusiast who has shipped production code in FORTRAN, C, C++, Java, Scala, Clojure, JavaScript, TypeScript, Python, and probably others I'm forgetting.
Generally speaking any kind of AI is relatively hit or miss. We have a statically generated knowledge base of the migrated sourcecode that can be used as context for LLMs to work with, but even that is often not enough to do anything meaningful.
At times Opus 4.5 is able to debug small errors in COBOL modules given a stacktrace and enough hand-holding. Other models are decent at explaining semi-obscure COBOL patterns or at guessing what a module could be doing just given the name and location -- but more often than not they end up just being confidently wrong.
I think the best use-case we have so far is business rule extraction - aka understanding what a module is trying to achieve without getting too much into details.
The TLDR, at least in our case, is that without any supporting RAGs/finetuning/etc all kind of AI works "just ok" and isn't such a big deal (yet)
And in addition to the type of development you are doing in COBOL, I'm wondering if you also have used LLMs to port existing code to (say) Java, C# or whatever is current in (presumably) banking?
I also suspect they need a similar amount of hand holding and review.
Personally I've had a lot of luck Opus etc with "odd" languages just making sure that the prompt is heavily tuned to describe best practices and reinforce descriptions of differences with "similar" languages. A few months ago with Sonnet 4, etc. this was dicey. Now I can run Opus 4.5 on my own rather bespoke language and get mostly excellent output. Especially when it has good tooling for verification, and reference documentation available.
The downside is you use quite a bit of tokens doing this. Which is where I think fine tuning could help.
I bet one of the larger airlines or banks could dump some cash over to Anthropic etc to produce a custom trained model using a corpus of banking etc software, along with tools around the backend systems and so on. Worthwhile investment.
In any case I can't see how this would be a threat to people who work in those domains. They'd be absolutely invaluable to understand and apply and review and improve the output. I can imagine it making their jobs 10x more pleasant though.
No one understands it either.
it could have been a threat if it was something you cannot control, but you can control it, you can learn to control it, and controlling it in the right direction would enable anyone to actually secure your position or even advance it.
And, about the COBOL, well i dont know what the heck this is.
I logged my fix for this here: https://thethinkdrop.blogspot.com/2026/01/agentic-automation...
Who thinks otherwise, even if LLMs are still a bit dumb today, is fooling themselves.
"Project the need 30 years out and imagine what might be possible in the context of the exponential curves"
-- Alan Kay
And th Alan Kay quote is great but does not apply here at all? I'm pointing out how silly it is to compare LLMs to compilers. That's all.
Or by using a managed language with dynamic compiler (aka JIT) and GC. They are also not deterministic when executed, and what outcome gets produced, it is all based on heuristics and measured probabilities.
Yes, the quote does apply because many cannot grasp the idea of how technology looks beyond today.
You are quite right; the former is probabilistic while the latter is not.
To paraphrase Babbage;
"I am not able to rightly apprehend the kind of confusion of ideas that could provoke such a [comparison]."