mstdn.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A general-purpose Mastodon server with a 500 character limit. All languages are welcome.

Administered by:

Server stats:

11K
active users

#softwaredevelopment

72 posts63 participants0 posts today

Can AI really code? Study maps the roadblocks to autonomous software engineering:

news.mit.edu/2025/can-ai-reall

“Without a channel for the #AI to expose its own confidence — ‘this part’s correct … this part, maybe double‑check’ — developers risk blindly trusting hallucinated logic that compiles, but collapses in production. Another critical aspect is having the AI know when to defer to the user for clarification.”

THIS! 👆

MIT News | Massachusetts Institute of TechnologyCan AI really code? Study maps the roadblocks to autonomous software engineeringBy Rachel Gordon | MIT CSAIL
Continued thread

I mean, I'm never really compiling binaries, unless they are #NodeJs dependencies (some might be) so it's never *really* mattered, but the obsessive compulsive part of me just wants my dev environment to be as much like production as possible..

Do you write code that runs on Linux and macOS? If so, what does your development environment look like? Please boost for visibility.

Do AI models help produce verified bug fixes?

"Abstract: Among areas of software engineering where AI techniques — particularly, Large Language Models — seem poised to yield dramatic improvements, an attractive candidate is Automatic Program Repair (APR), the production of satisfactory corrections to software bugs. Does this expectation materialize in practice? How do we find out, making sure that proposed corrections actually work? If programmers have access to LLMs, how do they actually use them to complement their own skills?

To answer these questions, we took advantage of the availability of a program-proving environment, which formally determines the correctness of proposed fixes, to conduct a study of program debugging with two randomly assigned groups of programmers, one with access to LLMs and the other without, both validating their answers through the proof tools. The methodology relied on a division into general research questions (Goals in the GoalQuery-Metric approach), specific elements admitting specific answers (Queries), and measurements supporting these answers (Metrics). While applied so far to a limited sample size, the results are a first step towards delineating a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs.

These results caused surprise as compared to what one might expect from the use of AI for debugging and APR. The contributions also include: a detailed methodology for experiments in the use of LLMs for debugging, which other projects can reuse; a finegrain analysis of programmer behavior, made possible by the use of full-session recording; a definition of patterns of use of LLMs, with 7 distinct categories; and validated advice for getting the best of LLMs for debugging and Automatic Program Repair"

arxiv.org/abs/2507.15822

arXiv logo
arXiv.orgDo AI models help produce verified bug fixes?Among areas of software engineering where AI techniques -- particularly, Large Language Models -- seem poised to yield dramatic improvements, an attractive candidate is Automatic Program Repair (APR), the production of satisfactory corrections to software bugs. Does this expectation materialize in practice? How do we find out, making sure that proposed corrections actually work? If programmers have access to LLMs, how do they actually use them to complement their own skills? To answer these questions, we took advantage of the availability of a program-proving environment, which formally determines the correctness of proposed fixes, to conduct a study of program debugging with two randomly assigned groups of programmers, one with access to LLMs and the other without, both validating their answers through the proof tools. The methodology relied on a division into general research questions (Goals in the Goal-Query-Metric approach), specific elements admitting specific answers (Queries), and measurements supporting these answers (Metrics). While applied so far to a limited sample size, the results are a first step towards delineating a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs. These results caused surprise as compared to what one might expect from the use of AI for debugging and APR. The contributions also include: a detailed methodology for experiments in the use of LLMs for debugging, which other projects can reuse; a fine-grain analysis of programmer behavior, made possible by the use of full-session recording; a definition of patterns of use of LLMs, with 7 distinct categories; and validated advice for getting the best of LLMs for debugging and Automatic Program Repair.

Chiunque abbia realizzato qualche prodotto software complesso (e non mi riferisco quindi alla solita landing page o sito web) sa benissimo che l’#AI va usata con coscienza, altrimenti si perde più tempo a sistemare i casini per poi ritrovarsi un prodotto architetturalmente imbarazzante.

The Problem with trying to define Wellbeing is that there is no one size fits all definition...

We all have the power to decide what wellbeing means for us…

Wellbeing can mean completely different things to different people…

🟢 If you find yourself feeling lethargic all the time, improving your wellbeing might mean looking at your diet and physical activity.

Shutting down the conversation because you're uncomfortable with change or cannot see someone else's perspective is not cool... (Ok, neither is saying "cool" - I know!)

It doesn't show your confidence and it doesn't show strong leadership, quite the opposite.

When I witness this - and it happens all too often - it sticks out like a sore thumb... why? 🤷‍♂️

Interesting tidbits from #Anthropic’s blog on how they use Claude Code:
anthropic.com/news/how-anthrop

Top tip from Data Science and ML Engineering teams: treat it like a *slot machine*. Save your state before letting Claude work, let it run for 30 minutes, then either accept the result or start fresh…

Top tip from Product Engineering teams: treat it as an *iterative partner*, not a one-shot solution…

Hand with network visualization nodes and slides in presentation context
www.anthropic.comHow Anthropic teams use Claude CodeDiscover how Anthropic's internal teams leverage Claude Code for development workflows, from debugging to code assistance.
#AI#coding#genAI