mstdn.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A general-purpose Mastodon server with a 500 character limit. All languages are welcome.

Administered by:

Server stats:

16K
active users

#anthropic

38 posts36 participants1 post today
Continued thread

"Why do language models sometimes hallucinate—that is, make up information? At a basic level, language model training incentivizes hallucination: models are always supposed to give a guess for the next word. Viewed this way, the major challenge is how to get models to not hallucinate. Models like Claude have relatively successful (though imperfect) anti-hallucination training; they will often refuse to answer a question if they don’t know the answer, rather than speculate. We wanted to understand how this works.

It turns out that, in Claude, refusal to answer is the default behavior: we find a circuit that is "on" by default and that causes the model to state that it has insufficient information to answer any given question. However, when the model is asked about something it knows well—say, the basketball player Michael Jordan—a competing feature representing "known entities" activates and inhibits this default circuit (see also this recent paper for related findings). This allows Claude to answer the question when it knows the answer. In contrast, when asked about an unknown entity ("Michael Batkin"), it declines to answer.

Sometimes, this sort of “misfire” of the “known answer” circuit happens naturally, without us intervening, resulting in a hallucination. In our paper, we show that such misfires can occur when Claude recognizes a name but doesn't know anything else about that person. In cases like this, the “known entity” feature might still activate, and then suppress the default "don't know" feature—in this case incorrectly. Once the model has decided that it needs to answer the question, it proceeds to confabulate: to generate a plausible—but unfortunately untrue—response."

anthropic.com/research/tracing

Whoa! LOTS to unpack here. Weekend Reading!

Anthropic reveals research how AI systems process information and make decisions. AI models can perform a chain of reasoning, can plan ahead, and sometimes work backward from a desired outcome. The research also provides insight into why language models hallucinate.

Interpretation techniques called “circuit tracing” and “attribution graphs” enable researchers to map out the specific pathways of neuron-like features that activate when models perform tasks. See the links below for details.

Summary Article: venturebeat.com/ai/anthropic-s

Circuit Tracing: transformer-circuits.pub/2025/

Research Overview: transformer-circuits.pub/2025/ #AI #Anthropic #LLMs #Claude #ChatGPT #CircuitTracing #neuroscience

Objevil jsem NotebookLM od Googlu. Je to jako WiseBase od #SiderAI nebo Project od #anthropic. Znalostní databáze, kam si nahraju různé zdroje a pak se o nich s #AI bavím.
Google z nich navíc umí připravit nejen studijní materiály nebo FAQ, ale i podcast! Dva hlasy (muž a žena) se spolu baví na dané téma. Konverzace je neskutečně realistická. Celou dokumentaci Svelte 5 tak mám v půlhodině audia.
Co je ale absolutní mindfuck, do podcastu lze vstoupit a bavit se s nimi 🤯
notebooklm.google.com/

00:00/30:10