Egregoros

So what's interesting is that if you tell an LLM to "show its work," you sometimes get a more accurate result:

https://www.lesswrong.com/posts/XNBZPbxyYhmoqD87F/llms-and-computation-complexity

The comments are probably not so much for verification as they are a required part of the emergent process .. and it still makes up function names that don't exist anyway 😩

Replies

Replying to @djsumdog@djsumdog.com

lain

@djsumdog @Zergling_man humans make up function names that don't exist too, if you don't let them run the code, don't let them look anything up and ask them to code a few thousands LOC on paper. you can solve most of these issues by giving the LLM ground truth that it can check its work with (i.e., a compiler it can run, a test suite it can run, for network software: a few server implementations it can test against).

Replying to @lain@lain.com

Blurry Moon

@lain @djsumdog @Zergling_man the best fix for hallucinations is better documentation I found web3 libraries like ethers.js hallucinate all the time because they change everything constantly and don’t document it well. Before ai I couldn’t get it right myself. After ai I solved a couple problems I set aside for literally two years bc I couldn’t read their dog shit JavaScript source and figure it out

Replying to @sun@shitposter.world

Blurry Moon

@lain @Zergling_man @djsumdog also there are a couple mcps including a fantastic one for elixir that completely erased hallucinations for me.

Replying to @sun@shitposter.world

djsumdog

They're not hallucinations ... the random code generator has no intent. It can't "lie" or "hallucinate." It's randomly guessing the wrong next token.

I hate this human personification bullshit to the weighted random code machine. If you treat them as "not all that great," you might get something useful from them. If you're in the "I haven't written code in 3 months and pass all my code reviews" camp, your co-workers are incompetent and your job will be replaced in 5 years by someone who has to clean up everything.

Replying to @djsumdog@djsumdog.com

Blurry Moon

@djsumdog @Zergling_man @lain your job will be replaced no matter how good you are right up until the very peak tier disciplines like kernel, netcode, verilog and there it will wipe out all the jobs but senior engineer.

Replying to @djsumdog@djsumdog.com

Clint Eastwood

@djsumdog

I accept the personified language because under the hood it’s demons

@Zergling_man @lain @sun

Replying to @Leyonhjelm@shitposter.world

Blurry Moon

@Leyonhjelm @djsumdog @Zergling_man @lain I already personified computers and I just still don’t have a problem with it

Replying to @sun@shitposter.world

Clint Eastwood

@s2208@nicecrew.digital remote

@sun

With the exception of a white full tower I called Isengard, I’ve always given computers human names. I’m posting this via a manga baseball player’s sister.

@Zergling_man @djsumdog @lain

Replying to @sun@shitposter.world

Special Guest Star

Sometimes we use personified language because it’s easy shorthand. We all know it’s a magical thinking rock with no agency of its own, but sometimes it’s easier to just pretend

Replying to @s2208@nicecrew.digital

Clint Eastwood

@s2208

Advocates desperately want people to think it’s actually “thinking”. It’s stupid and a lie but their business model depends on the kayfabe

@djsumdog @Zergling_man @lain @sun

Replying to @lain@lain.com

djsumdog

I usually refuse most agents when they ask to run commands. they don't make i easy to just give the LLM feedback if you run the command (or something close) yourself. The few times I've allowed an agent to run build/test commands on a difficult problem, it usually goes into a big loop that just eats through tokens (even with Claude Opus vi Copilot .. because it's free paid for by my job)

Replying to @djsumdog@djsumdog.com

lain

@djsumdog @Zergling_man have you tried recently with frontier models? (chatgpt 5.2/3, opus 4.5/6) i really don't have this problem.

Replying to @lain@lain.com

djsumdog

I get Opus 4.5 and GPT-5.2 from work. GPT52 is cheaper (Opus ix 3x usage). I had something recently that ate up a few days worth of tokens in a loop before I stopped it ... but that might have been Claude Sonnet now that I think about it

Replying to @djsumdog@djsumdog.com

lain

@djsumdog @Zergling_man it probably was. if you get 5.2, try it on xhigh. it will be quite slow but the results are on a different level.

Replying to @lain@lain.com

Clint Eastwood

@lain

I’m so old I read LOC as the measurement unit “Libraries of Congress”

@djsumdog @Zergling_man

Replying to @Leyonhjelm@shitposter.world

lain