@
nihilistic_capybara LLMs aren't omniscient, and they will never be.
If I make a picture on a sim in an OpenSim-based grid (that's a 3-D virtual world) which has only been started up for the first time 10 minutes ago, and which the WWW knows exactly zilch about, and I feed that picture to an LLM, I do not think the LLM will correctly pinpoint the place where the image was taken. It will not be able to correctly say that the picture was taken at
<Place> on
<Sim> in
<Grid>, and then explain that
<Grid> is a 3-D virtual world, a so-called grid, based on the virtual world server software OpenSimulator, and carry on explaining what OpenSim is, why a grid is called a grid, what a region is and what a sim is. But I can do that.
If there's a sign with three lines of text on it somewhere within the borders of the image, but it's so tiny at the resolution of the image that it's only a few dozen pixels altogether, then no LLM will be able to correctly transcribe the three lines of text verbatim. It probably won't even be able to identify the sign as a sign. But I can do that by reading the sign not in the image, but directly in-world.
By the way: All my original images are from within OpenSim grids. I've probably put more thought into describing images from virtual worlds than anyone. And I've pitted my own hand-written image description against an AI-generated image description of the self-same image
twice. So I guess I know what I'm writing about.
CC: @
🅻🅸🅲🅴 (
) @
nihilistic_capybara#
Long #
LongPost #
CWLong #
OpenSim #
OpenSimulator #
Metaverse #
VirtualWorlds #
CWLongPost #
ImageDescription #
ImageDescriptions #
ImageDescriptionMeta #
CWImageDescriptionMeta #
AI #
LLM #
AIVsHuman #
HumanVsAI