Thread:
I've conducted several informal experiments over the last few weeks about alt text for #photos as described by humans and as provided by #AI systems.
#LLMS, despite providing plethora of details when describing images, still miss the nuances of what the photos contain. Human #descriptions certainly continue to be better at conveying context.
This is not to suggest that automatic descriptions aren't useful. Several tools available to #blind people are now capable of providing a good idea of the contents of photos when no descriptions are available. Apps and hardware are able to analyze photos and videos for quick access to the environment when this wasn't possible short time ago.
Even though these tools exist, automatic #descriptions should not be a substitute for alt text.
For this experiment, I asked several friends to send me photos and share #descriptions with me.
To give you one example, I was sent a photo of a table surface with a small coffee pot and a cup of coffee with foam on top. The table also had a plate of cookies, muffins, and croissants.
The #AI description described the drink in the coffee cup as a yogurt-based concoction. It also missed the cookies on the plate.
#Photo #descriptions are becoming essential when I'm traveling for instance. While in a hotel I used to have to rely on a human to distinguish between various shampoo/conditioner/ soap bottles, something insignificant for most people but rather frustrating for someone like me. I was able to take a quick photo and get those bottles identified while traveling this weekend. Getting a general description and queering for details got me information that I wouldn't have just by touch.
I was also able to easily identify and manipulate the temperature controls in the room. In the absence of human assistance, this was a great way to get what I needed.
There were a couple of instances when the #LLM just made up things that weren't there in the room.
@ppatel I wonder when we get the first reports of “AI” “seeing” people that don’t exist in the room. I think that could get fairly scary…
@yatil "I see no people!"