stux @stux

Recent searches

Search options

Only available when logged in.

Thread:

I've conducted several informal experiments over the last few weeks about alt text for #photos as described by humans and as provided by #AI systems.

#LLMS, despite providing plethora of details when describing images, still miss the nuances of what the photos contain. Human #descriptions certainly continue to be better at conveying context.

#blind #accessibility #a11y

Aug 23, 2024, 03:53 PM··TweeseCake

40boosts·38favorites

**Pratik Patel** @ppatel · Aug 23, 2024

Aug 23, 2024

Pratik Patel @ppatel

This is not to suggest that automatic descriptions aren't useful. Several tools available to #blind people are now capable of providing a good idea of the contents of photos when no descriptions are available. Apps and hardware are able to analyze photos and videos for quick access to the environment when this wasn't possible short time ago.

Even though these tools exist, automatic #descriptions should not be a substitute for alt text.

#accessibility #a11y #photography

**Pratik Patel** @ppatel · Aug 23, 2024

Aug 23, 2024

Pratik Patel @ppatel

For this experiment, I asked several friends to send me photos and share #descriptions with me.

To give you one example, I was sent a photo of a table surface with a small coffee pot and a cup of coffee with foam on top. The table also had a plate of cookies, muffins, and croissants.

The #AI description described the drink in the coffee cup as a yogurt-based concoction. It also missed the cookies on the plate.

#accessibility #a11y #blind

**Pratik Patel** @ppatel · Aug 23, 2024

Aug 23, 2024

Pratik Patel @ppatel

#Photo #descriptions are becoming essential when I'm traveling for instance. While in a hotel I used to have to rely on a human to distinguish between various shampoo/conditioner/ soap bottles, something insignificant for most people but rather frustrating for someone like me. I was able to take a quick photo and get those bottles identified while traveling this weekend. Getting a general description and queering for details got me information that I wouldn't have just by touch.

#accessibility

**Pratik Patel** @ppatel · Aug 23, 2024

Aug 23, 2024

Pratik Patel @ppatel

I was also able to easily identify and manipulate the temperature controls in the room. In the absence of human assistance, this was a great way to get what I needed.

There were a couple of instances when the #LLM just made up things that weren't there in the room.

#accessibility #photos #photo

**Eric Eggert** @yatil@yatil.social · Aug 23, 2024

Aug 23, 2024

Eric Eggert @yatil@yatil.social

@ppatel I wonder when we get the first reports of “AI” “seeing” people that don’t exist in the room. I think that could get fairly scary…

**Pratik Patel** @ppatel · Aug 24, 2024

Aug 24, 2024

Pratik Patel @ppatel

@yatil "I see no people!"

**Simon Willison** @simon@simonwillison.net · Aug 23, 2024

Aug 23, 2024

Simon Willison @simon@simonwillison.net

@ppatel did you explore the hybrid approach?

Something I’ve been doing for alt text is pasting the image into Claude, getting its first draft of alt text and then prompting for follow-up improvements: “shorter”, “more details about the lighthouse” etc

(To be honest I usually then manually edit the text as well before using it)

**Pratik Patel** @ppatel · Aug 23, 2024

Aug 23, 2024

Pratik Patel @ppatel

@simon Oh absolutely. I often queery for details and try to sus out the context whenever feasible. In one of the tools I use, my prompt tells OpenAI to give me details while being concise or it would go on forever. I also ask it to give me the text rather than summerizing it if there's text in the image.

**Bornach** @bornach@masto.ai · Aug 24, 2024

Aug 24, 2024

Bornach @bornach@masto.ai

@simon @ppatel
One problem when AI tech companies train their models using publicly available data without asking permission first, is that they've very likely scraped up family photo albums and Instagram feeds. Their AI can thus identify people by their full name in any photos a would-be stalker were to upload. To avoid the bad publicity, AI companies blur out all the faces.

As a result the AI will describe the presence of "gray blocks" in the image
https://masto.ai/@bornach/112987668496258631

An image featuring the old £10 note and a new £10 note is uploaded to Bing Copilot. When asked to describe the image it replies:

"The image shows two £10 banknotes from the Bank of England placed on a wooden surface. The top banknote appears to be an older version, while the bottom one seems to be a newer version. Both banknotes have sections obscured with a gray block, likely to conceal sensitive information such as serial numbers. The newer banknote features several security elements, including a transparent window and holographic details, highlighting the evolution in design and security features over time."

MastodonBornach (@bornach@masto.ai)Attached: 1 image @jonthegeek@fosstodon.org @FediThing@chinwag.org @RomanVilgut@graz.social @alexisbushnell@toot.wales @breadandcircuses@climatejustice.social So instead I tried asking Bing Copilot to describe an image of an item that only just came out. https://fosstodon.org/@bornach/112914156404626890 Although it recognized that the image was a comparison between old and new £10 note, it completely failed to describe the most glaringly obvious difference in the two notes

**Pratik Patel** @ppatel · Aug 24, 2024

Aug 24, 2024

Pratik Patel @ppatel

@bornach @simon I used one of my tools to get a description of the photo in the original post.

"The photo on the webpage shows two ten-pound banknotes placed on a wooden surface. The top banknote features a portrait of Queen Elizabeth II, while the bottom banknote features a portrait of King Charles III. Both notes have intricate designs and security features typical of currency."

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back