@
Jeffrey D. Stark I don't know for certain what the majority want in general. And I don't know what they'd want in my very specific case.
I can only try and extrapolate what they might want from other Fediverse users' image descriptions and feedback on other Fediverse users' image descriptions, as little as there is.
The problem with this is that I don't post what everyone else posts. Not real-life photographs, not screenshots from social media etc., but renderings from very obscure 3-D virtual worlds. This means that there is next to nothing in my images that anyone is familiar with and that any blind or visually-impaired user has a rough idea what it looks like.
I've seen real-life photographs, sometimes literally focusing on one specific element in them with the whole background blurred out, that were described in over 800 characters. I've seen them be praised for their alt-texts. On the other hand, I've never seen a real-life photograph in the Fediverse be criticised for too much alt-text.
This, however, doesn't easily translate to virtual world renderings. Real-life photographs are much more familiar and much more likely to mostly contain things that people are more or less familiar with. And yet, people love it when they're described in 800 characters and more, all the way to replying with hashtags such as #
AltTextAward or #
AltTextAwards or #
AltTextHallOfFame.
Logical conclusion: If there's more in the images that people aren't familiar with, I'll have to describe more than in these real-life photographs. And there is more in the images that people aren't familiar with.
Virtual world renderings are a largely unexplored edge-case. Only very few people in the Fediverse post these. I think only two describe them. And I'm the only one who really puts some thought into describing them instead of trying to get away with the bare minimum. This means that what I'm trying to do is a first. Nobody has done it before me. There's no prior experience with it.
Thus, I have to go with my own assumptions and conclusions based on a) observations on Mastodon and b) the differences in familiarity between real life and what I post about.
Three things are clear about my images.
First, if sighted people see it, they don't really know what it is, where it is etc.
Second, if non-sighted people come across the image, there is nothing in the image of which they know what it looks like due to having been told often enough what it looks like because they've
never been told what
anything in the image looks like. But they may want to know what it looks like. And it's their right to know what it looks like.
Third, this topic is such a small niche and so extremely obscure that if you don't know something, you can't just look it up on Wikipedia. You can't even Google it. Generally, the only source of information that could really help you with my pictures, that's me. I'm definitely the only way "to get the larger details and nuances".
And so there's much more in my images that needs to be described. And there's much more that needs to be explained, one of the reasons why I always describe my virtual world renderings twice.
This starts with the location, the place where an image was taken. There are cases in which it does matter where an image was taken. My virtual world renderings are such cases.
If a real-life location is shown in a photo, sighted people may recognise it because it's so famous. Otherwise and for non-sighted people, simple name-dropping is usually sufficient. There's hardly any place in real life that can't be sufficiently mentioned in under 50 characters.
I can't name-drop. It won't tell anyone anything because nobody would know the name I've dropped. If I want to tell people where an image is from, I'll first have to break it down and then explain it and explain the explanation and so forth. I can't tell anyone, sighted or not, where my images are from in under 1,000 characters. Not if I want them to understand it.
As for visual descriptions, the usual advice is to limit yourself to what's important within the context, describe only the one important element in detail and hint at everything else at most. But I don't always have that one important element. I may have about two dozen important elements. Or, more often, the post is about the whole image, the whole scenery, and everything in it is important just the same.
But even if something in the image is more important than something else, I still have to describe everything. I mean, we're talking about what amounts to glances into a whole new universe for 99.999% of all people out there. Sure, many will shrug it off.
Others, however, may be intrigued, curious even. After all, this is evidence that "the Metaverse" is, surprisingly, alive. It is not suggested in AI-generated pictures. It really exists. And it looks better than all of Zuckerberg's propaganda. These people don't care what matters in the image and what doesn't. They go on an exploration trip all across the whole image and take in all the details.
Blind or visually-impaired people can't do this. But they may want to do it. And they've got the right to do it, just like sighted people. So they should be given just the same opportunity to do it. Remember that I can't assume that they know what
anything in the image looks like unless there's a real-life counterpart that looks very much the same.
Whenever there's something in one of my images that doesn't exist in real life, I have to assume that nobody knows anything about it. So not only do I need an extensive visual description, but I often also need an extensive explanation of this one item.
Finally, there's one more thing that takes up a lot of room: text transcripts. The rule is that if there is text within the borders of an image, it must be transcribed. I rarely even see the exception "unless it doesn't matter within the context". And, again, it tends to happen that
everything in one of my images matters within the context because the context is the very image itself.
What this rule doesn't cover at all is text that is unreadable in the image as it is shown. There is no exception for this kind of text, nor is it explicitly included in this rule. It isn't handled at all. It has never even been thought of. Hence, I must assume that the rule applies to this kind of text just as well.
Before you say that I can't transcribe text that I can't read: I actually can. I don't transcribe text by looking at it in the image. I transcribe text by looking at it in-world. And all of a sudden, those six pixels in a row that are ever so slightly more greenish than the surrounding white are two lines of text. That blot, four pixels wide, three pixels high, is actually a sign with a 1024x768-pixel texture and text that's perfectly legible. That tree trunk in front of that sign? In-world, I can look behind it.
If I can transcribe all this text, and nothing says I must not do so, I assume I must do so. And so I may end up with several dozen transcripts of more or less text which, including their respective contexts in the image description, take up more characters than fit into a Mastodon alt-text. If this is the case, then the text transcripts must go into the long description in the post rather than the short description in the alt-text.
This is not by user request. This is an accessibility rule that I follow.
Now you may say that I don't have to deliver such an enormous infodump at once on a silver platter, whether people want it or not. You may say that they could always ask if they want something more.
But seriously, this is about accessibility. And if people have to ask and then wait for assistance, it isn't accessible. They could just as well ask for the whole image description, and if they don't, I don't have ot write it. It wouldn't make much of a difference.
#
Long #
LongPost #
CWLong #
CWLongPost #
FediMeta #
FediverseMeta #
CWFediMeta #
CWFediverseMeta #
AltText #
AltTextMeta #
CWAltTextMeta #
ImageDescription #
ImageDescriptions #
ImageDescriptionMeta #
CWImageDescriptionMeta #
A11y #
Accessibility