john,
@john@sauropods.win avatar

I cannot get good results out of #StableDiffusion.

“A fox crossing a residential street. The fox has a human face. There are autumn leaves on the ground, terraced houses in the the background, and a slight mist.”

It's just ignoring most of my prompt (as well as really struggling with what foxes look like). I've tried many iterations and variations, they're all like this.

A street in a autumnal park. There isn’t even a fox in this one, and no houses.
A three legged fox with no torso standing on a road in an autumnal park. It hasn’t got a human face and there are no houses in the background.
Some sort of shrunken fox-adjacent thing on a road in an autumnal park. No human face, no houses in the background.

rakyat,
@rakyat@hachyderm.io avatar

@john For a non-paleontologist like me, dare I say the fox in the third pic look a little bit like a… pterosaur?

john,
@john@sauropods.win avatar

@rakyat Yes, the very short torso will do that!

Gustodon,
@Gustodon@mas.to avatar

@john No one gets good results from Stable Diffusion, because all of its results are evil as hell.

BlackPhi,

@john The thing about Stable Diffusion is that it is not a tool for following a logical set of instructions. It is about associations and links into features of existing images on the internet. In the case of your prompt, the associations attached to 'autumn' and 'leaves' override the associations of 'terraced houses' and 'residential street'. Also, pictures on the internet of foxes with human faces are, as you say above, not common, so SD doesn't have a lot to work from.

BlackPhi,

@john Another thing about SD is that it can be sensitive to your prompt details. There may be few foxes with human faces on the internet, but there are plenty of foxes so it should be able to do a lot better than your outputs. The challenge of using SD as a tool can often be in getting the feel for how it likes to work and going with it - more like sailing than motoring. Then if you get a picture you like with a fox, SD lets you experiment with infilling the face area with a new prompt.

john,
@john@sauropods.win avatar

@BlackPhi I guess I'm a little bit surprised that SD doesn't get which bit is the face and composite, like it can do with subject and background. But the experimenting I’ve seen today leads me to think that although AIs have a strong bias toward the conventional, some can be coaxed to do such things. Dall-E was certainly resistant, but can do it.

mike,
@mike@sauropods.win avatar

@john I love the one with no fox!

bhawthorne,

@john Can’t you ask an LLM to write the prompt?

john,
@john@sauropods.win avatar

@bhawthorne Yes, I’ve actually thought about that. The LLM would need to know what it is aiming at, so you’d need to hook in an image description AI to describe a model image, and then tell it to turn that into an effective prompt. I don't know whether they know what makes good prompts though, because they avoid training on AI generated stuff I believe.

bhawthorne,
john,
@john@sauropods.win avatar

@bhawthorne That's pretty good, the best one yet.

Turns out AIs do write good prompts ChatGPT does it when you ask for an image, and in fact that might be going on invisibly with other interfaces too.

karafuto,
@karafuto@mas.to avatar
john, (edited )
@john@sauropods.win avatar

@karafuto that last one! Which AI are you using?

MesozoicMind, (edited )
@MesozoicMind@sauropods.win avatar

@john Oh god even you have joined the AI bandwagon!? No! Don't get on it!

john,
@john@sauropods.win avatar

@MesozoicMind move forward or die. Don’t worry, I won’t be doing something as boring as being a prompt-jockey.

futurebird,
@futurebird@sauropods.win avatar

@john Ah yes the make you so angry with bad drawings that you are motivated to just draw it yourself machine!

john,
@john@sauropods.win avatar

Thought I'd try it with a cat, in case it understands cats better than foxes... holy moly now.

I do like it more.

mike,
@mike@sauropods.win avatar

@john Did you ask for the cat to be levitating?

john,
@john@sauropods.win avatar

@mike I did not, it's the same prompt as the earlier one, just with “cat” substituted for “fox”.

mike,
@mike@sauropods.win avatar

@john Yikes.

mike,
@mike@sauropods.win avatar

@john Now realise that when asked for scientific or technical information, LLMs emit similar errors — they're just harder to spot.

john,
@john@sauropods.win avatar

@mike I feel like they're more at the DALL-E level for a lot of stuff. Mostly correct but with small or subtle errors, and gaps in capability.

I don't care much about the current error rate for stuff I do. It's adequate. I care more about prompt comprehension, which is also hit and miss (but it's probably as good as people on average).

llewelly,
@llewelly@sauropods.win avatar

@john ... and your residential street is populated with insidious doppelgangers of actual trees.

Zeugs,
@Zeugs@social.cologne avatar

@john this is quite normal. What were you expecting? Dall-e 3 is a bit better, but in general that's how they do stuff...

john,
@john@sauropods.win avatar

@Zeugs Well, my experience of Dall-e even a while ago was considerably better. Here's what it does with the prompt now. It's ignoring the human face part, but obviously these are light-years better.

image/jpeg
image/jpeg
image/jpeg

Zeugs,
@Zeugs@social.cologne avatar

@john yeah like I said it's a bit better. 🤷

john,
@john@sauropods.win avatar

@Zeugs Just a tinsy bit!

Zeugs,
@Zeugs@social.cologne avatar

@john depends on the randomly chosen random seed.🤷 Have you retried the same prompt like ... More often?

john,
@john@sauropods.win avatar

@Zeugs Yeah, I’ve tried the same prompt six times, with slight variations a few more times. The results are quite consistent, and all terrible.

miekeroth,
@miekeroth@socialserver.science avatar

@john similar experience as I have

john,
@john@sauropods.win avatar

@miekeroth The image generators are really uneven. Unfortunately, it seems to the ones that actually work are proprietary.

PeterFalkingham,
@PeterFalkingham@sauropods.win avatar

@john @miekeroth Bing (free, using /Dall-E) was a bit better, but didn't catch the human face bit:

Zeugs,
@Zeugs@social.cologne avatar

@PeterFalkingham @john @miekeroth

It's practically another random seed variation of this:
https://sauropods.win/@john/111455719352164697

Dall-e seems to have seen more residential areas than SD.
Maybe, in the end there is not that much differend stuff in there.

john,
@john@sauropods.win avatar

@Zeugs @PeterFalkingham @miekeroth It also knows the shape of foxes, which SD clearly does not.

Zeugs,
@Zeugs@social.cologne avatar

@john @PeterFalkingham @miekeroth
Looks like a fox to me. Source stable diffusion. 🦊🤷

john,
@john@sauropods.win avatar

@Zeugs @PeterFalkingham @miekeroth Try it with legs.

Zeugs,
@Zeugs@social.cologne avatar

@john @PeterFalkingham @miekeroth
This turns into: lets Google that for me with Stable diffusion. It just takes at least 60seconds longer than googling. First shot. Fox running no filters got the legs problem. The shape fits better than on your SD stuff.

Zeugs,
@Zeugs@social.cologne avatar

@john @PeterFalkingham @miekeroth
In the end the hard truth about AI generation is that you should kill your darlings early. If the model doesn't hook it's pointless.
But look at residential cyberfox he also has back legs but here the model can compensate. It relaxes.

john,
@john@sauropods.win avatar

@Zeugs @PeterFalkingham @miekeroth I'm not sure exactly what you're arguing, to be honest, but my arguments pretty simple, SD seems a long way behind. I need dozens of attempts to get things like a fox without egregious errors, whereas DALL-E doesn't even seem to make mistakes (although it's interesting it still ignores important components of prompts and absolute will not make a fox with a human face!)

Quality is not particularly for what I’m vaguely planning to do, so oh well I guess.

john,
@john@sauropods.win avatar

Aiming for something like this:

etchlings,
@etchlings@wandering.shop avatar

@john what’s the source on the ideal image here? It’s very unsettling.

john,
@john@sauropods.win avatar
etchlings,
@etchlings@wandering.shop avatar

@john love the atmosphere and that fox… being. Are you just seeing if you can make processors give you output similar to what you already create?

john,
@john@sauropods.win avatar

@etchlings Yeah, essentially. I don't think artists should be ignoring AI, or pinning their hopes on some sort of copyright salvation, so I'm diving in.

etchlings,
@etchlings@wandering.shop avatar

@john it’s a worthy effort. My vague understanding was that most of the “effective” visual ai was being paywalls, but that’s not something I looked deep at.

john,
@john@sauropods.win avatar

@etchlings Sure seems to be the case.

llewelly,
@llewelly@sauropods.win avatar

@john every aspect of this is so much better than even the DALL-E stuff @PeterFalkingham posted; the fox body is more accurate, the leaves look real, the street looks better, the houses look better, and so on.

john,
@john@sauropods.win avatar

@llewelly @PeterFalkingham Yes, but of course it took me several days to paint rather than 30 seconds to prompt!

PeterFalkingham,
@PeterFalkingham@sauropods.win avatar

@john Maybe the key part of the prompt that was missing was 'human-like arse' :)

john,
@john@sauropods.win avatar
  • All
  • Subscribed
  • Moderated
  • Favorites
  • StableDiffusion
  • DreamBathrooms
  • ngwrru68w68
  • tester
  • magazineikmin
  • thenastyranch
  • rosin
  • khanakhh
  • InstantRegret
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • mdbf
  • tacticalgear
  • JUstTest
  • osvaldo12
  • normalnudes
  • cubers
  • cisconetworking
  • everett
  • GTA5RPClips
  • ethstaker
  • Leos
  • provamag3
  • anitta
  • modclub
  • megavids
  • lostlight
  • All magazines