I cannot get good results out of #StableDiffusion.... - StableDiffusion

john, 6 months ago

I cannot get good results out of #StableDiffusion.

“A fox crossing a residential street. The fox has a human face. There are autumn leaves on the ground, terraced houses in the the background, and a slight mist.”

It's just ignoring most of my prompt (as well as really struggling with what foxes look like). I've tried many iterations and variations, they're all like this.

A street in a autumnal park. There isn’t even a fox in this one, and no houses.
A three legged fox with no torso standing on a road in an autumnal park. It hasn’t got a human face and there are no houses in the background.
Some sort of shrunken fox-adjacent thing on a road in an autumnal park. No human face, no houses in the background.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Image

Image alternative text

rakyat, 6 months ago

@john For a non-paleontologist like me, dare I say the fox in the third pic look a little bit like a… pterosaur?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

john, 6 months ago

@rakyat Yes, the very short torso will do that!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Gustodon, 6 months ago

@john No one gets good results from Stable Diffusion, because all of its results are evil as hell.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

BlackPhi, 6 months ago

@john The thing about Stable Diffusion is that it is not a tool for following a logical set of instructions. It is about associations and links into features of existing images on the internet. In the case of your prompt, the associations attached to 'autumn' and 'leaves' override the associations of 'terraced houses' and 'residential street'. Also, pictures on the internet of foxes with human faces are, as you say above, not common, so SD doesn't have a lot to work from.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

BlackPhi, 6 months ago

@john Another thing about SD is that it can be sensitive to your prompt details. There may be few foxes with human faces on the internet, but there are plenty of foxes so it should be able to do a lot better than your outputs. The challenge of using SD as a tool can often be in getting the feel for how it likes to work and going with it - more like sailing than motoring. Then if you get a picture you like with a fox, SD lets you experiment with infilling the face area with a new prompt.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

john, 6 months ago

@BlackPhi I guess I'm a little bit surprised that SD doesn't get which bit is the face and composite, like it can do with subject and background. But the experimenting I’ve seen today leads me to think that although AIs have a strong bias toward the conventional, some can be coaxed to do such things. Dall-E was certainly resistant, but can do it.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mike, 6 months ago

@john I love the one with no fox!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bhawthorne, 6 months ago

@john Can’t you ask an LLM to write the prompt?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

john, 6 months ago

@bhawthorne Yes, I’ve actually thought about that. The LLM would need to know what it is aiming at, so you’d need to hook in an image description AI to describe a model image, and then tell it to turn that into an effective prompt. I don't know whether they know what makes good prompts though, because they avoid training on AI generated stuff I believe.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bhawthorne, 6 months ago

@john Okay, here’s the best I was able to do: https://www.bing.com/images/create/an-image-of-a-fox2c-no-fox-face2c-woman27s-face-on-fo/1-655f6604b88a41359d5c0774f3cf4922?id=eDMkDgJr4X1wDGPLRUuu1Q%3d%3d&view=detailv2&idpp=genimg&idpclose=1&FORM=SYDBIC&ssp=1&setlang=en-US&safesearch=moderate

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

john, 6 months ago

@bhawthorne That's pretty good, the best one yet.

Turns out AIs do write good prompts ChatGPT does it when you ask for an image, and in fact that might be going on invisibly with other interfaces too.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

karafuto, 6 months ago

@john ok that's actually a fun challenge

A person with a fox head walking his pet fox
this needs a trigger warning. a very creepy creature, semi fox semi human, with blue eyes and fox ears

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

john, 6 months ago (edited 6 months ago)

@karafuto that last one! Which AI are you using?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

MesozoicMind, 6 months ago (edited 6 months ago)

@john Oh god even you have joined the AI bandwagon!? No! Don't get on it!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

john, 6 months ago

@MesozoicMind move forward or die. Don’t worry, I won’t be doing something as boring as being a prompt-jockey.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

futurebird, 6 months ago

@john Ah yes the make you so angry with bad drawings that you are motivated to just draw it yourself machine!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

john, 6 months ago

Thought I'd try it with a cat, in case it understands cats better than foxes... holy moly now.

I do like it more.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mike, 6 months ago

@john Did you ask for the cat to be levitating?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

john, 6 months ago

@mike I did not, it's the same prompt as the earlier one, just with “cat” substituted for “fox”.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mike, 6 months ago

@john Yikes.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mike, 6 months ago

@john Now realise that when asked for scientific or technical information, LLMs emit similar errors — they're just harder to spot.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

john, 6 months ago

@mike I feel like they're more at the DALL-E level for a lot of stuff. Mostly correct but with small or subtle errors, and gaps in capability.

I don't care much about the current error rate for stuff I do. It's adequate. I care more about prompt comprehension, which is also hit and miss (but it's probably as good as people on average).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

llewelly, 6 months ago

@john ... and your residential street is populated with insidious doppelgangers of actual trees.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Zeugs, 6 months ago

@john this is quite normal. What were you expecting? Dall-e 3 is a bit better, but in general that's how they do stuff...

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

john, 6 months ago

@Zeugs Well, my experience of Dall-e even a while ago was considerably better. Here's what it does with the prompt now. It's ignoring the human face part, but obviously these are light-years better.

image/jpeg
image/jpeg
image/jpeg

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Zeugs, 6 months ago

@john yeah like I said it's a bit better. 🤷

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

john, 6 months ago

@Zeugs Just a tinsy bit!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Zeugs, 6 months ago

@john depends on the randomly chosen random seed.🤷 Have you retried the same prompt like ... More often?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

john, 6 months ago

@Zeugs Yeah, I’ve tried the same prompt six times, with slight variations a few more times. The results are quite consistent, and all terrible.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

miekeroth, 6 months ago

@john similar experience as I have

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

john, 6 months ago

@miekeroth The image generators are really uneven. Unfortunately, it seems to the ones that actually work are proprietary.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

PeterFalkingham, 6 months ago

@john @miekeroth Bing (free, using /Dall-E) was a bit better, but didn't catch the human face bit:

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Zeugs, 6 months ago

@PeterFalkingham @john @miekeroth

It's practically another random seed variation of this:
https://sauropods.win/@john/111455719352164697

Dall-e seems to have seen more residential areas than SD.
Maybe, in the end there is not that much differend stuff in there.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

john, 6 months ago

@Zeugs @PeterFalkingham @miekeroth It also knows the shape of foxes, which SD clearly does not.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Zeugs, 6 months ago

@john @PeterFalkingham @miekeroth
Looks like a fox to me. Source stable diffusion. 🦊🤷

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

john, 6 months ago

@Zeugs @PeterFalkingham @miekeroth Try it with legs.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Zeugs, 6 months ago

@john @PeterFalkingham @miekeroth
This turns into: lets Google that for me with Stable diffusion. It just takes at least 60seconds longer than googling. First shot. Fox running no filters got the legs problem. The shape fits better than on your SD stuff.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Zeugs, 6 months ago

@john @PeterFalkingham @miekeroth
In the end the hard truth about AI generation is that you should kill your darlings early. If the model doesn't hook it's pointless.
But look at residential cyberfox he also has back legs but here the model can compensate. It relaxes.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

john, 6 months ago

@Zeugs @PeterFalkingham @miekeroth I'm not sure exactly what you're arguing, to be honest, but my arguments pretty simple, SD seems a long way behind. I need dozens of attempts to get things like a fox without egregious errors, whereas DALL-E doesn't even seem to make mistakes (although it's interesting it still ignores important components of prompts and absolute will not make a fox with a human face!)

Quality is not particularly for what I’m vaguely planning to do, so oh well I guess.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

john, 6 months ago

Aiming for something like this:

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

etchlings, 6 months ago

@john what’s the source on the ideal image here? It’s very unsettling.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

john, 6 months ago

@etchlings It's my painting. https://johnconway.art/vulpes-persona

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

etchlings, 6 months ago

@john love the atmosphere and that fox… being. Are you just seeing if you can make processors give you output similar to what you already create?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

john, 6 months ago

@etchlings Yeah, essentially. I don't think artists should be ignoring AI, or pinning their hopes on some sort of copyright salvation, so I'm diving in.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

etchlings, 6 months ago

@john it’s a worthy effort. My vague understanding was that most of the “effective” visual ai was being paywalls, but that’s not something I looked deep at.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

john, 6 months ago

@etchlings Sure seems to be the case.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

llewelly, 6 months ago

@john every aspect of this is so much better than even the DALL-E stuff @PeterFalkingham posted; the fox body is more accurate, the leaves look real, the street looks better, the houses look better, and so on.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

john, 6 months ago

@llewelly @PeterFalkingham Yes, but of course it took me several days to paint rather than 30 seconds to prompt!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

PeterFalkingham, 6 months ago

@john Maybe the key part of the prompt that was missing was 'human-like arse' :)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

john, 6 months ago

@PeterFalkingham

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment