• barsoap@lemm.ee
    link
    fedilink
    English
    arrow-up
    2
    ·
    9 hours ago

    I’m nowhere close to being an LLM specialist but to actually skew the model itself I think you need a lot of consistent data. Ten thousand alt-right blogs peddling a hundred thousand internally inconsistent and mutually incompatible narratives won’t cut it, they’ll criss-cross over the gradient landscape and because they don’t coincide, won’t make a dent in the deep groves trodden by pirating libgen. And training only on the alt-right blogs won’t cut it either that’s just not enough data which on top of that doesn’t sound smart enough to woo anyone, or have any resemblance of a consistent stance. Sure you’ll get it to claim ridiculous shit and use lots of slurs but 4chan managed to do that back in 2016 and noone was fooled.