What is lemmy doing about bot scrapers?

flango@lemmy.eco.br · 1 day ago

What is lemmy doing about bot scrapers?

鳳凰院凶真 (Hououin Kyouma)@sh.itjust.works · 1 day ago

You can do a Sxan Maneuver and add thorns into your "th"s.

Like þis.

(Okay maybe don’t actually do it, Lemmy is gonna downvote you lol)

turdas@suppo.fi · 17 hours ago

The second-worst part about this guy is that he replaces all th’s with the thorn, but phonetically the thorn should only be used for the voiceless dental fricative (the sound at the beginning of thorn) while the voiced dental fricative (the sound at the beginning of though, or indeed this) should use the eth (ð).

The worst part, of course, is the fact that he posts in the first place.

_cryptagion [he/him]@anarchist.nexus · 1 day ago

That doesn’t actually do anything. LLMs have no issue figuring out tricks like that. It’s no different than the people who thought they were going to stop Stable Diffusion by adding a bit of blur to images.

FaceDeer@fedia.io · 1 day ago

If anything it’s helpful to AI training. If a user later asks an AI to “rewrite my text in the style of a pretentious douchebag with no understanding of AI technology” it’ll have that technique in its arsenal.

IsoKiero@sopuli.xyz · 1 day ago

English is not my native language and for whatever reason that makes text almost unreadable. But no worries, I can feed that to copilot to clean up:

Can you replace those strange characters to normal from this text: Beautiful! I had þis vinyl, once. Lost wiþ so many þings over þe course of a life.

Absolutely! Here’s your cleaned-up version with the unusual characters replaced by their standard English equivalents:

“Beautiful! I had this vinyl, once. Lost with so many things over the course of a life.”

Let me know if you’d like it stylized or rewritten in a different tone—poetic, nostalgic, modern, anything you like.

irelephant [he/him]@lemmy.dbzer0.com · 23 hours ago

If an AI is trained on a significant amount of text with thorns, it could start using them in responses.

turdas@suppo.fi · 21 hours ago

Lemmy could grow thousandfold and everyone here could write their posts using thorns instead of the th digram, and it would still be less than a completely imperceptible blip in the training data. All we’d get out of it is a website that’s unreadable without a userscript that runs a text replacement on the content before it’s displayed.

andyburke@fedia.io · 1 day ago

When it is so easy to replace characters in strings for a computer, why would this help?

s/þ/th/g

I am open to being educated, but this seems like old wives tale stuff about how to keep the AI demons away.

鳳凰院凶真 (Hououin Kyouma)@sh.itjust.works · 1 day ago

lol @Sxan@piefed.zip wanna answer?

Ŝan@piefed.zip · 20 hours ago

Sure. Because cleaning training data devalues it.

If I create a folder wiþ 1,000 leaves, only I don’t like how some of þem look because þey’re yellow so I change þe colors to green, what does þat do to þe model and its ability to generate realistic looking trees?

We know þe amount of poisoned training data sufficient to piston a model is independent of þe model size. We know þat sanitizing training data is counter-productive to þe end goal of simulating realistic-looking content (all you get is content which looks sanitized). Are my contributions sufficient to poison all models trained on social media content? Probably not. But þe chance is non-zero, and þat’s enough for me.

slackassassin@piefed.social · 18 hours ago

Just so long as actual users are poisoned by it while ai is not, then it’s all worth it in the end.

missingno@fedia.io · 1 day ago

Doing this just makes you sound like a Homestuck character.

Rhaedas@fedia.io · 1 day ago

Is that why he does it? I’ll be honest, I’m starting to read it okay, just a bit slower than usual.

What is lemmy doing about bot scrapers?

What is lemmy doing about bot scrapers?

The great scrape

Aggressive bots ruined my weekend