Content Recommendation Algorithms

gandalf_der_12te@discuss.tchncs.de · 1 month ago

Content Recommendation Algorithms

FoundFootFootage78@lemmy.ml · edit-2 1 month ago

The problem with recommendation algorithms isn’t just the power, it’s the fact that it deprives us of a shared reality. It’s one thing if we filter ourselves into a bubble but it’s another if the site itself does it.

gandalf_der_12te@discuss.tchncs.de · 1 month ago

what if we filter ourselves into a bubble by choosing to use a site that does it?

FoundFootFootage78@lemmy.ml · 1 month ago

That’s not filtering ourselves, that’s letting ourselves be filtered. If an algorithm does the mental work of filtering us into bubbles, that makes it harder to escape.

unwarlikeExtortion@lemmy.ml · 1 month ago

deleted by creator

Die4Ever@retrolemmy.com · 1 month ago

i want to see posts from communities that i already subscribed to, but because there’s more than 1000 communities on the fediverse and i’m only subscribed to a small countable subset of them, i inevitably lose out on a lot of content. (The “all” feed sucks unfortunately). So how to solve this?

You don’t use the Subscribed feed? I like Subscribed+Scaled

The Fediverse significantly lacks behind on the Content Discoverability technology.

https://quiblr.com/

https://quiblr.com/understanding_your_private_personalized_feed

gandalf_der_12te@discuss.tchncs.de · 1 month ago

https://quiblr.com/understanding_your_private_personalized_feed

Unfortunately it’s very spare with technical details. What are the technical details? How much network traffic does it cause for the servers?

Die4Ever@retrolemmy.com · 1 month ago

idk, it’s all done client-side though

https://github.com/Technicolor-Dreamcoat/Quiblr

Aurelius@lemmy.world · 16 days ago

Hi, I made Quiblr. The “For You” customized feed is all client side (i.e. nothing is sent externally).

Dessalines@lemmy.ml · 1 month ago

@nutomic@lemmy.ml brought up https://github.com/LemmyNet/lemmy/issues/5871 , which would at least be a way for communities to explicitly link to each other, and we could possibly create a superset of Subscribed.

In 1.0 there is the Suggested filter, where admins have a preset / chosen list of curated communities they like, and a way to view all the posts from them.

I’d rather not do complicated algorithms to try to figure out interest / community adjacency based on user activity, as this could get really complicated and also probably show things people don’t want to see.

I totally agree that content discovery could be better, but I’d like it to be explicitly chosen by the user, rather than generated. So I think the best way is still just to go to the communities page, and click subscribe on anything that might potentially interest you.

gandalf_der_12te@discuss.tchncs.de · 30 days ago

I’d rather not do complicated algorithms to try to figure out interest / community adjacency based on user activity, as this could get really complicated

I agree that per-user recommendations could easily get out-of-hand in terms of computational expense and probably also it would be a bit intransparent if the algorithm is really complicated, which would undermine user trust.

I do think that maybe having a few explicitely-defined feeds/lists would be better than trying to dynamically and implicitely generate a list for each user.

So instead of having a “suggested for you (personally)” feed for each user, there could be a few large feeds like /feed/technology, /feed/science, /feed/events_you_can_attend_in_germany, … that people can view.

I’m not sure how computationally expensive such a feed would be. I imagine each feed is defined by a small text file that might look like this: (example feed_science.txt hosted on mander.xyz)

/c/science@mander.xyz
/c/publichealth@mander.xyz
/c/mycology@mander.xyz
...

And then the server can re-generate a cached version of the feed every 10 minutes. That is then served to each user who accesses /feed/science@mander.xyz . In this way, the feed only needs to be generated once every 10 minutes and is then served out to all users that want to view it. That means that feed-generation time is independent of number of viewers.

squirrel@piefed.kobel.fyi · 1 month ago

Mastodon has an algorithm in their official app. The Explore feed shows you posts that are popular among people you follow and interact with.

Loops has a For You Page algorithm.

There are algorithms on the fediverse. They are rare, but they do exist.

Coelacanth@feddit.nu · 1 month ago

Is the Mastodon algorithm new? I could have sworn they used to pride themselves on the chronological-only feed.

squirrel@piefed.kobel.fyi · 1 month ago

It’s been there for a couple of years. Found a GitHub comment by Gargron talking about what it is in 2022

The explore tab is in essence a moderated, quality-filtered federated timeline. Its purpose is to help you discover other people and expand your visibility, but without being a vector for spam and abuse.

Ephera@lemmy.ml · 1 month ago

There should be an open-source recommendation algorithm, though; I’m sure of it.

Problem is that the kind of algorithm you envision is technologically a black-box, not just by choice. It’s a machine learning model. At best, you could make the training data and instructions public, but it would still be hard to reason why it makes certain decisions. Corporations traditionally try to eliminate biases by throwing as much data at it as possible, but that makes it even harder to reason about it.

I guess, maybe you could try to split the tasks. So, set up a list of e.g. 50 topics, such as sports, IT, politics etc… Then use a small language model to decide into which categories each post fits. And then you could let the user decide the weights for the topics + weights for recency and vote count.
Or I guess, automatically decide the weights based on what the user upvotes and then make the weights transparent to each user.

But yeah, I don’t think there’s prior art in this respect, so would probably need lots of experimenting still.

gandalf_der_12te@discuss.tchncs.de · 1 month ago

hmm, i think you’re overthinking this. what if the recommendation algorithm simply gives you stuff from communities and you’ve subscribed to and “similar” communities (these would have to be linked from the original communities / link to the original communities)?

that should be reasonably easy and not involve any neural networks. i think basically it constructs a “feed” (post list) which is basically a remix of other lists (which are the individual communities that stuff is taken from), maybe weighted with a certain scalar factor.

Nutomic@lemmy.ml · 1 month ago

Sounds exactly like this: https://github.com/LemmyNet/lemmy/issues/5871

Ephera@lemmy.ml · 1 month ago

Right, yeah, I guess on Lemmy, the categorization is already mostly there. I was thinking more generally… 😅

PumpkinDrama@reddthat.com · 1 month ago

You can’t have a recommendation algorithm on open-source software because it requires a lot of compute to calculate personalized recommendations for each user, which simply isn’t feasible for most instances. Instead, there should be an API endpoint that returns post metadata for the last week, allowing users to implement their own ranking algorithm via a userscript running on their own hardware.

I also believe there should be a more personalized “All” feed per instance. Each instance could surface different content tuned to the admins or to a subset of long-term users—something stable that doesn’t change often but varies from server to server.