"Buy Me A Coffee"

  • 0 Posts
  • 10 Comments
Joined 1 year ago
cake
Cake day: June 13th, 2023

help-circle

  • marsara9@lemmy.worldtoLemmy@lemmy.mlLemmy content aggregator bot list
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    1 year ago

    Maybe. 2nd idea I’ve got is that if no one is replying after say 24hrs and something like 75-80% of your posts are as such and you have at least 100 such posts, you get added to the list?

    Main concern I see about something like this is false positives and how someone real could end up getting blocked.

    I definitely want to think on this some more but it might have some legs.


  • …I wonder if there’s a programmatic way to detect these bots? Some sort of analysis on their posting behavior?

    If they’re playing nice they’ll have the bot flag checked in their profile, and then maybe build a list of any bot that creates posts? As most of the “good” bots just reply to comments? Anyway just thinking out loud. But I’m thinking I could easily add a public API to my search engine that just returns a list of “posting bots”…







  • I’m using the public API to grab every post / comment and then I essentially replace the content with only the unique words. Then when you go to search it just looks for any post or comment, in my database, that has the words you typed in. Finally I sort based on the number of upvotes.

    Right now it only craws a specific instance that you point it to. But as long as that instance is federated it /should/ get everything. But eventually I plan on using that instance’s list of federated instances to scan everything and lighten the load on any one particular instance.

    Edit: I thought about tapping into the existing database but the existing database is more geared towards serving content but not necessarily searching. The database that I’m building you can search but I drop so much of the original data that using it for content is worthless.


  • marsara9@lemmy.worldtoLemmy@lemmy.mlIs Lemmy search-engine unfriendly?
    link
    fedilink
    English
    arrow-up
    16
    arrow-down
    1
    ·
    1 year ago

    I’m doing tests in the next couple days. But I’m trying to build a search engine specifically for Lemmy.

    • It should in theory work similar-ish to Google / Bing.
    • You can filter by instance, community or author.
    • it only indexes Lemmy posts and it won’t keep duplicates.
    • It’ll also open any link you find in your instance.
    • You’ll be able to self host it and point it to any instance you want as well.

    I’m hoping I can open it to the public in a week or so.