Fabrice Canel from Microsoft mentioned that every day Bing discovers 12s of billions of normalized URLs by no means seen earlier than. That’s quite a lot of new URLs for BingBot to search out in a single day, do not you suppose?
However the net is massive and content material is consistently being produced, not simply high quality content material however quite a lot of junk, gibberish, machine-generated content material, and so forth.
Fabrice defined on Twitter that a lot of the content material is “principally ineffective content material,” he listed examples akin to duplicate content material, scraped content material, routinely generated content material, spam content material, junk content material, and extra.
So whereas Bing could uncover billions and billions of latest URLs per day, I doubt it indexes a lot of it.
Listed below are these tweets:
Website of the web = ♾. We uncover at #bing each day 12s of billions of normalized URLs by no means seen earlier than. Largely ineffective content material (duplicate/scraped/routinely generated content material, spam, junk, and so on.). See our tips https://t.co/IKdDkLNs6W together with the “Issues to keep away from”
— Fabrice Canel (@facan) August 17, 2022
Discussion board dialogue at Twitter.