The Bing Index Quality Team is raising its bar and now working on Web Spam Filtering. An official blog post by Microsoft’s Igor Rondel, Principal Development Manager of the Bing Index Quality team details the ways they use to discover and filter web spam. Igor has shared the strategies used by Bing Team to detect, process and then filter out the web spam from their index.
How Bing Detects Spam
Igor says in his blog post that they first understand the spammer’s motivations and activities and develop algorithm to make the internet ecosystem cleaner, reduce malicious content and give high quality search results to users.
The basic ways detailed in the blog post include:
Quality of Content– Now-onwards Bing will explore the web content profoundly. The team will now look for the content generated for the customers and not for the search engines and their algorithms.
Igor says, “The result is that, in most cases, spam pages have inadequate content with limited value to the user. We use this fact to facilitate detection”.
Presence and positioning of ads- Bing will now monitor, how many ads are appearing your page, what type of ads they are and how interruptive they are.
Positional & layout information – If the main content on your page is not positioned correctly, Bing might consider it as a spam. Igor explains,
Where is the main content located? Where are the ads located? Do the ads take up the prime real estate or are they neatly separated away from the main content (e.g. in the header/ footer or side pane)? Is it easy for users to mentally separate content from ads?”
Creative Clustering Algorithms
Bing team will now use their ‘Creative Clustering Algorithm” and counteract the black hat content generation techniques to maximize the web presence. The post says,
The content generated using the techniques like (a) copying other’s content (either entirely or with minor tweaks), b) using programs to automatically generate page content, c) using external APIs to populate their pages with non-unique content” will now be considered as spam.
Bing will also debar the techniques like:
- Stuffing page body / URL / anchors with keywords
- Performing link manipulation via link farms, link networks, forum post abuse
- Including hidden content on the page not meant for human consumption.
Anything which look unnatural will be detected as spam by Bing Web Spam Filtering algorithm and Bing will take different levels of action on spam they find. It may demote the page or even remove the page from index. The action will depend upon the spam techniques used in the web page.
Head over to Bing Blogs for detailed information of Bing’s developing algorithms on Web Spam Filtering.