Social Networks


Follow sproutworks on Twitter
Enter your Email


Powered by FeedBlitz

RSS Feed

Links

Blogshares Links

The BlogShares fantasy blog stock market.

BlogShares Price Tracker
I wrote this program to archive information from the BlogShares fantasy stock market. You can view graphs of any industry, and analyze your portfolio.

SproutWorks Projects

Digg Archive
A new experimental Digg page.
AJAX Pixel Editor
A Collaborative pixel editor currently in development.
Web promotion links
These tools help you get visitors on your website.
SproutPics
My photography Site
SproutSearch
I designed this blog indexing tool, and it has accumulated over 6 million blogs so far.
Products
Some of the programs I've written.
SproutTree Demo
A demo of a tree-drawing PHP script.
My Gallery

Sign In

Username:
Password:
Remember Me

sprout man
Forums/People/Brandon

sproutworks
September 25th, 2006 5:23 AM PST
My blog search engine SproutSearch is now indexing over 8 million blogs. I am now working on changing the way the blogs are ranked. For now, they are sorted by the sheer amount of content they contain. I noticed a big problem with this method is that many spam blogs contain masses of content. I don't like SproutSearch linking to so much spam, so I need to find a way to remove a lot of these listings.

It is not practical for me to read 8 million blogs, so I need to come up with an automated method to detect spam. Many spam blogs use the same words over and over. So I wrote a program to count the number of repeated words. Most spam blogs seem to use a similar number of words per post. I made another program that computes the standard deviation of the number of words in a post. Using these metrics, I will make a program that flags potential spam so I can review and delete it.