Social Networks


Follow sproutworks on Twitter
Enter your Email


Powered by FeedBlitz

RSS Feed

Links

Blogshares Links

The BlogShares fantasy blog stock market.

BlogShares Price Tracker
I wrote this program to archive information from the BlogShares fantasy stock market. You can view graphs of any industry, and analyze your portfolio.

SproutWorks Projects

Digg Archive
A new experimental Digg page.
AJAX Pixel Editor
A Collaborative pixel editor currently in development.
Web promotion links
These tools help you get visitors on your website.
SproutPics
My photography Site
SproutSearch
I designed this blog indexing tool, and it has accumulated over 6 million blogs so far.
Products
Some of the programs I've written.
SproutTree Demo
A demo of a tree-drawing PHP script.
My Gallery

Sign In

Username:
Password:
Remember Me

sprout man
Forums/SproutWorks Products/SproutChat

sproutworks
May 27th, 2005 4:27 AM PST
I have recently made some advances on SproutSearch, my blog directory site. SproutSearch has helped me get a bit more traffic, because it adds massive amounts of content to my site. At the moment, it contains about 1.3 million links to blogs.

When I set out to write SproutSearch, it was an exercise in site scraping. Using an HTML parser I had developed earlier, I made a simple program that parsed Blogger's recently updated blogs xml feed. It stored all the blogs it could find in a database. I set up a cron job to run this script every ten minutes. Even when I am sleeping, my website is finding new blogs for me.

Next I made a program to sort the blogs alphabetically, and put them in organized pages. As the number of blogs grew, so did the amount of time my script needed to display a page. I read an article in PHP Magazine about caching, and integrated a caching script into my site. Now, each page would be saved in a file so that it could be recalled quickly.

I left things this way for a while, and it was working well. I started to get more traffic, which was one goal of this project. I noticed that the first page of each letter was geting most of the inbound traffic. For some reason, the search engines won't point people to the hundreds of additional pages that are not #1.

I might try to make some pages with a lot more blogs per page, maybe about 10,000. Then I will see if these big pages generate more traffic than my 500 blog pages.

The other day, I wrote a program to perform searches on my directory and turn them into web pages. I typed in a whole bunch of popular keywords and let it generate pages. I am hoping that these keyword pages will attract a new wave of traffic of people searching for niche terms.

The next step for me is to categorize the topic pages I have been generating.