sproutworks
I have recently made some advances on SproutSearch, my blog directory site. SproutSearch has helped me get a bit more traffic, because it adds massive amounts of content to my site. At the moment, it contains about 1.3 million links to blogs.
When I set out to write SproutSearch, it was an exercise in site scraping. Using an HTML parser I had developed earlier, I made a simple program that parsed Blogger's recently updated blogs xml feed. It stored all the blogs it could find in a database. I set up a cron job to run this script every ten minutes. Even when I am sleeping, my website is finding new blogs for me.
Next I made a program to sort the blogs alphabetically, and put them in organized pages. As the number of blogs grew, so did the amount of time my script needed to display a page. I read an article in PHP Magazine about caching, and integrated a caching script into my site. Now, each page would be saved in a file so that it could be recalled quickly.
I left things this way for a while, and it was working well. I started to get more traffic, which was one goal of this project. I noticed that the first page of each letter was geting most of the inbound traffic. For some reason, the search engines won't point people to the hundreds of additional pages that are not #1.
I might try to make some pages with a lot more blogs per page, maybe about 10,000. Then I will see if these big pages generate more traffic than my 500 blog pages.
The other day, I wrote a program to perform searches on my directory and turn them into web pages. I typed in a whole bunch of popular keywords and let it generate pages. I am hoping that these keyword pages will attract a new wave of traffic of people searching for niche terms.
The next step for me is to categorize the topic pages I have been generating.