Anatomy of a Twitter Bot


Update: If you’re here in search of sample code for a Twitter repost bot, I would strongly recommend going to the Anatomy of a Better Twitter Bot post, which has a much-improved iteration of the LOTD bot code.

Update: Since Fred Twittered this post, it has reached a somewhat larger audience than expected and I’ll add an addtional note: there will be some interesting new stuff happening with @lotd. I’ve been chatting back and forth with Daryn Nakhuda, a gentleman and a real programmer, about possibilities, and when you put together his ideas, my ideas, and his ability to write sane, reliable code, cool things should result. Stay tuned.

While the logic of “hide your shame” should dictate that I never, ever reveal the code that runs the Twitter Lyric of the Day Club, there’s been enough interest that I’m just going to suck it up and publish it. For any programmers who end up at this post: yes, the code is a little eccentric, and yes, there are a number of ways that it could fall down and hurt itself. Believe me, I know. That said, here’s the rundown:

To run a variation of this @lotd script, you must be able to understand Perl well enough to make some basic modifications to the script, and must be able to set up a simple database table (phpMyAdmin is your friend). In addition, you need a server or account at a Web hosting service that:

  • Allows you to set up a (small) database.
  • Allows you to run perl.
  • Has the XML::Simple perl module installed, or will allow you to install it.
  • Allows you to run scheduled jobs (i.e. cronjobs)

It’s a very simple setup: there’s a single perl script running on my server that gets the replies posted to @lotd via the Twitter API, loads them into a database table, and then republishes the posts in the lotd account (again via the API). It’s scheduled to run once every 15 minutes, around the clock. The script uses the XML::Simple and DBI modules, but doesn’t have any other dependencies.

To run your own bot using this script, you’ll need to:

  1. Create the Twitter account that will be the repost bot.
  2. Create a database table as described in the next section.
  3. Update the script below:
    • The DB connect information for your database. (Lines 21, 43, 75)
    • The Twitter username/password information for your Twitterbot account. (Lines 6, 84)
    • The regex to remove @YourTwitterBot from replies before it reposts them. (Line 18)
  4. Upload the finished script to your server.
  5. Set up a cron job to periodically run your script (@lotd runs every 15 minutes).

…and that’s pretty much it. As I said in my first email to Fred about this, “the hardest part of making something like the @lotd bot is just having the idea — once you’ve got the idea, Twitter makes it easy to build what you’ve got in your head.”

Good luck, have fun, and let me know about anything interesting that you make!

Update: In response to a couple of questions, I guess I didn’t make it explicit—this stuff is released under the Woodie Guthrie license: “This song is Copyrighted in U.S., under Seal of Copyright # 154085, for a period of 28 years, and anybody caught singin it without our permission, will be mighty good friends of ourn, cause we don’t give a dern. Publish it. Write it. Sing it. Swing to it. Yodel it. We wrote it, that’s all we wanted to do.”

Get the Database Table Structure (MySQL CREATE)
Click the link above, copy and paste, and change the table or field names as seems appropriate.

Get the Example Script
Click the link above, copy and paste, and make the updates noted in the post above. If you change the database table or field names, make sure that you also update the script appropriately.

FCC: US Broadband Doesn’t Suck If You Squint Just Right


This via Dave Farber’s IP list. If I tell you that the big news that that the FCC has finally decided to increase the definition of “broadband” from 200Kbps (in one direction) all the way up to 768Kbps, and that “availability” will no longer be determined at the ZIP code level, does that give you a sense of the bizarro broadband fantasy land we’re still living in here in the US?

There’s a link to the actual report (PDF) at the bottom, if you want some more depressing reading.

[Note:  This item comes from friend Ken DiPietro.  DLH]

From: ken 
Date: March 20, 2008 3:49:54 AM PDT
To: Dewayne Hendricks 
Subject: The FCC has released their High-Speed Internet Status report.

Included in this report is some pretty interesting facts, for those of
us that follow this kind of stuff.

There are now over 100,000,000 high speed connections (as
defined by exceeding 200Kbps in one direction) in service in the
United States.
A little over 60,000,000 are connected to residential dwellings.

Of those connections only 5.6% have a greater throughput than

The total number of connections that have speeds in excess of
100Mbps (in one direction) is a staggering 21,708 as opposed to
Japan which has already achieved close to 100% deployment of

Over 95% of all lines are serviced by the duopoly. This would be
the same duopoly that does not exist, according to AT&T's

And with a level of hubris that is beyond all concept of
reality, we find the FCC stating that 99% of all US ZIP Codes
now have, at least, one broadband provider, a statement that
Commissioner Copps called the ZIP code methodology "stunningly
meaningless." Even better, roughly 85% of all ZIP Codes
(estimated) to have four or more providers.

And in a move that I can only term, better late than never, the FCC has
decided that 200Kbps (in only on direction) is no longer a true
definition of broadband) and has voted to increase that rate to 768Kbps,
which coincidentally is the speed that many of the ILECs provide as
entry level DSL.

The FCC's report. titled, "High-Speed Services for Internet Access:
Status as of June 30, 2007" can be downloaded here:

A reasonably good review of this report can be found here: 

Twitter Client Feature Request


So let’s say that you’re like me and have finally come to terms with the fact that you like Twitter.

You initially followed a couple of friends and some internet-famous people. Then you added some of the people you work with and some people who always seem to come up with funny things to Twitter. Then some more internet-famous people, and a few genuinely interesting people that you’ve never heard of. And a few more friends. And then a couple of people that you found through @messages sent by the people you already follow.

How many people is that you’re following, now? Do you actually have enough spare bandwidth to make effective use of an input stream that’s made up of the output from 50, or 500, or 5,000 [yes, I’m looking at you, jowyang] people?

Sure, one good answer is to modify Dave Winer’s RSS insight and view Twitter as a river of tweets. You hop in and catch what you can, and don’t worry to much about the stuff that gets carried past you by the current. But I think there’s a limit to how well that works. The different people in my Twitterstream are important and interesting to me for many different reasons; some people I always want to hear from, and and I may not want to hear from (and about) all of them all the time.

So here’s what I propose: a Twitter client that (a) allows you to flag each person that you follow as a member of one or more groups, and (b) allows you to dynamically filter which tweets are displayed, based on group.

With that functionality I can follow all the people who interest me while also ensuring that I don’t miss the output from the small group of people that I’m most interested in. I like both the [shudder] “ambient intimacy” experience of updates from people I know and the “exhibitionist meets stalker” experience of updates from people that I don’t know, and I’d like to be able to keep both of them as a part of my Twitter experience.

Maybe it’s just me, but this feels like something that could be really useful. Any takers?

Admin Note: NYC Web Analytics Meetup


In NYC, interested in Web analytics, and looking for something to do on this coming Tuesday evening? Lucky you!

Following the untimely demise of the former NYC Web analytics meetup group, a friend has started a brand new shiny NYC Web analytics meetup group. He started the group on Friday and scheduled the first meetup for Tuesday, so the extent of the agenda is some casual discussion and some coffee and beer drinking, but it should be a good time.

It’ll be happening at 7PM on Tuesday, January 22nd, at Café Orlin on St. Marks Place (#41, betweeen first and second avenues), so swing by if you’re in the area and interested in Web analytics, coffee, or beer.