The Technology of Tracking


No, not the Verichip and its “the end times are here” fan club.

Just plain old tracking of who’s doing what on the internet. The Christian Science Monitor recently published a non-techical article on the difficulties of accurately tracking how many people visit their Web site. “People” is the operative word here — the beauty of the Web from a tracking perspective is that you’ve got very precise record of how the machines involved are interacting; knowing something about the people attached to those machines is something else entirely.

With didtheyreadit’s recent, brief moment of email tracking infamy, a million and one discussions of how one might track RSS feed usage (including FeedBurner’s excellent update to their tracking reports), and — of course — MarketingSherpa’s belated realization that email open and clickthrough reporting may not be all that they’re cracked up to be, a couple of things seem to be happening.

Companies are starting to pay attention to online operations again, and asking the right sort of questions: who is coming to my site/getting my emails/reading my RSS feed? What are those people doing when they access the content I’m putting out there?

Companies are realizing that this tracking is a lot harder than it seems. While DoubleClick, 24/7, and a host of smaller companies offer tools (some better, some worse) to track and analyze Web traffic and email activity , relatively few organizations have the money to spend on those sorts of tools. Even fewer have any idea of what to do with the data once they have it.

We’re mostly moved past using the httpd access_log for purposes that nature never intended, but even when tracking tools are using more user-focused metrics, we often don’t know what those metrics are, nor what assumptions they’re making. Becuase there are machines involved in every step of online business, we often opt for the comforting illusion that we therefore have volumes of bulletproof data about users and their actions, when that’s just not the case.

Users are (for the moment) not hardwired into their computers, and it’s the computers that we have data on, not the users. We can extrapolate from machine to user pretty well, but it’s essential that we understand the assumptions that we’re making and the attendant limitations.