Send As SMS

Monday, June 07, 2004

Using Junk Mail Filtering techniques to Spot Common Threads

So a couple weeks ago, the good folks at O'Reilly MacDevCenter posted a concise article detailing how the Mail.app junk mail filter works. The article, The Fight Against Spam, Part 2 is somewhat light in details, but is a GREAT overview. If you've ever wondered about how junk mail filters work on your Mac, you should probably click over and check it out.

I'll spare you the gorey details, but one aspect of Mail.app's junk mail filters is that it attempts to find clusters of messages in a "phase space." The space is a multi-dimensional place where you have a dimension for each word in the English language. Messages that have the similar words, will wind up being closer to each other in this imaginary space. Mail.app looks for clusters of email messages surrounding known SPAM messages, and guesses that since they have a lot of the same words, they're probably SPAM as well. (this is a gross oversimplification, of course...)

So I was just thinking... A lot of my online experience is in mailing lists and news groups. I scan these things visually, using subject headings as a key. Mailing list or news group threads tend to eventually wander off from the original topic, but the subject line frequently stays the same. Wouldn't it be nice to have a system like the junk mail filter to scan messages sent to a mailing list and cluster similar messages into a "thread."

Even more fun would be to use OpenGL to create a 3d representation of phase space clusters, possibly with links between clusters. The links could be labled with concepts or word groups that are the same (or similar) in different clusters...

Just an idea...

Wednesday, June 02, 2004

General Service Level of Orkut

People who know me know that I have an urge to use the coolest, newest technology. Except possibly for mobile phones... I'm burnt out on mobile phones. But when an Orkut inviation came my way, I was overjoyed. I thought, "Orkut! Cool! I'm getting invited to an invitation only service. Someone loves me!" So I'm pretty happy with Orkut. I've used Ryze and LinkedIn before. My wife likes Tribe and I'm sure I know someone out there using Friendster. And just about everyone I know in the Bay Area has spent time looking at Craig's List, even if it's just to buy a toaster. All this is simply my way of saying, I've used a number of online FOAF and community building sites.

But I've noticed a lot of service outages lately. Many times I've tried to logon, looking for someone's contact info, only to get a 'server not available' type error. Ugh. If you really want me to consider Orkut to be a "critical service," you're going to have to have better uptime.