August 28, 2003

Spam

With four main email accounts (for the too-many jobs and personal email) Spam became a real problem for me in early 2003. The number of Spam messages in my inbox(s) everyday out numbered the legitimate mail and I started to delete real email on accident. I started getting more serious about filtering because it became necessary.

In March I started using SpamAssassin and haven't looked back. SA uses several methods of spam detection.

Filters -- SA looks for various parameters in the headers and body of an email and increments a score value if found... you can set your "threshold" level to whatever your comfortable with. Once the score value is greater than your threshold, the message is marked as Spam.

Bayesian -- SA also uses a statistical approach first suggested by Paul Graham in his article "A Plan for Spam". It works incredibly well by "learning" what you consider Spam and what you consider Ham (real email). In effect the more email you process with the "learning" software the more effective your filter becomes.

I was "learning" messages today, well SA was... so I thought I'd post my stats for my primary personal (and oldest email account). In the roughly 6 months I've been using SA, its caught nearly 8000 Spam messages. 1330+ Spams a month or 40+ a day for a single account (and I'm fairly conservative about posting my email address online or giving it away).


Posted by ben at 11:21 AM
Comments (0)

April 12, 2004

GMail "scandal"

What's with the constant battering Google's getting in the press regarding its newest service offering (not yet widely available)?

I see daily op-ed pieces, like this one, with computer experts, privacy rights advocates, and lawmakers all in an uproar over the fact that GMail has potential privacy implications. Of course it does. So does all free or for-pay email services. What's different about Google? According to most of these articles, two main differences from other, similar services, concern them:

  • 1GB of email storage
  • Content specific ads based on text in your email

Update 4/16: Finally others are pointing out that the critics of GMail are a little off base: Matt Howie and Slate Magazine for starters.

The argument goes, since Google is offering you so much storage, the potential for someone getting at private data if your account is hacked is much higher. On top of that, generally people are afraid that the index of keywords used to build the content-specific ads will be archived and used later to track you (and your activities). Usually all the arguments dissolve into one main point; Because all this personal data (1GB of email is likely years of data) and this index of keywords (to build the personalized ads) is now on Google's servers, your chances of loosing your data to a) marketing companies, b) hackers, or c) a government subpoena are much greater.

Frankly, this argument could be used against ANY online email provider, even your private ISP -- it just seems everyone is complaining that Google is offering too much space. I suppose if I was concerned about that, I could set my own (arbitrary) limit on stored data just like Yahoo's or Hotmail's own limits, right? The same potential breach exists whether you store 1MB or 1GB -- it only take ones email.

What about the ad index data though? Well, EFF was curious too -- turns out Google (claims) they don't keep an index anyway. All that data is created (and thrown away) on the fly.

I bet the NSA and other law enforcement agencies only smirk when they read how all these privacy rights groups are up-in-arms on this one. They've got to be thinking to themselves, "Why would we risk the media attention by going to Google directly? We'll just pull anything we want off the wire." (which surely they can already do with applications like Carnivore and laws like the Patriot Act).

While, raising the issues that exist with GMail (and any other online service that store personal data) is an important one, it seems many experts simply find the subject of privacy rights and related issues the current hot topic. I don't understand why they go out of their way to condemn GMail before it's even publicly available. Hotmail and Yahoo have been around for years with the same issues plus a history of vulnerabilities that affected millions of users.

Just like much of life (both personal and professional), it comes down to an issue of trust. Who do you trust to keep your data? Who do you trust to handle the security of this data? Many companies have the potential to collect data on you and your activities (think about what your phone company, your ISP, your credit card / bank, or even your grocery store already know about you.) these days. Do you trust "them" to handle your data with the same level of care that you do? I suspect your answer is no.

On a related note. I played w/ GMail over the weekend. Some pretty neat features.

In particular, I like their concept of 'labels' to replace 'folders' for email storage. Quite convenient. The way it works is you create 'labels' for things you want to be able to find later (i.e. 2003 Taxes). You simply assign a label (actually you can assign multiple labels too!) to your emails and hit 'archive'. The email disappears from your inbox into your archive. To get back to it, simply click into the label you created.

It also groups emails into conversations, so a reply to an email that's already labeled inherits the same label(s) plus you can view them together more easily. In my limited use over the past couple days, this works pretty well.


Posted by ben at 03:00 PM
Comments (1)

June 02, 2004

Del.icio.us + Google == PKB?

I finally got a chance to start playing with Del.icio.us this last weekend. In case you missed its release party a couple months ago, Del.icio.us is sorta a social-networked-bookmark site. Simply create an account and start 'bookmarklet'-ing what you're browsing. Super-simple interface lets you add category tags (one or more) to each link plus some metadata. It's quick, it's easy, and it's centralized so you can build and use your bookmarks from anywhere. Joshua, the author, continues to build out the API so you can pull your data out whenever you want.

Anyway, I wanted a simple way to track articles and other goodies I find online (see the 'Notable URLs' to the right -- that's Del.icio.us!). Next time I'm trying to remember where that obscure comparison article on CMS systems is, I know where to look. It's a personal log of what I thought was important each day; articles, references, software, etc. But couldn't it be more?

What I really want is to cache and index the full-text articles I bookmark too. But that would be a time consuming and resource intensive task (putting aside the various copyright issues). In my perfect world, Google would help to build my own Personal Knowledge Base, and yours too.

I see it going down like this:

  • Google buys Del.icio.us. (OK, OK. I threw that in. Really, Google only needs to build a way to grab/cache the individual user's Del.icio.us XML feed...)
  • Google incorporates Del.icio.us into their User Profile framework.
  • Google develops a way to scope searches of web content based on Del.icio.us bookmarks.

So now, I can log into my Google Profile, and manage my Del.icio.us bookmarks. More importantly, I can perform Google searches that limit the content searched to only links (and the full-text of those links) found in my own Del.icio.us database. Google (in my fantasy land) would, of course, also allow you to search based on any combo of Del.icio.us tags, as well as (hey, I'm already dreaming here) let me also search other user's Del.icio.us data. I know, for example, Drew will have the best Apple links, or Josh the best Java links (or Bruce Springsteen or late 80's Goth bands, respectively -- whatever, you get the idea *smartass* ).

I have no idea how much work this would involve for Google. I have no idea what their long term strategy is, maybe this makes no sense for them. I do know how they could pay for it though. Just think of the highly targeted AdWords placements they'll be able to deliver when you are searching your own PKB. These are searches you'd be performing on your own personal corpus of bookmarked web content!

Someone at Google should start working on this for us. We know they get time to work on side projects, and this should be one of them! So, uh, I'll be watching my referrer logs for 'google.com'. *knocks wood* *smile*


Posted by ben at 09:45 PM
Comments (0)

June 13, 2004

Del.icio.us Link Log

Since I found Jeff's and Matthew's posts on their Del.icio.us link log script so helpful, I thought I'd post my modifications for others as well. My version does the link log you see to the right (if you're on /). I had slightly different designs on how I wanted to use my Del.icio.us links:

  • I wanted to post a days worth of my 'noted' URLs (automatically) everyday.
  • I did not want a separate blog, with a single post per link (and matching category-to-tag mapping)
  • I decided to just have a single post per day, with each link as a formatted bullet point (w/ tags) in the body of the post.
  • If available, I wanted the extended information from my link posted as well.


I used Jeffery's script as a base and with some helpful hints on using MT::Placement from Matthew's script, I came up with this 'lightly modified' version of Veen's. Not even worth a copyright, really, but may be helpful to others.

I use cron to run this script just after midnight everyday (the script calculates 'yesterday', and grabs those links only). My MT index template is setup to a) grab posts that are NOT in the 'links' category for the main column and recent posts sections (using MTSQLEntries) and b) only grab entries from the 'links' category for the link log column (using <MTEntries category="links">). MT and its development community certainly makes this stuff simple.


Posted by ben at 10:41 PM
Comments (0)

July 03, 2004

Solving the worldwide VIN shortage

Apparently the worldwide auto industry is running out of VINs. Sounds like the same problem we ran into with IP addresses and I suggest the following solution: Assign every car an IPv6 address. Why not? It's a standard. The math is already done. There's a ton of address space (340 trillion trillion trillion, or so...). Cars will eventually be 'on' the Internet. It makes perfect sense!

Now, I know many of you are shaking your collective heads... Assign a single IP Address to each car!? Pure insanity. And you're all asking the same question now: What about all the other devices in my car that will require IP connectivity? Will I have to assign each device it's own IPv6 address? Won't that be messy? Yes, that *could* be a problem. Really though, basic network router technology is cheap and abundant. Shouldn't each car have a small router/firewall (as standard equipment) to handle IP traffic for internal devices? Shouldn't future IP enabled auto devices use DHCP and be designed to use an in-car NAT?


Posted by ben at 12:00 PM
Comments (0)

September 18, 2004

Sprint PCS Customer Service

Last week my year old Treo 600 started acting funny. It was resetting itself and dying randomly -- the only way to bring it back was to put it back on its charger. I finally broke down and completely wiped my install and started re-installing all my software (and data!) from scratch.

It lasted about a day, then while (ironically) at the Verizon store buying Tedd his new phone (birthday gift), it just gave up. I, obviously, immediately suspected sabotage by the tricky Verizon sales people. *wink*

OK, so this must be a hardware problem. To be fair to the Treo, it's survived for a year living with me. I know I've introduced it to several cement floors in that time... so I can't really blame the phone.

The reasons I kept my Sprint account and insured my phone after leaving Medsphere were, 1) at the time they were the only company offering reasonable all you can data services and 2) their customer service was very good. I figured I'd encounter the same helpful folks.

I called into the support line expecting that they'd diagnose over the phone and ship a new Treo out. While the support person was able to diagnose that the phone needed to be replaced, he couldn't ship one out. I had to go to a store with a technician before they could approve the replacement. The closest store with a technician (I had to ask this!) is in Fountain Valley, 20 minutes away, and not the store less than a mile from my house. Grrr. Fine.

I drove to work last Wednesday, specifically so I could take the phone to Fountain Valley. I love walking into a store and standing around trying to figure out who is supposed to help me. After a few minutes someone noticed me. She wrote up the phone and told me to come back in 30 minutes. I walked the strip mall twice, and then stood around the store for the remaining 29 minutes. Finally someone asked me if I was waiting for something... gah.

Sprint Woman: We are going to have to replace your phone. It will take three to five business days. Then you can come pick it up.

Me: Ummm, this store isn't convenient to get to, especially since there's a store less than a mile from my house (versus 15+ miles). Can you ship the phone to that store instead?

Sprint Woman: Well, you can take your phone there with this receipt saying the phone needs to be replaced, but they'll probably have to diagnosis it again anyway.

Me: Well, they can't do that because they don't have a technician there -- that's why I drove here in the first place.

Sprint Woman: Oh yeah, then no, ... you can't.

Me: Okay, so can you ship the new phone to my house and I'll ship back the old one?

Sprint Woman: Oh. ... No, we can't do that.

Me: So in 3-5 business days someone from this store is going to call me and I'll have to drive back up here to pick it up?

Sprint Woman: Yes. Is that OK?

Arrrrgh! Did you not listen to anything we just talked about?! No, it isn't OK. It's lame.

I did get a new (reconditioned) phone on Saturday (which was nice -- it hadn't even really been a full 3 business days), and the customer service was handled by a girl that barely communicated what she was doing, acted like a teenager (messing with her own cell phone and chatting with the other reps while "helping me"), and when the Vision service wouldn't provision in the store said, "Oh. Just try that again later..." leaving me to wonder if I'd have to come back to this store when it didn't work "later".

Why is it that customer service EVERYWHERE is so bad these days?


Posted by ben at 01:30 PM
Comments (0)