Monday, September 22, 2008

The shocking story behind the banning of No Minister



The banning of No Minister from various government departments created a bit of a stir.
Was it all part of a dastardly wicked government plot? Or do we bloggers need to mind our language? Either way, the technology allows such plotting, it seems.
I present the following correspondence between myself and MailMarshal’s Auckland-based marketing man Adrian Duigan to help you decide.

Our WebMarshal product has two mechanisms for determining the nature of a website. One is a filtering list, the other is lexical analysis scripts (essentially an advanced keyword search).
I have checked your blog site against our filtering list where it is listed under the category of “Newsgroups / Bulletin Boards / Blogs”.
This would seem to be an accurate classification of your site to me. You can check any site you wish against our database yourself here: http://www.marshal.com/products/webmarshal/marshalfilterlist/result.asp
This leaves our lexical analysis scripts. We provide a number of lexical analysis scripts to customers as part of the default rules that ship with our product. These scripts look for various arrangements of words and phrases. The scripts are weighted, requiring multiple instances of words to build a score which will trigger a threshold. Scripts are provided to look for profanity, hate, racism, and other content that customers consider to be objectionable. These scripts are not turned on by default, they are provided to customers for convenience so they don’t have to create them from scratch. The customer determines if they want to turn on a particular script to help enforce their workplace acceptable use policies.
I have checked your site against the default scripts that we provide customers and this seems to be the source of your problem. Your blog triggers our profanity script because of the presence of various potentially offensive words.
Words like “tits” and “pricks” in isolation will not trigger the script, but in context with other phrases like “large cock” a picture forms of the nature of your blog that triggers our product. The problem is not one of nudity but the language used on your site.
My suggestion is that you avoid using these kinds of words if you want government employees to be able to access your web site while they are at work.
Many of our customers do not want their employees exposed to potentially offensive language – even if it is in a humorous context. You are welcome to download a trial of WebMarshal from our website for the purposes of testing it against your site if you wish.
Incidentally, I have noted comments about WebMarshal on your blog site. For the sake of clarity, WebMarshal does not scan images. So, pictures of chicken meat and smiling politicians are simply not a factor when determining the nature of a web site for WebMarshal.

But some other websites seem more offensive than No Minister, I said.

Without more information I cannot really explain this discrepancy as itcould be down to any number of factors (different users, different departments, accessing the sites at different times of day, etc). If you could point me towards some of these blogs where the issue is being discussed I might be able to offer some insight into this apparent contradiction.
As for government conspiracies, the information I have provided wouldn'trule that out. "They" just might be a little more clever than you think"they" are.

I suggested Adrian looked at some other blogs.

Indeed these sites do make for entertaining reading and I think I have a better insight into your concerns regarding deliberate political censorship. I don't believe that is happening. I will try to keep this brief as it can get complex, and please forgive my language.
No Minister triggers our default profanity script at MOH - that much is clear from the blog at welllingtonhive - the description of whyWebMarshal triggered is exactly what I would expect to see and replicated for myself.
I noticed on a few other blogs that words like "tw*t" and "sh*t"appeared but did not seem to be blocked by WebMarshal. These instances seem to be isolated and infrequent. Words have different weightings in WebMarshal.
A word like "tit" might score 3 points while a word like "c**t" might score 15. The trigger level required is 60 points. No Minister scores well over 60. I am notsure if you are aware of just how much profanity is on No Minister, but it is quite a lot.
Part of the problem is links to other blog archives containing profanity in the title, not just the current articles. The words "sh*t", "f**k" and "oral sex" all appear on No Minister. Combined with all of the other words mentioned earlier No Minister scores around 75 points.
Some of these other blogs are scoring 30-40 points, but not enough in total to be blocked by WebMarshal. While they appear to be less offensive to a casual reader, WebMarshal is evaluating every word on the site, so it is seeing every instance and adding that to the score.

So what can we do now? Mind our language?

It is not that easy I am afraid. WebMarshal categorizes websites as users visit them. When the first MOH visitor went to No Minister,WebMarshal read the site and thought "profanity" and categorized yoursite as 'Adult & Nudity'. When the next user tried to visit your blogWebMarshal didn't bother reading it again as it has already determined your site violates policy.
For MOH users to be able to visit your site again, you will have to reduce the instances of profanity, but also get the MOH system administrator to delete No Minister out of 'Adult & Nudity' category so that WebMarshal doesn't remember categorizing it.
I very much doubt thatMOH will comply with your request but they may comply with their ownemployees/users complaining that WebMarshal has miss-classified yoursite.
So, I suggest you ask MOH readers of your blog to ask their system administrator to re-classify it.
One more thing to keep in mind. Context is important to WebMarshal. When you use words like "tit" and "cock" in isolation WebMarshal might give these words a score of 3-5 points.
When you use words like "large"before them, WebMarshal multiplies the significance of the word so you might end up scoring 20 points for things like "large" "cock".
Essentially, WebMarshal will ignore words like "teen" and"pussy" by themselves but put them together and you sound a lot like a porn site.

3 comments:

Psycho Milt said...

As I suspected, the banning of No Minister was down to Psycho Milt's foul mouth, rather than yet another fiendish and corrupt Labour conspiracy. Sorry chaps.

I do recall someone commenting last year that my language would get No Minister blocked by filtering software, and I decided at the time that there's not much point in having a blog if you let some outside party tell you what you can write.

Oswald Bastable said...

It is quite right that government servers block blogs.

The worker drones should all be- well...working!

No reading blogs, newspapers, cartoons or looking for love on Trade Me!

FAIRFACTS MEDIA said...

PM, while the thought of conspiracy crossed my mind, I was awaiting further evidence before finalising a judgement.

I never thought we had more profanities than Whale Oil, for example.

But as Adrian Duigan notes, the technology for conspiracy is possible.