Now excluding BBC sport articles

April 22nd, 2007

Revisionista now explicitly excludes BBC sport articles from it’s monitoring and I’ve deleted any from the last couple of months. This is because these articles generate a vast number of revisions mostly due to score updates during games. This is wasting News Sniffer resources that could be better spent monitoring other articles.

Upgrade: wym comment cleanup, downtime, and improved search

April 20th, 2007

News Sniffer was down for a couple of hours yesterday afternoon whilst I upgraded the software that runs it. This fixes a few display bugs and more importantly patches up another problem where bbc comments were being misclassed as censored. The misclassification was due to a mistake when handling british summer time, so it’s only been prevalent since the end of March. We double checked all censored comment from the last few months and removed any misclassified ones.

Due to the downtime whilst we cleaned and reindexed the database, some news articles and bbc comments were not added to the database so will not be tracked. So, if you’re looking for a particular news article published yesterday afternoon, it’s likely that it is missing. Apologies for the inconvenience.

The biggest new feature is vastly improved search system.

It works like most search engines, so just type keywords in and you get results. You can be a bit more advanced too though. Say you want to find all censored bbc comments by the author ‘John Smith’. Do a search for:

author:"john smith"

Read the rest of this entry »

New domain name

April 9th, 2007

News Sniffer now has its own domain name: www.newssniffer.co.uk
The old address (newssniffer.newworldodour.co.uk) will automatically redirect you, but update your bookmarks!

No updates for the last 2 days

March 1st, 2007

Apologies but due to a VPN problem, News Sniffer has not been monitoring news articles or Have Your Say forums for the last 2 days. All working again now though. I’ll set up some alerting system to prevent this happening again.

BBC fixes RSS feeds – breaks Watch Your Mouth

December 8th, 2006

When the BBC discovered News Sniffer, I was invited to discuss it on their techie lists. I mentioned a few of the problems I’d had with the feeds such as duplicate entries, a lack of useful caching HTTP headers and the huge size of the feeds. In response to this they looked into it and fixed the duplicate entries within a couple of weeks.

Yesterday they changed the default size of the feeds but also rejigged the RSS format. This broke Watch Your Mouth in a number of ways (mostly affecting only new threads since yesterday):

  • The timestamps of new comments weren’t reported correctly
  • The author details of new comments were missing
  • New thread description details missing

Due to a combination of some of the above, some comments were marked censored when they were in fact published. I’ve adjusted Watch Your Mouth in response to these changes and it’s working ok again now. I’ve also run the clean-up scripts so any published comments marked censored have been restored – you might notice a bunch of “censored” comments disappear from the indexes.

These kinds of problems are expected when you’re monitoring data that’s in the control of someone else (especially in ways they might not have ever intended). I just need to keep an eye on the situation and make alterations accordingly when problems arise.

To compare it to an “arms race” isn’t quite right because there is no evidence that the BBC are purposefully making life difficult for us. In fact, these changes are actually helpful.

UPDATE: Due to the combination of malformed BBC RSS timestamps (hours going from 0000 to 2400?!) and a bug in Watch Your Mouth, we’ve been missing a lot of potentially censored comments on some threads for the last 3 or 4 days. I’ve now written a workaround to this quirk so things should be back to normal.

News Sniffer used by NHS Blog Doctor

December 4th, 2006

The NHS Blog Doctor blog used Revisionista to expose a bit of a “cover-up” on a BBC News article.

Basically, NHS Blog Doctor criticised a story on babies with milk allergies published by the BBC. They even made a formal written complaint. The article was then changed and readers started accusing NHS Blog Doctor of misrepresenting the BBC. They never received a reply from the BBC about their complaint. Not even an acknowledgement.

They used News Sniffer to show the article had been changed. Go read the whole thing.

The Revisionista diff of the particular change is here.

Revisionista parser changes – BBC flurry

November 2nd, 2006

I’ve tweaked the way BBC news articles are parsed for Revisionista. Unfortunately this means you’ll see a flurry of new revisions, with no actual changes (though the whole article will be marked as changed).

I also fixed the Guardian news article title parsing – they changed things around a bit. This shouldn’t result in any flurries.

BBC Editors blog about News Sniffer

October 31st, 2006

The BBC Editors blog has mentioned News Sniffer today.

It’s largely just marginalised us, but it’s rather ambiguous. When they suggest that Revisionista will not find examples of bias, I can’t decide if they mean that the BBC is not biased, or that they are just very good at it.

Some of the recommended revisions are interesting, but maybe we need a comments feature so people can explain and discuss their recommendations.

And they also describe their censoring of ‘Have Your Say’ comments as “censoring” in quotation marks. I’m not sure what this is supposed to mean either. Is it not censorship when they remove comments? Or do they not remove comments?

See the top recommended censored comments for some interesting examples of “censorship”.

Watch Your Mouth updates

October 30th, 2006

Back end changes

I rolled out a new version of News Sniffer last night. A lot of work went into rejigging the censored comment detection to make sure we don’t make mistakes. I also wrote a system to confirm censored comments by html scraping, which gives us a way to double check censorship.

This also allowed me to check the entire backlog of censored comments. There were a number of comments that we thought were censored but were not and those are now fixed. We also found a number of comments that we didn’t know had been censored.

Apologies for this. My understanding of the BBC HYS RSS feeds was flawed (to be frank, largely due to some brain-deadness on the part of the BBC forum software).

So you might notice that some existing censored comments disappeared but other ones appeared for the first time. I’ll be running the confirmation script regularly to monitor the new checking system (though not as regularly as the others as it’s a bit intensive).

Front end changes

The “Recommended Comments” page now lists the latest recommended comments, not the highest recommended as before. Comments in order of highest recommendations can now be found on the “Top Recommended Comments” page.

The thread listing pages now displays the number of published comments along with the number of censored. This helps give an idea of how busy a thread is (you might expect busier threads to be more censored).

And lastly, the thread display page now includes the BBC HYS description, which gives a bit of background to the thread and links to any news articles that might have prompted the it.

Thanks to datamining.typepad.com and currybet.net for the feedback that led to some of these improvements.