Updates not working for the last week
June 14th, 2008Both Revisionista and Watch Your Mouth have not been updating correctly for about a week due. Working again now. Sorry for the outage.
Both Revisionista and Watch Your Mouth have not been updating correctly for about a week due. Working again now. Sorry for the outage.
The BBC upgraded their forum software on the Have Your Say site and it broke News Sniffer. If you’re interested, they’ve changed from using the “Thread ID” in their urls to using a “Forum ID”. I’ve done a quick patch up to fix things, but any comments censored in the last week or so are lost to News Sniffer. It’sl going to take a while whilst things get back up to speed, but it should be done by tomorrow morning.
Thanks to the News Sniffer reader who spotted the lack of updates and got in touch with me. It was a relatively quick fix once I knew about it.
News Sniffer was down for a couple of hours yesterday afternoon whilst I upgraded the software that runs it. This fixes a few display bugs and more importantly patches up another problem where bbc comments were being misclassed as censored. The misclassification was due to a mistake when handling british summer time, so it’s only been prevalent since the end of March. We double checked all censored comment from the last few months and removed any misclassified ones.
Due to the downtime whilst we cleaned and reindexed the database, some news articles and bbc comments were not added to the database so will not be tracked. So, if you’re looking for a particular news article published yesterday afternoon, it’s likely that it is missing. Apologies for the inconvenience.
The biggest new feature is vastly improved search system.
It works like most search engines, so just type keywords in and you get results. You can be a bit more advanced too though. Say you want to find all censored bbc comments by the author ‘John Smith’. Do a search for:
author:"john smith"
Apologies but due to a VPN problem, News Sniffer has not been monitoring news articles or Have Your Say forums for the last 2 days. All working again now though. I’ll set up some alerting system to prevent this happening again.
When the BBC discovered News Sniffer, I was invited to discuss it on their techie lists. I mentioned a few of the problems I’d had with the feeds such as duplicate entries, a lack of useful caching HTTP headers and the huge size of the feeds. In response to this they looked into it and fixed the duplicate entries within a couple of weeks.
Yesterday they changed the default size of the feeds but also rejigged the RSS format. This broke Watch Your Mouth in a number of ways (mostly affecting only new threads since yesterday):
Due to a combination of some of the above, some comments were marked censored when they were in fact published. I’ve adjusted Watch Your Mouth in response to these changes and it’s working ok again now. I’ve also run the clean-up scripts so any published comments marked censored have been restored - you might notice a bunch of “censored” comments disappear from the indexes.
These kinds of problems are expected when you’re monitoring data that’s in the control of someone else (especially in ways they might not have ever intended). I just need to keep an eye on the situation and make alterations accordingly when problems arise.
To compare it to an “arms race” isn’t quite right because there is no evidence that the BBC are purposefully making life difficult for us. In fact, these changes are actually helpful.
UPDATE: Due to the combination of malformed BBC RSS timestamps (hours going from 0000 to 2400?!) and a bug in Watch Your Mouth, we’ve been missing a lot of potentially censored comments on some threads for the last 3 or 4 days. I’ve now written a workaround to this quirk so things should be back to normal.
The BBC Editors blog has mentioned News Sniffer today.
It’s largely just marginalised us, but it’s rather ambiguous. When they suggest that Revisionista will not find examples of bias, I can’t decide if they mean that the BBC is not biased, or that they are just very good at it.
Some of the recommended revisions are interesting, but maybe we need a comments feature so people can explain and discuss their recommendations.
And they also describe their censoring of ‘Have Your Say’ comments as “censoring” in quotation marks. I’m not sure what this is supposed to mean either. Is it not censorship when they remove comments? Or do they not remove comments?
See the top recommended censored comments for some interesting examples of “censorship”.
I rolled out a new version of News Sniffer last night. A lot of work went into rejigging the censored comment detection to make sure we don’t make mistakes. I also wrote a system to confirm censored comments by html scraping, which gives us a way to double check censorship.
This also allowed me to check the entire backlog of censored comments. There were a number of comments that we thought were censored but were not and those are now fixed. We also found a number of comments that we didn’t know had been censored.
Apologies for this. My understanding of the BBC HYS RSS feeds was flawed (to be frank, largely due to some brain-deadness on the part of the BBC forum software).
So you might notice that some existing censored comments disappeared but other ones appeared for the first time. I’ll be running the confirmation script regularly to monitor the new checking system (though not as regularly as the others as it’s a bit intensive).
The “Recommended Comments” page now lists the latest recommended comments, not the highest recommended as before. Comments in order of highest recommendations can now be found on the “Top Recommended Comments” page.
The thread listing pages now displays the number of published comments along with the number of censored. This helps give an idea of how busy a thread is (you might expect busier threads to be more censored).
And lastly, the thread display page now includes the BBC HYS description, which gives a bit of background to the thread and links to any news articles that might have prompted the it.
Thanks to datamining.typepad.com and currybet.net for the feedback that led to some of these improvements.
The ‘Watch Your Mouth’ system is currently marking some comments as censored when they are not. This seems to be due to the BBC’s servers being out of sync with each other and I get out of date RSS feeds. I have a solution for this and am working on it.
This was brought to my attention (very gracefully) by a BBC employee.
UPDATE:: Problem fixed. No more comments should be mis-classified. I’m working on verifying the backlog, but it should only have been a small number of comments. It was a bit of a corner case causing the problem.