RT articles are now tracked

April 7th, 2016

News Sniffer now monitors the RT news website to track changes in articles.

You can search using the term “source:rtcom” to limit results to RT.

We’ve been monitoring them for only a few days but changes are already rolling in. In this article about Dutch voters say no to ratifying EU-Ukraine deal for example you can see the details unfold over three hours. And here, RT editors realise someone spelled Nagorno-Karabakh incorrectly!

Let us know in the comments, or on twitter if you spot any more interesting changes.

News Sniffer hits one million news article versions

July 8th, 2013

News Sniffer retrieved it’s one millionth news article version last week, after running continuously for almost 7 years!

The first version ever collected was on the 29th August 2006 – 6 years 10 months 10 days ago.

There are currently 1,004,651 versions of 394,967 articles, so each article has on average 2.5 changes. It’s collected 185,949 versions this year alone which is about 1,000 versions discovered each day.

For the techies, it currently takes up about 7 gigabytes of data on disk (in MySQL) with an additional 29 gigabytes of search index (in Xapian).

Looking forward to another million versions, which will come sooner as we start monitoring more news sources.

Remember that the News Sniffer project is open source, so if you’re a Ruby programer (or can hire Ruby programmers) you can make it better, or even run your own sniffer! Join in!

Problems tracking Guardian articles fixed

December 6th, 2012

Due to a change over at The Guardian, News Sniffer was detecting some errant changes in articles. This has now been fixed, but some versions of Guardian articles going back a couple of weeks have been lost.

BBC News Twitter widget now ignored

November 11th, 2012

News Sniffer was seeing the new BBC news Twitter widget updates as article content changes, and so some BBC articles had several hundred versions. I’ve updated the BBC news parser to ignore the twitter widget now, and manually cleaned out a few recent articles that had high version counts.

New York Times articles are now tracked

August 28th, 2012

News Sniffer now tracks changes to New York Times news articles. A couple of interesting changes have already been discovered, such as this article that used to mention how American intelligence agents helped funnel arms to Syrian rebel groups.

You can specifically search New York Times articles using the search keyword “source:nytimes”.

Also, you can now link to specific paragraphs and they’ll be highlighted. For example, here we link to paragraph 4 of this article showing the BBC renaming the Syrian government to the Syrian regime.

BBC News articles missing

May 27th, 2012

I’ve just discovered that on the 12th April 2012 the BBC changed the format of their urls, causing News Sniffer to stop tracking nearly all their articles.

I’m working on fixing this now but most of the articles from that period (12th April through to 27th May) are lost to News Sniffer.

I’m also figuring out a way to be notified when things like this happen again in future.

I’ll keep this post updated with the progress.

UPDATE 23:04: This is now fixed and BBC articles are being monitored again.