« Throttle Off | Main | Give Poor Aggregators Less »

December 20, 2004

Aggregator Developer

Andy Henderson wrote in with some very interesting questions that he is allowing me to share. Please post comments or reply to him.

Andy writes:

I am a developer of an Aggregator - the CITA RSS Aggregator available from www.SeeITA.com/RSSA/RSSA.html. I developed it for a specific target market and I do not expect its use to grow to the extent that it will materially affect any RSS servers. However, I want to be responsible so I am trying to take suggestions from this forum seriously. I have already implemented ETag and Last-Modified header processing and now I am considering Randy Charles Morin's HowTo document to improve my Aggregator's behaviour.

It gives rise to several issues:

1) How should I understand the word 'Hint' in this context? Is the information an instruction that the Aggregator should obey, or is it a suggestion the Aggregator should convey to its user and allow them to decide how they act on it?

2) The skipHours/skipDays tags gives me several concerns:

a) A straw poll of 29 sites that I monitor reveals that none of them implement skipHours or skipDays. Syndic8 says that 1.98% of feeds use skipHours and 0.18% use skipDays. That's a pretty small minority to code for.

b) How many RSS publishers properly understand that the hours are in GMT? Most Americans I have met (I'm a Brit) think London is GMT - but that's only for half the year. Alternatively, how many will simply use their local time zone by mistake?

c) There is an obvious issue when people are in different time zones. For example, a reader in Singapore has no working hours overlapping with working hours in, say, New York. Simple implementation of the skipHours tag could mean that an Aggregator would never poll some feeds for some people.

d) Similarly, not polling on, say, Sundays is subject to interpretation. Is it Sunday for the reader or the publisher that the Aggregator should exclude from polling? I guess it has to be subject to the reader's time zone, but does that implement the publisher's expectation?

3) ttl is more widely implemented - Syndic8 says 7.74%. Use of the Syndication module will push this up a bit. However, I find that most sites specify hourly caching or less. My Aggregator's minimum poll interval is one hour, so implementing ttl would increase polling in many cases! That having been said, I plan to implement the higher of ttl and hourly polling as a minimum polling interval - subject to question 1 above.

4) I'm obviously interested in the Accept-Encoding tag because it looks like everyone wins from correct implementation of that one. However:

a) I use the .net HttpWebRequest class to read RSS feeds but I can't find any authoritative statement about whether this tag is implemented automatically in the .net framework, or not. I can't see why it wouldn't be (and the referenced w3.org document suggests that, by default, the server can send a compressed response), but we are talking Microsoft here.

b) As I understand it, the reader tries Accept-Encoding with, say, "gzip,deflate" if it gets a 406 (Not acceptable) status code back it has to try again without the Accept-Encoding. But doesn't this mean a double hit on servers that don't support compression? OK, it could remember the initial response but suppose the server is upgraded to support compression later on?

c) I can't find any tutorial on how to handle compressed responses from a server. If .net doesn't handle them automatically, can anyone point me in the right direction?

Sorry if my naivety is showing. Obviously, I don't expect you to be able to answer my questions but maybe one of your readers will be able to help.

Andy Henderson Constructive IT Advice Andy@SeeITA.com

Posted by Glennf at December 20, 2004 04:17 PM

Comments

My answers.
http://www.kbcafe.com/rss/?guid=20041222185425

Posted by: Randy Charles Morin at December 22, 2004 06:51 PM