links for 2008-09-29
-
Selenium is an awesome scraper. This is a simple scraper that performs the same function as WGet or Lynx --dump, that is: get a remote HTML file. However, instead of retrieving the code of that file, RWget retrieves its DOM as rendered in Firefox (or another browser).
I can also a do a "rendered diff," that is: get the innerHTML of the BODY tag from Firefox, then compare that with the innerHTML of the same page as stored on the server. Based on a presentation by Kord of Splunk at Ajax World 2008.