main page




Rants & Raves

« Gibson and DoS Attacks, Ch 2 | Main | Mac Switch Campaign Parody »

July 26, 2002  Exposing Yourself in Public

Yesterday's question of the day was "Is Google too good?" This was in response to a New York Times article about a woman working as a computer tutor. Her client showed her how he found personal information about her just by entering her name into a search engine. As usual, the mainstream press has overesimplified issues relating to technology. When will people learn that the Internet is a public place. Posting information on the web is no different than having it published in a newspaper. Most of the items that were found on this woman were posted on her own family web site, and therefore, under her own control. I fail to see how a person with computer training would be so ignorant about Internet technology.

The article was misleading on several other points. If you don't want search engines to index your web site, or even just certain pages, you can create a file on the site to keep them out. It is a text file called "robots.txt" and there are tutorials on the web about how to properly create one. A search engine that is about to index your site will first request this file. If the file specifies that a page is not to be indexed, it will ignore it. One note: obeying this file is voluntary. You have to trust that the search engine will comply with it, but all the major search engines do.

The author made it sound as though the archived copies of web pages on Google are permanent. This is not true. The cached page only lasts until the next time Google indexes a site, which occurs about every two months. If the Google search robot finds a robots.txt file excluding it from indexing the site, the cached files will be deleted.

The article failed to mention a search engine that caches web pages far longer than any of the major search engines: The Internet Archive. This site caches web pages for historical purposes. Although it is not yet searchable, it has stored web pages as far back as 1996. It's fun to see what Yahoo looked like six years ago, and see how far things have come. But anything stored there, stays there. Fortunately, if you are the owner of a site, they will remove the information if you request it. But it can still be a little scary for web site owners like me, who, back in my days of relative obscurity, may have made some overly fawning remarks about a certain talk show host which might prove embarrasing. My site has been stored there three times, but I checked, and all of them were cached after the remarks were removed. Whew!

Posted by Christy on July 26, 2002 09:48 PM


Post a comment


Email Address:



Remember info?