

			Search Engines and Privacy

			      Marc Roessler
			marc@tentacle.franken.de
			      March 28, 2001



				Abstract

		This paper analyzes the impact of using
		 search engines on the user's privacy.
		Several privacy related issues were found,
		 such as search engines leaking the query
		string to advertising companies or sharing
		 user identifiers with other sites. This
		     paper will show the way this is
		accomplished as well as possible counter-
		   measures to be taken by the user to
		          preserve her privacy.




1. Introduction

Search engines may be one of the most widely used and most important
services the World Wide Web offers.

Concerning privacy search engines could be described as a 'data bottleneck'.
Companies usually need to analyze lots of access statistics to create a
profile for a single user. Yet when using a search engine the user herself
submits condensed and accurate information on her topics of interest. This
makes search engines an interesting target for profiling and advertising
companies.

Many users do not understand why leakage of information not containing any
personal data such as email addresses or real names is considered
a threat by privacy concerned users. However the tiny bits of information
which get dropped every now and then form a trail which can be used
to trace users, i.e. the user is not anonymous any more but pseudonymous.
Being pseudonymous means that it is not known who the user really is, but it
is known that it is her. The problem is that giving personal data such as
an email address or a name only one time will make all past and future
pseudonymous actions personal actions which can be linked to the user's real
identity.

Also in the computer security sector it is common practice to disclose as
little information as possible as every tiny bit of information might provide
valuable information to an attacker trying to break into the system. There
exist attacks exploiting bugs in web browsers yielding access to the user's
system, and knowledge of the exact browser and OS version will simplify the
task of the attacker.

After discovering that a major search engine had started to use redirects
enabling the search engine server to log which of the search hits were
actually visited a closer look at several popular engines was taken.

The follwing search engines were analyzed:

	www.altavista.com
	www.excite.com
	www.google.com
	www.lycos.com (only www.de.lycos.de tested)
	www.hotbot.com (www.hotbot.lycos.de, www.hotbot.lycos.com)
	www.webcrawler.com
	www.yahoo.com
	www.metacrawler.com
	www.looksmart.com
	www.directhit.com
	www.goto.com
	www.go.com (same as www.infoseek.com)
	www.mamma.com

Banners and advertising links (href) were ignored. Only privacy related issues 
concerning regular use of the search engine (submitting a search string,
viewing the results, visiting found webpages) or occuring without user
interaction were examined. An example for an event occuring without user
interaction is the automatic loading of an image.
Be advised that the fact that banners and advertising links were not analyzed
does not say anything about whether information is leaked when clicking them.
In fact it was found that the search query string is often passed to the
advertiser when clicking an advertisement.

For the code snippets presented in this paper unimportant parameters such as
border, height, width etc were omitted. The search hits used for illustration
were randomly selected without paying attention to the company or site they
pointed to.

Several techniques that have an impact on the user's privacy were found to be
in use. There also were some strange side effects which may or may not pose a
threat to the user's privacy. They are mentioned nevertheless so the reader
can judge for herself.

One of those 'effects' is caused by the use of images instead of buttons for
submitting html-forms. If the image is clicked the coordinates of the
click will be transmitted along with the data of the html form.
If the button is not pressed but the form is completed by pressing Enter
in the query string input field the x and y parameters will usually not appear
in the form data submitted. Several browsers were tested, the following table
shows what parameters are sent along with the form data for each browser when
either clicking or submitting by pressing Enter:

			Enter			Click
Netscape Nav. 4.76	no x/y params		x/y coordinates
MSIE 5.01		no x/y params		x/y coordinates
Opera 5.02		no x/y params		x/y coordinates
Lynx 2.8.3rel.1		x/y: zero		-

If lynx is used and the form is sent by pressing the submit
image replacement the string '&x=0&y=0' will be appended to the form
data. This string is very unlikely to occur by clicking the image using
Netscape, MSIE or Opera.
 
This means that a probabilistic approach for determining whether a GUI based
browser or Lynx is being used is possible. This works even if the HTTP
headers are rewritten using a simple filtering proxy.

Another problem is the fact that the URL of the search results page usually
contains the query string. Contrary to what one might expect parameters are
included with the URL in the referer field of the HTTP header (Netscape Nav.
4.76, MSIE 5.0 tested), which means that the query string will be leaked to
every site visited. This means that all advertisers displaying banners on the
search results page and all visited sites will know the search string if
referers are not filtered by an intermediate proxy.

Another strange piece of code related to ad.preferences.com looks as if
it was used to measure the time needed for loading the search result pages.
At the beginning of the page an image tag such as
	<IMG SRC="http://ad.preferences.com/image;spacedesc=Targeting
        Network_ExciteNet_468x60_RunofSite_Any&ML_NIF=Y&time=2001.03.05
        .23.12.00&TG_NETLOC=wsr/default&TG_COBRAND=&ex_reg=">
is placed. At the end of the page there is some request to a program or
script named 'ping' at the preferences.com site:
	<IMG SRC="http://ad.preferences.com/ping.ng;spacedesc=TCExcite
        Data_WebCrawler_1x1_RunOfSite_Any&event=KeywordResults&time=2001.
        03.05.10.12.05">
In combination with user identifiers (such as cookies) in theory this can
be used to measure the speed of the user's internet connection.

Another thing which was noticed were the extremely long expiry times of
some of the cookies. For instance 31-Dec-2030 23:59:59 seems to be a bit
far fetched.


Disclaimer:
No guarantee is given concerning correctness or completeness of the information.
Giving correct and up to date information is almost impossible due to the
fact that the HTML templates of search engines are highly subject to changes.
When in doubt browse the html source yourself.
This paper proposes no claims whether the mentioned companies do use the
gathered information, it only shows that they have access to the
mentioned data which means that they are able to store and use it.
Some of the mentioned data can be used to improve the service provided to
the customer, e.g. by sorting search results by the number of users which
followed them, thus making the search more effective.






2. A look at some search engines



2.1 www.altavista.com

Uses cookies:	Yes.
Cookie Example:	AV_USERKEY=AVS037fdee1e2ac000a20203c0051635
Cookie Expiry:	expires=Tuesday, 31-Dec-2013 12:00:00 GMT

The query string ('privacy on the net') is leaked to ad.doubleclick.net,
possibly including IP address if no proxy is used.
This happens without any user interaction by including a layer located at
ad.doubleclick.net, passing a few parameters inluding the query string.
	<layer src="http://ad.doubleclick.net/adl/avfilteredsearchresults
	.com/results;sz=468x60;kw=%2B%22privacy+on+the+net%22;lang=XX;
	cat=stext;tan=f000;ord=19953907?">
	</layer>
Chris Brenton [1] quotes an unnamed person on the 'ord' parameter seen above.
According to this person the ord value includes the value of the cookie
issued by Altavista, which means that Altavista and doubleclick are sharing
user identifiers. He or she showed that the value of the ord parameter
converted to hex (e.g. by using perl -e 'printf "%x\n", <number>;') is part
of the Altavista cookie. This was in December 1999.
This has been checked again and it turns out Altavista obviously has modified
the 'syntax' of its cookies. They are much longer now and the ord value
is at least not trivially contained in the cookie. Of course this proves
nothing, the ord value may still be part of the cookie in some way.

Altavista does not link search hit URLs directly but via a redirector
located on the Altavista server. This (in theory) enables Altavista to tell
which search hits actually were visited. In conjunction with cookies and the
query strings this is nice ground for a database.
	<a href="/r?r=http://www.tamos.com/privacy/"
	onMouseOver="status='http://www.tamos.com/privacy/'; return true;">
	Privacy on the Net: Practical Issues</a>
Note that this redirect is hidden from the user if JavaScript is enabled:
to the user the URL will be shown as 'http://www.tamos.com/privacy/' at the
bottom of the browser window.
The present server hostname (www.altavista.com) is inserted by the browser
automatically. In effect '/r?r=' will become 'http://www.altavista.com/r?r='.

The "Sponsored Listings" are linked in a strange way. The user gets passed
back to Altavista, which passes her to www.goto.com, which in turn resolves
the xargs string and redirects the user to the URL she intended to visit.
Example:
	<a href="http://jump.altavista.com/av4/spon_res.go?http://www.goto.com/
	d/sr?xargs=00u3hs9yoaT3UOSyDCPRCsDQphWW1BdK6Dwr1RWAxJoFW8TUO94mtkyhbyZv
	On3jYeS0Y5eWgr7buO22jCDgUStPdO6bDYghiVDIyZ4IrCm9OYgUEje7nyQoog3y+sfAyxN
	I5lNf4O8+YtfJ/V/HaphJ9EfAZcFC5DZLh+n4xdBxTaUNtJUDnLfYwnfHP/DfeMkrIAf6EF
	9D8="class=goto>Siegesoft Protects Your Net Privacy</a>
It is unclear if the xargs string is only a key used for looking up the URL
in a database or if it contains the URL itself. It may as well contain
cookie data or even the search string.
 
An image is used as the submit button. This could be used for the mentioned
probabilistic browser statistics.






2.2 www.excite.com

Uses cookies:   Yes.
Cookie Example: UID=E37F3193E55CE9D0
Cookie Expiry:	Expires=Fri, 21-May-2004 02:25:38 GMT
Cookie Example:	registered=no
Cookie Expiry:  Expires=Fri, 21-May-2004 02:25:38 GMT

Search hits are linked using redirects.
	<a href=/r/sr=webresult|ss=%22privacy+on+the+net%22|id=39410229;
	pos=1;http://www.privacy.org/ onMouseOver="window.status=('http:
	//www.privacy.org/');return true" onMouseOut="window.status=('');
	return true">Privacy Site</a>
Just as with Altavista.com, the real URL is hidden from the user.
The id and pos parameters seem not to be related to tracking users.
The id is different for each search hit, it is probably some database key. 
The pos field contains the position of the link on the page (first, second,
third, .. ).

The search form is quite strange. The query is submitted using a redirect:
	<form method=get name=searchbox
	action="/r/co=d_nb_mesp_search;http://search.excite.com/search.gw">
After submitting the form the user will be redirected ("302 Moved Temporarily")
to the search engine itself, at 
http://search.excite.com/search.gw?c=web&search=%22privacy+on+the+net%22.

Yet another "ping".. but this is the only image from ad.preferences.com
which gets loaded on this page, so there is no obvious way how time delta
values for approximating the speed of the internet connection of the user
could be calculated.
	<img src=http://ad.preferences.com/ping;spacedesc=TCExciteData_
	Excite_1x1_RunOfSite_Any&event=KeywordResults&time=2001.03.05.07.11.45>






2.4 www.google.com

Uses cookies:   Yes.
Cookie Example: PREF=ID=246f221b2b735fd7:TM=984763048:LM=984763048
Cookie Expiry:  expires=Sun, 17-Jan-2038 19:14:07 GMT

On Google search hits are usually linked directly, without using any redirects.
Usually means 'not always'.. Once a search results page on google
had all search hits linked like this, using redirects:
	<A HREF=/url?sa=U&start=3&q=http://www.rewi.hu-berlin.de/Datenschutz/
	andere.html&e=747 >Datenschutz-Informationen - Verweise</A>
The same query was retried a few minutes later, yielding the expected
direct links again:
	<A HREF=http://www.rewi.hu-berlin.de/Datenschutz/andere.html>
	Datenschutz-Informationen - Verweise</A>
It is unknown what this behavior depends on, maybe this is done randomly
for doing statistics.

There are no information leaks to third party sites occuring without user
interaction. No JavaScript code is used.






2.5 www.lycos.com

Annotation:
Only www.de.lycos.de was tested since all requests to www.lycos.com from
german domains are automatically redirected to www.de.lycos.de.

Uses cookies:	No

Nothing serious besides the fact that other commercial sites are
linked to using redirects such as <a href="/cgi-bin/nph-bounce?...>,
but this was not part of the analysis.






2.6 www.hotbot.com

www.hotbot.com has started redirecting german users to www.hotbot.lycos.de
recently. As the domain implies, www.hotbot.lycos.de is powered by lycos.
Search hits are linked directly, the search query string is not leaked.
The search submit button is an image (probabilistic browser statistics may be
possible).
Uses cookies:   Yes
Cookie Example:	referer=set
Cookie Expiry:	expires=Fri, 22-Mar-02 13:24:27 GMT
Cookie Example:	PARTNER=HOTBOT
Cookie Expiry:	-

The 'original' hotbot to be found at www.hotbot.lycos.com is a lot different
from hotbot.lycos.de:

www.hotbot.lycos.com:
Uses cookies:   Yes.
Cookie Example: lubid=010000508B2E0AA20E2C3AB9FE4F0001D64300000000
Cookie Expiry:  expires=Mon, 18-Jan-2038 08:00:00 GMT
Cookie Example:	p_uniqid=BNY/wF9mYBkF
Cookie Expiry:	expires=Fri, 21-Dec-2012 08:00:00 GMT
Cookie Example:	HB%5FSESSION=BT=lowend&BA=false&VE=4%2E7&PL=Win98%2C+I
		&MI=7&BR=Netscape&MA=4&BC=1
Cookie Expiry:	-

hotbot.lycos.com uses an image as submit button, just like Altavista.
This could be used for the mentioned probabilistic browser statistics.

The query string is leaked to doubleclick.net without any user interaction.
In this case this is achieved by loading an image from doubleclick.net, again
passing (among others) the query string as parameter.
	<img src='http://ln.doubleclick.net/ad/hb.ln/r;kw=%22privacy+on+the+
	net%22;h=net;ratio=1_5;sz=468x60;!category=adult;pos=1;tile=1;ord=
	7639673?'>

Search hit URLs are not linked directly but via local redirectors.
Note that the userid also contained in the p_uniqid field of one of the
cookies is passed to the redirector.
	<a href="/director.asp?target=http%3A%2F%2Fwww%2Etamos%2Ecom%2F
	privacy%2F&id=1&userid=BNY%2FwF9mYBkF&q=MT=%22privacy+on+the+net%22
	&rsource=DH">Privacy on the Net: Practical Issues</a>






2.7 www.webcrawler.com

Uses cookies:   Yes.
Cookie Example: UID=1D3B21C53AA3AC16
Cookie Expiry:  expires=Wednesday, 31-Dec-2010 12:00:00 GMT

Webcrawler loads JavaScript from ad.preferences.com.
	<SCRIPT SRC="http://ad.preferences.com/jscript;spacedesc=Targeting
	Network_ExciteNet_468x60_RunofSite_Any&ML_NIF=Y&ML_CUSTOM=600&time=
	2001.03.05.23.12.00&TG_NETLOC=wsr/default&TG_COBRAND=&ex_reg=">
	</SCRIPT>
The script looks like this:
	var d= new Date();
	var tzOffset= 0 - d.getTimezoneOffset();
	[.. banner advertisements removed ..]
	document.write('<img src="http://ad.preferences.com/data;spacedesc=
	targetingnetwork_excitenet_468x60_runofsite_any&ML_TZOFF=' + tzOffset
	+ '&time=2001.03.24.04.12.00" ');
	document.write(' width=1 height=1 border=0>');
(Annotation: The times seen in the script loading code and the script code
itself differ because the script was loaded one day later manually)
The script displays some banner advertisements and transmits the time zone
offset to ad.preferences.com. This happens without user interaction by
loading an image from ad.preferences.com. The attempt to load the image
will yield a "302 Moved Temporarily" redirect to
http://208.178.169.7:80/mluniversal-1x1.gif, which is a white 1x1 pixel image
(a so called "web bug").


.. and again this strange PING to ad.preferences.com:
	<IMG SRC="http://ad.preferences.com/ping.ng;spacedesc=TCExcite
	Data_WebCrawler_1x1_RunOfSite_Any&event=KeywordResults&time=2001.
	03.05.10.12.05">







2.8 www.yahoo.com

Uses cookies:   Yes.
search.yahoo.com:
	Cookie Example: B=bc2ostktaa2qv&b=2&f=s
	Cookie Expiry:  expires=Thu, 15 Apr 2010 20:00:00 GMT
pa.yahoo.com:
	Cookie Example:	B=bc2ostktaa2qv&b=2&f=s
	Cookie Expiry:	expires=Thu, 15 Apr 2010 20:00:00 GMT
	Cookie Example:	PA=p1=BSFglQ--&e=ylRp6A
	Cookie Expiry:	expires=Tue, 06 Mar 2001 17:08:02 GMT
help.yahoo.com:
	Cookie Example:	B=17d6ccctaa36a&b=2
	Cookie Expiry:	expires=Thu, 15 Apr 2010 20:00:00 GMT
google.yahoo.com:
	Cookie Example:	B=8fdft7gtaa38m&b=2&f=f
	Cookie Expiry:	expires=Thu, 15 Apr 2010 20:00:00 GMT

'Search site' matches are not linked directly but redirected:
	<a href="http://srd.yahoo.com/srst/14982516/%22privacy+on+the+net
	%22/2/2/*http://www.Privacy.net/analyze/">Privacy Analysis of your
	Internet Connection</a>

'Search web page' matches are not linked directly but redirected, too:
	<A HREF="http://srd.yahoo.com/goo/%22privacy+on+the+net%22/4/*
	http://www.eserver.org/internet/censorship.html">Internet: Censorship
	and Privacy on the Net</A>







2.9 www.metacrawler.com

Uses cookies:   Yes.
Cookie Example:	p_go2id=L7n_UYTJClmzivNU_yk9FA
Cookie Expiry:	expires=Thursday, 31-Sep-37 12:47:15 GMT
Cookie Example:	s_go2id=L7n_UYTJClmzivNU_yk9FA
Cookie Expiry:	-

What is strange about the cookies is that those two cookies get set not only
by www.metacrawler.com but also by swizzle.go2net.com, just differing
in the 'domain' field. But where does go2net.com know the p_go2id and
s_go2id from? It is not passed over to go2net in any way using image tags
or similar tricks.

A closer look revealed the following:
Calling http://www.metacrawler.com we get an "302 Moved Temporarily" with new
location http://swizzle.go2net.com/cgi-bin/swizzle?origin=/index.html
&server=www.metacrawler.com, which sets our cookie (L7n_UYTJClmzivNU_yk9FA)
and shares it with metacrawler by 302-passing us back to
http://www.metacrawler.com/go2swizzle?go2id=L7n_UYTJClmzivNU_yk9FA
&origin=/index.html. At this addresse we get 302-ed to 
http://www.metacrawler.com/go2swizzle2?origin=/index.html, which in turn detects
that our browser doesn not like cookies and passes us on to
http://www.metacrawler.com/index.html?nocookie, which is the main page.
It was not checked what happens if cookies are enabled.
An interesting and not easily detected way for sharing user information.

Metacrawler leaks the query string to blink.com.
The following JavaScript was found on on the query results page:
    window.open("http://www.blink.com/add?partnerID=6021739" +
        "&entry.EntryTrack.vendor=metacrawler&entry.EntryTrack.advert=
	bookmarkthis&url="
        + escape(location.href) + "&title="
        + escape(document.title),"Blink",
        "height=230,width=425,screenX=100,screenY=100,resizable");
This poses a problem: document.title is the title of the query results page
and this title contains the complete query string. Yet another elegant way
not easily to be circumvented to pass the query string to other sites.

Search hits are usually linked directly, sometimes via external redirectors
located at other search engines.

What is also interesting is that among the regular search hits there are
sponsored links not easily distinguished from the regular hits. The sponsored
links are linked via redirects, passing a parameter 'xargs' the content
of which is yet unknown (see above).
	<a href="http://click.go2net.com/adclick?clickurl=http%3A%2F%2F
	www%2Egoto%2Ecom%2Fd%2Fsr%3Fxargs%3D00u3hs9yoaT2UOSyDCPRDdDYN0h
	QPougxCNWwE8akUVNb0qniTp3uJtJMLS8kyZjfvYm1ZhA0CcA60%252BTddfyye
	uAguge7dYM7YFf7Z28%252B6TcI7dp2aHG2Pg63VdAcGmX8fjJFKLgCq0piLSly
	DQTPHynlKdWVYI2XBbObD6Nq78%252BTkPt3jqOf6l0EcUvKzCGDnFqgtcqBFAC
	YU%252F2Y86V9EOOo%253D&cid=00026fa58f021f9800000000
	&area=results.goto.picks&site=MC&shape=textlink">
	ActivePrivacy - Protection Software</a>

Other redirects look like this.
	<a href="http://navigation.realnames.com/resolver.dll?action=
	redirection&amp;uid=978437:1&amp;realname=Don+Ray%27s+104+Privacy+
	Tips&amp;charset=iso-8859-1&amp;locale=en-US&amp;srcq=privacy+on+
	the+net&amp;providerid=154">Don Ray&#39;s 104 Privacy Tips</a>
This one will leak the query string (and some uid) to realnames.com.

(RealNames states on its page: "RealNames Keywords are a better kind of Web
address that improves the internet experience." RealName Keywords can be used
in "Keyword-enabled environments" such as MSIE 4.x or later, MSN, Neoplanet
browsers, Altavista, iWon, LookSmart etc.
It is interesting that RealNames considers IP addresses to be anonymous (see
their privacy page). This is certainly not the case and will not be the case
any more for sure with IPv6.)






2.10 www.looksmart.com

Uses cookies:   Yes.
Cookie Example: LookSmartPIN=010320x6c4daa0f33e351b9a21
Cookie Expiry:  expires=Fri, 18 Mar 2011 16:55:42 GMT

The Looksmart people had some 'nice' idea. Even those who disable cookies
completely can be tracked by using 'html-embedded cookies' to pin the user.
Those pins are passed across serval pages showing results of the same query.
	<a href="/r_search?l&pin=010320x6c4daa0f33e351b9a21&key=%22privacy
	+on+the+net%22&skip=10&se=0,27,0&search=us302562;local_US">
	Next 10</a>
The pins are even passed across seperate querys:
	<form action=/r_search method=get>
	<input type=hidden name=look value=>
	<input type=hidden name=pin value=010320x6c4daa0f33e351b9a21>

No direct links are used but local redirects:
	<a href=/cgi-bin/go/t=LSSites:1-10-1-US;g=strak;ref=1/
	http://www.tamos.com/privacy/>Privacy on the Net</a>







2.11 www.directhit.com

Uses cookies:   Yes.
Cookie Example: ASPSESSIONIDQQGGQBDU=GNDOLBGDPAJIOJDFIGOFMAHK
Cookie Expiry:	-

The query string is leaked to doubleclick.net without any user interaction.
As seen before this happens by passing the query string as a paramater
along with some image request.
	<img src="http://ad.doubleclick.net/ad/results.directhit.aj.com/;
	kw=%26quot%3Bprivacy+on+the+net%26quot%3B;tile=1;sz=468x60;ord=
	73133?">

No direct links are used, only redirects.
	<a href="http://askdh.directhit.com/fcgi-bin/RedirURL.fcg?url=
	http%3A%2F%2Fwww%2Etamos%2Ecom%2Fprivacy%2F&qry=privacy+on+the+net
	&rank=1&src=DH_SRCH_POP"><b>Privacy on the Net: Practical Issues</b>
	</a>







2.12 www.goto.com

Uses cookies:   Yes.
Cookie Example: sessionid=NCWX11AAAHZV1QFIEOQAPUQ
Cookie Expiry:	-
Cookie Example:	UserID=BAACAD96692190F7
Cookie Expiry:	expires=Fri, 18-Mar-2011 17:27:42 GMT

The query string is leaked to doubleclick.net:
	<IMG SRC="http://ad.doubleclick.net/ad/www.goto.com/;abr=!ie;kw=privacy
	+on+the+net;ord=7852">

JavaScript is loaded from doubleclick.net. This leaks the query string:
	<SCRIPT language="JavaScript1.1" SRC="http://ad.doubleclick.net/
	adj/www.goto.com/;abr=!ie;kw=privacy+on+the+net;ord=7852"></SCRIPT>

Similar to looksmart.com the user gets 'pinned' with a session ID
(which is also set as a regular cookie, see above):
	<a href=/d/about/howto/usht_search.jhtml;$sessionid$
	NCWX11AAAHZV1QFIEOQAPUQ>Search Tips</a>

The pin is also carried on to "More results" pages:
	<a href=/d/search/;$sessionid$NCWX11AAAHZV1QFIEOQAPUQ?Keywords=
	%22privacy+on+the+net%22&view=2+38+2&did=>more results</a>

..and it is even carried on across several distinct querys:
	<form method=GET action=/d/search/;$sessionid$NCWX11AAAHZV1QFIEOQAPUQ
	name=Search target="_top"><input type=hidden name=type value=bottombar>
	<input type=text value="&quot;privacy on the net&quot;" name=Keywords
	size=12><input type=image src=http://a840.g.akamai.net/7/840/614/af67
	bb5c566a46/www.GoTo.com/images/shared/ar2.gif></form>

There are no direct links to the search hits. Instead the obfuscated mechanism
mentioned before is used for redirection. Note that the mentioned 'pin'
(sessionid) is transmitted as well:
	<a href=/d/sr;$sessionid$NCWX11AAAHZV1QFIEOQAPUQ?xargs=00u3hs9yoahS
	umGpxqaqgGlCBEiyNTM1PT84tK04pAYoGDYAyNNhAYGTs4GBmaWLo7OTqZOhs7rEAEM
	mtbaWxoapKYYmRmZJaumJyfl2hXUFtZlJiWqVyfnV2nkZINqgt0ldhOjpYGJqYuRqZG
	BiamLkYWpxBRYgAMXvYFNA%3D%3D>Siegesoft Protects Your Net Privacy</a>







2.13 www.infoseek.com, www.go.com

These are now one site (infoseek is now go.com).
A few weeks ago they still used Cookies and lots of horribly obscured
JavaScript code which transmitted referer, browser, OS, screen
resolution/color depth, plug-ins, language settings, cookie preferences,
search engine keywords, JS enablement, number of visits, paths taken and time
spent on sites and pages (quoted from http://websidestory.com/privacy)
to stats.hitbox.com, another domain of websidestory.com.

Now this site is powered by www.goto.com.
Cookies are from www.goto.com.

Uses cookies:	Yes.
Cookie Example:	sessionid=NPCHJNIAAJ4HFQFIEOSAPUQ
Cookie Expiry:	-
Cookie Example:	UserID=ABCF5D2AF79E36E9
Cookie Expiry:	expires=Fri, 18-Mar-2011 17:02:02 GMT

The user gets 'pinned' with a session ID which is not only present as a cookie
but passed around between the different pages of a site.
Further querys will carry the same session id:
	<form method=get action="/d/search/p/go/;
	$sessionid$NPCHJNIAAJ4HFQFIEOSAPUQ" name=search>

No direct links but redirectors are used. The session-ID is passed along
and the URL is passed as an obfuscated xargs string which may well contain
further information.
	<a href="/d/sr;$sessionid$NPCHJNIAAJ4HFQFIEOSAPUQ?xargs=
	00u3hs9yoahSumGpxqaqgGlCBEiyNTM1PT84tK04pAYoGDYAyNNhA5GZ
	mZuLhYWZiaO5k4uRuZrEMEMmtbaWxoapKYYmRmZJaumJyfl2hXUFtZlJ
	iWqVyfnV2nkZINqgt0ldhOjpYGJqYuRqZGBiamLkYWpxAaqmEZB6qqCg
	AB3ygZNQ%3D%3D">Siegesoft Protects Your Net Privacy</a>

The 'pin' is also carried to the following "More Results" pages.
	<a href="/d/search/p/go/;$sessionid$NPCHJNIAAJ4HFQFIEOSAPUQ
	?Keywords=%22privacy+on+the+net%22&view=2+13+2&goAdultStatus
	=null">More Results</a>







2.14 www.mamma.com

Uses cookies:	No.

The Mamma search engine leaks the query string to doubleclick.net without
any user interaction.
	<IMG SRC="http://ad.doubleclick.net/ad/mamma.dart/;abr=!ie;kw=
	privacy+on+the+net;sz=468x60;ord=523442253?">

The query string is leaked to admonitor.net as well. Also JavaScript is
loaded from that location:
	<SCRIPT LANGUAGE="JavaScript" SRC="http://ads.admonitor.net/
	adengine.cgi?F1915|1203|2|multi|C|privacy+on+the+net||">
	</SCRIPT>

mamma.com also pins its users.
This 'cookie' is carried across all search result pages of one single query.
This is the link which carries to the next search results:
	<a href=http://mamma49.mamma.com/Mamma?cookie=983898401-
	BDMPWMDDGIZAGGEWTRLZ&query=privacy+on+the+net&qtype=0&rpp=15&index=16>
	Next</a>

No direct links are set to the search hits, usually several layers of
redirects are used. For example the user may be passed back to
mamma.com, then on to yahoo which in turn will pass her to the page she
wanted visit in the first place:
	<a href="http://mamma49.mamma.com/Search?eng=Yahoo&cb=Mamma&dest=
	http%3A%2F%2Fsrd.yahoo.com%2Fsrst%2F21562988%2Fprivacy%2Bon%2Bthe%2
	Bnet%2F1%2F8%2F%2Ahttp%3A%2F%2FPrivacy.net&engid=1&af=0&qtype=0&qw=
	privacy+on+the+net&idx=0">Privacy.net</a>







3. Summary


3.1 Problems and remedial measures

Several problems were seen while testing the search engines.
They can be classified as follows:

Problem:	IP leakage
Impact:		If the user has a static IP address she can be identified and
		traced across several sessions; this is the case with all
		webpages although external pages (advertisers) have no business
		knowing the user's IP address
Solution:	Use proxys or anonymizers

Problem:	Cookies
Impact:		Tracing the user across several pages and over several sessions
		is possible
Solution:	Disable cookies or use filtering proxy

Problem:	The HTTP header contains plenty of information such as
		language, OS and browser version
Impact:		Those informations will be leaked to the servers
Solution:	Use filtering proxy

Problem:	The query string is part of the URL of the search results page;
		the query string will be sent to other servers using
		the referer. This has been a problem with all tested search
		engines. Using redirected links may (or may not) be an attempt
		of the search engines to prevent this, though redirects
		are another problem by themselves (see below).
Impact:		External sites jumped to from the search engine page
		will know the query string used for finding them
Solution:	Use filtering proxy

Problem:	html-embedded cookies / user pinning
Impact:		Tracing the user across several pages (but not over several
		distinct sessions) is possible
Solution:	None yet. Maybe content-filtering? This will become tricky
		and will probably not be failsafe. Another possibility is to
		not allow hidden fields in html-forms, thus all data to be
		sent can be seen by the user. Of course this will cause
		a lot of sites not to work any more.

Problem:	Leakage of query strings to other sites by passing them as
		parameters to external servers
Impact:		External sites get to know the query string.
Solution:	No definite solution.
		It seems the sites which the query strings are leaked to
		are only few, prominent among them doubleclick.net.
		By blocking those using a filtering proxy information
		leakage can be prevented.

Problem:	JavaScript, possibly loaded from other sites;
		JavaScript is far too powerful and will enable the server
		to obtain information such as the local IP address (even if
		using a proxy!), local configuration information etc.. 
Impact:		Tracing of users, leakage of information such as OS, browser
		version, screen resolution, used plugins...
Solution:	Disable JavaScript.

Problem:	redirected links
Impact:		The server will know which of the links presented the user
		chooses to follow.
Solution:	Do not use servers using redirected links or use a local
		redirector script which will strip the redirection before
		passing the URL to the webserver. Alternatively content-
		rewriting may be used. The author of this paper developed
		a patched version of the Internet JunkBuster filtering proxy
		which does this [2].

Problem:	Sharing identifiers by using 302-redirects.
Impact:		This can be used to share cookies or other user identifiers
		in a way very diffcult to detect for average users.
Solution:	Difficult. In the case presented in this paper disabling
		cookies will suffice, but this concept may be extended
		to do other mischief not thought of yet.

Problem:	The x/y field is treated differently by some browsers
		(notably lynx); see Introduction
Impact:		Lynx users can be detected quite reliably
Solution:	Patch lynx

doubleclick.net is quite prominent concerning excessive statistics
and data leaks, this has been recognized by other people before [1][5].
Due to this there were a lot of discussions about doubleclick during
the last two years [6][7][8]. Some people suggest blocking doubleclick.net
completely using a filtering proxy. As a welcome side effect this will
block a lot of advertising banners.




3.2 So which search engines should be used?

The search engines to be preferred by paranoid people are probably
www.lycos.com and www.google.com. As already mentioned you might have to
take care with google as they seem to use redirects sometimes.




3.3 Suggestions to the search engine maintainers

It would be nice if search engines would finally decide not to use`
"GET" but "POST" for their search forms. Contrary to regular GET-
submits, POST submits do not show up neither in the URL of the search results
page (thus the query string leakage via referers would not occur any more)
nor are they logged by most proxys. POST submits do have some disadvantages
if combined with redirects [3] but this is a special case and is probably
not relevant for most applications.




3.4 Suggested features for future browser versions

There are still a lot of helpful features missing in most browsers:

"Suppress Referer" config option;
	  If choosing that option referers would not be
	  included with page requests any more. Optionally this may
	  be overridden by using a right-click popup menu for single
	  links as some sites check referers.

Pre-submit header/URL analysis;
	  a popup function similar to "View Source Code" could be added
	  so the request (including headers!) which usually would be
	  submitted by clicking the link/form submit button can be displayed
	  in advance without actually submitting the data.

Link locations (e.g. with Netscape, displayed at the bottom) not
	  only for links but also for forms

"Ask before following redirects" config option;
	  this will make redirections visible to the user. Judging by 
	  RFC 2616 [4] additional verification of redirects of any kind is
	  never against the specification, i.e. it should be possible to
	  have the user confirm all 3xx redirects.
	  This gives the user the chance to detect and prevent mentioned
	  "302 information sharing" attempts.
	  This is already possible with Opera. Depending on the
	  configuration it will display a page with the link to the new
	  location without automatically forwarding the user to that page.

Without doubt the mentioned features are of no or low use for the average
Internet user but they may be of great value to privacy concerned users
for testing purposes. Of course on the other hand it is questionable whether
a software package of extended size (which most browsers are today) should
be trusted at all. It is definately more secure and reliable to use small
external 'dumb' programs for testing purposes.




References:
-----------

[1] Chris Brenton on the 'Firewalls' mailinglist: 'Blocking DoubleClick'
    http://lists.gnac.net/firewalls/mhonarc/firewalls.199912/msg00292.html
[2] Patched InternetJunkBuster version which supports content-modifications
    http://www.franken.de/users/tentacle/progs/
[3] A.J.Flavell: 'Redirect in response to POST transaction'
    http://ppewww.ph.gla.ac.uk/~flavell/www/post-redirect.html
[4] Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach,
    P., Berners-Lee, T.: 'Hypertext Transfer Protocol -- HTTP/1.1',
    RFC 2616, June 1999
[5] Privacy & Spying News, Luc Vezina:
    'Web Bugs - Tracking your every move'
    http://www.isn.net/~deighanj/privacy-examples1.html
[6] RISKS Volume 20, Digest 45 (Thursday 17 June 1999):
    Monty Solomon: 'Trouble for DoubleClick'
[7] RISKS Volume 20, Digest 81 (Mon, 21 February 2000):
    Newsscan: 'Michigan puts Doubleclick on notice'
[8] The Dallas Morning News, Feb 10, 2000:
    Doug Bedell: 'Banner ad firm draws protests over tracking of Web surfers:
    New profiles attach names to browsing habits'
    http://www.findarticles.com/m0BJS/2000_Feb_10/59352236/p1/article.jhtml
[9] DECUS Bulletin (DECUS Muenchen e.V., Germany) Issue 85, March 2001:
    Frank Theisen: 'IBM E-Analytics: Data Mining im E-Commerce Umfeld'


Thanks to Nils Hornung, Roland Schulz, Florian Stegmeier and
Markus Ziermann for their suggestions/contributions.

[end of document]
