Search engines for audio-visual contentThis is a featured page

Legal aspects, policy implications & directions for future research


1.1. Introduction

We are currently witnessing a trend of data explosion. In June 2005, the total number of Internet sites was believed to be in the order of 64 million, with two digit annual growth rates. This data comes in a variety of formats, and content has evolved far beyond pure text description. It can be assumed that search engines, in order to cope with this increased creation of audiovisual (or multimedia) content, will increasingly become audio-visual (AV) search engines. By their nature, audio-visual search engines promise to become a key tool in the audio-visual world, as did text search in the current text-based digital environment. Clearly, AV search applications would be necessary in order to reliably index, sift through, and 'accredit' (or give relevance to) any form of audiovisual (individual or collaborative) creations. AV search moreover becomes central to predominantly audiovisual file-sharing applications. AV search also leads to innovative ways of handling digital information. For instance, pattern recognition technology will enable us to search for categories of images or film excerpts. Likewise, AV search could be used for gathering all the past voice-over-IP conversations in which a certain keyword was used. However, if these key applications are to emerge, search technology must transform rapidly in scale and type. There will be a growing need to investigate novel audio-visual search techniques built, for instance, around user behaviour. Therefore, AV search is listed as one of the top priorities of the three major US-based search engine operators - Google, Yahoo! and Microsoft. The French Quaero initiative, for the development of a top-notch AV search portal, or the German Theseus research programme on AV search, provide further evidence of the important policy dimension. This paper focuses on some legal and policy challenges for European content industries emanating from the development, marketing and use of AV search applications. As AV search engines are still in their technological infancy, drawing attention to likely future prospects and legal concerns at an early stage may contribute to improving their development. The paper will thus start with a brief overview of trends in AV search technology and market structure. The central part of this paper emphasises the legal, regulatory and policy dimension of AV search. The possibility exists that existing regulation is lagging behind technological, market and social developments: search engines may either fall between the mazes of existing legal regulation, or the application of existing law to search engines may be sub-optimal from the viewpoint of policy-makers. In order to assess the situation, a variety of EU directives and selected national laws have been screened, including: · intellectual property rights (trademarks, copyright, patents) · competition law (horizontal & vertical integration, joint dominance) · media law (transparency, oversight, media pluralism, content regulation) · e-commerce law (liability, self- and co-regulation, codes of conduct) · communications law (EU electronic communications package) · law of obligations (consumer protection, anti-spyware/spam, security defects) · criminal law (e.g. anti-terrorism) · constitutional law & fundamental rights (freedom of expression, property, privacy) A fully-fledged analysis of all those legal obligations is beyond the scope of this paper. However, bearing in mind the complete set of obligations, the paper considers a select number of laws in more detail. The search engine landscape consists of three main parts. First, there is a large number of content providers that make their content available for indexing by the search engine's crawlers. Second, there are the advertisers that provide most of the income for the search engine activity. Finally, search engines interact with users, and the relevance of their search results depends to a large extent on the user data they gather. The relation between search engines and content providers is regulated by means of copyright law. Copyright law, with its dual economic and cultural objectives is a critical policy tool in the information society because it takes into account the complex nature of information goods. It seeks to strike a delicate balance at the stage of information creation. Copyright law affects search engines in a number of different ways, and determines the ability of search engine portals to return relevant organic results.[1] Courts across the globe are increasingly called on to consider copyright issues in relation to search engines. This paper analyses some recent case law relating to copyright litigation over deep linking, provision of snippets, cache copy, thumbnail images, news gathering and other aggregation services (e.g. Google Print). The relation between search engines and advertisers is regulated by means of trademarks law. Trademarks are important for search engines. If they cannot sell keywords freely, they are not worth their market valuation. If competitors are allowed to buy ad keywords that contain registered trademarked names, then the search engine may be diverting some of the income streams away from the owners of the trademarked words toward their competitors. There has been intense litigation on this issue on both sides of the Atlantic. US courts are currently undecided but leaning towards giving leeway to search engines; EU courts, on the other hand, seem to be in favour of giving TM holders broad rights in relation to the use of their registered TM by search engines. This paper considers issues involving the use of TM terms in meta-tag for search engine optimisation, in search engine advertising auctions, and in organic results. The relation between search engines and their users depends to a large extent on data protection law. Recently, search engine providers have been confronted with a series of significant complaints regarding the logging of user data The question arose whether these practices are in compliance with existing EU data protection and data retention obligations, and more generally, whether search engine regulation is in line with the fundamental right to protection of private life. This paper considers the potential impact of data protection and privacy laws on the development of a thriving European AV search engine market. The paper includes a brief overview of the manner in which search engines profile as well as the commercial and other reasons behind these profiling activities. The paper reviews recent high profile cases in the US (COPA, AOL) and EU (WP 29 debate). It discusses the likely application of current legal regulatory obligations to search engines, and considers the response of search engines both in terms of technological change as well as proposals to amend existing legal regulation. The laws are not the same for the whole of Europe. Though they are harmonized to a certain extent, there are differences in each EU Member State. It is not the intention of this paper to address particular legal questions from the perspective of a particular jurisdiction or legal order. Instead, the analysis tackles the various questions from the higher perspective of European policy. The aim is to inform European policy in regard to AV search through legal analysis, and to investigate how specific laws could be viable tools in achieving EU policy goals. Finding the proper regulatory balance in each of these areas of regulation will play a pivotal role in fostering the creation, marketing and use of AV search engines. For instance, too strong copyright, trademark or data protection laws may hamper the development of the AV search market; it may affect the creation and availability of content, the source of income of AV search engine operators, as well as their capacity to improve and personalise search engine resuls. Conversely, laws which are unduly lenient for AV search engine operators may inhibit the creation of sufficient content, put their advertising income at risk, or instill fear of pervasive user profiling and surveillance. The paper refers each time to relevant developments in the text search engine sector, and considers to what extent the specificities of AV search warrant a different approach. Section 2 briefly describes the functioning of web search engines and highlights some of the key steps in the information retrieval process that raise copyright issues. Section 3 reviews the market context, and business rationales. Section 4 offers the main legal questions and arguments relating to copyright (relation with content providers), trademarks (relation with advertisers), and data protection (relation with users). Section 5 places these debates in the wider policy context and infers three key messages. Section 5 offers some tentative conclusions.

[1] Organic (or natural) results are not paid for by third parties, and must be distinguished from sponsored results or advertising displayed on the search engine portal. The main legal problem regarding sponsored results concern trademark law, not copyright law.


1.2. Search Engine Technology

For the purposes of this paper, the term 'web search engine' refers to a service available on the Internet that helps users find and retrieve content or information from the publicly accessible Internet.[1] The best known examples of web search engines are Google, Yahoo!, Microsoft and AOL's search engine services. Web search engines may be distinguished from search engines that retrieve information from non-publicly accessible sources. Examples of the latter include those that only retrieve information from companies' large internal proprietary databases (e.g. those that look for products in eBay or Amazon, or search for information inside Wikipedia), or search engines that retrieve information which, for some reason, cannot be accessed by web search engines.[2] Similarly, we also exclude from the definition those search engines that retrieve data from closed peer-to-peer networks or applications which are not publicly accessible and do not retrieve information from the publicly accessible Internet. Though many of the findings of this paper may be applicable to many kinds of search engines, this paper focuses exclusively on publicly accessible search engines that retrieve content from the publicly accessible web. Likewise, it is better to refer to search results as "content" or "information", rather than web pages, because a number of search engines retrieve other information than web pages. Examples include search engines for music files, digital books, software code, and other information goods.[3] In essence, a search engine is made up of three essential technical components: the crawlers or spiders, the (frequently updated) index or database of information gathered by the spiders, and the query algorithm that is the 'soul' of the search engine. This algorithm has two parts: the first part defines the matching process between the user's query and the content of the index; the second (related) part of this algorithm sorts and ranks the various hits. The process of searching can roughly be broken down into four basic information processes, or exchanges of information: a) information gathering, b) user querying, c) information provision, and d) user information access.


1.2.1. Four Basic Information Flows

1.2.1.1. Search Engines Gather and Organise Content
In the beginning of the search engines’ life cycle, web masters were encouraged to submit information directly to the search engines operators.[4] Though this is still one possible method, today's major search engines do not require any extra effort to submit information, as they are capable of finding pages via links on other sites. The web search process of gathering information is driven primarily by automated software agents called robots, spiders, or crawlers that have become central to successful search engines.[5] The agents do not actually visit the pages or content repositories. The process is not so different from what a browser does: the software agent exchanges information with the content provider.

1.2.1.2. Users Query the Search Engine: From 'Pull' to 'Push'
The second major information flow that determines search results is the series of queries the user inputs in the search box. User queries may be divided in three categories: navigational (the user wants to find specific information), informational (the user is looking for new data or facts), and transactional (the user is seeking to purchase something).[6] The query is usually made of a couple of keywords. A number of new search engines are being developed at the moment that propose query formulation in full sentences,[7] or in audio, video, picture format. Most search engines start recording (or logging) the user information in order to offer better search results. One trend is, for instance, the provision of increasingly personalized search results, tailored to the particular profile and search history of each individual user.[8] Another major trend is the development by search engines of information gathering services regarding news, and other types of information. At the intersection of these trends lies the development of proactive search engines that crawl the web and ‘pushes’ information towards the user according to this user’s search history and profile.

1.2.1.3. Search Engines Return Results
The third information flow is the provision of relevant search engine results by the search engine to its user. This is often an iterative process in the sense that the user may want to refine his or her query according to the results that are returned by the search engine. Better search engines provide more relevant results without the user’s need to insert too many queries. The key here for search engines is to determine relevance of specific content for a given query. In the past, search engines relied uniquely on the text of web sites. Over time, however, search engines have become more sophisticated, integrating metadata (data about the pages or content), tags, user click stream data, as well as the link structure. The latter involves information about which pages link in and out of which pages. Link structure analysis is helpful, for instance, in determining the popularity of content. Search engines thus make use of complex ranking algorithms with more than 100 factors for ranking content. Every search engine has its own recipe on the factors to evaluate the ranking of web pages. For instance, Google makes use of the well-known PageRank concept.[9] Given that every search engine uses hundreds of factors for the ranking, whose composition and weight can change continually, and because their respective algorithms are also different, results are likely to be quite distinct between competing search engines. A web page that ranks high in a particular search engine can rank lower in another search engine or even on the same search engine some days later. Because of its importance in returning relevant results and giving search engines a competitive edge, the ranking algorithms are widely considered the soul of the search engine. Generally speaking, details on their algorithms and architecture – particularly for the crawlers, indexers, and ranking – are kept behind vaulted doors as business secrets.[10] One important point that needs to be stressed here is the fact that the process is increasingly automated, with as little human intervention as possible. The process of mechanically making sense of the masses of information that is available on the Internet is now reaching a high level of sophistication. This can be seen at the stage of gathering information, user querying, and returning of relevant results.

1.2.1.4. Users obtain the Content
The line between search engines and content providers is increasingly blurred. Many providers of online services provide search engines for their own services. The same holds for sites that are aggregations of user produced content. Likewise, decentralized peer-to-peer networks use the same resources provided by users (computing power, bandwidth, storage) to retrieve and provide content to its community of users. In addition, a number of search engines provide content directly to their users. They store content on their cache, in order to make it easier for the user to retrieve the information. They archive content, enabling users to receive the information, even when the original content is no longer available. For visual information, it is now common practice for many search engines to provide thumbnails (or smaller versions) of pictures. Simply put, search engines are powerful intermediaries that determine or facilitate the connection or information exchange between content or information providers, and users. Each such connection may be detrimental to the users, content providers, or third users (be they competing content providers, regulators, or advertisers) who would rather not have such connection occur.[11]

1.2.2. Search Engine Operations and Trends


1.2.2.1. Indexing
Once the crawler has downloaded a page and stored it on the search engine's own server, a second programme, known as the indexer, extracts various bits of information regarding the page. Important factors include the words the web page or content contains, where these key words are located and the weight that may be accorded to specific words and any or all links the page contains. The index is further analysed and cross-referenced to form the runtime index that is used in the interaction with the user. A search engine index is like a big spreadsheet of the web. The index breaks the various web pages and content into segments. It stores where the words were located, what other words were near them, and analyses the use of words and their logical structure. By clicking on the links provided in the engine's search results, the user may retrieve from the server the actual version of the page. Importantly, the index is not an actual reproduction of the page or something a user would want to read.

1.2.2.2. Caching
Most of the major search engines now provide "cache" versions of the web pages that are indexed. The search engine's cache is, in fact, more like a temporary archive. Search engines routinely store for a long period of time, a copy of the content on their server. When clicking on the "cache version", the user retrieves the page as it looked the last time the search engine's crawler visited the page in question. This may be useful for the user if the server is down and the page is temporarily unavailable, or if the user intends to find out what were the latest amendments to the web page.

1.2.2.3. Robot Exclusion Protocols
Before embarking on legal considerations, it is worth recalling the regulatory effects of technology or code. Technology or 'code' plays a key role in creating contract-like agreements between content providers and search engines. For instance, since 1994 the robot exclusion standard has allowed newspapers to prevent search engine crawlers from indexing or caching certain content. Web site operators can do the same by simply making use of standardised html code. Add '/robots.txt' to the end of any site's web address and it will indicate the site's instructions for search engine crawlers. Similarly, by inserting NOARCHIVE in the code of a given page, web site operators can prevent caching. Each new search engine provides additional, more detailed ways of excluding content from its index and/or cache. These methods are now increasingly fine-grained, allowing particular pages, directories, entire sites, or cached copies to be removed.[12] Standardising bodies are currently working on implementing standardised ways to go beyond the current binary options (e.g. to index or not to index). Right now content providers may opt-in or opt-out, and robot exclusion protocols also work for keeping out images, specific pages (as opposed to entire web sites), but many of the intermediate solutions are technologically harder to achieve. Automated Content Access Protocol (ACAP) is a standardized way of describing some of the more fine-grained intermediate permissions, which can be applied to web sites so that they can be decoded by the crawler. ACAP might – for instance – indicate that text can be copied, but not the pictures. Or it could say that pictures can be taken on condition that photographer's name also appears. Demanding payment for indexing might also be part of the protocol.[13] This way, technology could enable copyright holders to determine the conditions in which their content can be indexed, cached, or even presented to the user.

1.2.2.4. From Text Snippets & Image Thumbnails to News Portals
Common user queries follow a 'pull'-type scheme. The search engines react to keywords introduced by the user and then submit potentially relevant content.[14] Current search engines return a series of text snippets of the source pages enabling the user to select among the proposed list of hits. For visual information, it is equally common practice to provide thumbnails (or smaller versions) of pictures. However, search engines are changing from a reactive to a more proactive mode. One trend is to provide more personalized search results, tailored to the particular profile and search history of each individual user.[15] To offer more specialized results, search engines need to record (or log) the user's information. Another major trend is news syndication, whereby search engines collect, filter and package news, and other types of information. At the intersection of these trends lies the development of proactive search engines that crawl the web and ‘push’ information towards the user, according to this user’s search history and profile.

1.2.2.5. Audio-visual search
Current search engines are predominantly text-based. They gather, index, match and rank content by means of text and textual tags. Non-textual content like image, audio, and video files are ranked according to text tags that are associated with them. While text-based search is efficient for text-only files, this technology and methodology for retrieving digital information has important disadvantages when it is faced with other formats than text. For instance, images that are very relevant for the subject of enquiry will not be listed by the search engine if the file is not accompanied with the relevant tags or textual clues. Although a video may contain a red mountain, the search engine will not retrieve this video when a user inserts the words "red mountain" in his search box. The same is true for any other information that is produced in formats other than text. In other words, a lot of relevant information is systematically left out of the search engine rankings, and is inaccessible to the user. This in turn affects the production of all sorts of new information.[16] There is thus a huge gap in our information retrieval process. This gap is growing with the amount of non-textual information that is being produced at the moment. Researchers across the globe are currently seeking to bridge the gap. One strand of technological developments could provide a solution on the basis of text formats by, for instance, developing intelligent software that automatically tags audio-visual content.[17] Truveo is an example of this for video,[18] and SingingFish for audio content.[19] Another possibility is to create a system that tags pictures using a combination of computer vision and user-inputs.[20] AV search often refers specifically to new techniques better known as content-based retrieval. These search engines retrieve audio-visual content relying mainly on pattern or speech recognition technology to find similar patterns across different pictures or audio files.[21] These pattern or speech recognition techniques make it possible to consider the characteristics of the image itself (for example, its shape and colour), or of the audio content. In the future, such search engines would be able to retrieve and recognise the words "red mountain" in a song, or determine whether a picture or video file contains a "red mountain," despite the fact that no textual tag attached to the files indicate this. This sector is currently thriving. Examples of such beta versions are starting to reach the headlines, both for visual and audio information. Tiltomo[22] and Riya[23] provide state-of-the-art content-based image retrieval tools that retrieve matches from their indexes based on the colours and shapes of the query picture. Pixsy[24] collects visual content from thousands of providers across the web and makes these pictures and videos searchable on the basis of their visual characteristics. Using sophisticated speech recognition technology to create a spoken word index, TVEyes[25] and Audioclipping[26] allow users to search radio, podcasts, and TV programmes by keyword.[27] Blinkx[28] and Podzinger[29] use visual analysis and speech recognition to better index rich media content in audio as well as video format. The most likely scenario, however, is a convergence and combination of text-based search and search technology that also indexes audio and visual information.[30] For instance, Pixlogic[31] offers the ability to search not only metadata of a given image but also portions of an image that may be used as a search query. Two preliminary conclusions may be drawn with respect to AV search. First, the deployment of AV search technology is likely to reinforce the trends discussed above. Given that the provision of relevant results in AV search is more complex than in text-based search, it is self-evident that these will need to rely even more on user information to retrieve pertinent results. As a consequence, it seems likely that we will witness an increasing trend towards AV content 'push', rather than merely content 'pull'. Second, the key to efficient AV search is the development of better methods for producing accurate meta-data that describe the AV content. This makes it possible for search engines to organise the AV content optimally (e.g. in the run-time index) for efficient retrieval. One important factor in this regard is the ability of search engines to have access to a wide number of AV content sources on which to test their methods. Another major factor is the degree of competition in the market for the production of better meta-data for AV content. Both these factors (access to content, market entry) are intimately connected with copyright law.

[1] See for a similar definition, James Grimmelmann, The Structure of Search Engine Law (draft), October 13, 2006, p.3, at http://works.bepress.com/james_grimmelmann/13/.
[2] Part of the publicly accessible web cannot be detected by web search engines, because the search engines’ automated programmes that index the web, crawlers or spiders, cannot access them due to the dynamic nature of the link, or because the information is protected by security measures. Although search engine technology is improving with time, the number of web pages increases drastically too, rendering it unlikely that the 'invisible' or 'deep' web will disappear in the near future. As of March 2007, the web is believed to contain 15 to 30 billion pages (not sites), of which one fourth to one fifth is estimated to accessible by search engines. See and compare www.pandia.com/sew/383-web-size.html and http://technology.guardian.co.uk/online/story/0,,547140,00.html.
[3] Search engines might soon be available for locating objects in the real world. See John Battelle, The Search: How Google and its rivals rewrote the rules of business and transformed our culture (2005), p 176. See James Grimmelmann, supra.
[4] It is acknowledged that Google and Yahoo still offer submission programs, while some search engines, including Yahoo!, even operate paid submission services that assure the inclusion into the database, but do not secure any specific ranking within the search results. But these practices are now no longer mainstream.
[5] There are of course alternatives on the market, such as the open directory project whereby the web is catalogued by humans, or search engines that tap into the wisdom of crowds to deliver relevant information to their users, such as Wiki Search, the wikipedia search engine initiative (http://search.wikia.com/wiki/Search_Wikia), or ChaCha (http://www.chacha.com/). See Wade Roush, New Search Tool Uses Human Guides, Technology Review, February 2, 2007, at http://www.techreview.com/Infotech/18132.
[6] See Andrei Broder, A Taxonomy of Web Search, 36 ACM SIGIR Forum, no.2 (2002), at http://www.acm.org/sigs/sigir/forum/F2002/broder.pdf.
[7] See Stefanie Olson, Spying an Intelligent Search Engine, ZDNet, August 21, 2006, at http://www.zdnet.com.au/news/communications/soa/Spying_an_intelligent_search_engine/0,130061791,139267128,00.htm
[8] See Your Google Search Results Are Personalised, http://www.seroundtable.com/archives/007384.html. See also Kate Greene, A More Personalized Internet?, Technology Review, February 14, 2007, at http://www.technologyreview.com/Infotech/18185/. This raises intricate data protection issues. See Boris Rotenberg, Towards Personalised Search: EU Data Protection Law and its Implications for Media Pluralism. In Machill, M.; M. Beiler (eds.): Die Macht der Suchmaschinen / The Power of Search Engines. Cologne [Herbert von Halem] 2007, forthcoming. Profiling will become an increasingly important way for identification of individuals. It will raise concerns in terms of privacy and data protection. This interesting topic is however outside the scope of this paper (information can be found elsewhere. See Clements, B, Maghiros I, Beslay L, Centeno C, Punie Y, Rodriguez C, Masera M, "Security and privacy for the citizen in the Post-September 11 digital age: A prospective overview" 2003, EUR 20823 available at www.jrc
[9] PageRank is an algorithm that weighs a page's importance based upon the incoming links. PageRank interprets a link from Page A to Page B as a vote for Page B by Page A. It then assesses a page's importance by the number of votes it receives. PageRank also considers the importance of each page that casts a vote, as votes from some pages are considered to have greater value, thus giving the linked page greater value. In other words, the PageRank concept values those links higher which are more likely to be reached by the random surfer.
[10] Search engines also increasingly learn from the large volumes of user data. Query histories provide valuable information by which search engines can improve the relevance of their results.
[11] See about an attempt to offer more transparency: Google Webmaster Central Adds Link Analysis Tool, February 6, 2007, at http://www.seroundtable.com/archives/007401.html.
[12] See for a detailed overview Danny Sullivan, Google releases improved Content Removal Tools, at http://searchengineland.com/070417-213813.php.
[13] See Struan Robertson, Is Google Legal?, OUT-LAW News, October 27, 2006, at http://www.out-law.com/page-7427
[14] A number of new search engines are being developed at the moment that propose query formulation in full sentences, or in audio, video, picture format.
[15] See Your Google Search Results Are Personalised, http://www.seroundtable.com/archives/007384.html. See also Kate Greene, A More Personalized Internet? Technology Review, February 14, 2007, at www.technologyreview.com/Infotech/18185/. This raises intricate data protection issues. See Boris Rotenberg, Towards Personalised Search: EU Data Protection Law and its Implications for Media Pluralism. In Machill, M.; M. Beiler (eds.): Die Macht der Suchmaschinen / The Power of Search Engines. Cologne [Herbert von Halem] 2007, pp.87-104. Profiling will become an increasingly important way for identification of individuals, raising concerns in terms of privacy and data protection. This interesting topic is however beyond of the scope of this paper (information can be found elsewhere. See Clements, B, et al., "Security and privacy for the citizen in the Post-September 11 digital age: A prospective overview" 2003, EUR 20823 available at www.jrc.es
[16] See Matt Rand, Google Video's Achilles' Heel, Forbes.com, March 10, 2006, at http://www.forbes.com/2006/03/10/google-video-search-tveyes-in_mr_bow0313_inl.html.
[17] See about this James Lee, Software Learns to Tag Photos, Technology Review, November 9, 2006, at http://www.technologyreview.com/Infotech/17772/.
[19] SingingFish was acquired by AOL in 2003, and has ceased to exist as a separate service as of 2007. See http://en.wikipedia.org/wiki/Singingfish
[21] Pattern or speech recognition technology may also provide for a cogent way to identify content, and prevent the posting of copyrighted content. See, Associated Press, MySpace launches pilot to filter copyright video clips, using system from Audible Magic, Technology Review, February 12, 2007 at http://www.technologyreview.com/read_article.aspx?id=18178&ch=infotech.
[25]http://www.tveyes.com; TVEyes powers a service called Podscope (http:// www.podscope.com) that allows users to search the content of podcasts posted on the Web.
[27] See Gary Price, Searching Television News, SearchEngineWatch, February 6, 2006, at http://searchenginewatch.com/showPage.html?page=3582981. See
[30] See Brendan Borrell, Video Searching by Sight and Script, Technology Review, October 11, 2006, at http://www.technologyreview.com/read_article.aspx?ch=specialsections&sc=personal&id=17604.

1.3. Market Developments

The technology does not operate in a vacuum. By virtue of the Internet's development, search engines have become vital players. But they can only carry out their mission through their interaction with content or information providers, advertisers, and users. This section will first consider the pivotal role of search engines in the information society. Second, it will provide a brief description of the search engine landscape – that is, the various players involved and their relation with search engines. Finally, it will concisely show how the centrality of search has led a number of players in the digital economy to adapt their business models to this new reality.


1.3.1. The Centrality of Search

Although dominated by three US-based giants (i.e. Google, Yahoo! and Microsoft), the search engine market is currently extremely active. The search engine space spans across all sorts of information. We currently witness the deployment of search engines for health, property, news, job, person, code or patent information. They will increasingly be able to sift through information coming from a wide range of information sources (including emails, blogs, chat boxes, etc.) and devices (desktop, mobile). Search engines are able to return relevant search results according to the user’s geographic location or search history. Virtually any type or sort of information, any type of digital device or platform, may be relevant for search engines. Search is also increasingly a central activity that has become the default manner to interact with the vast amounts of information that are available on the Web. For most users the search box is the entry door into the digital environment. Many queries or intentions in the user's mind, whether navigational, transactional, or informational, take the shape of a few words in the search box. Some commentators therefore consider search functionality the core to the development of the emerging application platform. That emerging platform supports server side, AJAX-based online applications that can run smoothly within a web browser.[1] This centrality is evident from the vast amounts of traffic that flows through the major search engines. Search engines are heavily used intermediaries. The search volume for January 2007 is more than 7.19 billion searches in the USA alone.[2] The volume and the market shares may vary slightly by the method the investigation has been carried out, but the ranking is clear: Google comes on top, followed by Yahoo!, MSN, AOL and Ask.[3] Web search is thus responsible for most web traffic and both Google as Yahoo! offer two digits growth rates. This growth rate is considerable and explains the high expectations of online advertisement of search engines as a promising growth market.


1.3.2. The Adapting Search Engine Landscape

The search engine landscape consists of three main parts. First, there is a large number of content providers that make their content available for indexing by the search engine's crawlers. Second, there are the advertisers that provide most of the income for the search engine activity. Finally, new players have arisen whose livelihood depends on the business model of search engines.[4] The content providers' market is in a very dynamic condition at the moment, with a number of business models competing with one another. While technology gives content providers a number of technological tools for controlling the accessing, using and sharing of content created or owned by them, the need to use of so-called Digital Rights Management (DRM)[5] tools is increasingly questioned, and currently highly contentious.[6] For instance, by January 2007 the last publisher to use DRM for audio CDs stopped doing so because the cost of implementing DRM did not measure up to the results. In the Internet music industry, an increasing amount of music is sold without DRM protection, and major players have called upon the industry to remove DRM protection.[7] Major players are also gradually discovering that giving away content "for free," may spur another type of business models that may turn out to be more profitable on the World Wide Web.[8] A number of major players are arising, for instance, in regard video sharing, such as YouTube, MySpace, or Joost and NetFlix.[9] In other words, content may well be moving from closed environment to an open environment in which being available, reachable, is of paramount importance: survival in this brave new world depends on being found by (prominent) search engines. An important normative question regarding the relation between search engines and content providers is in how far and when content providers may have control over the search engines' basic functions. The trend, however, seems to be that a content provider may prevent a search engine from indexing or caching some of the content it provides through the use of standardised automatic robots (robot exclusion protocols). Most major search engines routinely agree to respect such exclusions.[10] Some search engines have decided to go even further. Google recently introduced Sitemaps, a new tool for content providers, which aims to give websites more control over what content they do or don't want included in Google News.[11] The second type of players with which search engines interact on a daily basis are the advertisers. The predominant business model for search is advertising.[12] The leading search engines generate revenue primarily by delivering online advertisement. The importance of advertising for search engines is evident, also, from their spending. In 2006, Google was planning to spend 70% of its resources on search and advertising related topics.[13] A few years ago, advertising on search engine sites was very much like in analogue media. This included mainly banner advertising,[14] and sometimes paid placement, whereby ads were mixed with organic results.[15] But many users considered these too intrusive and not sufficiently targeted or relevant to the search or web site topic, and not taking advantage of the interactive nature of the Web. By contrast, online advertising differs from traditional advertising that traceability of results is easier. Mainstream search engines now mainly rely on two techniques. These are advertising business models that rely on actual user behaviour: pay-per-click (advertiser pays each time the user clicks on the ad) and pay-per-performance (advertiser pays each time the user purchases or prints or takes any action that shows similar interest).[16] Finally, players that depend on search engines (of which there are many) have been adapting their activities in order to take advantage of the centrality of search. Both the content and advertising markets have thus been adapting rapidly to the prominence of search engines. As regards the ranking of content or information by relevance in the organic results, a range of strategies and techniques are being employed to get links from other sites. These are called search engine optimisation (SEO), and aim at raising the relevance of certain content or web site for a given query. They include two broad categories. First there are techniques that search engines recommend as part of good design and that are considered desirable because they increase the efficiency of information retrieval and lower transaction costs. But there are also those techniques that search engines do not approve of and attempt to minimize the effect of, referred to as spam-dexing.[17] Of course, it is not always easy to draw a line between accepted and non-accepted optimisation techniques, and it is contentious to what extent search engines should be allowed or expected to fiddle with the results brought up by the sole functioning of the algorithm.[18] The above-depicted advertising techniques have also generated their own type of fraud, referred to as click fraud. Click-fraud refers to the situation in which a competitor to a given advertiser creates a program whereby the ads of the advertiser are clicked repeatedly, thereby artificially inflating the figures and the bill for the advertiser. Another type of click-fraud arises when a player registers as an affiliate and then repeatedly clicks on the ads he himself has served, thereby making profit. Since they are dependent on good content and advertising income that relies on accurate measurement of user behaviour, search engines have an interest in fighting these types of malpractices. Search engines have been engaging in a technological arms race with both content and advertising fraudsters. The paradox is thus that, while search engines have an interest in keeping the image of being transparent and objective, the algorithms that determine the ranking of both the organic results and the ads remain kept behind sealed doors. This is one of the recurrent tensions underpinning search engine policy.

1.3.3. Extending Beyond Search


Search engines affect the business model of content owners by placing targeted advertising on affiliate sites.[19] Search engines use their unique capacity to link relevant ads with relevant key-words and content. Through affiliate networks, they are seeking to reach out, extending their "tentacles" deep into the fabric of the Internet.[20] In doing so, search engines may affect media players filtering and accreditation power by taking over some of their editorial functions in terms of relevance, ranking, etc. They may affect newspapers' business model by caching content and thus diminishing their sales of archived content, or by directing traffic round their front page and thus potentially curbing their advertising income. They may affect trademark owners by directing traffic to competitors, depending on their trademark policy. This highlights the power of search engines to determine or affect the business model of the various players with which it is interaction. Some of the players have been competing head-on with the search engine to keep their share of a given market. There is such a clash between search engines and application providers, as well as between search engines and content providers. At the level of the application layer, we are currently witnessing a high degree of technological convergence: more and more creators of technology integrate search engines in their applications. Apple OSX treats search as a basic functionality of the operating system. Almost every application now on the Internet includes some sort of search functionality. But search engines also integrate new types of applications in their functionality. For instance, a number of search engines are providing open APIs with the aim of providing the next generation OS or platform with search functionality at its core. At the level of the content layer too, search engines are increasingly starting to compete with classic media services. Search engines are now populating the many applications that they themselves provide with appropriate content; good examples are Google News, Google Print, Google Earth. We confuse some of the video sharing sites with the search engines, because the major search engines are transforming rapidly into full-scale platforms that also provide content. Moreover, the distinction between classic search engines that respond to a particular user query from its cache or index, and an aggregation service that provides a collection of information online is bound to become smaller in the future with the move toward ever more personalized services. The Yahoo Pipes service is one more piece of evidence of this growing trend toward proactive search services that are tailored to the user profile.[21] At the initial of the search engines' development, we denoted a marked difference in approach between portals and search engines. Google portrayed itself as a search engine, while Yahoo! with its directory of information was considered to be more like a portal. Gradually, those two approaches have been converging with Yahoo! integrating a powerful search engine at its core, and Google providing a flurry of applications around its main search functionality, and entering the content provision market. But one of the possible consequences of this initial divide may be the fact that Yahoo! does not object so much to being considered a media player. Google, on the other hand, stresses the fact that it is merely providing a tool that facilitates access to information, all kinds of digital information. The same tensions are defining the environment within which AV search engines unfold. The same players are competing for a share of this important market. As noted previously, we denote a rising importance of AV content online. Evidence can be garnered from 2006 figures concerning the use of YouTube and MySpace for sharing and downloading videos online. 2006 was a banner year for YouTube. The video sharing site launched in February 2005 and had claimed over 40 percent of the online video market share by May 2005. By October 2005, YouTube was logging more than 100 million video downloads per day and by the end of the year had become the sixth most popular site on the Internet.[22] We see traditional content providers such as broadcasters making deals with the online video sharing sites for the provision of their content.[23] There is thus nothing more logical than to expect AV search to rise in importance with the explosion of AV content online. According to some analysts, image search, for instance, is the fastest growing search category on the Internet today.[24] This paper argues that legal regulation will help determine the extent to which AV search technology is able to fulfil its promise. The next section will briefly consider some high profile copyright cases that have arisen. It will discuss the positions of content owners and search engines on copyright issues, and provide an initial assessment of the strengths of the arguments on either side.


[1] See Stephen E. Arnold, THE GOOGLE LEGACY. HOW GOOGLE’S INTERNET SEARCH IS TRANSFORMING APPLICATION SOFTWARE, (2005); John Battelle, THE SEARCH: HOW GOOGLE AND ITS RIVALS REVROTE THE RULES OF BUSINESS AND TRANSFORMED OUR CULTURE (2005).
[2] Top Search Providers for January 2007, Nielsen/Netratings 28/02/2007, at http://ww.netratings.com/pr/pr_070228.pdf.
[3] At present, more than 60 search engines are operational, but the bulk of the searches are performed by few service providers only. Following the consultancy firm Nielsen/Netratings, the first three operators control more than eighty percent of the market. In particular, in January 2007 online searches in the US were executed by Google 49.2%, Yahoo! 23.8%, MSN 9.6%, AOL 6.3%, Ask 2.6 and all others together 8.5%. For the same month, comScore Networks sees Google sites capturing 47.5% of the U.S. search market, Yahoo! 28.1% and Microsoft 10,6%, Ask 5.4% and AOL 4.9% (see http://www.comscore.com/press/release.asp?press=1219). Variations between Nielsen/Netratings, comScore and other rating / traffic measuring service providers are a consequence of the measurement methods. However, the ranking amongst the search engines is stable. Equally interesting is that fact that the number of search queries increases annually by 30%. The major beneficiary is Google, which also increased its market share and saw its profits rocketing in 2006 by 110% to $3.07bn. See http://technology.guardian.co.uk/news/story/0,,2003373,00.html; http://business.timesonline.co.uk/article/0,,9075-2578425,00.html.
[4] Namely, these are, on the one hand, the players that offer a set of services and techniques for content providers to be ranked high in the organic results (search engine optimization), and, on the other hand, the players that fraudulently take advantage of the pay-per-click advertising model to make money.
[5] Digital rights management (DRM) tools is an umbrella term that refers to the collection of technologies used by copyright owners for protecting digital content against unwanted copying. With DRM, clients need to be authenticated to access contents. The authentication process controls the access rights clients have paid for and assures that it is delivered. Through DRM technology it is also possible to choose the level of access to the selected song, i.e. listening to the song only once, permission to save, permission to copy, to use in another media, etc. See http://en.wikipedia.org/wiki/DRM.
[6] Recently Sony Uk and Sony France have a lost a case against a consumer rights organisation because they did not inform consumers about the lack of interoperability of their products and services to other devices. See http://www.edri.org/edrigram/number5.1/drm_sonyfr (January 17, 2007). See for the judgment of December 15, 2006: http://www.tntlex.com/public/jugement_ufc_sony.pdf. A similar case is on-going against Apple's iPod in France, Germany and Norway; see Associated Press, German, French Consumer Groups Join Nordic-Led Drive Against Apple's iTunes Rules, Technology Review, January 22, 2007, at http://www.technologyreview.com/read_article.aspx?id=18098&ch=biztech, Apple DRM Illegal in Norway: Ombudsman, The Register, January 24, 2007; at http://www.theregister.co.uk/2007/01/24/apple_drm_illegal_in_norway.
[7] See Chris Nuttall, Apple Urges End to Online Copy Protection, Financial Times, February 6, 2007, at http://www.ft.com/cms/s/5469e6ea-b632-11db-9eea-0000779e2340.html.
[8] See Eric Pfanner, Internet Pushes Concept of Free Content, Herald tribune, January 17, 2007 at http://www.iht.com/articles/2007/01/17/yourmoney/media.php. See also Cory Doctorow, EMI abandons CD DRM, January 8, 2007; at http://www.boingboing.net/2007/01/08/emi_abandons_cd_drm.html
[9] See Brendan Borrell, Joost Another YouTube?, Technology Review, January 29, 2007; at http://www.techreview.com/Biztech/18111/; See The Economist Editorial, The Future of Television – What's On Next, The Economist, February 8, 2007; at http://economist.com/business/displaystory.cfm?story_id=8670279.
[10] See above, in the technology section.
[11] See Aoife White, Court to Hear Google-Newspaper Fight, CBS News, November 23, 2006, at http://www.cbsnews.com/stories/2006/11/23/ap/business/mainD8LITLI00.shtml.
[12] Another source of revenue is selling search functionality for business. The revenues from licensing are however modest relative to their income from advertising see http://investor.google.com/releases/2006Q4.html
[13] More specifically, 20% is spent on local search, Google Earth, Gmail, Google Talk, Google Video, Enterprise solutions, Book Search, Adsense, Desktop search and mobile search and the remaining 10% for Orkut, Google Suggest, Google Code, Adsense Offline, Google Movies, Google Readers, Google Pack and Wifi. See Jonathan Rosenberg, Google Analyst Meeting 2006, at http://investor.google.com/pdf/20060302_analyst_day.pdf.
[14] In this technique, the advertiser pays the search engine or platform provider each time the user sees the ad.
[15] The idea is that the bidder who values the high ranking most will pay the price for it, and as a result users will encounter information in an efficient manner.
[16] At present, pay-per-click seems to strike the best balance. On the one hand, the system provides an incentive for search engines or affiliate sites to target ads correctly; on the other hand, those advertisers who value their ranking most will be prepared to pay the most for a given set of keywords, during the online auctions. Most leading search engines provide the ads in a separate column. These ads are generated using similar algorithms as for organic search results. That is, the sponsored ad depends on the user's key words, advertiser's willingness to pay, and the popularity of this ad with other users. This selection process continuously adapts itself according to the circumstances and developments.
[17] Some industry commentators classify these methods, and the practitioners who utilize them, as either "white hat SEO", or "black hat SEO". Black hat SEO includes hiding popular keywords invisibly all over the page, or showing the search engine another page than the one shown to users. Given the importance of link structure, prominent black hat SEO now includes the creation of so-called "linkfarms" with thousands of sites and pages that point to each other, giving the sense of a community of users.
[18] See Rachel Williams, Search engine takes on web bombers, Sydney Morning Herald, SavedJanuary 31, 2007, at http://www.smh.com.au/articles/2007/01/30/1169919369737.html.
[19] Search engines were instrumental in the development of banner and pay-per-click advertising. See John Battelle, THE SEARCH: HOW GOOGLE AND ITS RIVALS REVROTE THE RULES OF BUSINESS AND TRANSFORMED OUR CULTURE (2005).
[20] These "tentacles" taken together create affiliate networks, whereby the advertiser pays for each event, while the middlemen (search engines) and the site on which the ad appears share the revenue. For instance, the largest of the advertising networks are Google's AdWords/AdSense and Yahoo! Search Marketing. Other important ad networks include media companies and technology vendors.
[22] See Clement James, BBC and YouTube discuss content deal, IT News.com.au, January 25, 2007, at http://www.itnews.com.au/newsstory.aspx?Cianid=44892; See also Gates: Internet to revolutionize TV in 5 years, C|NET News, January 27, 2007, http://news.com.com/2100-1041_3-6154009.html.
[23] See Clement James, BBC and YouTube discuss content deal, IT News.com.au, January 25, 2007, at http://www.itnews.com.au/newsstory.aspx?CIaNID=44892&r=hstory. Jane Wardell, BBC signs program deal with YouTube, Associated Press, March 2, 2007, at http://news.findlaw.com/ap/f/66/03-02-2007/a0e30018d027da47.html.
[24] Among search verticals, image search enjoyed the strongest year over year growth in February 2006,
increasing 91 percent. See Nielsen/Netratings, March 30, 2006, at http://www.nielsen-netratings.com/pr/pr_060330.pdf.

1.4. Legal aspects


1.4.1. Copyright in the Search Engine Context

Traditional copyright law strikes a delicate balance between an author’s control of original material and society’s interest in the free flow of ideas, information, and commerce. Such a balance is enshrined in the idea/expression dichotomy which states that only particular expressions may be covered by copyright, and not the underlying idea.[1] In US law, the balance is struck through the application of the "fair use" doctrine. This doctrine allows use of copyrighted material without prior permission from the rights holders, under a balancing test.[2] Key criteria determining whether the use is "fair" include questions as to whether it is transformative (i.e. used for a work that does not compete with the work that is copied), whether it is for commercial purposes (i.e. for profit), whether the amount copied is substantial, and whether the specific use of the work has significantly harmed the copyright owner's market or might harm the potential market of the original. This balancing exercise may be applied to any use of a work, including the use by search engines. By contrast, there is no such broad catch-all provision in the EU. The exceptions and limitations are specifically listed in the various implementing EU legislations. They only apply provided that they do not conflict with the normal exploitation of the work, and do not unreasonably prejudice the legitimate interests of the right-holder.[3] Specific exemptions may be in place for libraries, news reporting, quotation, or educational purposes, depending on the EU Member State. At the moment, there are no specific provisions for search engines, and there is some debate as to whether the list provided in the EU copyright directive is exhaustive or open-ended.[4] In view of this uncertainty, it is worth analysing specific copyright issues at each stage of the search engines' working. The last few years have seen a rising number of copyright cases, where leading search engines have been in dispute with major content providers. Google was sued by the US Authors' Guild for copyright infringement in relation to its book scanning project. Agence France Presse filed a suit against Google's News service in March 2005. In February 2006, the Copiepresse association (representing French and German-language newspapers in Belgium) filed a similar law suit against Google News Belgium. As search engines' interests conflict with those of copyright holders, copyright law potentially constrains search engines in two respects. First, at the information gathering stage, the act of indexing or caching may, in itself, be considered to infringe the right of reproduction, i.e. the content owners' exclusive right "to authorise or prohibit direct or indirect, temporary or permanent reproduction by any means and in any form, in whole or in part" of their works.[5] Second, at the information provision stage, some search engine practices may be considered to be in breach of the right of communication to the public, that is, the content owners' exclusive right to authorise or prohibit any communication to the public of the originals and copies of their works. This includes making their works available to the public in such a way that members of the public may access them from a place and at a time individually chosen by them.[6]

1.4.1.1. Right of reproduction
1.4.1.1.1 Indexing
Indexing renders a page or content searchable, but the index itself is not a reproduction in the strict sense of the word. However, the search engine's spidering process requires at least one initial reproduction of the content in order to be able to index the information. The question therefore arises whether the act of making that initial copy constitutes, in itself, a copyright infringement. Copyright holders may argue that this initial copy infringes the law if it is not authorized. However, the initial copy is necessary in order to index the content. Without indexing the content, no search results can be returned to the user. Hence it appears search engine operators have a strong legal argument in their favour. The initial copy made by the indexer presents some similarities with the reproduction made in the act of browsing, in the sense that it forms an integral part of the technological process of providing a certain result. In this respect, the EU Copyright Directive states in its preamble that browsing and caching ought to be considered legal exceptions to the reproduction right. The conditions for this provision to apply are, among others, that the provider does not modify the information and that the provider complies with the access conditions.[7] The next section considers these arguments with respect to the search engine's cache copy of content.

1.4.1.1.2 Caching
The legal issues relating to the inclusion of content in search engine caches are amongst the most contentious. Caching is different from indexing, as it allows the users to retrieve the actual content directly from the search engines' servers. The first issues in regard to caching relate to the reproduction right. The question arises as to whether the legal provision in the EU Copyright Directive's preamble would really apply to search engines. One problem relates to the ambiguity of the term ‘cache’. The provision was originally foreseen for Internet Service Providers (ISPs) to speed up the process. It may give the impression that content is only temporarily stored on an engine's servers for more efficient information transmission. Search engines may argue that the copyright law exception for cache copies also applies also to search engines. Their cache copy makes information accessible even if the original site is down, and it allows users to compare between live and cached pages. However, cache copies used by search engines fulfill a slightly different function. They are more permanent than the ones used by ISPs and can, in fact, resemble an archive. Moreover, the cache copy stored by a search engine may not be the latest version of the content in question. In US law, the legal status under copyright law of this initial or intermediate copy is the subject of fierce debate at the moment.[8] For instance, in the on-going litigation against Google Print, publishers are arguing that the actual scanning of copyrighted books without prior permission constitutes a clear copyright infringement.[9] In the EU, however, the most important issue appears to relate to the use of particular content, or whether and how it is communicated to the public. In the Copiepresse case, the Court made clear that it is not the initial copy made for the mere purpose of temporarily storing content that is under discussion, but rather the rendering accessible of this cached content to the public at large.[10]

1.4.1.2. Right of communication to the public
1.4.1.2.1 Indexed Information
(i) Text Snippets It is common practice for search engines to provide short snippets of text from a web page, when returning relevant results. The recent Belgian Copiepresse case focused on Google's news aggregation service, which automatically scans online versions of newspapers and extracts snippets of text from each story.[11] Google News then displays these snippets along with links to the full stories on the source site. Copiepresse, an association that represents the leading Belgian newspapers in French and German, considered that this aggregation infringed their copyright. The argument is that their members - the newspapers - have not been asked whether they consent to the inclusion of their materials in the aggregation service offered by the Google News site.[12] Though it is common practice for search engines to provide short snippets of text, this issue had not raised copyright issues before. However, this may be a matter of degree and the provision of such snippets may become problematic, from a copyright point of view, when they are pro-actively and systematically provided by the search engines. One could argue either way. Search engines may argue that thousands of snippets from thousands of different works should not be considered copyright infringement, because they do not amount to one work. On the other hand, one may argue that, rather than the amount or quantity of information disclosed, it is the quality of the information that matters. Publishers have argued that a snippet can be substantial in nature – especially so if it is the title and the first paragraph – and therefore communicating this snippet to the public may constitute copyright infringement. One might also argue that thousands of snippets amount to substantial copying in the qualitative sense. The legality of this practice has not yet been fully resolved. On 28th June 2006, a German publisher dropped its petition for a preliminary injunction against the Google Books Library Project after a regional Hamburg Court had opined that the practice of providing snippets did not infringe German copyright because the snippets were not substantial and original enough to meet the copyright threshold.[13] By contrast, in the above mentioned Copiepresse case, the Belgian court ruled that providing the titles and the first few lines of news articles constituted a breach of the right of communication to the public. In the court's view, some titles of newspaper articles could be sufficiently original to be covered by copyright. Similarly, short snippets of text could be sufficiently original and substantial to meet the 'copyrightability' threshold. The length of the snippets or titles was considered irrelevant in this respect, especially if the first few lines of the article were meant to be sufficiently original to catch the reader's attention. The Belgian court was moreover of the opinion that Google's syndication service did not fall within the scope of exceptions to copyright, since these exceptions have to be narrowly construed. In view of the lack of human intervention and fully automated nature of the news gathering, and the lack of criticism or opinion, this could not be considered news reporting or quotation. Google News' failure to mention the writers' name was also considered in breach of the moral rights of authors. If upheld on appeal, the repercussions of that decision across Europe may be significant. (ii) Image Thumbnails A related issue is whether the provision by search engines of copyrighted pictures in thumbnail format or with lower resolution breaches copyright law. In Arriba Soft v. Kelly,[14] a US court ruled that the use of images as thumbnails constitutes 'fair use' and was consequently not in breach of copyright law. Although the thumbnails were used for commercial purposes, this did not amount to copyright infringement because the use of the pictures was considered transformative. This is because Arriba’s use of Kelly’s images in the form of thumbnails did not harm their market or their value. On the contrary, the thumbnails were considered ideal for guiding people to Kelly's work rather than away from it, while the size of the thumbnails makes using them, instead of the original, unattractive. In the Perfect 10 case, the US court first considered that the provision of thumbnails of images was likely to constitute direct copyright infringement. This view was partly based on the fact that the applicant was selling reduced-size images like the thumbnails for use on cell phones.[15] However, in 2007 this ruling was reversed by the Appeals Court, in line with the ruling on the previous Arriba Soft case. The appeals court judges ruled that "Perfect 10 is unlikely to be able to overcome Google's fair use defense."[16] The reason for this ruling is the highly transformative nature of the search engine's use of the works, which outweighed the other factors. There was no evidence of downloading of thumbnail pictures to cell phones, nor of substantial direct commercial advantage gained by search engines from the thumbnails.[17] By contrast, a German Court reached the opposite conclusion on this very issue in 2003. It ruled that the provision of thumbnail pictures to illustrate some short news stories on the Google News Germany site did breach German copyright law.[18] The fact that the thumbnail pictures were much smaller than the originals, and had much lower resolution in terms of pixels, which ensured that enlarging the pictures would not give users pictures of similar quality, did not alter these findings.[19] The court was also of the view that the content could have been made accessible to users without showing thumbnails – for instance, indicating in words that a picture was available. Finally, the retrieving of pictures occured in a fully automated manner and search engines did not create new original works on the basis of the original picture through some form of human intervention.[20] The German Court stated that it could not translate flexible US fair doctrine principles and balancing into German law. As German law does not have a fair use-type balancing test, the Court concentrated mainly on whether the works in question were covered or not by copyright.[21] Contrary to text, images are shown in their entirety, and consequently copying images is more likely to reach the substantiality threshold.[22] It may therefore be foreseen that AV search engines are more likely to be in breach of German copyright law than mere text search engines. A related argument focuses on robot exclusion protocols. The question arises as to whether not using them can be considered by search engines as a tacit consent to their indexing the content. The court's reaction to these arguments in relation to caching is significant here. These issues are thus considered below.

1.4.1.2.2 Cached Information
The second set of issues related to the caching of content revolves around the right of communication to the public. When displaying the cache copy, the search engine returns the full page and consequently users may no longer visit the actual web site. This may affect the advertising income of the content provider if, for instance, the advertising is not reproduced on the cache copy. Furthermore, Copiepresse publishers argue that the search engine's cache copy undermines their sales of archived news, which is an important part of their business model. The communication to the public of their content by search engines may thus constitute a breach of copyright law. The arguments have gone either way. Search engines consider, that information on technical standards (e.g. robot exclusion protocols), as with indexing, is publicly available and well known and that this enables content providers to prevent search engines from caching their content. But one may equally argue the reverse. If search engines are really beneficial for content owners because of the traffic they bring them, then an opt-in approach might also be a workable solution since content owners, who depend on traffic, would quickly opt-in. Courts on either side of the Atlantic have reached diametrically opposed conclusions. In the US, courts have decided on an opt-out approach whereby content owners need to tell search engines not to index or cache their content. Failure to do so by a site operator, who knows about these protocols and chooses to ignore them, amounts to granting a license for indexing and caching to the search engines. In Field v Google,[23] a US court held that the user was the infringer, since the search engine remained passive and mainly responded to the user's requests for material. The cache copy itself was not considered to directly infringe the copyright, since the plaintiff knew and wanted his content in the search engine's cache in order to be visible. Otherwise, the plaintiff should have taken the necessary steps to remove it from cache. Thus the use of copyrighted materials in this case was permissible under the fair use exception to copyright. In Parker v Google,[24] a US court came to the same conclusion. It found that no direct copyright infringement could be imputed to the search engine, given that the archiving was automated. There was, in other words, no direct intention to infringe. The result has been that, according to US case law, search engines are allowed to cache freely accessible material on the Internet unless the content owners specifically forbid, by code and/or by means of a clear notice on their site, the copying and archiving of their online content.[25] In the EU, by contrast, the trend seems to be towards an opt-in approach whereby content owners are expected to specifically permit the caching or indexing of content over which they hold the copyright. In the Copiepresse case, for instance, the Belgian Court opined that one could not deduce from the absence of robot exclusion files on their sites that content owners agreed to the indexing of their material or to its caching.[26] Search engines should ask permission first. As a result, the provision without prior permission of news articles from the cache constituted copyright infringement.[27]


[1] For a more exhaustive analysis of copyright issues, see Boris Rotenberg & Ramón Compañó, Search Engines for Audio-visual Content: Copyright Law & Its Policy Relevance, in Justus Haucap, Peter Curwen & Brigitte Preissl, forthcoming (2008).
[2] A balancing test is any judicial test in which the importance of multiple factors are weighed against one another. Such test allows a deeper consideration of complex issues.
[3] See Art.5.5, Directive 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the harmonisation of certain aspects of copyright and related rights in the information society, OJ L 167, 22.6.2001.
[4] See IVIR, The Recasting of Copyright & Related Rights for the Knowledge Economy, November 2006, pp.64-65, at www.ivir.nl/publications/other/IViR_Recast_Final_Report_2006.pdf. Note, however, that Recital 32 of the EUCD provides that this list is exhaustive.
[5] See Art.2, Directive 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the harmonisation of certain aspects of copyright and related rights in the information society, OJ L 167, 22.6.2001.
[6] Ibid., Art.3.
[7] See EUCD, supra, Recital 33.
[8] See, for instance, Frank Pasquale, Copyright in an Era of Information Overload: Toward the Privileging of Categorizers, Vanderbilt Law Review, 2007, p.151., at http://ssrn.com/abstract=888410; Emily Anne Proskine, Google Technicolor Dreamcoat: A Copyright Analysis of the Google Book Search Library Project, 21 Berkeley Technology Law Journal (2006), p.213.
[9] Note that this is essentially an information security argument. One of the concerns of the publishers is that, once the entire copy is available on the search engines’ servers, the risk exists that the book become widely available in digital format if the security measures are insufficient.
[10] See Google v. Copiepresse, Brussels Court of First Instance, February 13, 2007, at p.38.
[11] See Google v. Copiepresse, Brussels Court of First Instance, February 13, 2007, at p.36. The Copiepresse Judgment is available at http://www.copiepresse.be/copiepresse_google.pdf. See Thomas Crampton, Google Said to Violate Copyright Laws, The New York Times, February 14, 2007, at http://www.nytimes.com/2007/02/14/business/14google.html?ex=1329109200&en=7c4fe210cddd59dd&ei=5088&partner=rssnyt&emc=rss.
[12] See Latest Developments: Belgian Copyright Group Warns Yahoo, ZDNet News, January 19, 2007, at http://news.zdnet.com/2100-9595_22-6151609.html; Belgian Newspapers To Challenge Yahoo Over Copyright Issues, at http://ecommercetimes.com/story/55249.html. A group representing french- and german-language belgian newspaper publishers has sent legal warnings to yahoo about its display of archived news articles, the search company has confirmed. (They complain that the search engine's "cached" links offered free access to archived articles that the papers usually sell on a subscription basis.) See also Yahoo Denies Violating Belgian Copyright Law, Wall Street Journal, January 19, 2007, at http://online.wsj.com/.
[14] See Kelly v. Arriba Soft, 77 F.Supp.2d 1116 (C.D. Call 1999). See Gasser, Urs, Regulating Search Engines: Taking Stock and Looking Ahead, 9 Yale Journal of Law & Technology (2006) 124, p.210; at http://ssrn.com/abstract=908996.
[15] The court was of the view that the claim was unlikely to succeed as regards vicarious and contributory copyright infringement. See Perfect 10 v. Google, 78 U.S.P.Q.2d 1072 (C.D. Cal. 2006).
[17] See p. 5782 of the judgment.
[18] See the judgment of the Hamburg regional court, available at http://www.jurpc.de/rechtspr/20040146.htm, in particular on pp.15-16. See on this issue: http://www.linksandlaw.com/news-update16.htm
[19] Ibid., p.14.
[20] Ibid., p.15.
[21] Ibid., p.19
[22] Ibid., p.16.
[23] See Field v. Google, F.Supp.2d, 77 U.S.P.Q.2d 1738 (D.Nev. 2006); judgment available at http://www.eff.org/IP/blake_v_google/google_nevada_order.pdf
[24] See Parker v. Google, Inc., No. 04 CV 3918 (E.D. Pa. 2006); judgment available at http://www.paed.uscourts.gov/documents/opinions/06D0306P.pdf.
[25] See David Miller, Cache as Cache Can for Google, March 17, 2006, at http://www.internetnews.com/bus-news/article.php/3592251.
[26] See Google v. Copiepresse, Brussels Court of First Instance, February 13, 2007, at p.35; see also the judgment of the Hamburg regional court, at http://www.jurpc.de/rechtspr/20040146.htm, p.20.
[27] See Struan Robertson, Why the Belgian Court Ruled Against Google, OUT-LAW News, February 13, 2007, at http://out-law.com/page-7759.


1.4.2. Trademark Law


1.4.2.1. Early Litigation and Importance of Trademark Law
The issue of search on trademarked terms is one of the most litigated issues in the search engine context.[1] Trademarks are important for search engines. If search engines cannot sell keywords freely, they are not worth their market valuation. If competitors are allowed to buy ad keywords that contain registered trademarked names, then the search engine may be diverting some of the income streams away from the owners of the trademarked words toward their competitors. Trademark law has a lot to say about the actual practices of search engines in regard advertising. Google decided in 2004 to reverse its policy on trademarks. In the US and Canada it permit advertising bids on trademarked items, but forbids the use of the TM in the text of the advertising. Outside the US and Canada, it does not permit the use of trademarked items neither in the ads nor for triggering the ads. Yahoo! on the other hand, explicitly forbids this in its keyword auctions.[2]

1.4.2.2. Scenarios and Legal Questions
To be sure, we need to distinguish between three situations. There is the obvious case in which trademarked terms are being used by a competitor in the text or content of advertising on a search engine portal. However, the really contentious issues relate to situations in which the trademark remains invisible to Internet users. The trademarked item is part of the algorithm in two distinct situations. First, advertisers may use a registered trademark in the metatags of their web site ("meta-tagging scenario"). Search engines rely on keyword and description meta-tags for the selection of relevant results, and the risk exists that they would return a competitor's page among the main results for a user query on that specific trademarked item. The second situation concerns the case in which advertisers bid for a competitors' trademark in advertising auctions of search engines ("search engine auction scenario"). When users type the well-known trademark the risk then exists that the competitors' advertising message will rank higher than the one of the trademark owner. Only the two last situations are considered below. The main focus, however, is on the last issue since this is the only scenario in which search engines may be held liable for (enabling) trademark infringement.[3] This gives rise to three distinct legal questions. (i) Meta-tags Scenario v Auctioning Scenario The first question is thus whether the search engines' trademark practices in relation to advertising can be analogised to the meta-tagging scenario from a legal point of view. The trend in mata-tagging cases is in favour of liability of the web site provider who inserted the trademarked items in the meta-tags. Some courts have found that both these conducts should be considered analogous for the purposes of trademark law, while other courts consider that the metat-tagging scenario gives rise to liability but not the keyword auctioning scenario.[4] It appears that the two situations are not totally analogous for two reasons. First, it is always possible to see the trademarked items in metatags, either because they are in the text on the web site, or because they appear in the source code of the web site. By contrast, users cannot see the trademarked items in search engine auctions. Second, in the meta-tagging scenario the link comes up in the organic results, while in the auctioning scenrio the results come up in the advertising results. Given that consumers are more likely to expect some connection between the trademarked terms and the organic content or source, than between the trademarked term and the advertising message, one may argue that more caution and consequently stronger trademark protection is warranted in the meta-tags scenario. (ii) Trademark Infringing Use The second question is whether the search engines' keywording practice constitutes "infringing use" in the meaning of trademark law. The trademark use criterion is very complex, since there is more than one way in which one may consider that a trademark has been "used". The European Court of Justice considered that infringing use is a use by a third party that "is liable to affect the functions of the trademark, in particular its essential function of guaranteeing to consumers the origin of the goods."[5] In other words, infringing use refers to the use of a trademark in a way which would take away (some of) the goodwill created by the trademark owner. But of course it is possible to argue either way. One may claim that the concept should be broadly interpreted. Given that trademark owners invest huge sums of money in creating goodwill for the brand, and making it unique in the eyes of the consumer, they should also be the one reaping the benefits thereof. Conversely it is obvious that the connection between the trademark holder and the consumer is triggered or created by means of visible information. Therefore, one may equally hold that if advertisers do not display the trademark or information to consumers in any form, they cannot be said to be using the mark. In sum, the understanding of the terms "infringing use" is subject to diverging interpretations. Depending on one's view the scope of the trademark owner's rights may either be broad and relate to a number of uses of the trademark, or may be restricted to control over the purely visible or "informational" use of the mark towards the consumer. (iii) Likelihood of Consumer Confusion The third question is whether the search engines' trademark practices bring with them "likelihood of consumer confusion."[6] For a start, this criterion is not universal. While it is a necessary criterion in US law under section 32(1) of the Lanham Act, there is no such statutory requirement in many EU member States. For instance, in German law the finding of "likelihood of confusion" is presumed in certain cases, such as when an identical mark is used for goods or services that are in the same class as that for which the trademark is registered. However, due to the conceptual difficulty in determining the exact meaning of the above criterion relating to "trademark use", most jurisdictions appear to take the likelihood of consumer confusion into account, either implicitly or explicitly. In order to bring the necessary balancing elements, Efroni advocates greater reliance on the likelihood of confusion test as a presumption indicating trademark use. It would then be up to the advertiser to rebut that the use of the trademark is infringing trademark law.[7] This would mean that not every likelihood of confusion is actionable. This approach would also lead to a much more flexible test in wich a number of elements can be balanced against one another. Important elements are the interest of having free competition between advertisers, innovation in search engine advertisin, the right and benefits of comparative advertising, or the right to freedom of expression in the form of advertising. (iv) Wrongful Advantage As regards the jurisdictions that rely to a large extent on the finding of infringing use, some balancing might be introduced by having regard more closely to the issue of whether search engines gain wrongful advantage from the keyword auctioning business. Obviously search engines can be said to benefit somehow from the goodwill created by the brandowners. However, evidence is needed in each specific case as to whether this advantage may be considered wrongful, so as to avoid ending up with a limitless right for trademark owners. At the same time, this wrongful advantage test may bring the necessary flexibility in the application of trademark law in the search engines context. It is important to bear in mind the fact that search engines bring great benefit to society, and that they rely tro a large extent on the advertising business to offer their services from which many parties benefit (users, advertisers, and content providers).


[1] See Judge sides with Google in dispute over keywords, CNET News.com, September 29, 2006; Google loses French Trademark Lawsuit, CNET News.com, June 28, 2006; Google loses trademark dispute in France, CNET News.com, January 20, 2005; Google's ad sales tested in court, CNET News.com, February 13, 2006; Google may be liable for trademark infringement, CNET News.com, August 16, 2005.
[2] Danny Sullivan, Paid Search Ads & Trademarks: A Review of Court Cases, Legal Disputes, & Policies, Search Engine Land, September 3, 2007, at http://searchengineland.com/070903-150021.php
[3] For a good overview, see Eric Goldman, Deregulating relevancy in internet trademark law, 54 Emory Law Journal, 507 (2005); Zohar Efroni, Keywording in Search Engines as Trademark Infringement: Issues Arising fro Matim Li v. Crazy Line, Max Planck Working Paper, November 2006.
[4] See for specific case law on US and Germany, Efroni, p.9.
[5] See Arsenal v Matthew Reed, ECJ, C-206/01 (12.11.2002), para.51.
[6] This question is of course irrelevant if the previous question relating to infringing use is answered negatively.
[7] Zohar Efroni, supra, p.17.

1.4.3. Data Protection Law

1.4.3.1. Increasing Data Protection Concerns
On 17th March 2006, Google, the major web search engine, won a partial victory in its legal battle against government. In an attempt to enforce the 1998 Child Online Protection Act, government had asked it to provide one million web addresses or URLs that are accessible through Google, as well as 5,000 users' search queries. In Gonzales v. Google, a California District Court ruled that Google did not have to comply fully with the US government's request. Google need not disclose a single search queries, and shall provide no more than 50,000 web addresses[1]. However, it soon appeared that Microsoft, AOL and Yahoo! had handed over such information requested by government in that specific case,[2] and in the course of this case all search engines publicly admitted massive user data collection. It turns out that all major search engines are able to provide a list of IP addresses with the actual search queries made, and vice versa.[3] Not even 5 months later, AOL's search engine logs were responsible for yet another round of data protection concerns. There was public outcry when it became known that it had published 21 million search queries, that is, the search histories of more than 650,000 of its users. While AOL's intentions were laudable (namely supporting research in user behaviour), it appeared that making the link between the unique ID supplied for a given user and the real world identity, was not all that difficult.[4] Even more recently, the Article 29 Working Party had a public exchange of views with Google about its data retention policies, i.e. the logging of user data for indefinite periods of time. The Working Party questioned the legality of this practice in light of the data protection laws.[5]In July, Google said that it would start deleting identifying information after 18 months. Other operators such as Yahoo! and Ask followed suit, the latter even giving its user the option to prevent their data from being stored in the first place.[6] The last news came from Google's side, when it advocated the introduction of global privacy standards based on the APEC privacy framework.[7] These cases and public debates are milestones in raising awareness of the importance of data protection as regards web search. Importantly, these cases highlight a genuine need to better understand and analyse data protection issues. This issue is especially critical in a context of increased personalisation of search engines. Personalisation for the purposes of the present paper is the ability to proactively tailor offer to the tastes of individual users, based upon their personal and preference information. Personalisation is critically dependent on two factors: the search engines' ability to acquire and process user information, and the users' willingness to share information and use personalisation services[8].

1.4.3.2. Trends Towards Greater Personalisation
At present, search engines differentiate themselves from their competitors mainly thanks to the quality of their crawlers that gather digital information, and the volume and quality of their index, as well as by means of their algorithm which determines the relevance of search hits. One main consequence of search engine personalisation, however, is the enrichment of the latter process of defining relevance by means of a fourth component: a database containing the user profiles. Such a database is necessary for the search engine to effectively personalise the search results, or in order to rank the hits by "personalised relevance". Generally, search engines upload a cookie program in the computer of the user, during this user's first visit on the search engine site. That cookie bears a unique identifier or serial number, and is linked to the use of that browser on that particular computer. From that moment, every query made on the search engine using that particular browser software will be recorded, together with the Internet address, the browser language, the time and date of the query. To be sure, personalisation makes sense both from a technological and economic viewpoint. There is a genuine need for user-side information. User information may be used for internal tracking, for improving search engine's response to user queries, and for preventing click-fraud. Likewise, the emerging audio-visual or multimedia search applications hinge very much on user information, given the difficulties encountered in accurately carrying out pattern recognition. More personalised search also benefits the end-user. It helps the user remember search queries that have been viewed in the past. It may moreover be necessary in a context of proliferation of data. Search engines seek to cope with the explosion of data, formats and content diversity. Many searches are actually undertaken with some kind of answer, and there is currently an imbalance between the answer we search for, and getting a list of thousands of documents. As it is unlikely that a two or three word query can unambiguously describe a user's informational goal, and as users tend to view only the first page of results,[9] personalising may be one way to provide the end-user with more relevant hits.[10] The commercial interest in having more personalised search is equally beyond doubt. Better profiling would bring the search engine operators greater advertising revenue, as it would enable the latter to better price-discriminate. Search is a critical commercially relevant behaviour that indicates near-future user action. Rather than buying bluntly against words and context, personalisation would enable advertisers to buy against people and their likely habits. Thus, more and more personal information is gradually being drawn into the search domain. The harvesting of profiles and user information may rely increasingly on client-side applications. Search functionality now extends to desktop and email, files, notes, journals, blogs, music, photographs, etc. Toolbars, for instance, essentially grant the search engines access to users' hard drives every time they launch a search, which is many times a day. In the future, search may then even become "prospective." Search engines would match a user's record against new information passing through their matching engine. In sum, personalisation of search appears to result in huge benefits for both the commercial players and for the end-user. Though the idea to personalise search has been around for some time already (e.g. with Hotbot thinking about it as far back as 1996),[11] technological advances as regards storage, processing power, and artificial intelligence,[12] have meant that the drive toward increased personalisation have increased in recent years. There are basically two approaches, which are often combined. The first approach is to let the user define more narrowly the settings of her search engine. This amounts to personalisation of the index or sources from which the search engine will draw results. Examples of this are Rollyo, PSS!, Yahoo! Search Builder, and Google's recently launched Custom Search Engine. In short, this approach allows you to name an engine, include search terms, and web sites you want it to search. It can be very narrow or broad. This can be shared or strictly private. You can invite others to help or just accept volunteers who learn about the search engine.[13] Personalisation is not restricted to the individual users. Thus, Eurekster's social search engine is an example of personalisation of the results ranking according to both the interests and behaviour record of a community of users. The idea is that, in line with the logic of web 2.0 users would tag their search results, make notes, and share these with other users, thus mapping the Web. The second approach, which appears to be more fruitful given that most users will not take the time to set up their customised search engines, is to automatically re-rank results provided by search engines, or to show different users different results based on their past behaviour. The most prominent example of this approach is currently A9, an Amazon service which uses the Google index. Other examples are Google's Personalized Search, or Findory, which uses fine-grained information about individual pages the user viewed. Some have argued that current efforts toward personalisation of search is the wrong way to go, on the ground that people have changing interests, and because you cannot read the mind of a user by means of a few keywords entered in a search box.[14] However, the fact is that efforts toward personalisation are currently being undertaken. There is moreover little doubt that the current trend of gathering a maximum amount of user data shall continue. Given dramatic increases in processing power and storage capacity, there is no reason to believe that major players in the search engine market will not log all the personal information. Given that advertising is the biggest income source for many if not most search engines, and given that advertisers seem to appreciate the trend toward personalisation,[15] it makes sense to forecast an increasing reliance on personalised search. Information will thus be logged by search engines unless society makes a deliberate, concerted effort preventing this. It is consequently necessary to understand why and how this logging activity may need to be halted.

1.4.3.3. Data Protection Implications
Search engines conjure up the image of people being able to gain knowledge about other people's private lives using search engines.[16] This paper considers an arguably more important privacy debate. Namely, it questions whether the various search engines' logging activities are in line with EU data protection laws, and highlights the importance of this debate for media pluralism. As a starting point, it is important to bear in mind that responses to data profiling by search engines may take many forms: law, technology, social norms and market. One example of a technological response to surveillance by search engines is TrackMeNot, a tool which produces a lot of 'noise' and obfuscates the actual web searches in a cloud of false leads.[17] Another example is Tor, a technology that allows users to mask their IP address by means of a proxy server.[18] Search engine logging raises two related types of legal regulatory issues. The first type of privacy is privacy of communications, which covers the security and privacy of emails, and other forms of digital communication. Directive 2002/58/EC provides certain privacy protections for data gathered in the course of communications using publicly available electronic communications networks and services. In particular, recital 25 of the preamble states that cookies are legitimate provided that the users are given adequate information, and have the ability to refuse the cookie. This Directive is not particularly compelling for search engines, and is not dealt with any further here. The second type relates to information privacy, or the actual collection and handling of personal data. In this respect, the EU Data Protection Directive (Directive 95/46/EC) defines private data as any information relating to an identified or identifiable natural person. An identifiable person is one who can be identified, directly or indirectly, in particular by reference to an identification number (Art.2(a)) . The first question that arises is thus whether the data that are being recorded by search engines constitute personal data in the meaning of EU data protection legislation. Some of the queries made by the user may contain the name, telephone number of address of a given person. For instance, an increasing number of user tries to see what information is available about himself, by typing his name into the search engine box (vanity searches). Though in all of the policies regarding users' search histories there are clear indications as to how one may get rid of one's search history, it is not clear at all whether the information is wiped out completely, also at the end of the search engine.[19] Some have argued that none of the information thus recorded by search engines appears to constitute, in itself, personally identifiable information. This is because it is not actually possible to assert with a high degree of certainty who actually made the searches. Indeed, someone else might have typed your personal information in the search box, or two people might use the same browser engine to search using the same computer. Likewise, the actual information that is recorded in the digital dossier or profile will be (at best) a patchy overview of someone's life, given that the person may be using different browser software and/or search engines. In two recent court cases in France, user's IP addresses were considered not to be personally identifyable information in the sense of existing data protection legislation.[20] At the same time, the leading view across Europe In addition, it is important to note that there may sometimes be ways to link search query information to a particular person's computer by comparing the records of the search engine company with the logs of the Internet Service Provider (ISP). All major search engines are currently encouraging users to proactively help them with the building of the database, and they are providing other online applications and services. There is little doubt, for instance, that Google may have a reasonably good sense of a user's real world identity if that person is logged in to one of the Google applications – say, Gmail – and is simultaneously conducting search queries on the Google search engine.[21] Furthermore, the AOL case gives us a good idea of the actual ease with which it is possible to assert the real identity behind a list of search queries tied to unique ID numbers. In these circumstances, all of the above-mentioned data protection obligations would fall on the search engine operators. Finally, it is increasingly recognized that, contrary to popular belief, it is not the principle of secrecy which lies at the centre of data protection but the principle of autonomy. Data protection includes not only the right to keep personal matters out of the public eye, but also and foremost the right to be left alone, to be free from intrusion – to have some degree of autonomy over one's acts. Data and information regarding one's past activities are an important element in this debate. Data protection refers to the fact that I need to have some degree of control, autonomy, over the way my personal data are being processed. In this view, it is not so important whether you know the real world identity of the user who entered the search terms, or whether the information can be linked to a particular real world identity.[22] Surveillance by market players is intended to induce (as opposed to suppress) users into buying behaviour, but it is no less invasive of our autonomy than government control that may want to prevent users from certain behaviour. The fact that we are often watched by machines which seem less invasive from a secrecy point of view does not make it less problematic from a data protection point of view. While secrecy and autonomy were in many ways one and the same concept in physical space, this is not true in the digital environment where my personal data may well be secret to the search engines, but these may nonetheless severely affect my autonomy. In other words, it appears increasingly clear that search engines ought to comply with various provisions enshringed in the national laws implementing the data protection directive. Specficially, personal data should be processed fairly and lawfully (Art.6(1)(a)), they are to be collected for specified and legitimate purposes (Art.6(1)(b)). In addition, the data processing in question needs to be relevant (Art.6(1)(d)), and not excessive in relation to the purpose for which they have been collected (Art.6(1)(c), Artt.7-8). Finally, the data need to be kept accurate and up-to-date, when necessary with the help of data subjects (Art.7(a)), ought to be stored no longer than necessary for attainment of the objective; and may be disclosed only with the consent of the data subject (Art.7(a)).[23]


[1] Broache, A.: Google Wins Porn Probe Fight. In: CNET News, March 20, 2006; available at http://news.zdnet.co.uk/internet/0,1000000097,39258371,00.htm
[2] Hampton, M.: Google in bed with US intelligence, February 22, 2006, available at http://www.homelandstupidity.us
[3] Sullivan, D., Which Search Engines Log IP Addresses & Cookies – And Why Care?, 2006; available at http://blog.searchenginewatch.com/blog/060206-150030
[4] McCullagh, D.: AOL's disturbing glimpse into users' lives. In: CNET News, August 7, 2006, available at http://news.com.com/2100-1030_3-6103098.html; Barbaro, M., T. Zeller: A Face is Exposed. In: New York Times, August 9, 2006; available at http://www.nytimes.com/2006/08/09/technology/09aol.html?ex=1312776000&en=f6f61949c6da4d38&ei=5090
[6] See Kevin Allison, Seeking the Key to Web Privacy, Finanacial Times, September 23, 2007.
[8] Chelappa, R.K.; R.S. Sin: Personalization versus Privacy: An Empirical Examination of the Online Consumer Dilemma. In Information Technology and Management, Vol. 6, 2005, pp.181-202.
[9] Machill, M.; C. Neuberger, W. Schweiner, W, Wirth: Navigating the Internet: A Study of German-Language Search Engines. In: European Journal of Communication, Vol 19, Nr. 3, 2004, p.325.
[10] Teevan, J.: S.T. Dumais, E. Horvitz: Beyond the Commons: Investigating the Value of Personalizing Web Search. In: Workshop on New Technologies for Personalized Information Access, 2005; available at http://haystack.lcs.mit.edu/papers/teevan.pia2005.pdf
[11] Gasser, U.: Regulating Search Engines: Taking Stock and Looking Ahead. In: Yale Journal of Law and Technology, Vol.9, 2006, pp.204; available at http://ssrn.com/abstract=908996
[12] Olsen, S.: Spying an Intelligent Search Engine. In: CNET News, August 18, 2006; available at http://news.com.com/Spying+an+intelligent+search+engine/2100-1032_3-6107048.html
[13] Sherman, C.: Google Launches Custom Search Engine Service, October 24, 2006; available at http://searchenginewatch.com/showPage.html?page=3623765; Hafner, K.: Google Customizes Search Tool to Cut through Web Noise. In: The New York Times, October 24, 2006; available at http://www.iht.com/articles/2006/10/24/business/google.php; Bradley, P.: Your Search, Your Way, September 19, 2006, available at http://searchenginewatch.com/showPage.html?page=3623434
[14] Valdes-Perez, R.: Why Search Personalisation is a Dead End, 2006; available at http://vivisimo.com/docs/personalization.pdf
[15] Odlyzko, A.: Privacy, Economics, and Price Discrimination on the Internet. In: ACM International Conference Proceeding Series, Vol. 50, 2003; available at http://www.dtc.umn.edu/~odlyzko/doc/privacy.economics.pdf
[16] Tavani, H.T.: Search Engines, Personal Information and the Problem of Privacy in Public. In: International Review of Information Ethics, Vol.3, 2005, pp-39-45; available at http://www.i-r-i-e.net/inhalt/003/003_tavani.pdf
[17] See Howe, D.C.; H. Nissenbaum: TrackMeNot, 2006, available at http://mrl.nyu.edu/~dhowe/trackmenot
[18] See Christopher Soghoian, The Problem of Anonymous Vanity Searches, p.5; available at http:// ssrn.com/abstract=953673
[19] Sullivan, D., (b) Private Searches Versus Personally Identifiable Searches, 2006; available at http://blog.searchenginewatch.com/blog/060123-074811
[20] See Paris Appeals Court Decision – Anthony v. SCPP (27.04.2007), at http://www.legalis.net/jurisprudence-decision.php3?id_article=1954, and Paris Appeals Court Decision – Henri v. SCPP (15.05.2007) http://www.legalis.net/jurisprudence-decision.php3?id_article=1955, discussed in http://www.edri.org/edrigram/number5.17/ip-personal-data-fr.
[21] Goldberg, M.A.: The Googling of Online Privacy: Gmail, Search-Engine Histories and the New Frontier of Protecting Private Information on the Web. In: Lewis & Clark Law Review, Vol.9, 2005, pp.253
[22] Dan Solove, The Digital Person. Technology and Privacy in the Information Age. New York, NY [NYU Press] 2004
[23] On the other hand, Google's argument to the effect that the two years period followed from the data retention Directive was quickly rebutted by the Art. 29 Working Party, on the ground that the obligation to keep the data for two years applies to providers of public electronic communications networks and services, which search engines are not.

1.5. Policy Issues: Three Key Messages


1.5.1. Increasing Litigation in AV Search Era: Law as a Key Policy Lever

1.5.1.1. Sharp Tensions Surrounding Search Engine Operations
In each of the debates discussed above, it is possible to spot similar trends. The view of content providers, advertisers, and consumer and civil organisations is straightforward. They argue that search engines are free riding on their creations, their goodwill, or the user's data without appropriate remuneration, or without taking care of data protection obligations. The content generated by the providers is used by search engines in two distinct ways. First, search engines can become fully-fledged information portals, directly competing with the content providers that provide their very content.[1] Second, search engines use the content providers' creations as the source upon which they base their (sometimes future) advertisement income. Therefore, content providers are increasingly unwilling to allow search engines to derive benefits from listing or showing their content without remuneration. Brand owners are of the view that the goodwill created by them may be used by search engines to derive income. Users are increasingly concerned that the information that is held about them may be used. Search engines have a diametrically opposed view. They emphasise their complementary role as mere conduits in directing web-traffic to content providers, money to advertisers, and relevant content to their users. A recent report by the consulting company Hitwise shows that US newspapers' web sites receive 25% of their traffic from search engines.[2] Consequently, the search engines' view is that the relationship is mutually beneficial, in that search engines indirectly pay content providers through the traffic they channel to them, provide advertisers with a unique platform for increasing their brand name and commercial sales, and bring the most relevant to the users for free.

1.5.1.2. Unclear Legal Status
Search engines are gradually emerging as key intermediaries in the digital world, but it is no easy task to determine whether their operations, which are to a large extent automated, constitute copyright, trademark or data protection infringements. Due to their inherent modus operandi, search engines are pushing the boundaries of existing law. Issues are arising which demand a reassessment of some of the fundamentals of law. With regard to copyright law search engines raise a flurry of novel questions: does scanning books constitute an infringement of copyright, if those materials were scanned with the sole aim of making them searchable? When do text snippets become substantial enough to break copyright law if they are reproduced without the content owners' prior permission? With regard to trademark law, it is unclear whether the use of trademarked items to trigger ads constitutes "use of a trademark" in the sense of the law, or whether consumers are likely to be confused. With respect to data protection law, it is clear that user data are of fundamental importance in the development of improved search engines, but the question arises to what extent the data gathered by search engines constitute personal information in the sense of data protection law, and what may be the most appropriate means for balancing the various interests involved.

1.5.1.3. The Role of Technology & Market Transactions
Automation is inherent to the Internet's functioning: the question thus arises whether permission and agreement should equally be automated, or governed by technological standards. A good example comes from the copyright debate. In that context, search engines argue that if content providers prefer not to be included in the index or cache, they simply have to include the robot exclusion protocols in their website. Asking each content providers for prior permission would be unfeasible in practice. Content providers, on the other hand, argue that not including robot exclusion protocols in their websites cannot be considered as an implicit permission to use their content, since robot exclusion protocols cannot be regarded as law. There is currently no law in force stating that the non-use of robot exclusion protocols is equal to implicitly accepting indexing and caching. On the one hand, developments which aim to increase flexibility are welcome, because there is probably no one-size-fits-all solution to the copyright problem. Technology may fill a legal vacuum, by allowing parties at distinct levels of the value chain to reach agreement on the use of particular content. This approach has the advantage of being flexible. On the other hand, the question arises as to whether society wants content providers to exert, through technological standards, total control over the use of their content by players such as search engines. Such total control over information could indeed run counter to the aims of copyright law, as it could impede many new forms of creation or use of information. This is a recurrent debate. For example in the DRM debate, many commentators are skeptical about technology alone being capable of providing the solution. Another regulatory modality is the market, or contractual deals amongst market players. For instance, there have been a number of market deals between major content providers and major search engines. In August 2006, Google signed a licensing agreement with Associated Press. Google also signed agreements with SOFAM, which represents 4,000 photographers in Belgium, and SCAM, an audio-visual content association. Initially, both SOFAM and SCAM were also involved in the Copiepresse litigation. On 3 May 2007, the Belgian newspapers represented by Copiepresse were put back on Google news. Google agreed to use the no-archive tag so that the newspapers' material was not cached On 6 April 2007, Google and Agence France Presse reached an agreement concerning licensing. Consequently, as regards policy, the question arises as to whether there ought to be any legal intervention at all, since the market may already be sorting out its own problems. A German Court supported this view in its decision on thumbnails.[3] As it is a non-consolidated business and information is scarce, it is currently difficult to judge whether there is a market dysfunction or not. One of the salient facts here is that the exact terms of the deals were not rendered public, but in each one Google was careful to ensure that the deal was not regarded as a licence for the indexing of content. Google emphasised the fact that each deal will allow new use of the provider's content for a future product.[4] Some commentators see the risk that, while larger corporations may have plenty of bargaining power to make deals with content owners for the organisation of their content, the legal vacuum in copyright law may well erect substantial barriers to entry for smaller players who might want to engage in the organisation and categorisation of content. "In a world in which categorizers need licenses for all the content they sample, only the wealthiest and most established entities will be able to get the permissions necessary to run a categorizing site." [5] The same is true to some extent as regards branding. Brand owners may reach exclusivity agreements with the biggest and wealthiest search engines, thereby excluding upcoming players in the sector. This may become particularly worrying for emerging players. Concrete examples are emerging methods for categorizing and giving relevance to certain content, like the decentralised categorisation by user-participation. Although automatised, search engines are also dependent on (direct or indirect) user input. The leading search engines observe and rely heavily on user behaviour and categorisation. A famous example is Google's PageRank algorithm for sorting entries by relevance which considers the number clicks, and ranks the most popular URLs according to the link structure. There is a multitude of other sites and services emerging, whose main added value is not the creation of content but categorising it. This categorisation may involve communicating to the public content produced by other market players. Examples include shared bookmarks and web pages,[6] tag engines, tagging and searching blogs and RSS feeds,[7] collaborative directories,[8] personalized verticals or collaborative search engines,[9] collaborative harvesters,[10] and social Q&A sites.[11] This emerging market for the user-driven creation of meta-data may be highly creative, but may nonetheless be hampered by an increasing reliance on licensing contracts for the categorisation of content. In other words, law is not the only policy lever. There are other regulatory, technical and economic means of advancing the interests of the European AV content and AV search industry. However, it is clear from the above discussion that these regulatory means are influenced by copyright, trademark and data protection law which determine the permissible uses of certain content, brand names, or user data by search engines. Specifically, the law may have an impact on the use of certain technologies and technological standards; and the law may influence the conclusion of agreements between search engines and content providers, advertisers and users. 1.5.1.4. A Matter of Degree As a result, we denote one common pattern across the various bodies of law analysed. Issues relating to trademark law will become more acute in the audiovisual search context, given that the ads that can be served using AV search technology are likely to have a more powerful influence on consumer habits than the presently predominant text-based ads. The more audio-visual – rather than solely text-based – content is put on the Internet, the more we may expect copyright litigation problems to arise with respect to AV search engines. The reason is that premium AV content is generally more costly to produce and commercially more valuable than text-based content. Finally, given that it is already difficult to return pertinent results for text-based content, AV search engines will have to rely even more on user profiling; those user profiles will by the same token enable search engines to target users directly and thereby compete with traditional media and content owners. In sum, in comparison with pure text-based search, trademark, copyright and data protection litigation in the AV search environment may be expected to increase. In sum, the analysis highlights two aspects. First, no radically new legal problems are to be expected in the AV search context, as compared to the existing text-based environment. Second, law is a key policy lever in the search engine context, whose importance may moreover be expected to increase as we move on to an AV search environment.


[1] See Google v. Copiepresse, Brussels Court of First Instance, February 13, 2007, at p.22.
[2] See Tameka Kee, Nearly 25% of Newspaper Visits Driven by Search, Online Media Daily, Thursday, May 3, 2007, at http://publications.mediapost.com/index.cfm?fuseaction=Articles.showArticleHomePage&art_aid=59741.
[3] See the judgment of the Hamburg regional court, at http://www.jurpc.de/rechtspr/20040146.htm, p.20.
[4] Distinction between AFP/AP and copiepresse case. More difficult to remove AFP/AP content from Google news since hundreds of members are posting these stories on their site; comparatvely there are far fewer sources of Copiepresse content. In addition, AFP and AP are also different from classic news site because they get the bulk of their revenue from service fees from their subscribers, and derive little direct benefit from traffic from Google
[5] Frank Pasquale, supra, pp. 180-181.
[6] For instance, Del.icio.us, Shadows, Furl.
[7] For instance, Technorati, Bloglines.
[8] For instance, ODP, Prefound, Zimbio and Wikipedia.
[10] For instance, Digg, Netscape, Reddit and Popurl.
[11] For instance, Yahoo Answers, Answerbag.


1.5.2. Combined Effect of Laws: Need to Determine Default Liability Regime

1.5.2.1. Search Engines as Key Intermediaries
Search engines have become indispensable organisers and categorizers of data. They enable users to filter huge amounts of data and thus play an increasingly pivotal role in the information society. Search engines' main contribution is producing meta-data, for instance when indexing material. The above discussion indicates a number of unresolved issues in applying various laws to search engines. One important issue with respect to AV search engines relates to the copyright status of producers of meta-data, i.e. information (data) about particular information (data).[1]

1.5.2.2. Focusing on Individual Law is Insufficient
This section develops the following two points. First, each of the individual laws affects search engines and other emerging intermediaries in the digital environment. Second, focusing on each law individually may not yield the best result – there is a need to consider the laws together, and their combined effect on the market for those new intermediaries. Let us consider copyright law to make this point. Copyright law originates from the 'analogue era' with rather limited amounts of data. In those times, obtaining prior permission to reproduce materials or to communicate them to the public was still a viable option. Nowadays with huge amounts of data, automation is the only efficient way of enabling creation in the digital era. Automation raises intricate and unforeseen problems for copyright law. In addition, the automatic collection and categorisation of information by search engines and other meta-data producers is all-encompassing. Search engine crawlers collect any information they can find, irrespective of its creative value. They do this in a fully automated manner. The result may eventually be that search engines are forced to comply with the strictest copyright standard, even for less creative content. There are various policy dimensions here: (i) amending the law, and (ii) relying on the market.


1.5.2.2.1 Legal Regulation
Changing (slightly) the focus of EU copyright law could have positive economic effects. Today's main exceptions to copyright law are the right to quotation, review, or the special status granted to libraries. Automatic organization and filtering of data are not the focus of current copyright law. The above view suggests, however, that there is value in an efficient and competitive market for the production of meta-data, where the organisation of information is becoming increasingly critical in environments characterised by data proliferation. Some commentators consider that it would be beneficial to give incentives not only for the creation of end-user information, but also for the creation of meta-data. This could be achieved by including a legal provision in the copyright laws that take into account new methods for categorising content (e.g. the use of snippets of text, thumbnail images, and samples of audiovisual and musical works), some of which even as additional exceptions or limitations of copyright.[2] Increasing clarity on these practices might ease the entry of smaller players into the emerging market for meta-data. Similar arguments also apply to the cultural or social dimension, where copyright can be regarded as a driver of freedom of expression through its incentives to people to express their intellectual work. Again, given today's information overload, categorizers of information are also important from a social point of view. First, the right to freedom of expression includes the right to receive information or ideas.[3] One may argue that, in the presence of vast amounts of data, the right to receive information can only be achieved through the organization of information. Second, categorisations – such as the ones provided by search engines – are also expressions of information or ideas. Indeed, the act of giving relevance or accrediting certain content over other content through, for instance, ranking, is also an expression of opinion. Third, the creation or expression of new information or ideas is itself dependent on both the finding of available information and the efficient categorisation of existing information or ideas. EU Copyright Law and the Creation of Meta-Data for AV Search


1.5.2.2.2 Commercial Deals
Content providers and search engines need each other far too much. Search is big business and brings traffic. Content providers have some interest in keeping the search engines working and directing traffic towards their own sites. But search engines are equally useless without available content.[4] The hope of the news providers in the Copiepresse was that, if enough content and copyright owners object to being indexed without compensation then search engines will have substantially less content to index, and will be forced to come to the negotiation table. The case has potentially international ramifications. Google faces parallel case in France and in the US, where Agence France Presse has sued it for copyright infringement in the DC District Court in Washington. The Danish association of newspapers (Danske Dagblades Forening) has delayed the launch of Google News Denmark, arguing that Google will have to make separate agreements with each one of the publishers. The same legal and other negotiation techniques are being employed in relation to the Google Library project regarding the scanning of copyrighted books. Author and publisher organisations in many different countries are suing the search engine. These law suits are thus like strong positioning moves, or business negotiations that are going on in court.[5] Newspapers and other content providers want search engines to continue directing traffic, but they also want search engines to pay for the fact that they receive revenues in part thanks to their content. Besides answering in court, search engines have had two types of responses in relation to audio-visual content. The first move is one of increased (vertical) integration with online platforms for sharing and viewing audio-visual content, such as YouTube or Google Video.[6] It appears that here too platform operators will continue to be at odds with the right-holders until they licence the clips.[7] The second avenue is to conclude contractual agreements with content providers. For instance, it appears that Google sometimes agrees to pay for content. Google agreed to pay The Associated Press for stories and photographs, and settled copyright disputes with 2 groups in Belgium.[8] This strategy appears to bring with it a greater risk of (horizontal) concentration in the search engine sector. At present, it is still easy to switch between providers. Search personalisation has been one strategy of some search engines for tying users to their services. The risk is real that the contractual negotiations on the indexing and caching of copyrighted content may lead to increased barriers to entry.[9] Contractual negotiations are bilateral, and it is not unlikely that an agreement on the part of the search engine to pay for the indexing and caching of valuable content may come together with exclusivity clauses as is customary in other media segments. A contractual settlement between search engines and content providers may well result in distinctions between the types of content that may be retrieved by the various search engines. In some sense, this may signal a departure from the classic horizontal and open market structure that characterises the Internet, as opposed to the broadcast model. If such were the case, this would add another significant barrier to entry, and new start-ups would be less likely to threaten the incumbents in the search engine sphere. In sum, the copyright regime has a hard task taking into account the search engines’ unique role in making information accessible. This might be detrimental not only for a flourishing content sector, but also for development of new search engine technology (intermediaries). The more search engines move toward content aggregation and personalisation, the more likely it is that they will be affected by copyright law. At the same time, the sole application of copyright law in this sphere, and the barriers to entry that may result from contractual negotiations between search engines and content providers, may well require us to consider more closely whether there is a need to introduce some form of media law obligations. This is a debate that may have widespread ramifications and affect the basic fundaments of the Internet as a whole. A differentiation among search engines, which are widely believed to be among the key players of today’s Internet, would put into question the basic nature of the Internet as an open, horizontal communications platform.

1.5.2.3. In Search of the Default Liability Regime
Information products and services (i.e. culture) are intrinsically different in nature from—say—beans. A non-functioning media market may have catastrophic effects not only for the media players themselves but for society at large. In Europe and elsewhere, the media and their artefacts are thus recognised as deserving special regulatory attention in the interest of freedom of expression and freedom of information. This regulatory intervention takes the form of media law (or public interest regulation). The broadcasting sector, for instance, is one of the most heavily regulated sectors. Broadcasters are granted revocable conditional licences that are then tied to a set of stringent ownership requirements, media concentration rules, and content regulations. It should be stressed that by and large the great majority of media laws originate in the member States.[10] However, due to the lack of clear metrics for assessing, for instance, media pluralism or impact of certain players on audiences, correcting for perceived market failures is a highly complex exercise. Intervention needs to be carried out with caution. This is especially so in fast-paced technology markets. The important question thus arises to what extent and how traditional media laws are applicable in search engine-related questions. Generally speaking, media law does not talk about search engines; search engines are not in the media law dictionary. Despite their importance in the information society, search engines are systematically left out of sector-specific regulations.[11] This is no different at the European level. For instance, the main media regulatory instrument at the European level is currently the TV Without Frontiers Directive (TVWF).[12] The TVWF Directive explicitly excludes "communication services providing items of information or other messages on individual demand." While the TVWF Directive is currently in the process of being amended, the basic scope of the TVWF Directive does not change in relation to search engines. The on-going discussions seem to make clear that search engines that provide links to audiovisual content shall not be considered audiovisual media services in the sense of the Directive on AVMS.[13] Likewise, though search engines are closely related to EPGs for DTV, the Framework Directive only covers "associated facilities" that relate to the provision of DTV or digital radio as narrowly defined in the specific Directives. Search engines are not regulated under communications law either. The EU communications framework provides that it does not regulate services which provide or exercise editorial control over content transmitted over electronic communications networks. In sum, search engines seem to be beyond the scope of European laws relating to media and communications services.[14] This regulatory gap is perhaps the result of the particularly complex nature of search engines, and the question arises to what extent they can be compared to current media players. To most people, search engines appear objective because they are fully automated, give content providers the choice whether to be indexed or not, and merely respond to user queries. Search engines also like to portray themselves as such. For instance, Google stresses its objectivity and lack of bias on its very site when declaring that "our search results are generated completely objectively and are independent of the beliefs and preferences of those who work at Google."[15] This desire of complete impartiality is one of the reasons why search engines are careful when it comes to hand manipulation or intervention in the results.[16] At the same time, search engines have stressed their subjectivity in their relation with web masters in regard to search engine optimisation,[17] or in disputes with content providers. For instance, search engines have argued in recent law suits over ranking that ranking is a subjective statement of opinion about page quality, which falls under the right to freedom of expression.[18] Also, for reasons of search fraud, Google and other search engines cannot be totally passive conduits. They have an interest in preventing fraud, otherwise they may risk that users turn to other search engines that provide better, more relevant, search results. This subjectivity is logical: very much like media players, search engines are trying to maximize user satisfaction, and thus they must include some sort of subjectivity. In other words, the different with classic media players may be a mere matter of degree.[19] Due to the vast amounts of information, automated processes have become common place, and direct editorial intervention by humans as regards the results of the algorithmic selection is the exception. In this view, search engines have some degree of subjectivity like other media players, but their editorial choices are enshrined in the actual algorithm. This consideration is especially important at a time when search engines are moving toward content aggregation, proactively pushing content to the end user, and are thus acting in many ways like personalised broadcasters. A number of commentators have been debating whether some form of tailored media regulations ought to be enacted that take into account the specificities of search engines. Search engines are very similar to other media players. In fact, they are taking away large amounts of advertising income from classic media players. In recent years search engines have acquired a prominent role in granting users widespread access to information, and in giving the various advertisers even more “tailored eyeballs” than any broadcaster could offer them. Likewise, technologists and ethics scholars have convincingly stressed the fact that technology is not neutral, but that it has values and bias embedded in it.[20] Examples of proposed media law-type measures are increased transparency and various labelling and signalling measures by trusted third parties,[21] or public investment in alternative search engines.[22] At least one other commentator argues, on the one hand, that it is unavoidable that search engines make editorial judgments. But those editorial judgments are both desirable and necessary. This is so because search engines continually fight against spammers and fraudsters. In this view, government regulation will not be any more compelling at deciding which bias, which subjective view, should prevail in the ranking. However, this view rests on two assumptions or dynamics that would curb the bias of search engines. First, the move toward personalisation of search results moots the search engine bias since it breaks the above described snowball effect, and it caters for minority interest. Second, market forces and low switching costs between search engines mean that if a search engine’s bias degrades the relevance of search results, users will use alternative search engines.[23] The question arises whether the move towards AV search engines offers compelling reasons for re-thinking the current situation. One might need to distinguish between audio-visual and other media. Media law history has shown that the degree of media obligations increases as we move from text, to audio, to audio-visual. This may be inferred, first, from the distinct regulatory regimes that apply to radio and television broadcasters. Broadcasting regulation, for instance, is mainly a result of the cogent effects of AV programming on audiences.[24] Second, this has also transpired in some of the case law on media. In Jersild, for instance, the European Court of Human Rights accepted that restrictions on the right to freedom of expression may be more stringent in the case of audio-visual (as opposed to print) media when it stated that the latter often have ‘a much more immediate and powerful effect.’[25] In sum, search engines draw so much traffic that search engine web sites have become ideal candidates for advertising. In fact, search engines are key to the pay-per-click business model that is currently dominant. Second, on the basis of their indexing and recording of user queries and profiles, search engines are able to match user interests with the related content that is available on the Internet, and are increasingly converting themselves from mere conduits of information to active information gatherers pushing content to the user. It thus appears that search engines start competing with traditional media players in a number of respects. However, at present few media law obligations are directly applicable to search engines. This is paradoxical, since search engines are central to the new information economy, and as a result the position of search engines in media law is a topic for intense debates. It remains to be seen whether the switch to AV search engines, and the fact that the impact of audio-visual content is considered more cogent than text-based information products, will alter the existing equilibrium.


[1] Metadata vary with the type of data and context of use. In a film, -for instance- the metadata might include the date and the place the video was taken, the details of the camera setting, the digital rights of songs, the name of the owner, etc. The metadata may both be automatically generated or manually introduced, like tagging of pictures in online social networks (e.g. Flickr).
[2] See Frank Pasquale, supra, p.179 (referring to Amazon’s “look inside the book” application).
[3] See Art. 10 European Convention on Human Rights.
[4] A related point is of course that powerful search technology also makes it easier for right-holders to identify content and determine whether illegal copies of copyrighted content have been posted online. See Myspace Launches Pilot To Filter Copyright Video Clips, Using System From Audible Magic, Technology Review, February 12, 2007, at http://www.technologyreview.com/read_article.aspx?id=18178&ch=infotech. See Eric Auchard, Google Sees Video Anti-Piracy Tools as Priority, Reuters, February 22, 2007, at http://today.reuters.com/news/articlenews.aspx?type=technologyNews&storyid=2007-02-23T030558Z_01_N21366907_RTRUKOC_0_US-GOOGLE-YOUTUBE.xml. The technology solution is as follows: all major content providers send their content to Audible Magic to be logged into the database. Audible Magic uses “fingerprinting technology” that can recognise content no matter how this content is tampered with. Acoustic fingerprinting technology, for instance, is about creating a unique code from an audio-wave. This is different from other content identification technologies such as hash codes because the fingerprint is not generated from the binary data in the file. As a result, the acoustic fingerprint will be the same, irrespective of whether the file has been compressed, ripped into a different lower quality format, or amended. See http://en.wikipedia.org/wiki/Acoustic_fingerprint.
[5] See for this point Jeffrey Toobin, Google's Moon Shot. The Quest for the Universal Library, The New Yorker, January 29, 2007, at http://www.newyorker.com/fact/content/articles/070205fa_fact_toobin
[6] See Michael Liedtke, Google Video suit could signal YouTube trouble ahead, The Associated Press, November 8, 2006, at http://www.usatoday.com/tech/news/2006-11-08-google-sued_x.htm; Google Faces Legal Challenges Over Video Copyright, Reuters, November 11, 2006, at http://news.com.com/Google+faces+legal+challenges+over+video+copyright/2100-1030_3-6134679.html.
[7] See Jefferson Graham, Google Takes Hits From Youtube's Use Of Video Clips, USA Today, February 13, 2007, at http://www.usatoday.com/tech/news/2007-02-12-google-youtube_x.htm. A French Film producer sued Google for copyright infringement. It asked the court to sentence Google to provide compensation for loss of income. It alleged that Google had not acted as a simple host but as a fully responsible publisher when it made available its film on Googe Video. The film was downloaded 43,000 times in a very short time lapse. Astrid Wendlandt & William Emmanuel, French Film Producer Sues Google France, Reuters, November 23, 2006, at http://today.reuters.com.
[8] See for this Aoife White, Court to Hear Google-Newspaper Fight, CBS News, November 23, 2006, at http://www.cbsnews.com/stories/2006/11/23/ap/business/mainD8LITLI00.shtml; and Google Settles Copyright Dispute with 2 Groups in Belgium, International Herald Tribune, November 24, 2006, at http://www.iht.com/articles/2006/11/24/business/google.php.
[9] Some of the other significant barriers to entry in the search engine market are hardware related. Google and its competitors are currently engaged in an arms race toward ever more powerful server capacity. Each of them is believed to have many hundreds of thousands of servers in their server farms or datacentres. This server capacity provides search engines with the capability to speedily answer user queries. Speed is considered a major competitive element for attracting users. The server base may thus be considered a major barrier to entry, as it is unlikely that new entrants could quickly deploy a similar infrastructure. See Elinor Mills, Google Says Speed Is King, C|NET News, November 9, 2006, at http://news.com.com/Google+says+speed+is+king/2100-1032_3-6134247.html.
[10] See, for instance, the debate on the proposed EU Media Pluralism Directive which was due to remove barriers to cross-border activities of media players, by harmonising the media concentration rules across Europe. In the end, however, the Directive was never proposed for mainly political reasons: MS did not want to give up control over their media ownership laws. See G. Doyle, From 'Pluralism' to 'Ownership': Europe's Emergent Policy on Media Concentrations Navigates the Doldrums, Journal of Information, Law and Technology (JILT) (1997), http://elj.warwick.ac.uk/jilt/commsreg/97_3doyl/
[11] See Nico van Eijk, Search engines: Seek and Ye Shall Find? The Position of Search Engines in Law, IRIS plus (Supplement to IRIS - Legal observations of the European Audiovisual Observatory), 2006-2, at www.obs.coe.int/oea_publ/iris/iris_plus/iplus2_2006.pdf.en.
[12] See Directive 89/552/EEC of 3 October 1989 on the Coordination of Certain Provisions laid down by Law, Regulation or Administrative Action in Member States Concerning the Pursuit of Television Broadcasting Activities, O.J. L.298/23 of 17 October 1989.
[14] See Nico van Eijck, supra, p.5.
[16] See Rachel Williams, Search engine takes on web bombers, Sydney Morning Herald, SavedJanuary 31, 2007, at http://www.smh.com.au/articles/2007/01/30/1169919369737.html.
[17] See James Grimmelmann, supra, p.27
[18] See KinderStart v. Google, Case 5:06-cv-02057-JF (N.D. Cal. motion to dismiss granted July 13, 2006. This is reminiscent of the case law that pitted broadcasters against cable operators. Broadcasters argued that they should be granted access to the cable network on grounds of freedom of expression, while cable operators argued that they too enjoyed the right to freedom of expression which included the right not to broadcast certain views. See Turner Broadcasting System, Inc.. v. F.C.C. (93-44), 512 U.S. 622 (1994).
[19] The recent Copiepresse case also evidenced this interesting tension. In the beginning of the judgment, Google argued that Google News was a specialised search engine and not an information portal. As such it did not compete with the newspapers' sites. But when it came to the exceptions, Google argued that its service fell under the fair use exception of news reporting. This tension reflects the problems people have in classifying search engines in the media world.
[20] See Lucas Introna & Helen Nissenbaum: Shaping the Web: Why the Politics of Search Engines Matters, The Information Society, 16(3), 2000, pp. 169-186. See Frank Pasquale, Rankings, Reductionism, and Responsibility, Seton Hall Public Law Research Paper No. 888327, February 25, 2006, at http://papers.ssrn.com/sol3/papers.cfm?abstract_id=888327.
[21] Transparency of ownership and sources is a central value of media regulation, and could become central in relation to search engines too. But this transparency should be adapted to the specificities of search engines. Some have argued that one should open the search algorithms to public scrutiny. But this stands in tension with the idea that algorithmic innovation is ensured through secrecy and trade secret protection.
[22] Media pluralism can be sub-divided in two types. Internal pluralism rules are measures that seek to ensure that each media outlet gives a fair and complete overview of the range of views on a give topic. External pluralism measures, on the other hand, seek to remedy the risk that the media sector be overly concentrated.
[23] See Eric Goldman, Search Engine Bias and the Demise of Search Engine Utopianism, 9 Yale Journal of Law and Technology (2006), pp. 188-200; at http://papers.ssrn.com/sol3/papers.cfm?abstract_id=893892.
[24] Of course, one may argue that the main reason for this stringent regime was spectrum scarcity. But with many voices out there, it is submitted that with the resulting scarcity of attention, audio-visual media may still necessitate more careful consideration as regards regulation. Note that we will soon be witnessing a related move to audio-visual advertising. Google is expected to develop an audio version of AdSense which would allow any podcast producer to include ads in their shows. See Frank Barnako, Google Seen Powering Podcast Ad Growth, February 12, 2007, at http://internet.seekingalpha.com/article/26787.
[25] See Jersild v. Denmark, Judgment of 23 September 1994, A.298, p.23.



1.5.3. EU v. US: Law Impacts Innovation In AV Search

Analysts tend to agree that the search engine market is thriving at present. There are a number of innovation trends that can be spotted.[1] The question thus arises to what extent EU law allows for innovation, or may be seen as hampering it. The paper finds markedly different approaches to search engine regulation across the Atlantic. This is evident in copyright, trademarks and data protection law.
1.5.3.1. Copyright Law
Copyright infringement ultimately depends on the facts. Search engines may retrieve and display picture thumbnails as a result of image search, or they may do so proactively on portal-type sites such as Google news to illustrate the news stories. The copyright analysis might differ depending on particular circumstances. The analysis shows how US courts have tended to be more favourable towards search engine activities in copyright litigation. This can be seen, for instance, in the litigation on caching, the displaying of thumbnails, and the use of standardised robot exclusion protocols. The open-ended 'fair use' provision has enabled US courts to balance the pros and cons of search engine activities case by case. However, the balancing test does not confer much legal certainty. European case law shows that European courts have been rather reluctant to modify their approaches in the wake of fast-paced technological changes in the search engine sector. For instance, they have stuck more to the letter of the law, requiring express prior permission from right-holders for the caching and displaying of text and visual content. This is partly because European copyright laws do not include catch-all fair use provisions. The result is, however, that while US courts have some leeway to adapt copyright to the changing circumstances, the application of copyright law by European Courts is more predictable and confers greater legal certainty. The paper finds, first, that different courts have reached diametrically opposed conclusions on a number of issues. Second, case law appears to indicate that the closer search engines come to behaving like classic media players, the more likely it is that copyright laws will hamper their activities. Likewise, it appears that the current EU copyright laws make it hard for EU courts to account for the specificities and importance of search engines in the information economy (for instance, increased automatisation and data proliferation). Comparing EU and US copyright laws in general terms, we can say that EU laws tend to provide a higher degree of legal certainty but its application to search engines may be considered more rigid. US law, on the other hand, is more flexible but may not confer as much legal certainty. Both approaches are not mutually exclusive and a key question for policy makers is how to find a balance between conferring rather rigid legal certainty and a forward-looking more flexible approach in such a fast-paced digital environment.
1.5.3.2. Trademark Law
There has been intense litigation on this issue on both sides of the Atlantic. In the beginning it appeared, relying on the holding in Brookfield Communications,[2] that US courts would find that search engines infringed trademark rules when auctioning trademarked words. However, this ruling was about search optimisation, and the use of trademarked items in web site meta-tags. The Court had ruled that it was possible to infringe trademark law by capturing initial consumer attention, even though no action is completed as a result of the confusion, may still be an infringement.[3] In Playboy,[4] the court applied the Brookfield holding to rule that a clear indication on the banner ad of the actual source and sponsor name eliminated likelihood of initial interest confusion. In Geico,[5] a US court ruled that Geico had not presented sufficient factual evidence corroborating the finding that sales of TM to third parties constituted infringement since the ads themselves did not include the TM word and and there was no evidence that this activity alone caused confusion. US courts came to similar conclusions in a flurry of other recent cases.[6] In the EU, on the other hand, TM litigation seems to follow a different course. In France, the number of trademark lawsuits against Google now number more than 40, and most have gone against the U.S. Internet company. A smaller number of cases have been brought in Belgium and Germany.[7] The position of European courts seems to be that the use of trademarked terms in auctions amounts to a trademark infringement.[8] Google France said that since the case began in 2003, it has implemented a policy barring Internet advertisers from buying search listings under trademarks held by others, as well as a ban on advertising for counterfeit products. There is thus a noticeable difference between EU cases, where TM infringement has been found, and US cases where search engines seem to be more immune. In sum, it appears that jurisdictions that rely on the likelihood of confusion test, of which the US is the most well-known example, have inherently more flexibility built-in, and consequently more leeway for courts to conduct a balancing test. In this balancing test, Courts will be able to introduce important elements into the calculus such as the interests of competition, advertising innovation, comparative advertising or freedom of expression. In doing so, Courts are also able to bear in mind the importance of search engines in the information society (for all stakeholders), and the role of keyword advertising for funding them.


[1] See Nitin Karandikar, Top 17 Search Innovations Outside Of Google, May 7, 2007, http://www.readwriteweb.com/archives/top_17_search_innovations.php; See also Giorgio Soffiato, Le 17 innovazioni che cambieranno i motori di ricerca http://sitiwebmarketing.boraso.com/motori-di-ricerca-search-marketing/le-17-innovazioni-che-cambieranno-i-motori-di-ricerca.html; See also Emre Sokullu and Richard MacManus, Search 2.0 - what's next?, December 13, 2006, http://www.readwriteweb.com/archives/search_20_what_is_next.php; See also Charles Knight, The top 100 alternative search engines, January 29, 2007, http://www.readwriteweb.com/archives/top_100_alternative_search_engines.php (updated May 1, 2007 http://www.readwriteweb.com/archives/top_100_alt_search_engines_april07.php), and The future of search, technology review, July 16, 2007, at http://www.technologyreview.com/Biztech/19050/?a=f.
[2] Brookfield Communications, Inc v. West Coast Entertainment (DC California 1998 See Internet Business law services (September 29, 2006) The Initial Interest Confusion – Beginning of Liability for Search Engine Companies – relying on Brookfield Communications, Inc v. West Coast Entertainment (DC California 1998)
[3] The court stated that "To capture initial consumer attention, even though no action is completed as a result of the confusion, may still be an infringement."
[4] Playboy v Netscape (2004), see on this Gasser, supra, p.211.
[6] Check n go v Google, American Blind v Google (2005); Novak v Overture (2004) ; 800-JR-Cigar v Overture (2000); Newborn v Yahoo Inc. (2005); Rescuecom v. Google (trademark infringement dismissed) (September 28, 2006)
[8] See TGI Paris, 12 juillet 2006, GIFAM et autres v. Google France http://www.juriscom.net/jpt/visu.php?ID=848; CA Paris, 28 juin 2006, SARL Google, Sté Google Inc v. SA Louis Vuitton Malletier http://www.juriscom.net/jpt/visu.php?ID=837; Le Meridien Hotels v Google




1.6. Conclusions

1. Search is an advertising-based industry, relying heavily on well-known brands for its income. It should thus come as no surprise that the first series of cases involving search engines related to trademarks, and concerned the relation between advertisers and search engines. By contrast, the first generation of search engines caused relatively few problems in terms of copyright litigation. Search engines merely retrieved text data from the web, and displayed short snippets of text in reply to a specific user query. Over time, however, search engines started organising and giving users access to more economically valuable content, and copyright infringement claims have come to the fore. Data protection concerns have arisen only in recent times, in relation to the recording and processing of user search queries and user profiling activities. Search engines are essential tools in our current information ecosystem. Each of these three debates (copyright, trademarks, data protection) is ultimately about striking the right balance for society in relation to search engines. There is a need, on the one hand, to foster the efficient categorisation and organisation of content by a wide range of players such as search engines, relying on accurate user profiles and funded by advertising. On the other hand, there is equally an interest in incentivising the creation of digital content (copyright), fostering investments in creating goodwill for certain brands (trademarks), and supporting the widespread use of search engine technology (data protection). To be sure, law is only one of several possible regulatory modalities determining whether the most appropriate balance is struck. Other essential elements in this debate are technological standardisation (e.g. robot exclusion protocols, privacy enhancing technologies), and commercial deals between market players. Far from being independent from one another, these regulatory modalities impact each other. For instance, copyright law determines the use of robot exclusion protocols. Similarly, the way copyright law is applied may increase or decrease the pressure on search engines to conclude licencing agreements with content owners. However, this paper claims that law is a key policy lever with regard to search engines. The wording of the law, and its application by courts, has a major influence on whether a thriving market will emerge for search engines, including the future AV search engines. Instead of focusing on increased difficulties in applying the law, the shift towards more audio-visual search offers a unique opportunity to rethink trademark law, copyright law and data protection law for the digital environment. This paper argues that the legal problems encountered so far in relation to search engines may be expected to increase as we move into the AV search era. Issues relating to trademark law will become more acute in the audiovisual search context. This is because the ads that can be served using AV search technology are likely to have a more powerful influence on consumer habits than the presently predominant text-based ads. Likewise, the more audio-visual content is put on the Internet, the more we may expect copyright litigation with respect to AV search engines. The reason is that premium AV content is generally more costly to produce, and commercially more valuable than text-based content. Finally, it is already difficult for text-based content to return return pertinent results, but AV search engines will have to rely even more on user profiling; those user profiles will by the same token enable search engines to target users directly and thereby compete with traditional media and content owners. In sum, the analysis highlights that no radically new legal problems are to be expected in the AV search context, as compared to the existing text-based environment. However, the degree and amount of litigation may be expected to increase as we move on to an AV search environment.
2. Consequently, the switch to AV search appears to require policy makers to bring all of those legal questions in perspective. Trademark law is struggling to come to terms with the use of trademarked terms in the automated ad-triggering mechanisms of the search engine, because the use of the trademarked term takes place in the background away from the consumer's eyes. With regard to copyright a set of completely new legal issues arises, including those surrounding the caching of content, or the scanning of books with a view to making them searchable. Data protection law has to be re-considered in view of the importance of search engine personalisation in helping users make sense of the vast amounts of information that is available on the Web. Automation and the search engine's unique functionality forces us to reconsider the fundaments of our current legal regime. Legal issues that could still be left aside in the text search era will now need to be addressed. Over time, we have witnessed a steady transformation of search engines. Storage, bandwidth and processing power have increased dramatically, and automation has become more efficient. Search engines have gradually shifted from a reactive response to the user ('pull') to pro-actively proposing options to the user ('push'). Future search will require increasing organisation and categorisation of all sorts of information, particularly in audio-visual (AV) format. Due to this shift from pure retrievers to categorisers, search engines are in the process of becoming fully-fledged information portals, rivalling traditional media players. As a result, the position of search engines in law goes beyond the individual laws. There is an increasing need to determine exactly which type of intermediaries search engines are considered to be, and as a result which is the default liability regime search engines should conform to. The least intrusive regulation for search engines is the liability regime laid down in the e-commerce directive. More developed liability and obligations for intermediaries exist in varying degrees in communications and media laws. This default regime is not only important as such, but it also influences the position of courts in relation to legal claims regarding, for instance, copyright, trademark, and data protection. If search engines are analogous to media enterprises, then it follows that they may more easily be held liable for copyright, trademarks, and data protection infringements. Determining the specific nature of search engines, and the default liability regime that applies to them, is a prerequisite for a concerted approach across the various other laws that apply to them. Leaving search engines in a legal vacuum may end up hampering the development of a thriving European search engines sector.
3. Implicitly, the above legal analysis forces us to re-think innovation policy in relation to the search engine context. For instance, the paper claims that copyright's main policy relevance lies in its possible effects on the emerging market for meta-data production. A basic goal of copyright law is to incentivise the creation of content. Given the proliferation of digital content, it becomes more difficult to locate specific content. It becomes comparatively more important to promote the development of methods for accurate organising of AV content than to incentivise creation. This is particularly true in the AV search context, where organising AV content for efficient retrieval is a major challenge, and where many players currently compete to provide the leading technology or method for producing accurate meta-data. Strong copyright law will force AV search engines to conclude licensing agreements over the organising of content. It supports technology's role in creating an environment of total control whereby content owners are able to enforce licences over snippets of text, images and the way they are used and categorised. By contrast, a more relaxed application of copyright law might take into account the growing importance of creating a market for AV meta-data production and meta-data technologies in an environment characterised by data proliferation. This approach would give incentives for the creation of content, while allowing the development of technologies for producing meta-data. The analysis suggests that EU and US courts appear to have drawn markedly different conclusions on the same issues as a result of the differences of the respective legal orders. Comparing EU and US copyright law in general terms, we can say that EU copyright law tends to provide a higher degree of legal certainty but its application to search engines may be considered more rigid. US law, on the other hand, is more flexible but may not confer as much legal certainty. Similarly, US and EU trademark law may well have yielded somewhat different results so far. This could well be a direct result of the fact that US trademark law, relying to a large extent on the "likelihood of consumer confusion|" test, includes more balancing possibilities for Courts than trademark law in many EU Member States with its focus on "trademark use". Finally, it appears equally important to consider the possible effect of data protection laws on innovation. The EU has a much more developed data protection regime than the US, which relies mainly on regulation by technology and regulation by contract (privacy terms and conditions). With a number of high profile debates regarding the logging of user data, and ensuing public concern, there can be little doubt about the importance of addressing this issue. However, it is important to beat in mind the need to address these issues with as minimal impact as possible on critical innovation (such as for instance search engine personalisation).



1.7. Future Research

1.7.1. Social Trends

This section will consider the social aspects of AV search, by placing search in context. The backdrop against which AV search engines will need to be developed is one of increasing user participation (coined web 2.0). The section will thus show how social aspects have always been, and are increasingly revolving around user participation. On the one hand, search engines are at the heart of all of the upcoming web 2.0 applications such as wikipedia, Flickr, or YouTube. On the other hand, search engines are fundamentally dependent on humans, from the early stages onward (e.g. Yahoo was initially a human edited directory). The leading search engines currently observe and rely on user behaviour (clicks, popular URLs, and link structure). There is a multitude of sites and services out there that can be said to offer social search. Chris Sherman sorts them into a number of categories: Shared bookmarks and web pages (Del.icio.us, Shadows, Furl); Tag engines, tagging and searching blogs and RSS feeds (Technorati, Bloglines); Collaborative directories (ODP, Prefound, Zimbio and Wikipedia); Personalized verticals or collaborative search engines (Google Custom Search, Eurekster, Rollyo); Collaborative harvesters (Digg, Netscape, Reddit and Popurl); Social Q&A sites (Yahoo Answers, Answerbag). The section will conclude by asking how and whether the current trends may be expected to increase in the AV search era. This section will place the search engines within the wider context of access to information and knowledge. Search engines are key tools that help determine to what extent information is accessible at large. It will consider the recognition of search engines' special status in current regulatory initiatives seeking to foster widespread access to knowledge, and will ponder whether this role may be expected to increase in the switch to AV search. At the same time, it should be remembered that AV search may exacerbate the current trend of increasing centralisation of search engines. This poses deep questions of media pluralism, as a few players seem to become the main entry doors, or access points to the digital world. The section will briefly refer to a number of recent examples, ranging from manipulation of search engines by third parties, to the deliberate intervention of search engines themselves, to cases of censorship. This section concludes by considering whether these issues warrant a more careful approach in the AV era. It revisits the history of media regulation and looks at the distinction between text, audio and video.



1.7.2. Economic trends

The search engine landscape consists of three main parts. First, there is a large number of content providers that make their content available for indexing by the search engine's crawlers. Second, there are the advertisers that provide most of the income for the search engine activity. Finally, new players have arisen whose livelihood depends on the business model of search engines. This section will provide information on the most important player and will consider their respective interests, seeking foremost to give an idea about the various players involved and their respective interests. The predominant business model for search is currently advertising. The leading search engines generate revenue primarily by delivering online advertisement. The importance of advertising for search engines is self-evident, also, from their spending. In 2006, Google was planning to spend 70% of its resources on search and advertising related topics. A few years ago, advertising on search engine sites was very much like in analogue media. This included mainly banner advertising, and sometimes paid placement, whereby ads were mixed with organic results. But many users considered these too intrusive and not sufficiently targeted or relevant to the search or web site topic, and not taking advantage of the interactive nature of the Web. By contrast, online advertising differs from traditional advertising that traceability of results is easier. Mainstream search engines now mainly rely on two techniques. These are advertising business models that rely on actual user behaviour: pay-per-click (advertiser pays each time the user clicks on the ad) and increasingly pay-per-performance (advertiser pays each time the user purchases or prints or takes any action that shows similar interest). This section will place the current leading business model based on text advertising in context, and will ponder to what extent the switch to audio-visual search applications warrants/demands a different approach. Although dominated by three US-based giants (i.e. Google, Yahoo! and Microsoft), the search engine market is currently extremely active. The search engine space spans across all sorts of information. We currently witness the deployment of search engines for health, property, news, job, person, code or patent information. They will increasingly be able to sift through information coming from a wide range of information sources (including emails, blogs, chat boxes, etc.) and devices (desktop, mobile). Search engines are able to return relevant search results according to the user’s geographic location or search history. Virtually any type or sort of information, any type of digital device or platform, may be relevant for search engines. Search is thus an increasingly central activity that has become the default manner for many users to interact with the vast amounts of information that are available on the Web. This section will consider the importance of search, highlight current market trends, and ponder to what extent this is likely to change in an AV search context. The developments will be assessed against the possibility of increased market concentration (including initial analysis of barriers to entry, switching costs & network effects)


1.7.3. Further Legal Aspects

Depending on the findings of the research on economic and social trends, a number of additional questions comes up. These include in order of priority:

1.7.3.1. Constitutional law
A. Freedom of expression [Art.10 European Convention on Human Rights] · What is the role of search engines in fostering the right to freedom of expression (right to be included in index) and access to information (right to have access to a diverse set of information)? · Do search engines equally have a right to freedom of expression [cf. argument of cable network operators against TV operators]? Does this right to freedom of expression clash with other players' own right to freedom of expression [e.g. to be listed in the organic results, in the advertising results, etc.] · What might be considered appropriate restrictions to freedom of expression in the context of search engines? [youth protection, blasphemy, national security and terrorism, racism, violent content, etc.] · Are restrictions more appropriate in the case of AV search, given that audio-visual content is regarded as more powerful, immediate, than text [cf. Jersild case]?

B. Right to respect of Private Life [Art.8 European Convention on Human Rights] · Search engines enable easy access to details about many persons' private lives, and at the same time search engines record a lot of personal information about their users in the act of searching · Is current regulation in line with the constitutional right to privacy (proportionality of means to ends)? · What are appropriate restrictions to the right to privacy? [cybercrime, etc.] · Are those restrictions less appropriate in the case of AV search? Should privacy be more protected in the case of AV search given the nature of AV content? C. Right to Property [Art.1, 1st Protocol, European Convention on Human Rights] · Is there a constitutional right to intangible property? [see EU Charter of Fundamental Rights] · Search engines may give access to all types of information that is protected by some form of intellectual property right [EU database directive, copyright], or the way in which search engines work may enable certain players to make profits at the expense of the owner of a certain IPR [trademark] · Consequently, what are appropriate restrictions to the right to property – fair use, etc. Are those restrictions applicable in the case of AV search?

C. Right to Property [Art.1, 1st Protocol, European Convention on Human Rights] · Is there a constitutional right to intangible property? [see EU Charter of Fundamental Rights] · Search engines may give access to all types of information that is protected by some form of intellectual property right [EU database directive, copyright], or the way in which search engines work may enable certain players to make profits at the expense of the owner of a certain IPR [trademark] · Consequently, what are appropriate restrictions to the right to property – fair use, etc. Are those restrictions applicable in the case of AV search?
1.7.3.2. Intellectual Property Rights
There appears to be more need to research the potential implications of the sui generis Database Directive ( i.e. EU Directive 96/9/EC on the Legal Protection of Databases) on search engines. · Indexes are in effect huge databases. Can it be argued that indexes fall within the sui generis database directive? · What are the practical consequences of the application of this new form of intellectual property to search engines and AV search engines in particular? · What does this mean in terms of competition?

1.7.3.3. EU Competition Law
The classic competition law analysis is to be carried out for the existing market of text-based search. The key question relating to AV search is then whether possible dominance in the existing market risks to be leveraged into the newly arising AV search market.

A. Single Dominance [Art.82] · What is the search engines market structure, and what is the relevant market? · Determine whether likely dominance [study switching costs, barriers to entry] · will main players in text search leverage market power into AV search? [leveraging] · focus on abuse of dominance: essential facilities, bundling, tying, leveraging, etc.

B. Anti-competitive agreements and joint (or Collective) Dominance [Art.81] · is there an oligopoly in the search engine sector? Is there evidence of anti-competitive agreements between market players? · is "joint dominance" likely in fast-paced market characterized by technological innovation? · What kind of abuse may be existing? Essential facilities?

1.7.3.4. Media & Communications Law

A. E-Commerce Directive · Does the e-commerce directive apply and what does this imply in terms of liability of search engine providers? · Can we identify self-regulation or co-regulation initiatives and existing codes of conduct. · What is the relation beween e-commerce and TVWF directive?

B. Media Law · Can search engines be defined as media in the sense of media law? · Are Search engines biased? If so, is it perceived by users? Are there examples of intentional bias [e.g. BMW, China]? · Depending on the above, which principles of media law are relevant in relation to text search? [transparency and independence requirements, ownership limits, language and other quotas, etc.] · Analyse in detail potential application of TV without frontiers Directive to search engines · Is there any marked difference for AV search? · What type of EU intervention is warranted/possible, if any?

C. EU Communications Law · What is the place of search engines in the regulatory package for electronic communications? · Can we analogise with existing regulations on APIs and EPGs? Why? · Should we foresee some form of regulation analogous to existing universal service obligations? · Is intervention warranted on the basis of Significant Market Power [SMP]? · What about network neutrality? Should we impose different regulatory conditions on players who are responsible for a lot of network traffic? · What type of standardization, if any, is legally warranted/undertaken? · Any difference for AV search?

1.7.3.5. Law of Obligations / Liability Law

A. EU Product Liability
· What kind of tort obligations may be imposed on the search engine operator?
· Can these be held liable for not filtering out harmful content? For not giving accurate results? Is a notice at the top of the page sufficient?
· Does the EU product liability Directive apply to search engines?

B. Consumer Protection
· Which types of EU consumer protection regulations apply to search engines?
· May search engines be held liable for spyware, malware, or other software that may be damaging or present on the user's computer as a result of search engine use?
· Is there a filtering obligation on the search engine operator?

C. Anti-Spam Laws

  • Web sites and content providers could use images to attract traffic using AV content (falsely claiming to be about that content but in fact being about something totally different (cf, discussion with the use of (invisible) and incorrect metatags on web pages to attract traffic)
  • Do anti-spam laws foresee spamdexing and techniques used by content operators to divert traffic this way? If not, should they?


ANNEX to chapter 3: Summary and goals of use cases






No user avatar
pointjc
Latest page update: made by pointjc , Mar 17 2008, 9:07 AM EDT (about this update About This Update pointjc Edited by pointjc

1 image deleted

view changes

- complete history)
Keyword tags: None
More Info: links to this page

Anonymous  (Get credit for your thread)


There are no threads for this page.  Be the first to start a new thread.