First installment of our live-blogging. All quotes are paraphrased by me.
James Grimmelmann, Introductory Remarks.
Grimmelmann noted the speakers were even more influential and accomplished than he expected. He suggested search is the harbinger of Internet law and will affect the law of Internet and information. Grimmelmann outlined two trends: technological convergence (search engine has few unique functions) and doctrinal convergence (unclear which body of law should apply to search engines). Law and technology must deal with a huge number of legal problems and ambiguities, including intellectual property, free speech,
Panel 1: The Search Space
Michael Fischer, CS, Yale University (Moderator)
Robin Sloan, Current TV. Robin says he is a professional blogger and the author of a flash movie called EPIC 2014 -- just google 2014, and it’s the first result. It posits a future with a super-Google that encompasses the whole world and all information. In the story, NYT sues the super-Google, Google wins, and Google is able to do anything with (including mix and rewrite) information. This is a future with no gatekeepers that gets information out if you are link-worthy which is the present. Sloan says librarians, marketers, journalists, lawyers, and others are concerned about how search engines will affect their traditional roles, and their lives. Sloan claims these questions are very public and affect society as a whole. He says the public concerns that should be in the debate are resource pages, such Google Vote or Goole districting, like a guide to voting registration and other public resources, and this was tied to ads. He says an afterthought in his paper was the most interesting topic, FOIA. Suppose Google made an easy FOIA request, scanned them, and made them available to everyone. Or what if Google just FOIAed everything and then scanned it. What should be online? These are political decisions, and not just legal or business decisions.
Stefan Bechtold, Max Planck Institute for Research on Collective Goods, suggests search engines are legally close to P2P, eBay, Friendster, Do-Not-Spam, and Usenet groups. Groups are using copyright, trespass, and contract to restrict competition in search market. In the EU, the same efforts use the database directive and copyright law. Courts are reluctant to apply these property-like theories. Search engines also cooperate with each other on a limited scale. Search engines’ competition and innovation are creative. efforts like Google Book Search. Thus, the search engines move toward a platform which others depend on and want to control. Search will be more difficult to regulate if it become decentralized, as P2P did.
Andrei Broder, Yahoo. Broder notes search technology has evolved very quickly (despite consistent external appearances) and that it will continue to evolve. He goes through a quick history of search technology, text data, non-text data (metadata, page rank), and the developing “need behind the query� movement. He suggests the next innovation will be semantic search. The Yahoo term is FUSE (find, use, share, and expand all human knowledge). This moves from information retrieval to informational supply—there will be no search box, the information will just come to you as you need it. He analogized this to the old version of navigating using maps to the new version including GPS and on-demand resources of nearby destinations.
Daniel Caul, AskJeeves.com. Caul, as in-house counsel, says search is in its infancy. Caul says the Internet is now an entertainment form to “waste time� and is converging with games, movies, and TV, like Yahoo’s TiVo tie-in. The future of search will have other media involved, and search engines might even end up being broadcasters or content providers. Personalization of the web creates more information about users, and threatens privacy, compounded when the information is shared. Old regulation (in United States & UK) was search engines had to pull down child pornography. France and Germany and China have different content pushes. In the United States, this is pushing into advertising content, such as gambling or pharmaceutical ads. There has not been much content restriction, but that pressure is building (e.g. pulling down online pharmacy sites).
Stephen Arnold, Arnold Information, claims search is the new information platform. He says IBM made computers into ways of making information and not processing it (old mainframes with cards, white coat). The era of the PC changed into the era of accessing information (search engines) and future applications will be based on access. Google has evolved from search into multimedia because a change has occurred in the culture of information. Arnold is taking notes on a cell-phone, which as more applications like a PC. This kind of access is part of the Google legacy. Google’s advantage is its ability to deliver hybrid applications in any language and to many platforms. (Google won the translation competition at the NSA in Chinese, Arabic, and Korean.) Google is experimenting with “the data center on the truck,� a shipping container with a huge amount of computing power, and it builds the storage and can ship it all over the world. Arnold claims Google has a huge cost advantage which is required for any economic analysis. Google is expanding into telecom, and is contemplating Googlebucks and Googlemart, international sale and seamless translation of goods and currency.
Ed Felten, Princeton, goes through a process of information production. Observation, observation database, analyze/learn, digested observations, serve users. This matches the Internet search engines as well as Book Search, specialized sites (baseball stats), or P2P which is about helping people find what they want. In this definition, ChoicePoint is a search engine. Search can be internal (eBay) or external (Bidder’s Edge). This debate is also in P2P, e.g. Grokster’s internal search engine and Bittorrent’s lack of search engine, which turns out to be legally significant. The most difficult step is analyzing and learning, but the legal challenges are at the observation step, e.g. the crawling by Bidder’s Edge.
Q&A
Prof Fischer: What are going to be the changes in quality in information on the Internet? Some pages are misleading and wrong and lack the accreditation of traditional media. Users need a way to accredit. How can that be done? Is that the responsibility of content providers?
Broder: The goal of search engine is to rank quality (static ranking). There are many mechanical tools, age, maintenance, domain reliability, but the problem goes beyond the power of mechanical tools.
Felten: I’m hopeful. All of us make choices about what we believe, and there are lots of people making judgments about what is correct, and if we can aggregate that data, we will have good data about what is trustworthy. Simple statistics on large bodies of data has more power than thought.
Broder: Using large amounts of data is valuable until it is subverted by spammers.
Q:What do you think about Digg.com which is socially promoting news?
Sloan: Digg is a great example of technology nerds who are not well-served by NYT and have created their own scene.
Q: In all the potential futures of search, no one mentioned VoIP. Will VoIP be an element of search in the future with new legal issues?
Broder: VoIP has been around the corner for 15 years. Speech recognition is hard, and speech is poorly formatted, repeating words, saying um. Most multimedia search uses parallel text. VoIP can have a lot of recorded voice (ignoring privacy). Google won machine translation, but it took four days for a paragraph. The statistics require brute force and a large corpus. If Google can handle the data, it will be able to search voice.
Q (Bill McGeveran): Putting law aside, what do you see the possibility of push-back from users (my TiVo thinks I’m gay) from personalized search?
Caul: Some people have no problems with the Gmail ads, some are troubled by it, so it’s hard to predict.
Broder: Hard to know what consumers find acceptable. Improving your results by remembering recent searches is probably acceptable, but long-term records might be distressing. The Amazon recommendation system shows you books you would like, and probably uses very sophisticated software, but they offer a very simple explanation that Book A means you want Book B.
Q: What impact will it have when Google indexes all the legal books that competes with Lexis?
Arnold: The issue of scanning books is over. Yahoo jumped in and libraries have committed to scan. The debate is over and the books are already available. The impact is on information producers and the outcome is not certain.
Bechtold: I am skeptical about the battle of commercial databases. They have structured information with rich tools that search engines do not. Governments are putting statutes and court decisions online, so lawyers can use Google more which is problematic for quality control and for Lexis.
Q: Competition is driving search engines to keep adding kinds of media (video). Will there be types of information that will lack demand?
Sloan: Who can do a better job of adding buckets that weren’t there before: videos, books…. There is stuff that should be indexed and isn’t on the web, but it lacks ad revenue. I dream search engines will have the idealism of journalists, the double mission of society and economics.
Arnold: One of our best commercial database customers is Yale. The producers are under financial pressure, Lexis is in the hole, so they raised the rates. West will add other features and minimize operational costs. The infrastructure is expensive, and if a commercial database can keep costs low they survive, but if not (amazon, Thomson) they will be destroyed. NIH funded Medline because the government bails out failing databases. The technology must have the correct scale. The long-term outlook is not good because the databases have not grown at all.