 |
Robots.txt and Copyright: Not Yet |
|
 |
 |
|
 |
 |
It was only a matter of time until robots minding their own business
got mixed up in copyright battles. Robots are doing the grunt work in
dealing with a network bigger than anything managed by humans before.
They are trudging through the Internet, sorting the Internet, and
archiving the Internet. It was really only a question of what the
facts would look like when a robot stepped on someone’s copyright. In
fact, I remember a poolside conversation about what this lawsuit might
look like. I have expected this lawsuit for years. But when I saw it,
I barely recognized it. (continued…)
This lawsuit, Healthcare Advocates, Inc. v. Harding,
Early, Follmer & Frailey, et al., was filed earlier this month. William Patry does a much better job with the facts
of this lawsuit than I could, but this is my description in short. A
corporation is suing the Internet Archive (IA), the non-profit that
runs the Way Back Machine, for contract violation that they would block old archive pages one a robots.txt file was posted on the company's website.
I had outlined a series of issues about the legal status robots, robots.txt, and the DMCA, but this just isn't the case that will decide these issues. At best, this case will show what kind of notice robots.txt is in a contract.
Notice using robot headers matters because if robots.txt is a good way of giving notice, it could change the fair use inquiry for all kinds of robots. Owners might even be able to contract out of fair use by blocking the robots themselves (the modern 'No Trespass' sign), making IA and Google subject to those limitations with legal force. Robot Exclusion Headers and protocols are currently voluntary standards, NOT rules, especially not legal ones. If a robots.txt file is given the power of notice, it turns that standard into a rule, a scary prospect given how many other voluntary techinical standards are run, promoted, and designed by third parties. Even the robot standards are not settled. Some commands, such as No-Email-Collection, may or may not even be considered part of the standards. Helathcare Advocates claims IA had a contract with them about the archived pages, and the robots.txt file invoked their obligation to not display old pages. Maybe I have a contract with you when you visit my webpage about what kind of robot access you can use (this is still not settled), and if I do, I can claim you should have known about generic robot exclusions, or even a particular robot exclusion command, perhaps No-Email-Collection. Here the line between rule and standard is fuzzy. No-Email-Collection may be some sort of explicit contract term in a robot-readable header, but perhaps without special contracting (as the creators of this tag always invoke) it should have no force because it is a novel voluntary standard that very few even know about. Generic robot exclusions have been around a few years, and that is a long time on the Internet, so we view them as settled.
Contracting with robots is difficult, and as a society we are going to have to figure out how to do it. Perhaps robots.txt is a good, legally-binding way to communicate with robots, or maybe it's just the best we have. Either way, there needs to be limits. Those limits need to at least be at novel commands; a robot shouldn't be able to sign something it doesn't understand. In this case, I would have liked to see some actual notice in addition to the robots.txt file, given Healthcare knew so much.
|
|
 |
| |
 |
Login |
 |
 |
Don't have an account yet? You can create one. As a registered user you have some advantages like theme manager, comments configuration and post comments with your name. |
|
 |
 |
Related Links |
 |
 |
|
 |
 |
Article Rating |
 |
 |
|
 |
 |
Options |
 |
No Comments Allowed for Anonymous, please register |
|
|