Downloading Legislation

Legislice is a utility for downloading the text of statutes and constitutional provisions, and then creating computable objects representing passages from those provisions. This guide will show how to get started.

Legislice depends on the AuthoritySpoke API as its source of legislative text. Currently the API serves the text of the United States Constitution, plus versions of the United States Code in effect since 2013. Provisions of the United States Code that were repealed prior to 2013 aren’t yet available through the API, and neither are any regulations or any state law.

Using an API token

To get started, make an account on authorityspoke.com. Then go to the User Profile page, where you can find your API token. The token is a 40-character string of random letters and numbers. You’ll be sending this token to AuthoritySpoke to validate each API request, and you should keep it secret as you would a password.

There are several ways for Python to access your API token. One way would be to simply define it as a Python string, like this:

>>> TOKEN = "YOU_COULD_PUT_YOUR_API_TOKEN_HERE"

However, a better option is to make your API token an environment variable, and then use Python to access that variable. Using a Python library called dotenv, you can define an environment variable in a file called .env in the root of your project directory. For instance, the contents of the file .env could look like this:

LEGISLICE_API_TOKEN=YOUR_API_TOKEN_GOES_HERE

By doing this, you can avoid having a copy of your API token in your Python working file or notebook. That makes it easier to avoid accidentally publishing the API token or sharing it with unauthorized people.

Here’s an example of loading an API token from a .env file using dotenv.

>>> import os
>>> from dotenv import load_dotenv
>>> load_dotenv()
True
>>> TOKEN = os.getenv("LEGISLICE_API_TOKEN")

Now you can use the API token to create a Legislice Client object. This object holds your API token, so you can reuse the Client without re-entering your API token repeatedly.

>>> from legislice.download import Client
>>> client = Client(api_token=TOKEN)

Fetching a provision from the API

To download legislation using the Client, we must specify a path to the provision we want, and optionally we can specify the date of the version of the provision we want. If we don’t specify a date, we’ll be given the most recent version of the provision.

The path citation format is based on the section identifiers in the United States Legislative Markup standard, which is a United States government standard used by the Office of the Law Revision Counsel for publishing the United States Code. Similar to a URL path in a web address, the path format is a series of labels connected with forward slashes. The first part identifies the jurisdiction, the second part (if any) identifies the legislative code within that jurisdiction, the third part identifies the next-level division of the code such as a numbered title, and so on.

If we don’t know the right citation for the provision we want, we can sign in to an AuthoritySpoke account and browse the directory of available provisions, where the links to each provision show the correct path for that provision. Or we can browse an HTML version of the API itself. If the error message “Authentication credentials were not provided” appears, that means we aren’t signed in, and we might want to go back to the login page.

The fetch() method makes an API call to AuthoritySpoke, and returns JSON that is been converted to a Python dict. There are fields representing the content of the provision, the start_date when the provision went into effect, and more.

Here’s an example of how to fetch the text of the Fourth Amendment using the Client.

>>> data = client.fetch(query="/us/const/amendment/IV")
>>> data
{'heading': 'AMENDMENT IV.',
'start_date': '1791-12-15',
'node': '/us/const/amendment/IV',
'text_version': {
    'id': 735706,
    'url': 'https://authorityspoke.com/api/v1/textversions/735706/',
    'content': 'The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no Warrants shall issue, but upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.'},
'url': 'https://authorityspoke.com/api/v1/us/const/amendment/IV/',
'end_date': None,
'children': [],
'citations': [],
'parent': 'https://authorityspoke.com/api/v1/us/const/amendment/'}

Loading an Enactment object

If all we needed was to get a JSON response from the API, we could have used a more general Python library like requests. Legislice also lets us load the JSON response as a legislice.enactments.Enactment object, which has methods for selecting some but not all of the provision’s text. One way to load an Enactment is with the Client’s read_from_json() method.

>>> fourth_a = client.read_from_json(data)
>>> fourth_a.node
'/us/const/amendment/IV'

Instead of always using fetch() followed by read_from_json(), we can combine the two functions together with read(). In this example, we’ll use read() to load a constitutional amendment that contains subsections, to show that the structure of the amendment is preserved in the resulting Enactment object.

>>> thirteenth_a = client.read(query="/us/const/amendment/XIII")

The string representation of this provision includes both the selected text (which is the full text of the amendment) as well as a citation to the provision with its effective date.

Currently the only supported citation format is the path-style citation used in United States Legislative Markup. Future versions of Legislice may support the ability to convert to traditional statute citation styles.

>>> str(thirteenth_a)
'/us/const/amendment/XIII (1865-12-18)'

The text of the Thirteenth Amendment is all within Section 1 and Section 2 of the amendment. You can use the Enactment.children property to get a list of provisions contained within an Enactment.

>>> len(thirteenth_a.children)
2

Then we can access each child provision as its own Enactment object from the children list. Remember that lists in Python start at index 0, so if we want Section 2, we’ll find it at index 1 of the children list.

>>> str(thirteenth_a.children[1].text)
'Congress shall have power to enforce this article by appropriate legislation.'

Downloading prior versions of an Enactment

The API can be used to access specific provisions deeply nested within the United States Code, and also to access multiple date versions of the same provision. Here’s a subsection of an appropriations statute as of 2015. We can use the end_date attribute to find when this version of the statute was displaced by a new version.

>>> old_grant_objective = client.read(query="/us/usc/t42/s300hh-31/a/1", date="2015-01-01")
>>> old_grant_objective.content
'strengthening epidemiologic capacity to identify and monitor the occurrence of infectious diseases and other conditions of public health importance;'
>>> old_grant_objective.end_date
datetime.date(2019, 7, 5)

And here’s the same provision as of 2020. Its content has changed.

>>> new_grant_objective = client.read(query="/us/usc/t42/s300hh-31/a/1", date="2020-01-01")
>>> new_grant_objective.content
'strengthening epidemiologic capacity to identify and monitor the occurrence of infectious diseases, including mosquito and other vector-borne diseases, and other conditions of public health importance;'

The 2020 version of the statute has None in its end_date field because it’s still in effect.

>>> str(new_grant_objective.end_date)
'None'

Exploring the structure of a legislative code

When we query the API for a provision at a path with less than four parts (e.g., when we query for an entire Title of the United States Code), the response doesn’t include the full text of the provision’s children. Instead, it only contains URLs that link to the child nodes. These URL links might help to automate the process of navigating the API and discovering the provisions we want. Here’s an example that discovers the URLs for the articles of the US Constitution.

>>> articles = client.read(query="/us/const/article")
>>> articles.children
['https://authorityspoke.com/api/v1/us/const/article/I/', 'https://authorityspoke.com/api/v1/us/const/article/II/', 'https://authorityspoke.com/api/v1/us/const/article/III/', 'https://authorityspoke.com/api/v1/us/const/article/IV/', 'https://authorityspoke.com/api/v1/us/const/article/V/', 'https://authorityspoke.com/api/v1/us/const/article/VI/', 'https://authorityspoke.com/api/v1/us/const/article/VII/']

Downloading Enactments from cross-references

If an Enactment loaded from the API references other provisions, it may provide a list of CrossReference objects when we call its cross_references() method. You can pass one of these CrossReference objects to the fetch() or read() method of the download client to get the referenced Enactment.

>>> infringement_provision = client.read("/us/usc/t17/s109/b/4")
>>> str(infringement_provision.text)
'Any person who distributes a phonorecord or a copy of a computer program (including any tape, disk, or other medium embodying such program) in violation of paragraph (1) is an infringer of copyright under section 501 of this title and is subject to the remedies set forth in sections 502, 503, 504, and 505. Such violation shall not be a criminal offense under section 506 or cause such person to be subject to the criminal penalties set forth in section 2319 of title 18.'
>>> len(infringement_provision.cross_references())
2
>>> str(infringement_provision.cross_references()[0])
'CrossReference(target_uri="/us/usc/t17/s501", reference_text="section 501 of this title")'
>>> reference_to_title_18 = infringement_provision.cross_references()[1]
>>> referenced_enactment = client.read(reference_to_title_18)
>>> referenced_enactment.text[:239]
'Any person who violates section 506(a) (relating to criminal offenses) of title 17 shall be punished as provided in subsections (b), (c), and (d) and such penalties shall be in addition to any other provisions of title 17 or any other law.'

An important caveat for this feature is that the return value of the cross_references() method will only be populated with internal links that have been marked up in the United States Legislative Markup XML published by the legislature. Unfortunately, some parts of the United States Code don’t include any link markup when making references to other legislation.

Downloading Enactments from inbound citations

The method in the previous section finds and downloads Enactments cited by a known Enactment. But sometimes we want to discover provisions that cite to a particular provision. These “inbound” citations are not stored on the Python Enactment object. Instead, we have to go back to the download client and make an API request to get them, using the citations_to() method.

In this example, we’ll get all the citations to the provision of the United States Code cited /us/usc/t17/s501 (in other words, Title 17, Section 501). This gives us all known provisions that cite to that node in the document tree, regardless of whether different text has been enacted at that node at different times.

>>> inbound_refs = client.citations_to("/us/usc/t17/s501")
>>> str(inbound_refs[0])
'InboundReference to /us/usc/t17/s501, from (/us/usc/t17/s109/b/4 2013-07-18)'

We can examine one of these InboundReference objects to see the text creating the citation.

>>> inbound_refs[0].content
'Any person who distributes a phonorecord or a copy of a computer program (including any tape, disk, or other medium embodying such program) in violation of paragraph (1) is an infringer of copyright under section 501 of this title and is subject to the remedies set forth in sections 502, 503, 504, and 505. Such violation shall not be a criminal offense under section 506 or cause such person to be subject to the criminal penalties set forth in section 2319 of title 18.'

But an InboundReference doesn’t have all the same information as an Enactment object. Importantly, it doesn’t have the text of any subsections nested inside the cited provision. We can use the download Client again to convert the InboundReference into an Enactment.

>>> citing_enactment = client.read(inbound_refs[0])
>>> citing_enactment.node
'/us/usc/t17/s109/b/4'
>>> citing_enactment.text
'Any person who distributes a phonorecord or a copy of a computer program (including any tape, disk, or other medium embodying such program) in violation of paragraph (1) is an infringer of copyright under section 501 of this title and is subject to the remedies set forth in sections 502, 503, 504, and 505. Such violation shall not be a criminal offense under section 506 or cause such person to be subject to the criminal penalties set forth in section 2319 of title 18.'

This Enactment happens not to have any child nodes nested within it, so its full text is the same as what we saw when we looked at the InboundReference’s content attribute.

>>> citing_enactment.children
[]

Sometimes, an InboundReference has more than one citation and start date. That means that the citing text has been enacted in different places at different times. This can happen because the provisions of a legislative code have been reorganized and renumbered. Here’s an example. We’ll look for citations to Section 1301 of USC Title 2, which is a section containing definitions of terms that will be used throughout the rest of Title 2.

>>> refs_to_definitions = client.citations_to("/us/usc/t2/s1301")
>>> str(refs_to_definitions[0])
'InboundReference to /us/usc/t2/s1301, from (/us/usc/t2/s4579/a/4/A 2018-05-09) and 2 other locations'

The citations_to() method returns a list, and two of the InboundReferences in this list have been enacted in three different locations.

>>> str(refs_to_definitions[0].locations[0])
'(/us/usc/t2/s60c-5/a/2/A 2013-07-18)'

When we pass an InboundReference to read(), the download client makes an Enactment from the most recent location where the citing provision has been enacted.

>>> str(client.read(refs_to_definitions[0]))
'/us/usc/t2/s4579/a/4/A (2018-05-09)'

If we need the Enactment representing the statutory text before it was moved and renumbered, we can pass one of the CitingProvisionLocation objects to the Client instead. Note that the Enactment we get this way has the same content text, but a different citation node, an earlier start date, and an earlier end date.

>>> citing_enactment_before_renumbering = client.read(refs_to_definitions[0].locations[0])
>>> str(citing_enactment_before_renumbering)
'/us/usc/t2/s60c-5/a/2/A (2013-07-18)'
>>> citing_enactment_before_renumbering.end_date
datetime.date(2014, 1, 16)