Legislative bills have always stirred the curiosity of citizens. As expected, citizens, journalists and researchers have shown interest in taxes, healthcare, education, and other bills that influence their lives or businesses. The birth of a law should be interesting enough, but it can be tough to follow. Going over the content of a bill is time consuming and tiring, due to their length and volume. Approximately 70,000 bills are considered by state legislatures each year, therefore trying to read the bills or find their author/source is quite difficult.
Fortunately, University of Chicago has a program for aspiring data scientists, called The Eric & Wendy Schmidt Data Science for Social Good Fellowship. This program offers opportunities for aspiring data scientists to work with governments and nonprofits in order to learn more from the experts, and get involved in data science projects. A group of researchers involved in the program have created a tool that’s meant to solve the time consuming issue of going over legislative bills. Called the “Legislative Influence Detector” (LID), this tool can investigate thousands of state bills to find matches between documents. It’s an advanced plagiarism detector but with a different purpose.
“LID helps watchdogs turn a mountain of text into digestible insights about the origin and diffusion of policy ideas and the real influence of various lobbying organizations,” said the researchers. “LID draws on more than 500,000 state bills (collected by the Sunlight Foundation) and 2,400 pieces of model legislation written by lobbyists (collected by us, ALEC Exposed, and other groups), searches for similarities, and flags them for review. LID users can then investigate the matches to look for possible lobbyist and special interest influence.”
The screenshot shows LID at work. On the left-hand side is text from Wisconsin Senate Bill 179 (2015), which bans most abortions past the 19th week of pregnancy. On the right-hand side, LID found and presented SB 179’s highest-ranked match, Louisiana Senate Bill 593 (2012). The highlighting shows that these text sections match each other almost perfectly. Where differences exist, they are usually misspellings like “neurodeveolopmental” or formatting differences like “16”/“sixteen”.
Going through mountains of bills manually is clearly not a time efficient option when you are trying to find legislative influence. There are other tools that can help, but they are not very effective, and most of them render certain parts of documents, not complete bills, which means that they limit your results. On the other hand, LID searches entire documents for matches and in just a few seconds it displays only results that are bill-related.
There are two important works that inspired the researchers who created LID: Tracing the Flow of Policy Ideas in Legislatures: A Text Reuse Approach – authored by Wilkerson, Smith, and Stramp (2015), and Konstantin: Capturing Business Power Across the States with Text Reuse – authored by Hertel-Fernandez and Kashin.
A healthy democracy means also a well-informed public that can understand legislature and get involved whenever it is needed. Government transparency is essential in this situation, and the program developed by the University of Chicago focuses on democratic values.
“Government transparency is key to democracy, as is the public’s ability to understand the true influences at work in the legislative systems charged with their representation and protection. LID shines a light in the dark places of the legislative process, adding transparency and accountability to state government,” concluded the research team.
How does the Legislative Influence Detector work?
Using this tool is as simple as it looks. The user picks a text that he wants to check, and then copies it in the input box. LID starts the scanning process and displays the most similar results. With highlighted text.
Researchers’ note: LID is not yet robust enough to handle significant public traffic. We hope to make the interactive tool available to the public in the coming months. In the meantime, we ran all the documents we have through LID, stored the matches in files, and made them available for download.
We use the Smith-Waterman local-alignment algorithm to find matching text across documents. This algorithm grabs pieces of text from each document and compares each word, adding points for matches and subtracting points for mismatches. Unfortunately, the local-alignment algorithm is too slow for large sets of text, such as ours. It could take the algorithm thousands of years to finish analyzing the legislation. We improved the speed of the analysis by first limiting the number of documents that need to be compared. Elasticsearch, our database of choice for this project, efficiently calculates Lucene scores. When we use LID to search for a document, it quickly compares our document against all others and grabs the 100 most similar documents as measured by their Lucene scores. Then we run the local-alignment algorithm on those 100.
Even if it’s not available for use yet, we can only admire this new tool developed by the researchers involved in The Eric & Wendy Schmidt Data Science for Social Good Fellowship. The “Legislative Influence Detector” simplifies and makes more accurate an important, but time consuming process that brings us closer to a healthy democracy.