22 million public comments on proposed federal rules are now openly accessible via the Registry of Open Data on AWS

BETHLEHEM, PA – October 22, 2024 – Today, Moravian University announced that the Mirrulations dataset is now openly accessible on the Amazon Web Services (AWS) cloud.

The Mirrulations project began as a summer research project with Professor Ben Coleman and six undergraduate computer science majors in collaboration with Fred Trotter from CareSet. For the past six years, 81 additional Moravian University students have worked on the project as part of CSCI 334 System Design and Implementation, the senior capstone experience for the computer science major. Pete Lega (’85), Michael Turnbach (’17), and Jason Victor (not affiliated with Moravian) have also advised on the project.

The research community can now access Mirrulations on AWS without needing to pay to store their own copies of the dataset. Researchers will only pay for the computing services they use, and do not need to purchase storage to start a project using the dataset. AWS, through its Open Data Sponsorship Program, is covering the costs of the storage and transfer of the data, so that it can be accessed and analyzed in the cloud by researchers around the world.

“This data is valuable for data journalism. As a public record of the rule-making process, it contains insights into the workings of federal agencies,” said Ben Coleman, professor of computer science at Moravian University. “This dataset provides a single text corpus of the federal rule-making process and allows researchers to understand who has influenced this critically important process.”

The regulations.gov website allows users to view proposed rules and supporting documents for the federal rule-making process. In addition, users can post and view comments about those proposed rules. The site contains about 27 million pieces of text and binary data, but the API that provides access only allows a user to obtain one thousand items per hour. As a result, it would take approximately 3 years to download all the data.

Mirrulations (MIRRor of regULATIONS.gov) is a system that uses a collection of donated API keys to create a mirror of the data. In addition, for each pdf in the dataset the system extracts available text and stores it as a separate text file. It has successfully downloaded all historic data and continues to run and collect data each hour.

The Mirrulations project was inspired by data journalist Fred Trotter at CareSet and implemented by Professor Ben Coleman and his students at Moravian University. CareSet performs data journalism and analytics to help make the healthcare system more transparent. They advocate for open and accessible healthcare data to improve patient outcomes. Moravian University is the nation’s sixth-oldest university, located in Bethlehem, Pennsylvania, and offers undergraduate and graduate degrees that blend liberal arts with professional programs.

The AWS Open Data Sponsorship Program covers the cost of storage and egress for publicly available, high-value, cloud-optimized datasets. AWS works with data providers to democratize access to data by making it available for analysis on AWS; develop new cloud-native techniques, formats, and tools that lower the cost of working with data; and encourage the development of communities that benefit from access to shared datasets. Through the program, AWS has democratized access to petabytes of data, including satellite imagery, climate and weather data, genomic data, and data used for natural language processing. The full list of publicly available datasets is available on the Registry of Open Data on AWS.

To learn more and access the Mirrulations dataset, visit https://registry.opendata.aws/mirrulations/.

Moravian University
Michael Corr
Assistant Vice President of Marketing and Communications
Email: corrm@moravian.edu
Ph: 610.861.1365

About Moravian University
Moravian University, located in Bethlehem, Pennsylvania, is the nation’s sixth-oldest university and offers undergraduate and graduate degrees and certificates. For more than 280 years, the university has been preparing students for reflective lives, fulfilling careers, and transformative leadership in a world of change. Moravian University is a member of The New American Colleges and Universities (NACU), a national consortium of private comprehensive colleges and universities working together to graduate extraordinary professionals for a global workforce and society.

In 2024, Moravian University became part of a World Heritage designation when Moravian Church Settlements — Bethlehem, Gracehill (Northern Ireland, UK), and Herrnhut (Saxony, Germany) joined Christiansfeld (Denmark) as a single World Heritage Site—Moravian Church Settlements. Moravian is just the second university in the United States to be part of a UNESCO World Heritage Site designation and only the eighth university in the world to have this recognition.

Visit moravian.edu to learn more about how Moravian University prepares its students for lifelong success.

###