PDAP Newsletter

Archive

We're training our computers to identify police data

Machine learning update

Disclaimer: we were interested in machine learning at least a week before the recent hype started. With that out of the way…

What if there were tens of thousands of criminal justice agencies, each with dozens of potentially useful data sources sprawled across a handful of websites and subdomains? Actually…this is not hypothetical. Finding useful gems on the internet is an ongoing project of ours. Humans are good at identifying whether a web page is about the police, whether it’s useful, and in which format it’s published. Can we teach a computer to do the same thing?

One of our creative volunteers used Common Crawl to generate a list of 50,000 URLs featuring potential data about the police. They also set up a text classification pipeline for adding labels. Another spectacular volunteer created a working machine learning model trained on these labels, and we’re starting to be able to identify URLs. Exciting stuff!

#21
March 13, 2023
Read more

Help needed with urgent data requests

Data requests in Michigan and Arkansas

We're spreading the word that we can help people access data, and we got two recent requests that could use some extra hands.

One is for investigating the results of calls for service from the Detroit Police Department, to help local advocates develop policy for the upcoming city budget negotiations. We'll need help with data collection and analysis.

The other is for data related to juvenile justice in five majority-Black counties in Southeast Arkansas to help evaluate the equity of the systems there. We're gearing up for a records requests push—if you can use a web browser, you can help!

#20
February 1, 2023
Read more

A new way to request police data

Happy new year! We're about halfway through our first grant period, and some of the seeds we've been planting are starting to sprout.

New: request data from PDAP

We published a new form for requesting data. If you have a question about your local police system, give it a try. We have been informally connecting our staff and volunteers with projects in need of data assistance, doing work like locating records and contributing data analysis code. Now, we'll track these requests in a database and try to fulfill as many as we can. Your project could be next!

If you're in Discord and you gave yourself a role in the #welcome channel, you'll start getting pinged as we get more requests. If you'd like to be notified when we need help with scraping, or to find data in your area, reply to this email.

#19
January 10, 2023
Read more

New development: Use cases for call records

We're often asked how we think all this data about the police will be used. The answer is that it's being used every day, but there's a steep learning curve for finding and accessing it...that's where we come in!

Using calls for service to study police activity

Across the country, communities are reimagining public safety where they live. One of the first steps of this process is understanding what kind of work we ask police to do when we call them. Who shows up to help when people contact 911 about a mental health issue? How about a found animal, or an abandoned vehicle?

Some places publish a description of these calls somewhere on the internet. Here are some we know about.

#18
November 22, 2022
Read more

PDAP Newsletter | Early progress!

Hi there,

Thanks for subscribing to email updates from the Police Data Accessibility Project!

An update

Since I last wrote in August, we’ve made a working prototype and real progress toward the first of our three major milestones.

#17
October 11, 2022
Read more

PDAP Newsletter | Open community call this Tuesday!

Hi there,

Thanks for subscribing to email updates from the Police Data Accessibility Project!

Most timely: we’re having our first Community Call since we took on full-time staff, this Tuesday the 23rd at 2pm ET. It’s held in our Discord, and you can RSVP at the event here. We’ll ask people to share what they see in the police data ecosystem, and give our own updates. I hope to see you there!

If you’ve never used Discord, have no fear: it’s a chat forum like Slack but more fitting for our particular community. We’ll all join a group audio channel, where people can listen in or participate in the interactive elements!

#16
August 20, 2022
Read more

PDAP Newsletter | First Employees

As planned, we're using the money from our first grant award to cover two full-time salaries! For the first time, we will have a small full-time staff dedicated to bringing the PDAP vision of a transparent police system a reality.

Who we hired

Josh Chamberlain has been volunteering with PDAP for about a year and a half, using their design background to crystallize what PDAP is building and why. They also do Executive Director things, like keeping PDAP legally established and looking for funding.

Jacob Quinn Sanders now has responsibility for our code, working with the PDAP community to create the best possible open-source utilities for accessing police data. He has a fantastic background as a news reporter and editor, often working on the municipal data problems we're trying to solve here. He has since established himself as a programmer, data expert, and all around great fit for the role at PDAP.

#14
July 26, 2022
Read more

PDAP Newsletter | Big grant news!

Hi there,

Thanks for subscribing for email updates from the Police Data Accessibility Project!

I write with good news: we received a grant of $250,000 from The Heinz Endowments! This year’s arc is fundamentally changed—we’re closer than ever to making a positive impact on our systems and communities using better police data. You can find details in our blog post here.

The ask: this grant will take us to new heights, but doesn’t cover our whole budget. Contribute here to help us cover things like legal fees, grant writing help, and infrastructure. Help us turn volunteers into paid employees! We’re tiny, and just getting started, so a little goes a long way. If you can’t donate, consider sharing this with a friend.

#15
May 20, 2022
Read more

PDAP Newsletter | First Grant Awarded

🎉🎉🎉

On May 12, we received some excellent news: we've been awarded a grant of $250,000 by The Heinz Endowments to make progress on our app. Until now, most work on PDAP (except for Data Bounties) has been done pro bono by a small cast of folks. We've been lucky to receive help from talented, generous volunteers, and we're proud of our steady progress.

The short version: we get to hire staff to work on PDAP.

Grant details

#13
May 17, 2022
Read more

PDAP Newsletter | New Year Recap

Happy New Year!

In brief, the story is this: we have a small group of people making excellent progress toward our mission—but this is a big project. If you're reading this, you can help us.

What happened in 2021?

  • We added dozens of new Scrapers to the repo.

  • We achieved non-profit status, opening the door for funding and grant opportunities.

  • We ran our first DoltHub bounty, making real progress on our database of police Datasets.

  • We launched a simple app to help contributors write new Scrapers.

    • We started making Instagram posts. Go follow us!

  • We had dozens of open working sessions. Some of them were quiet, with one or two people making progress. Some of them were brainstorms, where people disagreed and learned from each other. These happen in Discord, and you should join one! Look for the calendar icon and "Events" in the upper left.

#12
January 2, 2022
Read more

PDAP Newsletter | 501c3 Approval

We did it!

In January, we filed for 501c3 status with the IRS. On August 14, 2021, it was accepted! Police Data Accessibility Project Inc is now a registered nonprofit. Here's our guidestar profile.

The biggest change is that your donations may be tax deductible!

What's next?

#11
September 18, 2021
Read more

PDAP Newsletter | Bounty Retro

Background

Our first Data Bounty with DoltHub was intended to give data scrapers a cash reward for their submissions to public data. They funded and supported the endeavor—thank you Tim, Katie, and the rest of the DoltHub team for giving our project this huge leap forward!

Gains

Our goal was to complete our Agencies table, which is a critical piece of infrastructure. It allows us to draw a line directly from a department's website to the processed, accessible police data that was collected from it.

#10
July 14, 2021
Read more

PDAP Newsletter | Slack → Discord

For a variety of reasons, we have decided to archive our old Slack channel and merge with our existing Discord! There was no need to have two messaging channels for the same project, this consolidates into one. While some Slack features—such as threads—will be missed, the much-more-used Discord will be a fantastic new home. We will be spending the next few days making adjustments to make the community more welcoming and help guide new users through this new experience!

#9
June 7, 2021
Read more

PDAP Newsletter | Bounty Update 4

  • Our first dolt bounty PR for agencies has been accepted with over 68,000 edits!!

  • Katie (Dolt) has a bash script that currently that clones the repo and will check for whitespaces, duplicates and some general domain keyword searching to ensure someone is not passing a fake URL just to get an edit count.

    • She will turn it into a python script with unit-testing, and also fledge it out for checking datasets as well. This script will be useful for us also as we can use it as part of our own pipeline for PR verification in the future!

#7
June 2, 2021
Read more

PDAP Newsletter | Bounty Update 3

  • Still waiting for when we actually start merging PRs for the bounty, Katie (Dolt) has a script she will use for verifying data integrity of bounty PRs (see edit at bottom). If good, PRs will start being merged tomorrow!

  • At the time of writing this post, we have 8 open PRs for the bounty that reflect:

    • over 50,000 row updates on the agencies table (populating lat/lng, city, zip, fips & homepage_url)!

    • 111 new datasets!

    • 4 new data types!

  • The Dolt team is aware of a bug that prevents NULL from being inserted in the csv import and are looking into it now (see attached)

  • As mentioned in the above thread, I removed the UNIQUE constraint on URLs

Not bounty related:

  • We have a few bounty participants that are very interested in actually loading the data and is hoping for another bounty for data-intake

  • One participant, Alexis, has a dolthub repo here with data she has scraped from the FBI for missing persons / wanted persons along with source code here! An excellent addition that we will look into!

#6
June 1, 2021
Read more

PDAP Newsletter | Dolt → PostgreSQL

ichard was hard at work and created a foreign data wrapper (FDW) to access a cloned dolt instance and copy the data into our own PostgreSQL instance that our applications run on.

The current implementation is that Dolt fires information over a webhook when a branch is pushed / merged. We can use this information to see exactly when a PR is merged into master and trigger a dolt pull and restart of our dolt sql-server instance. Then, a stored procedure will activate to load the new data into our PostgreSQL instance so we always have a stable, up-to-date, local copy.

#6
May 28, 2021
Read more

PDAP Newsletter | Bounty Update 2

Our bounty is off to an exciting start! We already have one PR where someone has essentially filled out the entire agencies table (14,711 rows modified!) with city, zip, fips, lat, lng, and homepage_url. We are continuing to work with the Dolt team and will soon be accepting the data into our master branch!

#5
May 28, 2021
Read more

PDAP Newsletter | Dolt Bounty Start

As of 13:31 (PST) / 16:31 (EST), our Dolt Bounty is officially live! Anyone who adds data into the datasets table will get $0.33 per row (maximum cap is $5,000 for the entire bounty). The bounty is running until July 7th @ 1500 (PST) / 1800 (EST)! You can find more information here about the bounty.

  • Only Dolt bounty related PRs can be approved (so we cannot make our own commits or PRs to master for the next 6 weeks)

  • Only Katie (Dolt Team) can make the PR approvals on the pdap/datasets repo during the active bounty (she needs to use a special internal framework to attribute credit to the bounties)

  • If we to do any schema changes or PRs, we will have to create a separate branch and hold all the changes there. It cannot be merged into master until the bounty is over

  • It will be very beneficial to have us monitor the #data-bounties channel in their Discord. If they have any questions about our dataset, it will help us understand how to improve.

#4
May 27, 2021
Read more

PDAP Newsletter | ETL Prototype

The GitHub PR is here.

The data that was loaded is from the USA/CA/butte_county/college/chico scraper. I chose this as a starting point because it has two different types of data, with two differing formats. This allowed me to verify the library reads from the schema.json properly and can load and map no matter the data output. You can find that PR here. It also created 2 new datasets here.

We currently have it set to not auto-commit so someone reviews before committing each time. But it does load data from files, use the schema.json file and (mostly) works!

Current Process:

#3
May 26, 2021
Read more

PDAP Newsletter | DoltHub Bounty

Volunteers gather or "scrape" data for PDAP, but we're running a 6-week data bounty generously sponsored by Dolt. The bounty pays people to gather traceable, approved data. Since we can focus effort by putting a price on specific data, we're able to get a massive head start on our Dataset Catalogue.

From there, we'll be able to scale and consolidate our scraping efforts into an app that can be run to gather data from any dataset. Every time a scraper is run, we add more data to the database!

The current bounty will get us a list of URLs for thousands of police agencies on our Agencies and Datasets tables. Read more about it here.

#2
May 20, 2021
Read more

PDAP Newsletter | Alpha App Launched

We published a Django app that shows a map of the US and when you click a state, it will show all the agencies on the map. Right now, it connects to a PostgreSQL instance using static Dolt data. Future plans:

  • getting a sync from dolt for the data

  • fleding it out to show more information about the current statuses / datasets for each agnecy

  • having a way to download the data easily

  • spruce up the intake tools to aid in importing data into Dolt

  • spruce up the UI (probably align with our Gatsby front-end framework)

#1
May 7, 2021
Read more
Brought to you by Buttondown, the easiest way to start and grow your newsletter.