You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 3, 2022. It is now read-only.
political ads can have many different purposes, including
listbuilding: finding potential supporters and getting their contact info, so you can claim them as supporters and also so you can ask them for money
fundraising: asking people -- probably people who you already know are your supporters or else people you think are reasonably likely to support you -- for money
mobilization: asking people -- probably people who you already know are your supporters or else people you think are reasonably likely to support you -- to do stuff, like vote early or volunteer
persuasion: communicating to people -- who are not your supporters but who probably aren't your opponent's supporters either -- about specifically-chosen issues/messages to persuade them to vote for you (or at least to not vote for your opponent)
(I realize this is a somewhat simplified ontology. Ideas on how to come up with -- and operationalize -- a different ontology are totally welcome.)
It'd be amazing to come up with a machine learning model that could come up with a decent guess as to which category a given political ad falls into. You might be able to figure this out just from the text of the ad. (In a perfect world, we could also extract interesting features from the ad images/video, but that's out of scope.)
I can talk endlessly about this idea. Let me know if you're interested. Reply here or email me at jeremy dot merrill at propublica dot org.
The text was updated successfully, but these errors were encountered:
Hi @yinleon, thanks for your interest! We have about 54,000 ads; you can download them here. That page has the schema too. The text content of the ads (message) is probably the most predictive, but the targeting methods (parsed into targets; raw from Facebook in targetings) and any links in the raw html content of the ad (body) might also be predictive.
We have an image from each ad (either the main image or a still from the video). We don't have any data extracted from the images, whether by image recognition, text OCR or anything like that. There's likely-predictive data in here: often listbuilding ads contain a "survey" (e.g. this one) that's not actually collecting any data other than email addresses.
The biggest problem is that we don't have a labeled subset for training. The dataset is unbalanced; it's mostly fundraising and listbuilding ads, with fewer persuasive and mobilization ads.
Would love to hear your thoughts! I'm always looking to hear from folks with more experience doing ML... Let me know if you have more questions about the dataset or about my ontology.
Just for recordkeeping, here's an example of a mobilization ad: https://projects.propublica.org/facebook-ads/ad/23842873784130638. Danny O'Connor, a Dem special election candidate for US House in OH-12 is asking a custom audience to check his list of changed precincts for the election.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
political ads can have many different purposes, including
(I realize this is a somewhat simplified ontology. Ideas on how to come up with -- and operationalize -- a different ontology are totally welcome.)
It'd be amazing to come up with a machine learning model that could come up with a decent guess as to which category a given political ad falls into. You might be able to figure this out just from the text of the ad. (In a perfect world, we could also extract interesting features from the ad images/video, but that's out of scope.)
I can talk endlessly about this idea. Let me know if you're interested. Reply here or email me at jeremy dot merrill at propublica dot org.
The text was updated successfully, but these errors were encountered: