-
-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate integrating SEC 10-K EX-21 subsidiary owner info with PUDL #2225
Comments
Do let me know when you attempt to process Constellation Energy Corp's 2022 report from whatever sources you can find.
Either they know how to deliver gas that burns 22% more efficiently than EPA knows about, or ??? Will be very interested to see how this ties to other disclosures and/or mass/energy/volume identities. |
Hokay. here is some learnings from a preliminary investigation: the original PDF formsFrom what i can tell, the 10k filer report all of their subsidiaries and the percentage ownership for each of those subsidiaries. seems pretty good! Corpwatch Accessit's pretty simple! they have bulk downloads of both CSV and MySQL versions. I've only looked into the CSVs, but it seems like they both have the same structure. There is also api access! Which i assume would be nice for more one-off questions. Columns shared with EIA
This is actually a fair amount of information for a record linkage since location is so specific. Data CoverageIt looks like some tables go back to 2001 but most go back to 2003 👍🏻 The one bummer in here is that it doesn't seem like the ownership % is reported in this corpwatch bulk download :-( |
That is really too bad about the percentage ownership not being available. 😢 Looking at the CorpWatch API GitHub repo, it seems like most of the codebase is like 13+ years old, and written in Perl. I wonder how far automated PDF parsing has come since then, and how hard it would be to re-write this system to use e.g. Layout Parser (GitHub repo) which @TrentonBush and @katie-lamb have been talking about for that other project. Especially if there's a well defined document structure that's shared across all of the EX-21 forms? Edit: of course there is no standard document layout. That would be too easy. A few examples: |
There's also TabulaPDF, but it hasn't made a release since 2018. |
The Corpwatch DB looks like it tracks all the companies -- both parents and subsidiaries -- and counts up how many parents each subsidiary has, and how many subsidiaries each parent has, so even without the actual percentages there's some measure of how concentrated ownership is. It might be that in most cases the utility subsidiaries are wholly owned, so you could assume it's just 100%. Big caveats on the data quality though -- it might be that misspellings or name changes mean that the counts of parents / subsidiaries are off, since they aren't using unique universal IDs. Depending on what fraction of these relationships are simple for the electric utilities, it might be tractable to try and fill in the missing ones by hand. Or just the number of owners of a subsidiary could be taken as a gross estimate of how distributed the ownership is (assume each of 2 owners owns 50%, each of 3 owners owns 33% etc.) Wouldn't be great, but would be a lot easier than trying to parse out the wide variety of tables with nesting relationships that respondents are using to report this information. |
@jrea-rmi do you want to talk about what we might do next? |
FWIW, there are PRs for Pint (hgrecco/pint#1615) and a Pint-Pandas branch (https://github.com/MichaelTiemannOSC/pint-pandas/tree/ducks-unlimited) that make it possible to use uncertainties, so you can make them 50/50 +/- 50%. |
In our reverse engineering of ERM with EIA860 ownership information, we see that ~90% of power plant owners are wholly owned by a single parent. About a third of the multiple owner cases are 50-50 ownership. utility_id_eia_to_parent_name.csv In our use of these relationships, we do need the fraction owned so we can calculate the amount of generation and emissions attributable to a ticker. 90% is pretty good, but I definitely don't want to go backwards with data quality from what we already have. The main benefits I see in CorpWatch are:
Overall at the moment, it'd be amazing to have a source I'm not convinced that the data quality or coverage will be good enough that we'll use it. It's hard to guess what that will be until record linkages have been attempted. So if you're feeling optimistic then I'm interested in seeing what comes of your attempt, but if not then maybe this stays on the backburner... |
A summary comment: I'm very curious, but the lack of % ownership is an issue and I'd want to understand the coverage after record linkage better (measured by the fraction of capacity/generation/emissions covered) and if there are gaps in coverage, would Catalyst be willing to maintain overrides/additions to sub-parent ownership information? |
If the data you currently have indicates that 90% of the owners that show up in EIA have a single parent, and we can confirm that the Corpwatch data agrees (based on linking SEC companies to EIA companies), it seems like checking the remaining 10% with multiple owners manually might be tractable. It would be several hundred records that need to be updated once a year right? So it seems like the potential weak point is the record linkage between the EIA-860 utilities and the SEC companies. The location data is an extremely specific discriminator, and so potentially very helpful in doing the linkage, but it seems like it's probably a legal address (HQ, incorporation address of record, mailing address, etc), rather than a permanent physical location like we have with the plants, so we might get a significant number of false negatives -- where different addresses have been reported to SEC vs. EIA for some reason. Many record linkage systems are designed with deduplicating addresses in mind, so they should work well when the reported addresses represent the same place even if they're not totally identical. I don't think we'll know to what extent the addresses (or names) being reported to the two agencies are just flat out different without trying to do the linkage. But @katie-lamb and @cmgosnell have gotten good at doing initial drafts of the record linkages. |
yes, around 500 records to check per year |
Many utilities identified in FERC, EIA, and EPA data are subsidiaries owned by some larger utility holding company. Understanding these ownership relationships can be helpful in understanding the economic incentives of utilities, e.g. which electric utilities are intimately linked to natural gas companies through being owned by the same parent.
@MichaelTiemannOSC pointed me at CorpWatch which has scraped data describing these relationships from the SEC 10-K filings, where it shows up in Exhibit 21, aka EX-21. They make the data freely available for download in bulk or via a RESTful API. The 10-K is reported in XBRL, so it might be possible to download the bulk data directly from SEC and create our own database.
The text was updated successfully, but these errors were encountered: