Skip to content

Commit

Permalink
rename org to nonprofit (so we can use 'org' to refer to an organizat…
Browse files Browse the repository at this point in the history
…ion member is a user of)
  • Loading branch information
austinhallock committed Jul 1, 2020
1 parent 186287d commit 2708b35
Show file tree
Hide file tree
Showing 25 changed files with 234 additions and 231 deletions.
2 changes: 1 addition & 1 deletion LICENSE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2020 Austin Hallock
Copyright (c) 2020 TechBy

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
26 changes: 13 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Everything is just a route. This was an easy way for me to deploy to a bunch of
http://localhost:3000/loadAllForYear?year=YYYY (eg 2017)
- if you leave year blank, it'll pull in a small sample of eins from /data/sample_index.json

http://localhost:3000/processUnprocessedOrgs
http://localhost:3000/processUnprocessedNonprofits
- this will take a while
- if I remember right, I spun had at least 4 pods w/ 4 CPU each processing jobs for this

Expand All @@ -31,30 +31,30 @@ http://localhost:3000/processUnprocessedFunds (990PF)

http://localhost:3000/setNtee
- this will also take a while
- sets the ntee for every org
- sets the ntee for every nonprofit

http://localhost:3000/lastYearContributions
- this set a field in ES for how much the org/fund contributed to other orgs the prior year
- this set a field in ES for how much the nonprofit/fund contributed to other nonprofits the prior year
- this was used for following route (parseGrantMakingWebsites)

http://localhost:3000/parseGrantMakingWebsites
- this went through every grant-giving org that gave a decent amount of grants, and pulled in all keywords from their site, to allow searching by keywords (I was trying to find data-driven grant-giving websites)
- this went through every grant-giving nonprofit that gave a decent amount of grants, and pulled in all keywords from their site, to allow searching by keywords (I was trying to find data-driven grant-giving websites)
- you can tweak the fns for your own purpose

### Examples
Types: IrsOrg, IrsFund, IrsOrg990, IrsFund990, IrsPerson
Types: IrsNonprofit, IrsFund, IrsNonprofit990, IrsFund990, IrsPerson

Get schema for a type (IrsOrg for example) so you know all fields you can specify (or just look at the type.graphql files)
Get schema for a type (IrsNonprofit for example) so you know all fields you can specify (or just look at the type.graphql files)
```
{
"query": "{ __type(name: \"IrsOrg\") { name fields { name type { name kind ofType { name kind } } } } }"
"query": "{ __type(name: \"IrsNonprofit\") { name fields { name type { name kind ofType { name kind } } } } }"
}
```

Get an org
Get an nonprofit
```
POST http://localhost:3000/graphql {
"query": "query ($ein: String!) {irsOrg(ein: $ein) {ein, name, assets} }",
"query": "query ($ein: String!) {irsNonprofit(ein: $ein) {ein, name, assets} }",
"variables": { "ein": "586347523" }
}
```
Expand All @@ -67,10 +67,10 @@ POST http://localhost:3000/graphql {
}
```

Get 990s for an org
Get 990s for an nonprofit
```
POST http://localhost:3000/graphql {
"query": "query ($ein: String!) {irsOrg990s(ein: $ein) {ein, year} }",
"query": "query ($ein: String!) {irsNonprofit990s(ein: $ein) {ein, year} }",
"variables": { "ein": "586347523" }
}
```
Expand All @@ -84,7 +84,7 @@ POST http://localhost:3000/graphql {
}
```

Get all people at an org
Get all people at an nonprofit
```
POST http://localhost:3000/graphql {
"query": "query ($ein: String!) {irsPersons(ein: $ein) {ein, name, title, compensation} }",
Expand All @@ -96,7 +96,7 @@ If you're not familiar with elasticsearch's query DSL, [here's a good guide](htt

```
POST http://localhost:3000/graphql {
"query": "irsOrg990s(query:$query) {name, revenue { contributionsAndGrants }}",
"query": "irsNonprofit990s(query:$query) {name, revenue { contributionsAndGrants }}",
"variables": { "query": {"range": {"revenue.contributionsAndGrants": {"gte": 5000000}}} }
}
```
2 changes: 1 addition & 1 deletion graphql/irs_contribution/model.js
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ class IrsContributionModel extends Base {
toCity: 'text',
toState: 'text',
amount: 'bigint',
type: 'text', // org | person
type: 'text', // nonprofit | person
nteeMajor: { type: 'text', defaultFn () { return '?' } },
nteeMinor: { type: 'text', defaultFn () { return '?' } },
relationship: 'text',
Expand Down
6 changes: 4 additions & 2 deletions graphql/irs_org/model.js → graphql/irs_nonprofit/model.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import { Base, cknex } from 'backend-shared'

class IrsOrgModel extends Base {
// TODO: rename tables from org -> nonprofit

class IrsNonprofitModel extends Base {
getScyllaTables () {
return [
{
Expand Down Expand Up @@ -95,4 +97,4 @@ class IrsOrgModel extends Base {
}
}

export default new IrsOrgModel()
export default new IrsNonprofitModel()
51 changes: 51 additions & 0 deletions graphql/irs_nonprofit/resolvers.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
import _ from 'lodash'
import { GraphqlFormatter, Loader } from 'backend-shared'

import IrsNonprofit from './model.js'
import IrsNonprofit990 from '../irs_nonprofit_990/model.js'

const nonprofitLoader = Loader.withContext((eins, context) => {
return IrsNonprofit.getAllByEins(eins)
.then((irsNonprofits) => {
irsNonprofits = _.keyBy(irsNonprofits, 'ein')
return _.map(eins, ein => irsNonprofits[ein])
})
})

export default {
Query: {
irsNonprofit (rootValue, { ein }) {
return IrsNonprofit.getByEin(ein)
},

irsNonprofits (rootValue, { query, sort, limit }) {
return IrsNonprofit.search({ query, sort, limit })
.then(GraphqlFormatter.fromElasticsearch)
}
},

IrsNonprofit: {
async yearlyStats (irsNonprofit) {
let irs990s = await IrsNonprofit990.getAllByEin(irsNonprofit.ein)
irs990s = _.orderBy(irs990s, 'year')
return {
years: _.map(irs990s, irs990 => ({
year: irs990.year,
assets: irs990.assets?.eoy,
employeeCount: irs990.employeeCount,
volunteerCount: irs990.volunteerCount
}))
}
}
},

IrsPerson: {
async irsNonprofit (irsPerson, __, context) {
if (irsPerson.entityType === 'nonprofit') {
return await nonprofitLoader(context).load(irsPerson.ein)
} else {
return null
}
}
}
}
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
import { Base, cknex } from 'backend-shared'

class IrsOrg990Model extends Base {
class IrsNonprofit990Model extends Base {
getScyllaTables () {
return [
{
name: 'irs_org_990s_by_ein_and_year',
name: 'irs_nonprofit_990s_by_ein_and_year',
keyspace: 'irs_990_api',
fields: {
ein: 'text',
Expand Down Expand Up @@ -60,7 +60,7 @@ class IrsOrg990Model extends Base {
getElasticSearchIndices () {
return [
{
name: 'irs_org_990s',
name: 'irs_nonprofit_990s',
mappings: {
ein: { type: 'keyword' },
year: { type: 'integer' },
Expand Down Expand Up @@ -100,7 +100,7 @@ class IrsOrg990Model extends Base {

getAllByEin (ein) {
return cknex().select('*')
.from('irs_org_990s_by_ein_and_year')
.from('irs_nonprofit_990s_by_ein_and_year')
.where('ein', '=', ein)
// TODO: order with withClusteringOrderBy instead of this
.orderBy('year', 'desc')
Expand All @@ -109,4 +109,4 @@ class IrsOrg990Model extends Base {
}
}

export default new IrsOrg990Model()
export default new IrsNonprofit990Model()
24 changes: 24 additions & 0 deletions graphql/irs_nonprofit_990/resolvers.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import { GraphqlFormatter } from 'backend-shared'

import IrsNonprofit990 from './model.js'

export default {
Query: {
irsNonprofit990s (rootValue, { ein, query, limit }) {
if (ein) {
return IrsNonprofit990.getAllByEin(ein, { limit })
.then(GraphqlFormatter.fromScylla)
} else {
return IrsNonprofit990.search({ query, limit })
.then(GraphqlFormatter.fromElasticsearch)
}
}
},

IrsNonprofit: {
irsNonprofit990s (irsNonprofit, { limit }) {
return IrsNonprofit990.getAllByEin(irsNonprofit.ein, { limit })
.then(GraphqlFormatter.fromScylla)
}
}
}
File renamed without changes.
51 changes: 0 additions & 51 deletions graphql/irs_org/resolvers.js

This file was deleted.

24 changes: 0 additions & 24 deletions graphql/irs_org_990/resolvers.js

This file was deleted.

6 changes: 3 additions & 3 deletions graphql/irs_person/resolvers.js
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,9 @@ export default {
}
},

IrsOrg: {
irsPersons (irsOrg, { limit }) {
return IrsPerson.getAllByEin(irsOrg.ein, { limit })
IrsNonprofit: {
irsPersons (irsNonprofit, { limit }) {
return IrsPerson.getAllByEin(irsNonprofit.ein, { limit })
.then(IrsPerson.groupByYear)
.then(GraphqlFormatter.fromScylla)
}
Expand Down
24 changes: 12 additions & 12 deletions index.js
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@ import { setup, childSetup } from './services/setup.js'
import { setNtee } from './services/irs_990_importer/set_ntee.js'
import { loadAllForYear } from './services/irs_990_importer/load_all_for_year.js'
import {
processUnprocessedOrgs, processEin, fixBadFundImports, processUnprocessedFunds
processUnprocessedNonprofits, processEin, fixBadFundImports, processUnprocessedFunds
} from './services/irs_990_importer/index.js'
import { parseWebsitesByNtee } from './services/irs_990_importer/parse_websites.js'
import IrsOrg990 from './graphql/irs_org_990/model.js'
import IrsNonprofit990 from './graphql/irs_nonprofit_990/model.js'
import * as directives from './graphql/directives.js'
import config from './config.js'

Expand All @@ -45,7 +45,7 @@ app.use(bodyParser.urlencoded({ extended: true })) // Kiip uses
app.get('/', (req, res) => res.status(200).send('ok'))

const validTables = [
'irs_orgs', 'irs_org_990s', 'irs_funds', 'irs_fund_990s',
'irs_nonprofits', 'irs_nonprofit_990s', 'irs_funds', 'irs_fund_990s',
'irs_persons', 'irs_contributions'
]
app.get('/tableCount', function (req, res) {
Expand All @@ -59,7 +59,7 @@ app.get('/tableCount', function (req, res) {
})

app.get('/unprocessedCount', function (req, res) {
return IrsOrg990.search({
return IrsNonprofit990.search({
trackTotalHits: true,
limit: 1, // 16 cpus, 16 chunks
query: {
Expand Down Expand Up @@ -153,17 +153,17 @@ app.get('/loadAllForYear', function (req, res) {
// faster ES node seems to help a little, but not much...
// cheapest / best combo seems to be 4vcpu/8gb for ES, 8x 2vcpu/2gb for api.
// ^^ w/ 2 job concurrencyPerCpu, that's 32. 32 * 300 (chunk) = 9600 (limit)
// seems to be sweet spot w/ ~150-250 orgs/s (2-3 hours total)
// seems to be sweet spot w/ ~150-250 nonprofits/s (2-3 hours total)
// could probably go faster with more cpus (bottleneck at this point is irsx)
// might need to increase thread_pool.write.queue_size to 1000
app.get('/processUnprocessedOrgs', function (req, res) {
processUnprocessedOrgs(req.query)
return res.send('processing orgs')
app.get('/processUnprocessedNonprofits', function (req, res) {
processUnprocessedNonprofits(req.query)
return res.send('processing nonprofits')
})

app.get('/processEin', function (req, res) {
processEin(req.query.ein, { type: req.query.type })
return res.send('processing org')
return res.send('processing nonprofit')
})

app.get('/fixBadFundImports', function (req, res) {
Expand All @@ -172,8 +172,8 @@ app.get('/fixBadFundImports', function (req, res) {
})

// chunkConcurrency=10
// chunkConcurrency = how many orgs of a chunk to process simultaneously...
// doesn't matter for orgs, but for funds it does (since there's an es fetch)
// chunkConcurrency = how many nonprofits of a chunk to process simultaneously...
// doesn't matter for nonprofits, but for funds it does (since there's an es fetch)
// sweet spot is 1600&chunkSize=50&chunkConcurrency=3 (slow)
// even with that, scylla might fail upserts for large funds
// so maybe run at chunk 1 concurrency 1 for assets > 100m
Expand All @@ -199,7 +199,7 @@ const serverPromise = schemaPromise.then((schema) => {

const defaultQuery = `
query($query: ESQuery!) {
irsOrgs(query: $query) {
irsNonprofits(query: $query) {
nodes {
name
employeeCount
Expand Down
Loading

0 comments on commit 2708b35

Please sign in to comment.