-
Notifications
You must be signed in to change notification settings - Fork 1
Product: KPDB
- The Known Projects DB contains information on future housing starts. It gathers data from 9-11 data sources (depending on use case, see below), compares the information in each data source to ensure that units existing in multiple sources are not counted more than once, and then aggregates these sources. It incorporates granular information at the project-level, including project statuses, estimated project phasing, and borough planner input on projects where available.
- The Known Projects DB presents housing information at the project-level. Therefore, it can be used for growth planning citywide as well as granular looks at neighborhood-level expected growth.
- The Known Projects DB is currently used to create the School Construction Authority's (SCA) Housing Pipeline.
Disclaimer: This information does NOT represent a housing projection produced by DCP, nor can DCP attest to the certainty that each of these developments will lead to future housing starts.
The Known Projects DB does not represent future as-of-right growth nor future growth from projects that have not yet materialized in the data sources below.
- DCP Housing Developments Database - This database is created by DCP using DOB permit and certificate of occupany data. It includes permits and applications for new buildings, building alterations, and demolitions.
- HPD Projected Closings - This is confidential data on HPD's pipeline of closings on privately-owned sites. Provided by HPD.
- HPD Request for Proposals - City-owned sites for which HPD has issued or designated RFPs for residential development. Provided by HPD.
- EDC Projected Projects - Projected projects expected by EDC to develop residential units with reasonable certainty. Provided by EDC.
- DCP Applications- Discretionary actions through DP that facilitate residential development. This data is generated by DCP. Several processing steps are required to identify projects facilitating residential development.
- Empire State Development Projected Projects- Known residential development projects by Empire State Development. These projects are collected by DCP.
- Neighborhood study affordable housing commitments- Affordable housing commitments made by the current administration in neighborhood rezoning areas. These are collected from each adopted neighborhood study's Points of Agreement.
- Future City-Sponsored RFPs/RFEIs- Additional future Request for Proposals or Requests for Expressions of Interest for affordable housing provided by City Hall. Note that many of these can be highly speculative conversations with developers, and not comprehensively provided by DCP planners.
- DCP Planner-Added Projects- Additional projects identified by the Department of City Planning borough planners which are not yet in the above data sources. Note that many of these can be highly speculative conversations with developers, and not comprehensively provided by DCP planners.
- Neighborhood study projected developments- The count of units each adopted neighborhood study from the current administration is expected to develop. These projections are not site-specific, and are highly speculative. These projects should not necessarily be included for planning purposes, depending on need.
- Future neighborhood studies- The count of units each future neighborhood study is projected to develop. These projections are not site-specific, and are highly speculative. Because these rezonings have not yet been adopted, we include a certainty discount factor, and we do not deduplicate. These projects should not necessarily be included for planning purposes.
Field | Long form Name | Definition |
---|---|---|
project_id |
Project ID | ID the groups records together into a single project |
source |
Source | The name of the dataset the project came from |
record_id |
Record ID | The ID of the project from the source dataset |
record_name |
Record name | The name of the project as provided by the source |
borough |
Borough | The borough the project is located within |
status |
Status | The current phase of the project |
type |
Type | The type of project. For DOB jobs this would be NB, Alt, etc. |
date |
Date | The date of the project. |
date_type |
Date Type | The type of date for the project based on the project source (Build Year, Certified Referred, Date Filed, Date Permitted, Effective Date, Month Closed, Projected Fiscal, Year Range) |
units_gross |
Units gross | The proposed number of units associated with the project as reported in the source data |
units_net |
Units net | The number of future units that should be counted as part of the project after removing completed DOB units and deduplication |
prop_within_5_years |
Proportion built within 5 years | Proportion of units that should be completed within the next 5 years |
prop_5_to_10_years |
Proportion built within 5-10 years | Proportion of units that should be completed within the next 10 years |
prop_after_10_years |
Proportion built after 10 years | Proportion of units that should be completed sometime after 10 years from this year |
within_5_years |
Units built within 5 years | The number of units that should be completed within the next 5 years |
from_5_to_10_years |
Units built within 5-10 years | The number of units that should be completed within the next 10 years |
after_10_years |
Units built after 10 years | The number of units that should be completed sometime after 10 years from this year |
phasing_rationale |
Phasing assumption rational | The planner’s rationale for the timeline of a project |
phasing_known |
Phasing: Assumed or Known | Boolean. True if the phasing assumptions are known |
nycha |
NYCHA | Boolean. True if this is at least partially a NYCHA project |
classb |
Group Quarters | Boolean. True if this is a project with Class B units (can be partial) |
senior_housing |
Senior Housing | Boolean. True if this is at least partially a Senior Housing project |
inactive |
Inactive | Flag to indicate if the project is inactive or not. If we do not include DOB inactive jobs we will exclude this field |
geom |
geometry | geometry(Geometry,4326) |
The following source tables are pre-processed and loaded from DigitalOcean. Raw data for all non-ZAP sources, along with their preprocessed versions, are available in the private repo db-knownprojects-data.
dcp_projects
:dcp_projectactions
:dcp_projectbbls
:
-
dcp_housing
: A copy of the housing database -
dcp_knownprojects
: KPDB from previous release -
dcp_n_study
: Neighborhood study rezoning commitment input data -
dcp_n_study_future
: Input data for future neighborhood studies -
dcp_n_study_projected
: Input data for Neighborhood Study Projected Development Sites -
dcp_planneradded
: Contains projects manually identified by planners. For more information, see the planner-added project instructions. -
edc_projects
: EDC projected projects source data -
esd_projects
: Empire State Developments projected projects source data -
hpd_pc
: HPD projected closings source data -
hpd_rfp
: HPD Requests For Proposals source data. Has one record per BBL.
-
dcp_application
: Pulls fields from the three ZAP input tables to create all fields necessary for KPDB. Also includes several flags to aid in the ZAP review process. Zap records are excluded if they have a status of "record closed," "terminated," or "withdrawn." Includes projects from after 2010 or from an unspecified date. Projects with a clear indication of residential components are flagged for review. Other flags indicate whether a project existed in the previous version of KPDB, and whether the project was explicitly added/removed via thecorrections_zap
table. -
dcp_housing_poly
: Housing database with formatted dates, BBL geometries, phasing, and boolean flags. Created in dcp_housing.sql.- Excludes demolitions and withdrawn jobs.
- For class A, exludes jobs with no residential unit change, alterations removing units, and jobs where the proposed residential units are 0.
- For class B, and record with positive initial or proposed units are included. These records only provide context for DOB match review, and get excluded from the final KPDB output.
- Polygon geometries come from a join with mappluto on BBL, primarily, and a spatial join secondarily.
-
combined
: Contains a combination of input data with a shared schema. For several data sources, this involves joining on BBL with dcp_mappluto_wi to get lot-level polygon geometry. For some input datasets, a rows get collapsed to the level of a project. In these cases, the unique IDs for input data rows get stored in the fieldrecord_id_input
as an array. If no collapsing was necessary (i.e. each project a single record in the a given source data table), therecord_id_input
array only contains therecord_id
. In cases where there is no unique project-level ID, therecord_id
gets assigned from a hash of theuids
inrecord_id_input
. This table contains standardized of statuses, initial phasing assumptions, and boolean flags. Created in combine.sql. -
_project_record_ids
: This is the initial table of project inputs (groups of record_ids that refer to the same project). They are identified by finding spatial overlaps between non-DOB records, forming arrays of individualrecord_ids
, calledproject_record_ids
. Two of the neighborhood study sources are not included, as units from these sources do not deduplicate with other sources. The table is initially created in _project_record_ids.sql. If there are corrections incorrections_project
, these get applied to moverecord_ids
between projects after the initial identification of spatial overlaps in correct_projects.sql. This table then gets used to createreview_project
(see below) for review. -
project_record_ids
: This is the final table ofproject_record_ids
. It includes everything in_project_record_ids
, with DOB records added. Thedcp_housing
record_ids
get added to the arrays in_project_record_ids
, as long as they haven't been tagged for removal incorrections_dob
. Additional DOB to non-DOB matches get added if they're flagged asadd
in thecorrection_dob
table. This table also includes stand-alone projects (an array consisting of a singlerecord_id
). These includedcp_housing
records andcombined
records that did not match with other records in the two stages of identifying spatial overlaps. Created in project_record_ids.sql -
deduped_units
: This table contains results from the automatic deduplication of units in dedup_units.py. Units get subtracted within a project based on the following hierarchy, where units from lower-number sources get subtracted from units with higher-number sources.hierarchy = { "DOB": 1, "HPD Projected Closings": 2, "HPD RFPs": 3, "EDC Projected Projects": 4, "DCP Application": 5, "Empire State Development Projected Projects": 6, "Neighborhood Study Rezoning Commitments": 7, "Neighborhood Study Projected Development Sites": 8, "DCP Planner-Added Projects": 9, }
-
_kpdb
: Contains all records fromcombined
except for DOB records containing only class B units. Joined onto these records are the deduplicatedunits_net
fromdeduped_units
. This field gets multiplied by phasing proportions to calculate phased unit counts. Created in create_kpdb.sql.
-
review_project
: Contains records fromcombined
, excluding those from sources DOB, Neighborhood Study Rezoning Commitments, and Future Neighborhood Studies. Joined onto these records are theproject_record_ids
identified in the table_project_record_ids
. This table has a boolean flag indicating cases where records are part of a multi-record project (identified from spatial overlap). Also includes a boolean flag indicating missing geometry. This is because geometries to not get exported in the CSV versions of this table. Created in review_project.sql. -
_review_dob
: Contains spatially-overlapping DOB and non-DOB records fromcombined
(excluding those from Neighborhood Study Rezoning Commitments and Future Neighborhood Studies). In cases where the non-DOB record is part of a multi-record project, all other records associated with that project are also included. There are several boolean flags in this table to aid in review. The fielddob_multimatch
is 1 for DOB records that overlap with more than one project. All non-DOB records in these projects get flagged withproject_has_dob_multi
. Created in review_dob.sql. -
review_dob
: This is the same as_review_dob
, with the array values transformed to strings for export. -
corrections_applied
: Contains all successfully-applied corrections fromcorrections_main
-
corrections_not_applied
: Contains corrections fromcorrections_main
that could not get applied, either because the field wasn't valid or the old_value did not match any existing records.
-
corrections_zap
: Corrections table for adding and removing ZAP records from KPDB. This gets downloaded from the For more information, see the ZAP review instructions. -
corrections_main
: Corrections table for modifying any value other thanunits_net
. This table is intended for all corrections that do not rely on the relationships between records. For more information, see main review instructions. -
corrections_project
: Corrections table for modifying which records refer to the same project. The default is to assume that spatially overlapping records refer to the same project. For more information, see the project review instructions. -
corrections_dob
: Corrections table either adding or removing matches between DOB and non-DOB records. This table is necessary because a DOB record cannot be associated with more than one project. For more information, see the DOB review instructions.
-
kpdb
: Final output of KPDB, created in_export.sql
. Only difference from_kpdb
is which fields are included.
This is the process to identify related projects from different source tables using spatial-overlap. Project relations are created in _project_record_ids.sql
.
Note:
DOB
,Neighborhood Study Rezoning Commitments
,Future Neighborhood Studies
are excluded from the spatial-overlap process and will be added later
The corrections for Non-DOB records can be recorded in the corrections_project.csv/corrections found in the db-knownprojects-data repo. The files needed to be reviewed in the KPDB output are:
-
review_project.shp, review_project.csv
taking a close look at thedummy_id
which refers to the cluster ID of the projects and therecord_id
which refers to the record that will need to be reassigned or combined. -
corrections_project.csv
is the file in which any necessary corrections are updated and applied in the pipeline.
The purpose of the manual cluster reviews is to identify records from the multiple datasources are indeed referring to the same project and complimentary in their information on the same project. Or is the spatial overlaps between the records simply coincidental and therefore should be separated into different clusters or thereby creating new clusters by themselves. In the second case, we call this a reassignment of clusters has occurred. In some cases, the geometry overlapping is due to an incorrect geometry or bbl. Then an additional step of correcting those geometry will take place by either correcting the source files or the record will need to be updated in ZAP. The cluster review is a crucial step in which reviewers should ensure as a later in KPDB process deduplication of the units to avoid over-counting of units.
Redundant information should also be removed in this process. Possible redundancy includes but not limited to 1. Planner added projects now an official application is filed. 2. Multiple DCP application coexists with the same project and often the most updated record should be removed in this case.
- Remove projects that do NOT have any residential component. e.g. parking rezoning.
- the overlapping geometry may be a larger rezoning that envelopes multiple independent projects. In this case, it is determined that projects should be separated therefore 'declustered'.
record_id | action | record_id_match |
---|---|---|
9 | reassign | 11 |
7 | reassign | NULL |
8 | combine | 12 |
3, 4 | reassign | NULL |
1, 2 | reassign | 10 |
Field | Value |
---|---|
record_id | record_id or comma-separated list of record_ids to get reassigned to or combined with another project |
action |
combine /reassign
|
record_id_match | any record_id from the project that you want to either combine with or reassign to (can be NULL ) |
Note: if record_id_match is
NULL
(leave as blank in csv or excel), it's indicating that we will create a new stand-alone project
project_record_ids |
---|
{1,2,3,4} |
{5,6,7,8} |
{9,10,11} |
For example
record_id | action | record_id_match |
---|---|---|
7 | reassign | 2 |
results in:
project_record_ids |
---|
{1,2,3,4,7} |
{5,6,8} |
{9,10,11} |
This can also be used to separate out a record into a stand-alone project. For example
record_id | action | record_id_match |
---|---|---|
7 | reassign | NULL |
results in:
project_record_ids |
---|
{1,2,3,4} |
{5,6,8} |
{9,10,11} |
{7} |
This action can also reassign multiple record_ids from a given project simultaneously. For example
record_id | action | record_id_match |
---|---|---|
7, 8 | reassign | 2 |
results in:
project_record_ids |
---|
{1,2,3,4,7,8} |
{5,6} |
{9,10,11} |
If record_id_match is NULL, can also be used to separate out multiple records into a stand-alone project. For example
record_id | action | record_id_match |
---|---|---|
7, 8 | reassign | NULL |
results in:
project_record_ids |
---|
{1,2,3,4} |
{5,6} |
{9,10,11} |
{7,8} |
Combine: Merging two, distinct groups of matched records. This means that they refer to the same project.
For example
record_id | action | record_id_match |
---|---|---|
7 | combine | 2 |
results in:
project_record_ids |
---|
{1,2,3,4,5,6,7,8} |
{9,10,11} |
There are two kinds of reviews that we are asking you to do in this phase. The review_dob
table contains every match we found between a DOB record and a non-DOB record (or a project containing multiple non-DOB records). These matches are based on where a DOB record's BBL geometry overlaps with the non-DOB record geometry, as long as the dates are close enough.
Note:
Neighborhood Study Rezoning Commitments
andFuture Neighborhood Studies
are excluded from this process and will be added later
Sometimes, a DOB record will spatially overlap with only one thing. In this case, we can assume that this match is accurate and move forward with the unit adjustments. In these cases, we set project_has_dob_multi
to 0. Because we're still developing our workflow, though, we don't want to make too many assumptions. That is why we are including these matches for you to verify.
In other cases, a DOB record will spatially overlap with more than one non-DOB record or verified project. In these cases, we cannot make any assumptions about how to automatically count up the units, since we don't know which match is the accurate one (remember that a DOB record cannot accurately match with more than one project). Because there is no automated way for us to move forward, we all projects with project_has_dob_multi = 1
needs review. In these cases, we need you to do more than verify that the match we found is correct. We need you to tell us which of the matches is correct, and which are incorrect.
If a DOB record matched with more than one project,
-
project_has_dob_multi
will be 1 for all records in that project. This is to make it easy for you to identify projects with conflicts. For example, the following query will identify every project that matched with a DOB record, but that DOB record also matched with a different project. This would give you the context you need to tell us which match is the correct one.
SELECT * FROM review_dob
WHERE project_has_dob_multi = 1
-
dob_multimatch
will be 1 for just the DOB record in that project. Because the DOB records are the only one's you are changing, this is so you can easily identify where we need you to make edits. The following query would give you every DOB match, where the DOB record matched with more than one project. You should be editing every record returned by this query.
It does not include the other records from the project, however, so refer to the previous query to get context information.
SELECT * FROM review_dob
WHERE dob_multimatch = 1
You will resolve all cases where dob_multimatch = 1
through the corrections_dob
table. In this table, you can either add or remove matches between a DOB record and a project consisting of one or more non-DOB records. As in the corrections_project
table, you only need to specify one record_id
from a group of record_ids
that make up a project in order to either add or remove a match.
record_id | action | record_id_dob |
---|---|---|
1 | add | 54321 |
1 | remove | 12345 |
In the following example, the DOB record with ID 12345 matched with two different projects. One project consists of non-DOB records with IDs 1, 2, and 3. The second project consists of non-DOB records with IDs 4 and 5.
record_id | project_record_ids | record_id_dob |
---|---|---|
1 | {1,2,3} | 12345 |
2 | {1,2,3} | 12345 |
3 | {1,2,3} | 12345 |
4 | {4,5} | 12345 |
5 | {4,5} | 12345 |
Remove a match: A DOB record matched with a verified group of non-DOB records, but they are not referring to the same project.
If the DOB record with ID 12345 should be part of the same project as the non-DOB records with 4 and 5 (and not the one with IDs 1, 2, and 3) you would specify that by removing the incorrect match. You would do so with a correction that looks like:
record_id | action | record_id_dob |
---|---|---|
1 | remove | 12345 |
This results in:
record_id | project_record_ids | record_id_dob |
---|---|---|
4 | {4, 5} | 12345 |
5 | {4, 5} | 12345 |
Add a match: A DOB record did not match with a verified group of non-DOB records, but it is actually referring to the same project.
If a DOB record does not overlap with a non-DOB record but they refer to the same project, you can also add a match. For example, if the DOB record with 54321 does not overlap with records 1, 2, and 3, but still refers to the same project, you would add a correction like:
record_id | action | record_id_dob |
---|---|---|
1 | add | 54321 |
This results in:
record_id | project_record_ids | record_id_dob |
---|---|---|
1 | {1,2,3} | 12345 |
2 | {1,2,3} | 12345 |
3 | {1,2,3} | 12345 |
1 | {1,2,3} | 54321 |
2 | {1,2,3} | 54321 |
3 | {1,2,3} | 54321 |
4 | {4, 5} | 12345 |
5 | {4, 5} | 12345 |
Sometimes source data will be incorrect or incomplete. Before publishing KPDB, research teams conduct an extensive manual research effort to review, verify, and correct data.
Records are added to the Correction table to edit an attribute of an existing record. The corrections_main
table is also used to to exclude records. By capturing the old attribute value, it is easy to keep track of the original, pre-edited data. Additionally, a value is only overwritten if the old value
matches the current value for the specified field. Therefore, if the source data is updated between versions, erroneous corrections won't be applied.
record_id | field | old value | new value | editor | date | record_name | Notes |
---|---|---|---|---|---|---|---|
Unique ID of the project | Field where the value needs to be updated | Value that needs to be changed | New value | Name of person who made the edit | Date the edit was made | Name of the record | Notes from the editor |
CREATE TABLE corrections_main(
record_id text,
field text,
old_value text,
new_value text,
editor text,
date text,
record_name text,
notes text
);
It is important that the editor input the record_id
, field, and old value values exactly as they appear in KPDB, including any capitalization and special characters. KPDB field names and descriptions can be found here. Additionally, record_id + field must be unique within the Correction table. If there are instances where record_id + field are not unique, we will not ingest the data.
Below describes what types of corrections can be made via the corrections_main
table, and gives an example of how to make each correction. If you'd like to make a correction that is not described, please reach out to Data Engineering.
A value is incorrect or missing and you know the correct information, so you'd like it to be reflected in the final output.
record_id | field | old value | new value | editor | date | record_name | Notes |
---|---|---|---|---|---|---|---|
P2015R0339 | units_gross | 200 | 180 | baiyue | 4/20/2020 | some comments | |
P2015R0339 | units_gross | 180 | baiyue | 4/20/2020 | filling in missing unit count |
You do not want the record to be included in KPDB or any clustering logic.
Note: as of 2021, all zap add/removals should be recorded in
corrections_zap
instead ofcorrections_main
record_id | field | old value | new value | editor | date | record_name | Notes |
---|---|---|---|---|---|---|---|
12123424 | remove | baiyue | 4/20/2020 | Stanley Avenue and Glen Avenue | some comments |
Sometimes the value in one field is dependent on another value. It's important to keep these dependencies in mind as you're updating values. For related fields that aren't automatically adjusted, you will want to also provide corrections to these fields so that they're in-sync.
Corrected fields | Fields affected downstream | Related fields that aren't automatically adjusted |
---|---|---|
Date | Phasing proportions | |
Status | Phasing proportions | |
Phasing proportions | Counts of units in each phase | |
Geometry | Assignment of record to a project (see note), borough | |
Gross units | Net units, counts of units in each phase |
NOTE: If a geometry correction results in a new overlap between records, the default assumption is that these records are part of the same project. If you correct a geometry in a way that causes an overlap, but the record refers to a different project than the overlapping records, you will need to account for this exception in the corrections_project table. For example, let's say you modify geometry in a way that causes an overlap, but the record getting the new geometry is actually a stand-alone project. You would add a row to corrections_project where
record_id
is the record with the new geometry,action
isreassign
, andrecord_id_match
isNULL
.
If the spatial data for a record is incorrect or missing, you can correct or create the geometry by supplying a shapefile. The shapefile should have two components: record_id
and corrected geometry.
There are several tools that you can use to create this shapefile, including: geojson.io, Carto, and ArcMap.
The geometry from this file will overwrite the geometry of the record with a matching record_id
in the original file, regardless of if the geometry has changed in the source data between versions. If, after editing the geometry of a record, the record should NOT be assigned to the same project as overlapping records, do not forget to specify this exception in corrections_project
.
The table corrections_main
is just one of the five research tables. For complete descriptions of the others, see:
After the research is complete and the five tables are returned to Data Engineering, we will rerun the KPDB build-process with the new corrections data. This will:
- Include any new planner-added projects or ZAP projects newly flagged as
add
incorrections_zap
- Remove any records flagged as "remove" in either
corrections_main
orcorrections_zap
- Apply the new values and geometries from
corrections_main
and shapefile - Identify projects by finding overlapping non-DOB records, then making adjustments to the lists of overlapping records using
corrections_project
. In cases where the geometries got corrected in a way that creates overlaps, we reassign records to the same project as overlapping records unless otherwise specified incorrections_project
- Identify overlaps between DOB records and non-DOB records, then use
corrections_dob
to assign these DOB records to projects. DOB records will get assigned to a project if a match is not flagged asremove
incorrections_dob
, or if a new match is specified as anadd
incorrections_dob
. - Calculate borough field based on corrected geometries.
- Run the units_net calculation based on corrected units_gross and corrected project assignments.
- Overwrite units_net calculation using any records in
corrections_main
wherefield = 'units_net'
- Calculate the phased units by multiplying the corrected phasing proportions with the corrected units net.
The new workflow deprecated the existing method of adding and removing with corrections_zap
table. Instead, giving the research team more control over the ZAP projects included in KPDB. zap_record_ids
list is the exhaustive list for records included meaning any records not on the list would get dropped after the filtering logics are applied to it. Also based on current logics, any records dropped by the filtering logics would get added back if they are part of this list.
Housing team should receive a list of new ZAP projects added since the last KPDB update. This is currently done manually but could be automated. The filtering is based on the createdon
date and should look back to a year normally since the last build. For example, if the 2021 version KPDB is built in November of 2021 and new build is happening in November of 2022 then the new records between November 2021 and November 2022 should be provided to Housing.
There are instances when a Planner knows about a future housing project that is not captured in any of the input sources. When this happens, a record is created within the Planner-Added Projects table. The layout of the Planner-Added Projects table is outlined below. It is important that the record_id
is unique throughout the table. If there is a corresponding geometry in the geometry corrections shapefile, make sure that the geometry record and the Planner-Added Projects record have the same record_id
so we can link them. Additionally, it is important that every relevant component is filled out for a record.
Only new records should be added to and edited within the Planner-Added Projects table. Planner-Added Project records that are already within KPDB should be considered as "locked in." Edits to these existing records should be done in the corrections_main
table. Only new records in the Planner-Added Projects table will be appended to the final KPDB.
Field | Definition |
---|---|
Record ID | The unique ID of the record |
Name | The name of the project as provided by the source |
Borough | The borough the project is located within |
Status | The current phase of the project |
Units gross | The proposed number of units associated with the project as reported in the source data |
Portion built within 5 years | The portion of the project should be completed within the next 5 years |
Portion built within 5-10 years | The portion of the project that should be completed within the next 10 years |
Portion built after 10 years | The portion of the project that should be completed sometime after 10 years from this year |
Phasing assumption rational | The planner’s rationale for the timeline of a project |
Phasing: Assumed or Known | Boolean. True if the phasing assumptions are known |
NYCHA | Boolean. True if this is a NYCHA project |
Group Quarters | Boolean. True if this is a Group Quarters project |
Senior Housing | Boolean. True if this is a Senior Housing project |
Assisted Living | Boolean. True if this is an Assisted Living project |
Inactive | Flag to indicate if the project is inactive or not. If we do not include DOB inactive jobs we will exclude this field |
Project ID | ID the groups records together into a single project |
Omit | Boolean. True if this project should not be included in KPDB |
Omission reason | Reason for omitting the record |
Lead planner name | Name of the lead planner |
Editor | Name of person who added the record into the planner added projects table |
Planner notes | Additional notes from the planner |
Date | Date the project was added |
First navigate the KPDB data repo main page
Then click on the issue tab to see this page. Then click the "New Issue"
Do not use the issue template instead use the "Open a blank issue"
Give your issue an title and also some descriptions. Then click on "Submit new issue"
you should see an new issue got created immediately after. One key information here is the #206
number which would be your issue number and also the number you should point the Pull Request to.
Once you have your issue created and issue number. Go back to the KPDB data repo home page. The home page should be automatically take you to the main branch which should be indicated right under the issue tab. It is automatic so it shouldn't require you to make a change but good to confirm. Then you can navigate to the corrections
folder.
Once you are inside the corrections folder, go to Add file
dropdown and then choose the Upload files
option.
Once you choose your new corrections spreadsheet to be uploaded and making sure the name is exactly the same as the file it is going to be replaced. You can give it some commit messages. Then select the option to Create a new branch for this commit and start a pull request. We then highly recommend it to name the branch with the issue number (in this case 206
) included. This would help connect the issue and Pull Request which is addressing the issue. When everything looks good to create the Pull Request click on Propose Changes
and it would take you to the final creation page for Pull Request.
Before finalizing the Pull Request, use the Reviewers
functionality on the right to tag appropriate EDM member to review the changes which could be viewed as you scroll to the lower bottom of this page. Once they were tagged then click on Create Pull Request
.
Now you have a Pull Request created for the changes you want to make and EDM members should be notified by Github with the request to review. It should be noted at least for the early on of the process to not click on Merge Pull Request
until at least one member of Data Engineering team approved the Pull Request.
You can build the KPDB without needing any softwares (or data) in your local environment by using the Github Actions.
Once you are in the Github Actions page for KPDB. Then navigates to the Build
workflow using the panel to the left which should bring you to the view below.
Then use the dropdown menu to first choose the branch you want to run the production off. Then also it asked you whether to use the latest datasets (defaulted to "no") which includes any source data updates and manual corrections merged into main
branch in the db-knownproject-data repo. Then last option asked you whether to send the output to the (defaulted to "yes") and we will talk about how to access this in the next section. Click Run workflow
to kick off a production run.
When a run is kicked off it would show up on the list of runs like below. You can click on the specific run to check on the status and details for each run.
Remember the db-knownproject-data is a private repo so you might need to request access for it.
If the option for sent the output to db-knownproject-data was set to yes in the last step, then a new batch of output would be sent to the db-knownproject-data. The output would be timestamped by the date and also hour the production run finished. Click on the output branch name you want to access.
Source | Project type | Phasing provided/assumed | Proposed phasing 2020 |
---|---|---|---|
DOB | Complete | Assumed | All counted as complete |
DOB | Permit issued | Assumed | Next 5 years |
DOB | Application filed | Assumed | Next 5 years |
DOB | In progress/In progress (last disapproved) | Assumed | Next 5 years |
DOB | Inactive (not withdrawn) | Assumed | 5-10 years |
DOB | Withdrawn | Assumed | exclude |
HPD Projected Closings | n/a | Provided | 2.5 years after financing close (assumption that financing closes around the same time a permit is issued) |
HPD RFPs | n/a | Provided | Next 5 years |
EDC | n/a | Provided | EDC-provided build year |
DCP Application | All non-ULURP | Assumption | 5-10 years |
DCP Application | Status = on-hold | Assumption | 5-10 years |
DCP Application | Status = active, Pre-PAS | Assumption | 5-10 years |
DCP Application | Status = active, certified in or before 2019 | Assumption | 5-10 years |
DCP Application | Status = active, certifying after 2019 | Assumption | 5-10 years |
DCP Application | Status = active, Public review | Assumption | 5-10 years |
DCP Application | Status = complete | Assumption | 5-10 years |
DCP Application | Status = Record Closed | Assumption | exclude |
Empire State Development Projects | n/a | n/a | One project: Atlantic Yards, provided by planner input |
Neighborhood Study Rezoning Commitments | Downtown Far Rockaway/Inwood | Provided | Estimates provided by EDC |
Neighborhood Study Rezoning Commitments | Jerome and Bay Street Corridor | Provided | Estimates provided by planner input |
Neighborhood Study Rezoning Commitments | ENY and East Harlem | Assumption | Next 5 years (based on rezoing effective date) |
Neighborhood Study Projected Development Sites | n/a | Assumption | Remaining units evenly distributed across each year of the estimated buildout timeline |
Future Neighborhood Studies | n/a | Assumption | Units evenly distributed across each year of the estimated buildout timeline. |
Original status | Source | Proposed status |
---|---|---|
Active, Initiation | DCP Applications | DCP 1: Expression of interest |
On-Hold, Initiation | DCP Applications | DCP 1: Expression of interest |
Active, Pre-Cert | DCP Applications | DCP 2: Application in progress |
Active, Pre-PAS | DCP Applications | DCP 2: Application in progress |
On-Hold, Pre-Cert | DCP Applications | DCP 2: Application in progress |
On-Hold, Pre-PAS | DCP Applications | DCP 2: Application in progress |
Active, Certified | DCP Applications | DCP 3: Certified/Referred |
Complete | DCP Applications | DCP 4: Zoning Implemented |
1. Filed Application | DOB | DOB 1. Filed Application |
2. Approved Application | DOB | DOB 2. Approved Application |
3. Permitted for Construction | DOB | DOB 3. Permitted for Construction |
4. Partially Completed Construction | DOB | DOB 4. Partially Completed Construction |
5. Completed Construction | DOB | DOB 5. Completed Construction |
Potential | DCP Planner-Added Projects | Potential |
Projected | EDC Projected Projects | Potential |
Projected | Empire State Development Projected Projects | Potential |
Rezoning Commitment | Neighborhood Study Rezoning Commitments | Potential |
Projected Development | Neighborhood Study Projected Development Sites | Potential |
Future Rezoning | Future Neighborhood Studies | Potential |
RFP issued; financing not closed | HPD RFPs | HPD 1: RFP Issued |
RFP designated; financing not closed | HPD RFPs | HPD 2: RFP Designated |
RFP designated; financing closed | HPD RFPs | HPD 4: Financing Closed |
Projected | HPD Projected Closings | HPD 3: Projected Closing |
the current implementation to perform SCA join is with a series sql queries run on with Carto Batch. This would require both the username for CP Carto account and also an API key.
the credential to logon to the CP division carto account is stored in the secrets file for DE.
Using that credential to logon to the portal to get the API key to run batch jobs.
Producing these three output are the next steps. The four ingredients required to run the sql queries are the final KPDB table, nyc_school_districts
, doe_schoolsubdistricts
, and doe_school_zones_es
. Make sure those datasets are present on CP's Carto account and also they are updated to the latest versions.
Follow the documentation by Emily to produce the three csvs and qaqc these output by comparing them with previous versions using jupyter notebook.
the final deliverable to SCA is going to be created from the three tables from previous step. The way to generate this is with Excel formula and combined into a single xlsx file before sending to SCA.
The spreadsheets used to perform the calculations for the previous years are in this folder.
##READ ME for HED this tab in Emily's KPDB OneNote contains the most detailed step-by-step to do that calculation to output the final ES zone and MS Zone allocation.