Remove NATing from BOSH networks #35

cirocosta · 2019-11-04T14:00:07Z

Hey,

We've been recently receiving complaints that resources like docker-image and
registry-image have been failing with "429 Too Many Requests".

While we did introduce retries at the resource-type level for registry-image, (see
concourse/registry-image-resource#69) those using
docker-image (or trying to reach dockerhub directly) would still suffer from the
limit being place on our IP.

My hypothesis is that by removing the NAT machine that we have in the bosh
network (which ends up making every request from any of the 40+ machines we
have going out from that single IP), we can then get rid of the problems we're
currently facing w/ regards to limits on the number of requests (aside from
reducing one hop and a single point of failure).

Last week, I naively tried just removing the routes that we have set at the
network level

prod/iaas/bosh.tf

Lines 135 to 153 in 92cf177

    
           resource "google_compute_route" "internal_nat" { 
        
             name = "internal-nat-route" 
        
             dest_range = "0.0.0.0/0" 
        
             network = "${google_compute_network.bosh.name}" 
        
             next_hop_instance = "${google_compute_instance.nat.name}" 
        
             next_hop_instance_zone = "${google_compute_instance.nat.zone}" 
        
             priority = 800 
        
             tags = ["internal"] 
        
           } 
        
           resource "google_compute_route" "vault_nat" { 
        
             name = "vault-nat-route" 
        
             dest_range = "0.0.0.0/0" 
        
             network = "${google_compute_network.bosh.name}" 
        
             next_hop_instance = "${google_compute_instance.nat.name}" 
        
             next_hop_instance_zone = "${google_compute_instance.nat.zone}" 
        
             priority = 800 
        
             tags = ["vault"] 
        
           }

but that didn't really work as expected as the machines that we create in the
bosh network do not assign ephemeral external IPs:

"The instance must have an external IP address. An external IP can be assigned
to an instance when it is created or after it has been created."

(from https://cloud.google.com/vpc/docs/vpc#internet_access_reqs)

prod/bosh/cloud_config.yml

Lines 29 to 36 in 92cf177

    
           - name: private 
        
             type: dynamic 
        
             subnets: 
        
             - azs: [z1, z2] 
        
               cloud_properties: 
        
                 network_name: bosh 
        
                 subnetwork_name: internal 
        
                 tags: [internal]

Given that we're on GCP, we can overcome that by using the
ephemeral_external_ip property - see https://bosh.io/docs/google-cpi/#networks.

Should we do that? I think so - if we don't have the requirement of having those
machines completely unreachable at all (not really true in our case), I think we
should just drop it.

Thanks!

The text was updated successfully, but these errors were encountered:

xtreme-sameer-vohra · 2019-11-04T16:34:53Z

We could use firewall rules & tags to ensure only outbound requests are allowed from the workers.

However, we don't have anyway of enforcing that those remain in place. For example, someone would be able to remove those rules or inadvertently change the tags/network name etc and we wouldn't know about it.

cirocosta · 2019-11-04T16:57:47Z

However, we don't have anyway of enforcing that those remain in place

yeah, while I do agree that that's indeed true and easy to misconfigure, I think it's just inevitable that our move to "protect the endpoints as if you were already compromised", and this can be a motivator to getting better at this (w/ e.g. issues like concourse/concourse#2415 and not exposing endpoints w/out auth in general) 🤔

(my point being that by forcing ourselves to rely less on a "perimeter of protection", we can be even more motivated to get our infra better protected to any scenario)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove NATing from BOSH networks #35

Remove NATing from BOSH networks #35

cirocosta commented Nov 4, 2019

xtreme-sameer-vohra commented Nov 4, 2019

cirocosta commented Nov 4, 2019 •

edited

Loading

Remove NATing from BOSH networks #35

Remove NATing from BOSH networks #35

Comments

cirocosta commented Nov 4, 2019

xtreme-sameer-vohra commented Nov 4, 2019

cirocosta commented Nov 4, 2019 • edited Loading

cirocosta commented Nov 4, 2019 •

edited

Loading