Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure S3 to store Airflow Logs #4470

Closed
robert-bryson opened this issue Sep 27, 2023 · 5 comments
Closed

Configure S3 to store Airflow Logs #4470

robert-bryson opened this issue Sep 27, 2023 · 5 comments

Comments

@robert-bryson
Copy link
Contributor

robert-bryson commented Sep 27, 2023

Airflow can store logs in a variety of ways. Should we want to use the deploy scenario of Airflow on Cloud Foundry, we would want to store the logs externally. There is functionality to allow writing logs to s3 that uses a s3 connector to handle the auth.

See additional context #4434 and GSA/datagov-harvester#1.

How to reproduce

  1. View the logs on a dag run (for example) on the https://test-airflow-webserver.app.cloud.gov/ deployment.
  2. See error in the cf app logs:
    image

Expected behavior

Populated logs

image

Actual behavior

No logs

image

Sketch

[Notes or a checklist reflecting our understanding of the selected approach]

  • Check the s3 service is set up correctly to allow Airflow to connect
  • Check the s3 connection in Airflow is set up correctly
  • ???
@robert-bryson robert-bryson added the bug Software defect or bug label Sep 27, 2023
@hkdctol hkdctol moved this to New Dev in data.gov team board Sep 28, 2023
@btylerburton btylerburton added the H2.0/Harvest-General General Harvesting 2.0 Issues label Oct 13, 2023
@btylerburton btylerburton removed the bug Software defect or bug label Nov 27, 2023
@btylerburton
Copy link
Contributor

Question for the team, if we have logs in New Relic, do we want them also in S3?

@robert-bryson
Copy link
Contributor Author

I suppose it doesn't make sense to double store the logs. I believe the idea behind the s3 connector is that all the various airflow components can drop logs in one place and then you can use whatever you'd like to aggregate them from there. Since our team is already using New Relic for this, it probably isn't necessary. Should we icebox or close this?

@btylerburton
Copy link
Contributor

I'm going to close it. If we need to revive it should be easy enough since it has the h20 label.

@github-project-automation github-project-automation bot moved this from New Dev to ✔ Done in data.gov team board Dec 5, 2023
@btylerburton
Copy link
Contributor

When the airflow configuration changes (worker scaling, etc), the logs are not guaranteed to be accessible, from what I've seen, so I believe this is worth reopening and revisiting in the future.

Image

@btylerburton btylerburton reopened this Dec 6, 2023
@github-project-automation github-project-automation bot moved this from ✔ Done to 📟 Sprint Backlog [7] in data.gov team board Dec 6, 2023
@btylerburton btylerburton moved this from 📟 Sprint Backlog [7] to New Dev in data.gov team board Dec 6, 2023
@btylerburton btylerburton changed the title Airflow on cloud foundry logs to s3 auth Configure S3 to store Airflow Logs Dec 6, 2023
@nickumia
Copy link

nickumia commented Dec 7, 2023

It seems like there isn't a good consensus on operational and maintenance procedures.. If the logs are in NR, then creating a process for linking to those logs (i.e. doing an API call to fetch the logs in whatever viewer makes sense might work OR just going to NR being an expert in finding logs haha...). The alternative of having "duplicate" logs only hurts if there's a heavy cost or maintenance burden involved, neither of which sound like it's the case. NR only stores 3 months of logs as is, soo... S3 would give you longer log storage too.

@btylerburton btylerburton moved this from New Dev to 📔 Product Backlog in data.gov team board Dec 7, 2023
@gujral-rei gujral-rei moved this from 📔 Product Backlog to 📟 Sprint Backlog [7] in data.gov team board Dec 7, 2023
@btylerburton btylerburton removed their assignment Dec 12, 2023
@btylerburton btylerburton added H2.0/orchestrator and removed H2.0/Harvest-General General Harvesting 2.0 Issues labels Dec 13, 2023
@FuhuXia FuhuXia moved this from 📟 Sprint Backlog [7] to 🏗 In Progress [8] in data.gov team board Jan 2, 2024
@FuhuXia FuhuXia self-assigned this Jan 2, 2024
@btylerburton btylerburton moved this from 🏗 In Progress [8] to 📟 Sprint Backlog [7] in data.gov team board Jan 2, 2024
@btylerburton btylerburton moved this from 📟 Sprint Backlog [7] to 🏗 In Progress [8] in data.gov team board Jan 4, 2024
@btylerburton btylerburton moved this from 🏗 In Progress [8] to 📔 Product Backlog in data.gov team board Jan 9, 2024
@btylerburton btylerburton moved this from 📔 Product Backlog to 🧊 Icebox in data.gov team board Jan 10, 2024
@hkdctol hkdctol closed this as completed Mar 28, 2024
@github-project-automation github-project-automation bot moved this from 🧊 Icebox to ✔ Done in data.gov team board Mar 28, 2024
@gujral-rei gujral-rei moved this from ✔ Done to 🗄 Closed in data.gov team board Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

5 participants