Skip to content

Commit

Permalink
Fix grammar and remove duplicate words (apache#14647)
Browse files Browse the repository at this point in the history
* chore: fix grammar and remove duplicate words
  • Loading branch information
jbampton authored Mar 7, 2021
1 parent e1ff59e commit 6dc24c9
Show file tree
Hide file tree
Showing 24 changed files with 30 additions and 30 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/build-images-workflow-run.yml
Original file line number Diff line number Diff line change
Expand Up @@ -556,5 +556,5 @@ jobs:
cancelMode: self
notifyPRCancel: true
notifyPRCancelMessage: |
Building images for the PR has failed. Follow the the workflow link to check the reason.
Building images for the PR has failed. Follow the workflow link to check the reason.
sourceRunId: ${{ github.event.workflow_run.id }}
2 changes: 1 addition & 1 deletion BREEZE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2283,7 +2283,7 @@ This is the current syntax for `./breeze <./breeze>`_:
update-breeze-file update-extras update-local-yml-file update-setup-cfg-file
version-sync yamllint
You can pass extra arguments including options to to the pre-commit framework as
You can pass extra arguments including options to the pre-commit framework as
<EXTRA_ARGS> passed after --. For example:
'breeze static-check mypy' or
Expand Down
2 changes: 1 addition & 1 deletion IMAGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -436,7 +436,7 @@ Customizing the image

Customizing the image is an alternative way of adding your own dependencies to the image.

The easiest way to build the image image is to use ``breeze`` script, but you can also build such customized
The easiest way to build the image is to use ``breeze`` script, but you can also build such customized
image by running appropriately crafted docker build in which you specify all the ``build-args``
that you need to add to customize it. You can read about all the args and ways you can build the image
in the `<#ci-image-build-arguments>`_ chapter below.
Expand Down
4 changes: 2 additions & 2 deletions PULL_REQUEST_WORKFLOW.rst
Original file line number Diff line number Diff line change
Expand Up @@ -237,7 +237,7 @@ As explained above the approval and matrix tests workflow works according to the
:align: center
:alt: Full tests are needed for the PR

4) If this or another committer "request changes" in in a previously approved PR with "full tests needed"
4) If this or another committer "request changes" in a previously approved PR with "full tests needed"
label, the bot automatically removes the label, moving it back to "run only default set of parameters"
mode. For PRs touching core of airflow once the PR gets approved back, the label will be restored.
If it was manually set by the committer, it has to be restored manually.
Expand All @@ -248,7 +248,7 @@ As explained above the approval and matrix tests workflow works according to the
for the PRs and they provide good "notification" for the committer to act on a PR that was recently
approved.

The PR approval workflow is possible thanks two two custom GitHub Actions we've developed:
The PR approval workflow is possible thanks to two custom GitHub Actions we've developed:

* `Get workflow origin <https://github.com/potiuk/get-workflow-origin/>`_
* `Label when approved <https://github.com/TobKed/label-when-approved-action>`_
Expand Down
2 changes: 1 addition & 1 deletion airflow/jobs/scheduler_job.py
Original file line number Diff line number Diff line change
Expand Up @@ -1463,7 +1463,7 @@ def _do_scheduling(self, session) -> int:
By "next oldest", we mean hasn't been examined/scheduled in the most time.
The reason we don't select all dagruns at once because the rows are selected with row locks, meaning
that only one scheduler can "process them", even it it is waiting behind other dags. Increasing this
that only one scheduler can "process them", even it is waiting behind other dags. Increasing this
limit will allow more throughput for smaller DAGs but will likely slow down throughput for larger
(>500 tasks.) DAGs
Expand Down
2 changes: 1 addition & 1 deletion airflow/models/dag.py
Original file line number Diff line number Diff line change
Expand Up @@ -1087,7 +1087,7 @@ def topological_sort(self, include_subdag_tasks: bool = False):
# using the items() method for iterating, a copy of the
# unsorted graph is used, allowing us to modify the unsorted
# graph as we move through it. We also keep a flag for
# checking that that graph is acyclic, which is true if any
# checking that graph is acyclic, which is true if any
# nodes are resolved during each pass through the graph. If
# not, we need to exit as the graph therefore can't be
# sorted.
Expand Down
2 changes: 1 addition & 1 deletion airflow/models/dagrun.py
Original file line number Diff line number Diff line change
Expand Up @@ -576,7 +576,7 @@ def _emit_true_scheduling_delay_stats_for_finished_state(self, finished_tis):
started task within the DAG and calculate the expected DagRun start time (based on
dag.execution_date & dag.schedule_interval), and minus these two values to get the delay.
The emitted data may contains outlier (e.g. when the first task was cleared, so
the second task's start_date will be used), but we can get rid of the the outliers
the second task's start_date will be used), but we can get rid of the outliers
on the stats side through the dashboards tooling built.
Note, the stat will only be emitted if the DagRun is a scheduler triggered one
(i.e. external_trigger is False).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ CREATE TABLE toTwitter_A(id BIGINT, id_str STRING
alter table toTwitter_A SET serdeproperties ('skip.header.line.count' = '1');
```

When you review the code for the DAG, you will notice that these tasks are generated using for loop. These two for loops could be combined into one loop. However, in most cases, you will be running different analysis on your incoming incoming and outgoing tweets, and hence they are kept separated in this example.
When you review the code for the DAG, you will notice that these tasks are generated using for loop. These two for loops could be combined into one loop. However, in most cases, you will be running different analysis on your incoming and outgoing tweets, and hence they are kept separated in this example.
Final step is a running the broker script, brokerapi.py, which will run queries in Hive and store the summarized data to MySQL in our case. To connect to Hive, pyhs2 library is extremely useful and easy to use. To insert data into MySQL from Python, sqlalchemy is also a good one to use.
I hope you find this tutorial useful. If you have question feel free to ask me on [Twitter](https://twitter.com/EkhtiarSyed).<p>
-Ekhtiar Syed
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ def transfertodb():
# The following tasks are generated using for loop. The first task puts the eight
# csv files to HDFS. The second task loads these files from HDFS to respected Hive
# tables. These two for loops could be combined into one loop. However, in most cases,
# you will be running different analysis on your incoming incoming and outgoing tweets,
# you will be running different analysis on your incoming and outgoing tweets,
# and hence they are kept separated in this example.
# --------------------------------------------------------------------------------

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
.. warning::
You need to provide a large enough set of data so that operations do not execute too quickly.
Otherwise, DAG will fail.
* GCP_TRANSFER_SECOND_TARGET_BUCKET - Google Cloud Storage bucket bucket to which files are copied
* GCP_TRANSFER_SECOND_TARGET_BUCKET - Google Cloud Storage bucket to which files are copied
* WAIT_FOR_OPERATION_POKE_INTERVAL - interval of what to check the status of the operation
A smaller value than the default value accelerates the system test and ensures its correct execution with
smaller quantities of files in the source bucket
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
* GCP_PROJECT_ID - Google Cloud Project to use for the Google Cloud Transfer Service.
* GCP_TRANSFER_FIRST_TARGET_BUCKET - Google Cloud Storage bucket to which files are copied from AWS.
It is also a source bucket in next step
* GCP_TRANSFER_SECOND_TARGET_BUCKET - Google Cloud Storage bucket bucket to which files are copied
* GCP_TRANSFER_SECOND_TARGET_BUCKET - Google Cloud Storage bucket to which files are copied
"""

import os
Expand Down
8 changes: 4 additions & 4 deletions airflow/providers/google/cloud/operators/dataflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ class DataflowConfiguration:
account from the list granting this role to the originating account (templated).
:type impersonation_chain: Union[str, Sequence[str]]
:param drain_pipeline: Optional, set to True if want to stop streaming job by draining it
instead of canceling during during killing task instance. See:
instead of canceling during killing task instance. See:
https://cloud.google.com/dataflow/docs/guides/stopping-a-pipeline
:type drain_pipeline: bool
:param cancel_timeout: How long (in seconds) operator should wait for the pipeline to be
Expand Down Expand Up @@ -729,7 +729,7 @@ class DataflowStartFlexTemplateOperator(BaseOperator):
domain-wide delegation enabled.
:type delegate_to: str
:param drain_pipeline: Optional, set to True if want to stop streaming job by draining it
instead of canceling during during killing task instance. See:
instead of canceling during killing task instance. See:
https://cloud.google.com/dataflow/docs/guides/stopping-a-pipeline
:type drain_pipeline: bool
:param cancel_timeout: How long (in seconds) operator should wait for the pipeline to be
Expand Down Expand Up @@ -863,7 +863,7 @@ class DataflowStartSqlJobOperator(BaseOperator):
domain-wide delegation enabled.
:type delegate_to: str
:param drain_pipeline: Optional, set to True if want to stop streaming job by draining it
instead of canceling during during killing task instance. See:
instead of canceling during killing task instance. See:
https://cloud.google.com/dataflow/docs/guides/stopping-a-pipeline
:type drain_pipeline: bool
"""
Expand Down Expand Up @@ -1006,7 +1006,7 @@ class DataflowCreatePythonJobOperator(BaseOperator):
JOB_STATE_RUNNING state.
:type poll_sleep: int
:param drain_pipeline: Optional, set to True if want to stop streaming job by draining it
instead of canceling during during killing task instance. See:
instead of canceling during killing task instance. See:
https://cloud.google.com/dataflow/docs/guides/stopping-a-pipeline
:type drain_pipeline: bool
:param cancel_timeout: How long (in seconds) operator should wait for the pipeline to be
Expand Down
2 changes: 1 addition & 1 deletion airflow/providers/google/cloud/operators/dataproc.py
Original file line number Diff line number Diff line change
Expand Up @@ -610,7 +610,7 @@ def execute(self, context) -> dict:
# Check if cluster is not in ERROR state
self._handle_error_state(hook, cluster)
if cluster.status.state == cluster.status.State.CREATING:
# Wait for cluster to be be created
# Wait for cluster to be created
cluster = self._wait_for_cluster_in_creating_state(hook)
self._handle_error_state(hook, cluster)
elif cluster.status.state == cluster.status.State.DELETING:
Expand Down
2 changes: 1 addition & 1 deletion airflow/providers/google/suite/hooks/sheets.py
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,7 @@ def batch_update_values(
"""
if len(ranges) != len(values):
raise AirflowException(
"'Ranges' and and 'Lists' must be of equal length. \n \
"'Ranges' and 'Lists' must be of equal length. \n \
'Ranges' is of length: {} and \n \
'Values' is of length: {}.".format(
str(len(ranges)), str(len(values))
Expand Down
2 changes: 1 addition & 1 deletion airflow/providers/google/suite/transfers/gcs_to_gdrive.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@

class GCSToGoogleDriveOperator(BaseOperator):
"""
Copies objects from a Google Cloud Storage service service to Google Drive service, with renaming
Copies objects from a Google Cloud Storage service to a Google Drive service, with renaming
if requested.
Using this operator requires the following OAuth 2.0 scope:
Expand Down
2 changes: 1 addition & 1 deletion airflow/www/templates/airflow/graph.html
Original file line number Diff line number Diff line change
Expand Up @@ -643,7 +643,7 @@
// Is there a better way to get node_width and node_height ?
const [node_width, node_height] = [rect[0][0].attributes.width.value, rect[0][0].attributes.height.value];

// Calculate zoom scale to fill most of the canvas with the the node/cluster in focus.
// Calculate zoom scale to fill most of the canvas with the node/cluster in focus.
const scale = Math.min(
Math.min(width / node_width, height / node_height),
1.5, // cap zoom level to 1.5 so nodes are not too large
Expand Down
2 changes: 1 addition & 1 deletion breeze
Original file line number Diff line number Diff line change
Expand Up @@ -2012,7 +2012,7 @@ ${CMDNAME} static-check [FLAGS] static_check [-- <EXTRA_ARGS>]
${FORMATTED_STATIC_CHECKS}
You can pass extra arguments including options to to the pre-commit framework as
You can pass extra arguments including options to the pre-commit framework as
<EXTRA_ARGS> passed after --. For example:
'${CMDNAME} static-check mypy' or
Expand Down
4 changes: 2 additions & 2 deletions chart/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@ secret: []
# Extra secrets that will be managed by the chart
# (You can use them with extraEnv or extraEnvFrom or some of the extraVolumes values).
# The format is "key/value" where
# * key (can be templated) is the the name the secret that will be created
# * key (can be templated) is the name of the secret that will be created
# * value: an object with the standard 'data' or 'stringData' key (or both).
# The value associated with those keys must be a string (can be templated)
extraSecrets: {}
Expand All @@ -185,7 +185,7 @@ extraSecrets: {}
# Extra ConfigMaps that will be managed by the chart
# (You can use them with extraEnv or extraEnvFrom or some of the extraVolumes values).
# The format is "key/value" where
# * key (can be templated) is the the name the configmap that will be created
# * key (can be templated) is the name of the configmap that will be created
# * value: an object with the standard 'data' key.
# The value associated with this keys must be a string (can be templated)
extraConfigMaps: {}
Expand Down
2 changes: 1 addition & 1 deletion dev/provider_packages/prepare_provider_packages.py
Original file line number Diff line number Diff line change
Expand Up @@ -1026,7 +1026,7 @@ def make_sure_remote_apache_exists_and_fetch(git_update: bool):
Make sure that apache remote exist in git. We need to take a log from the apache
repository - not locally.
Also the the local repo might be shallow so we need to unshallow it.
Also the local repo might be shallow so we need to unshallow it.
This will:
* check if the remote exists and add if it does not
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ To execute a streaming Dataflow job, ensure the streaming option is set (for Pyt
source, such as Pub/Sub, in your pipeline (for Java).

Setting argument ``drain_pipeline`` to ``True`` allows to stop streaming job by draining it
instead of canceling during during killing task instance.
instead of canceling during killing task instance.

See the `Stopping a running pipeline
<https://cloud.google.com/dataflow/docs/guides/stopping-a-pipeline>`_.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Prerequisite Tasks
Manage GKE cluster
^^^^^^^^^^^^^^^^^^

A cluster is the foundation of GKE - all workloads run on on top of the cluster. It is made up on a cluster master
A cluster is the foundation of GKE - all workloads run on top of the cluster. It is made up on a cluster master
and worker nodes. The lifecycle of the master is managed by GKE when creating or deleting a cluster.
The worker nodes are represented as Compute Engine VM instances that GKE creates on your behalf when creating a cluster.

Expand Down
2 changes: 1 addition & 1 deletion docs/apache-airflow/dag-run.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ A DAG Run is an object representing an instantiation of the DAG in time.
Each DAG may or may not have a schedule, which informs how DAG Runs are
created. ``schedule_interval`` is defined as a DAG argument, which can be passed a
`cron expression <https://en.wikipedia.org/wiki/Cron#CRON_expression>`_ as
a ``str``, a ``datetime.timedelta`` object, or one of of the following cron "presets".
a ``str``, a ``datetime.timedelta`` object, or one of the following cron "presets".

.. tip::
You can use an online editor for CRON expressions such as `Crontab guru <https://crontab.guru/>`_
Expand Down
2 changes: 1 addition & 1 deletion docs/apache-airflow/production-deployment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,7 @@ dependencies that are not needed in the final image. You need to use Airflow Sou
from the `official distribution folder of Apache Airflow <https://downloads.apache.org/airflow/>`_ for the
released versions, or checked out from the GitHub project if you happen to do it from git sources.

The easiest way to build the image image is to use ``breeze`` script, but you can also build such customized
The easiest way to build the image is to use ``breeze`` script, but you can also build such customized
image by running appropriately crafted docker build in which you specify all the ``build-args``
that you need to add to customize it. You can read about all the args and ways you can build the image
in the `<#production-image-build-arguments>`_ chapter below.
Expand Down
4 changes: 2 additions & 2 deletions docs/apache-airflow/upgrading-to-2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -299,7 +299,7 @@ When DAGs are initialized with the ``access_control`` variable set, any usage of
If you previously used non-RBAC UI, you have to switch to the new RBAC-UI and create users to be able
to access Airflow's webserver. For more details on CLI to create users see :doc:`cli-and-env-variables-ref`

Please note that that custom auth backends will need re-writing to target new FAB based UI.
Please note that custom auth backends will need re-writing to target new FAB based UI.

As part of this change, a few configuration items in ``[webserver]`` section are removed and no longer applicable,
including ``authenticate``, ``filter_by_owner``, ``owner_mode``, and ``rbac``.
Expand Down Expand Up @@ -1110,7 +1110,7 @@ and there is no need for it to be accessible from the CLI interface.

If the DAGRun was triggered with conf key/values passed in, they will also be printed in the dag_state CLI response
ie. running, {"name": "bob"}
whereas in in prior releases it just printed the state:
whereas in prior releases it just printed the state:
ie. running

**Deprecating ignore_first_depends_on_past on backfill command and default it to True**
Expand Down

0 comments on commit 6dc24c9

Please sign in to comment.