Skip to content

Commit

Permalink
Add documentation for with_columns_renamed() (mrpowers-io#219)
Browse files Browse the repository at this point in the history
* Add documentation for with_columns_renamed()

* Separate blocks for output screen and code block
  • Loading branch information
kunaljubce authored Apr 14, 2024
1 parent 55c6753 commit 8433b68
Show file tree
Hide file tree
Showing 2 changed files with 49 additions and 1 deletion.
48 changes: 48 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,54 @@ quinn.sort_columns(df=source_df, sort_order="asc", sort_nested=True)

### DataFrame Helpers

**with_columns_renamed()**

Rename ALL or MULTIPLE columns in a dataframe by implementing a common logic to rename the columns.

Consider you have the following two dataframes for orders coming from a source A and a source B:

```
order_a_df.show()
+--------+---------+--------+
|order_id|order_qty|store_id|
+--------+---------+--------+
| 001| 23| 45AB|
| 045| 2| 98HX|
| 021| 142| 09AA|
+--------+---------+--------+
order_b_df.show()
+--------+---------+--------+
|order_id|order_qty|store_id|
+--------+---------+--------+
| 001| 23| 47AB|
| 985| 2| 54XX|
| 0112| 12| 09AA|
+--------+---------+--------+
```

Now, you need to join these two dataframes. However, in Spark, when two dfs with identical column names are joined, you may start running into ambiguous column name issue due to multiple columns with the same name in the resulting df. So it's a best practice to rename all of these columns to reflect which df they originate from:

```python
def add_suffix(s):
return s + '_a'

order_a_df_renamed = quinn.with_columns_renamed(add_suffix)(order_a_df)

order_a_df_renamed.show()
```
```
+----------+-----------+----------+
|order_id_a|order_qty_a|store_id_a|
+----------+-----------+----------+
| 001| 23| 45AB|
| 045| 2| 98HX|
| 021| 142| 09AA|
+----------+-----------+----------+
```

**column_to_list()**

Converts a column in a DataFrame to a list of values.
Expand Down
2 changes: 1 addition & 1 deletion quinn/transformations.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@


def with_columns_renamed(fun: Callable[[str], str]) -> Callable[[DataFrame], DataFrame]:
"""Ffunction designed to rename the columns of a `Spark DataFrame`.
"""Function designed to rename the columns of a `Spark DataFrame`.
It takes a `Callable[[str], str]` object as an argument (``fun``) and returns a
`Callable[[DataFrame], DataFrame]` object.
Expand Down

0 comments on commit 8433b68

Please sign in to comment.