Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a method for date feature extraction in csv data preprocessor #34

Open
Ask149 opened this issue Apr 4, 2021 · 5 comments
Open
Labels
data-preprocessing Data preprocessing gssoc21 GirlScript Summer of Code'21 Level3 Very Hard Level Difficulty

Comments

@Ask149
Copy link
Contributor

Ask149 commented Apr 4, 2021

Description

a. Write a method to identify the columns of type date (this may include iterating over the list of columns and using an appropriate strategy to identify if a column has values of type date)

b. Implement another method that should be able to convert the date column into a specific static format (for example - YYYY-MM-DD) and split the date column into separate columns with the following attribute values:

  1. Date of the month (for example - 28 for '2021-12-28')
  2. Month (Numerical)
  3. Year
  4. Day of the week

c. Appropriate test methods should be implemented in the date_format_tests file

Assumptions

The following assumptions can be made during the implementation

  1. No time is present in the given input date.
  2. The data frame must contain column names
  3. A list of input patterns can be assumed. (For example - you can assume the input will be in either of any known formats mentioned).
    input_date_format = [ 'DD/MM/YYYY', 'YYYY/DD/MM', 'MM/DD/YYYY', 'YYYY/MM/DD', 'DD-MM-YYYY', 'YYYY-DD-MM', 'MM-DD-YYYY', 'YYYY-MM-DD' ]

Input (Method -1)

None

Output (Method-1)

list of column names with values of type date

Method details

Use the data frame from the self.df variable.

Input (Method -2)

An expected format the input date should be converted to

Output (Method-2)

None

Method details

Use the data frame from the self.df variable.

Implement a method for the same with appropriate name and parameters in the csv_preprocess.py file.

In the implementation use the method convert_date_format for converting the date into a specific format & the method-1 mentioned above to get a list of columns with date type.

Note

The use of standard python libraries is highly recommended.

JOIN THE SLACK CHANNEL HERE if you wish to contribute to this issue.

@Ask149 Ask149 added gssoc21 GirlScript Summer of Code'21 data-preprocessing Data preprocessing Level3 Very Hard Level Difficulty labels Apr 4, 2021
@asimaries
Copy link

I would like to work on this issue

@mehak6569
Copy link

I want to work on this issue. Is it still available to work on?

@Ask149
Copy link
Contributor Author

Ask149 commented Apr 21, 2021

@mehak6569, please find the first steps in the slack channel

@mehak6569
Copy link

@mehak6569, please find the first steps in the slack channel

Ok, Thank you!

@HarshKumarChoudary
Copy link

HarshKumarChoudary commented Feb 25, 2022

I want to work on this issue. Please assign this to me. I joined to Slack channel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-preprocessing Data preprocessing gssoc21 GirlScript Summer of Code'21 Level3 Very Hard Level Difficulty
Projects
None yet
Development

No branches or pull requests

4 participants