Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support WEEK_YEAR for date time formatter #10930

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ccat3z
Copy link
Contributor

@ccat3z ccat3z commented Sep 5, 2024

Support WEEK_YEAR for date time formatter. getWeekYear() is cloned from jdk 8 supporting both ISO8601 and Java SimpleDateFormat standards.

Test case is generated by the scala script below.

def gen_week_years_more() = {
  import java.util.Calendar
  import java.util.GregorianCalendar

  for {
    y <- (2017 to 2022);
    (m, d) <- Seq(
      (Calendar.JANUARY, 1),
      (Calendar.JANUARY, 2),
      (Calendar.JANUARY, 3),
      (Calendar.JANUARY, 4),
      (Calendar.JANUARY, 5),
      (Calendar.JANUARY, 6),
      (Calendar.JANUARY, 7),
      (Calendar.DECEMBER, 25),
      (Calendar.DECEMBER, 26),
      (Calendar.DECEMBER, 27),
      (Calendar.DECEMBER, 28),
      (Calendar.DECEMBER, 29),
      (Calendar.DECEMBER, 30),
      (Calendar.DECEMBER, 31),
    );
    (fd, md) <- Seq(
      (Calendar.SUNDAY, 1), // SimpleDateFormat
      (Calendar.MONDAY, 4)  // ISO
    )
  } yield {
    val cal = Calendar.getInstance();
    cal.setFirstDayOfWeek(fd)
    cal.setMinimalDaysInFirstWeek(md)

    cal.set(Calendar.YEAR, y)
    cal.set(Calendar.MONTH, m)
    cal.set(Calendar.DAY_OF_MONTH, d)
    cal.getTime

    val wy = cal.getWeekYear
    val woy = cal.get(Calendar.WEEK_OF_YEAR)

    f"  std::make_tuple(${y}%4d, ${m+1}%02d, $d%02d, ${fd-1}, ${md}, ${wy}%4d)," ++
    f" // ${wy}W${woy}"
  }
}

Presto test case is generated by the following query on presto 0.289:

with
  dates as (select
    date_parse(cast(year as varchar) || '-' || cast(month as varchar) || '-' || cast(day as varchar), '%Y-%m-%d') as d
  from
    (select * from unnest(sequence(2017, 2022)) as years(year)),
    (select * from (values
      (1, 1),
      (1, 2),
      (1, 3),
      (1, 4),
      (1, 5),
      (1, 6),
      (1, 7),
      (12, 25),
      (12, 26),
      (12, 27),
      (12, 28),
      (12, 29),
      (12, 30),
      (12, 31)
    ) as monthdays(month, day))
  )
select date_format(d, '%Y-%m-%d'), date_format(d, '%x') from dates
where day(d) in (1, 31) or year(d) != year_of_week(d);

SparkSQL test case is generated by the following query on spark 3.5.2:

SET spark.sql.legacy.timeParserPolicy=LEGACY;
with
  dates as (select
    cast(year as string) || '-' || cast(month as string) || '-' || cast(day as string) as d
  from
    (select * from (values
      (2017),
      (2018),
      (2019),
      (2020),
      (2021),
      (2022)
    ) as years(year)),
    (select * from (values
      (1, 1),
      (1, 2),
      (1, 3),
      (1, 4),
      (1, 5),
      (1, 6),
      (1, 7),
      (12, 25),
      (12, 26),
      (12, 27),
      (12, 28),
      (12, 29),
      (12, 30),
      (12, 31)
    ) as monthdays(month, day))
  )
select date_format(d, 'yyyy-MM-dd') as date, date_format(d, 'Y') as weekyear from dates
where day(d) in (1, 31) or date_format(d, 'yyyy') != date_format(d, 'YYYY')
order by d;

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 5, 2024
Copy link

netlify bot commented Sep 5, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit f31a643
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/672c8b69ba65ef0008ef6781

@ccat3z
Copy link
Contributor Author

ccat3z commented Sep 10, 2024

@rui-mo

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

velox/functions/lib/TimeUtils.h Outdated Show resolved Hide resolved
velox/functions/lib/TimeUtils.h Outdated Show resolved Hide resolved
velox/functions/lib/TimeUtils.h Outdated Show resolved Hide resolved
velox/functions/lib/TimeUtils.h Outdated Show resolved Hide resolved
velox/functions/lib/TimeUtils.h Outdated Show resolved Hide resolved
velox/functions/lib/TimeUtils.h Outdated Show resolved Hide resolved
static_cast<int>(calDate.year()),
static_cast<uint32_t>(calDate.month()),
static_cast<uint32_t>(calDate.day()),
2, // (ISO 8601) Monday = 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering how we plan to support the legacy Spark behavior where this value should be 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it can be obtained from java.util.Calendar in Java and then passed in through QueryConf.

@ccat3z
Copy link
Contributor Author

ccat3z commented Sep 13, 2024

@rui-mo I have updated the code. Could you please review it?

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

velox/functions/lib/TimeUtils.h Outdated Show resolved Hide resolved
/// (1582-10-15) would yield mismatched results.
///
/// The algorithm refers to the weekyear algorithm in jdk:
/// https://github.com/openjdk/jdk8/blob/6a383433a9f4661a96a90b2a4c7b5b9a85720031/jdk/src/share/classes/java/util/GregorianCalendar.java#L2077
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link points to an implementation in JDK 8. Is there any difference in the implementations among JDK versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

velox/functions/lib/TimeUtils.h Outdated Show resolved Hide resolved
velox/functions/lib/TimeUtils.h Outdated Show resolved Hide resolved
velox/functions/lib/TimeUtils.h Outdated Show resolved Hide resolved
velox/functions/lib/TimeUtils.h Show resolved Hide resolved
velox/functions/lib/TimeUtils.h Outdated Show resolved Hide resolved
velox/functions/lib/TimeUtils.h Outdated Show resolved Hide resolved
velox/functions/lib/TimeUtils.h Outdated Show resolved Hide resolved
@ccat3z ccat3z force-pushed the weekyear-fb branch 2 times, most recently from 032c621 to 31bb656 Compare September 24, 2024 04:31
@ccat3z
Copy link
Contributor Author

ccat3z commented Sep 24, 2024

@rui-mo I have updated the code. Could you please review it?

Presto test case is generated by the following query on presto 0.289:

with
  dates as (select
    date_parse(cast(year as varchar) || '-' || cast(month as varchar) || '-' || cast(day as varchar), '%Y-%m-%d') as d
  from
    (select * from unnest(sequence(2017, 2022)) as years(year)),
    (select * from (values
      (1, 1),
      (1, 2),
      (1, 3),
      (1, 4),
      (1, 5),
      (1, 6),
      (1, 7),
      (12, 25),
      (12, 26),
      (12, 27),
      (12, 28),
      (12, 29),
      (12, 30),
      (12, 31)
    ) as monthdays(month, day))
  )
select date_format(d, '%Y-%m-%d'), date_format(d, '%x') from dates
where day(d) in (1, 31) or year(d) != year_of_week(d);

@NEUpanning
Copy link
Contributor

@ccat3z After simple date time formatter was introduced in #10966 , could you help to add support of WEEK_YEAR which specifier is 'W' in simple date time formatter ? Thanks.

velox/functions/lib/TimeUtils.h Show resolved Hide resolved
velox/functions/lib/TimeUtils.h Outdated Show resolved Hide resolved
velox/functions/lib/TimeUtils.h Outdated Show resolved Hide resolved
@ccat3z
Copy link
Contributor Author

ccat3z commented Sep 25, 2024

@ccat3z After simple date time formatter was introduced in #10966 , could you help to add support of WEEK_YEAR which specifier is 'W' in simple date time formatter ? Thanks.

Sure. I will support it in this pr.

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

velox/functions/lib/TimeUtils.h Show resolved Hide resolved
velox/functions/lib/TimeUtils.h Outdated Show resolved Hide resolved
velox/functions/lib/TimeUtils.h Outdated Show resolved Hide resolved
velox/functions/lib/TimeUtils.h Show resolved Hide resolved
velox/functions/prestosql/tests/DateTimeFunctionsTest.cpp Outdated Show resolved Hide resolved
velox/functions/lib/tests/TimeUtilsTest.cpp Outdated Show resolved Hide resolved
velox/functions/lib/tests/TimeUtilsTest.cpp Outdated Show resolved Hide resolved
/// `minimalDaysInFirstWeek` is 1.
///
/// The algorithm refers to the getWeekYear algorithm in openjdk:
/// https://github.com/openjdk/jdk/blob/d9c67443f7d7f03efb2837b63ee2acc6113f737f/src/java.base/share/classes/java/util/GregorianCalendar.java#L2058
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ The upstream code is licensed under GPL 2.0 and while APIs and algorithms don't fall under copyright, afaik the implementation does. So this should probably not be merged without checking with some form of legal professional. @pedroerp @mbasmanova

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ccat3z @rui-mo is this different from the vendored date library we have at:

https://github.com/facebookincubator/velox/blob/main/velox/external/date/iso_week.h

we use it's iso_week functionality in a few places already. If it's the same logic, it would be nice to consolidate the usage.

Copy link
Contributor Author

@ccat3z ccat3z Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iso_week only support ISO 8601 standard week year (iso_week.h#L1519). The purpose of this implementation is to support non-ISO standard week year, which is required by Spark.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@assignUser @ccat3z @rui-mo I asked for guidance from license experts on this. Give me a day or two until they get back to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pedroerp Any updates?

@ccat3z ccat3z force-pushed the weekyear-fb branch 2 times, most recently from 5d7e994 to 174b4bc Compare October 16, 2024 04:07
@ccat3z
Copy link
Contributor Author

ccat3z commented Oct 16, 2024

@rui-mo I have updated the support for simple date time formatter. Could you please review it?
cc @NEUpanning

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating. Just some nits.

velox/core/QueryConfig.h Outdated Show resolved Hide resolved
velox/functions/sparksql/DateTimeFunctions.h Outdated Show resolved Hide resolved
velox/functions/sparksql/DateTimeFunctions.h Outdated Show resolved Hide resolved
velox/functions/sparksql/DateTimeFunctions.h Outdated Show resolved Hide resolved
velox/functions/sparksql/DateTimeFunctions.h Outdated Show resolved Hide resolved
private:
std::unique_ptr<char[]> literalBuf_;
size_t bufSize_;
std::vector<DateTimeToken> tokens_;
DateTimeFormatterType type_;

/// The first day-of-week varies by culture.
/// firstDayOfWeek is is a 1-based weekday number starting with Sunday. It
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "is is" repeated

@ccat3z
Copy link
Contributor Author

ccat3z commented Nov 5, 2024

@rui-mo I have updated the code. Could you please review it again?

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good to me overall % we might need community's feedback on #10930 (comment).

velox/core/QueryConfig.h Outdated Show resolved Hide resolved
velox/core/QueryConfig.h Outdated Show resolved Hide resolved
velox/functions/lib/TimeUtils.h Outdated Show resolved Hide resolved
velox/functions/lib/TimeUtils.h Outdated Show resolved Hide resolved
velox/functions/lib/tests/DateTimeFormatterTest.cpp Outdated Show resolved Hide resolved
velox/functions/lib/tests/TimeUtilsTest.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/DateTimeFunctions.h Outdated Show resolved Hide resolved
velox/functions/sparksql/tests/DateTimeFunctionsTest.cpp Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants