Skip to content

Commit

Permalink
Merge branch 'dev' into update_value_types_string
Browse files Browse the repository at this point in the history
  • Loading branch information
JPryce-Aklundh authored Jan 22, 2024
2 parents c882abf + b55982d commit 943503a
Show file tree
Hide file tree
Showing 10 changed files with 528 additions and 179 deletions.
4 changes: 2 additions & 2 deletions antora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@ nav:
asciidoc:
attributes:
neo4j-version: '5'
neo4j-version-minor: '5.16'
neo4j-version-exact: '5.16.0'
neo4j-version-minor: '5.17'
neo4j-version-exact: '5.17.0'
29 changes: 29 additions & 0 deletions modules/ROOT/pages/clauses/where.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -338,6 +338,35 @@ The `name` and `age` for `Peter` are are returned because his name contains "ete
|===


[[match-string-is-normalized]]
=== Checking if a `STRING` `IS NORMALIZED`

The `IS NORMALIZED` operator (introduced in Neo4j 5.17) is used to check whether the given `STRING` is in the `NFC` Unicode normalization form:

.Query
[source, cypher]
----
MATCH (n:Person)
WHERE n.name IS NORMALIZED
RETURN n.name AS normalizedNames
----

The given `STRING` values contain only normalized Unicode characters, therefore all the matched `name` properties are returned.
For more information, see the section about the xref:syntax/operators.adoc#match-string-is-normalized[normalization operator].

.Result
[role="queryresult",options="header,footer",cols="1*<m"]
|===
| normalizedNames
| 'Andy'
| 'Timothy'
| 'Peter'
2+|Rows: 1
|===

Note that the `IS NORMALIZED` operator returns `null` when used on a non-`STRING` value.
For example, `RETURN 1 IS NORMALIZED` returns `null`.

[[match-string-negation]]
=== String matching negation

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,47 @@ New features are added to the language continuously, and occasionally, some feat
This section lists all of the features that have been removed, deprecated, added, or extended in different Cypher versions.
Replacement syntax for deprecated and removed features are also indicated.

[[cypher-deprecations-additions-removals-5.17]]
== Neo4j 5.17

=== New features

[cols="2", options="header"]
|===
| Feature
| Details

a|
label:functionality[]
label:new[]

[source, cypher, role=noheader]
----
RETURN normalize("string", NFC)
----

| Introduction of a xref::functions/string.adoc#functions-normalize[normalize()] function.
This function normalizes a `STRING` according to the specified normalization form, which can be of type `NFC`, `NFD`, `NFKC`, or `NFKD`.

a|
label:functionality[]
label:new[]

[source, cypher, role=noheader]
----
IS [NOT] [NFC \| NFD \| NFKC \| NFKD] NORMALIZED
----

[source, cypher, role=noheader]
----
RETURN "string" IS NORMALIZED
----

| Introduction of an xref::syntax/operators.adoc#match-string-is-normalized[IS NORMALIZED] operator.
The operator can be used to check if a `STRING` is normalized according to the specified normalization form, which can be of type `NFC`, `NFD`, `NFKC`, or `NFKD`.

|===

[[cypher-deprecations-additions-removals-5.16]]
== Neo4j 5.16

Expand Down Expand Up @@ -146,7 +187,7 @@ label:updated[]
MATCH (n:Label) WHERE $param IS :: STRING NOT NULL AND n.prop = $param
----

| `IS :: STRING NOT NULL` is now an xref:indexes/search-performance-indexes/using-indexes.adoc#text-indexes-type-predicate-expressions[index-compatible predicate].
| `IS :: STRING NOT NULL` is now an xref:indexes/search-performance-indexes/using-indexes.adoc#text-indexes-type-predicate-expressions[index-compatible predicate].

|===

Expand Down
8 changes: 7 additions & 1 deletion modules/ROOT/pages/functions/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -505,6 +505,12 @@ These functions are used to manipulate `STRING` values or to create a `STRING` r
| `ltrim(input :: STRING) :: STRING`
| Returns the given `STRING` with leading whitespace removed.

1.2+| xref::functions/string.adoc#functions-normalize[`normalize()`]
| `normalize(input :: STRING) :: STRING`
| Returns the given `STRING` normalized according to the normalization form `NFC`. label:new[Introduced in 5.17]
| `normalize(input :: STRING, normalForm = NFC :: [NFC, NFD, NFKC, NFKD]) :: STRING`
| Returns the given `STRING` normalized according to the specified normalization form. label:new[Introduced in 5.17]

1.1+| xref::functions/string.adoc#functions-replace[`replace()`]
| `replace(original :: STRING, search :: STRING, replace :: STRING) :: STRING`
| Returns a `STRING` in which all occurrences of a specified search `STRING` in the given `STRING` have been replaced by another (specified) replacement `STRING`.
Expand Down Expand Up @@ -773,7 +779,7 @@ Graph functions provide information about the constituent graphs in composite da
|===
| Function | Signature | Description
1.1+| xref:functions/graph.adoc#functions-graph-by-elementid[`graph.byElementId()`] | `USE graph.byElementId(elementId :: STRING)` | Resolves the constituent graph to which a given element id belongs.
label:new[Introduced in Neo4j 5.13]
label:new[Introduced in 5.13]
1.1+| xref:functions/graph.adoc#functions-graph-byname[`graph.byName()`] | `USE graph.byName(name :: STRING)` | Resolves a constituent graph by name.
1.1+| xref:functions/graph.adoc#functions-graph-names[`graph.names()`] | `graph.names() :: LIST<STRING>` | Returns a list containing the names of all graphs in the current composite database.
1.1+| xref:functions/graph.adoc#functions-graph-names[`graph.propertiesByName()`] | `graph.propertiesByName(name :: STRING) :: MAP` | Returns a map containing the properties associated with the given graph.
Expand Down
174 changes: 174 additions & 0 deletions modules/ROOT/pages/functions/string.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,180 @@ RETURN ltrim(' hello')
======



[[functions-normalize]]
== normalize()

_This feature was introduced in Neo4j 5.17._

`normalize()` returns the given `STRING` normalized using the `NFC` Unicode normalization form.

[NOTE]
====
Unicode normalization is a process that transforms different representations of the same string into a standardized form.
For more information, see the documentation for link:https://unicode.org/reports/tr15/#Norm_Forms[Unicode normalization forms].
====

The `normalize()` function is useful for converting `STRING` values into comparable forms.
When comparing two `STRING` values, it is their Unicode codepoints that are compared.
In Unicode, a codepoint for a character that looks the same may be represented by two, or more, different codepoints.
For example, the character `<` can be represented as `\uFE64` (﹤) or `\u003C` (<).
To the human eye, the characters may appear identical.
However, if compared, Cypher will return false as `\uFE64` does not equal `\u003C`.
Using the `normalize()` function, it is possible to
normalize the codepoint `\uFE64` to `\u003C`, creating a single codepoint representation, allowing them to be successfully compared.

*Syntax:*

[source, syntax, role="noheader"]
----
normalize(input)
----

*Returns:*

|===

| `STRING`

|===

*Arguments:*

[options="header"]
|===
| Name | Description

| `input`
| An expression that returns a `STRING`.

|===

*Considerations:*

|===

| `normalize(null)` returns `null`.

|===


.+normalize()+
======
.Query
[source, cypher, indent=0]
----
RETURN normalize('\u212B') = '\u00C5' AS result
----
.Result
[role="queryresult",options="header,footer",cols="1*<m"]
|===
| +result+
| +true+
1+d|Rows: 1
|===
======

To check if a `STRING` is normalized, use the xref:syntax/operators.adoc#match-string-is-normalized[`IS NORMALIZED`] operator.

[[functions-normalize-with-normal-form]]
== normalize(), with specified normal form

_This feature was introduced in Neo4j 5.17._

`normalize()` returns the given `STRING` normalized using the specified normalization form.
The normalization form can be of type `NFC`, `NFD`, `NFKC` or `NFKD`.

There are two main types of normalization forms:

* *Canonical equivalence*: The `NFC` (default) and `NFD` are forms of canonical equivalence.
This means that codepoints that represent the same abstract character will
be normalized to the same codepoint (and have the same appearance and behavior).
The `NFC` form will always give the *composed* canonical form (in which the combined codes are replaced with a single representation, if possible).
The`NFD` form gives the *decomposed* form (the opposite of the composed form, which converts the combined codepoints into a split form if possible).

* *Compatability normalization*: `NFKC` and `NFKD` are forms of compatibility normalization.
All canonically equivalent sequences are compatible, but not all compatible sequences are canonical.
This means that a character normalized in `NFC` or `NFD` should also be normalized in `NFKC` and `NFKD`.
Other characters with only slight differences in appearance should be compatibly equivalent.

For example, the Greek Upsilon with Acute and Hook Symbol `ϓ` can be represented by the Unicode codepoint: `\u03D3`.

* Normalized in `NFC`: `\u03D3` Greek Upsilon with Acute and Hook Symbol (ϓ)
* Normalized in `NFD`: `\u03D2\u0301` Greek Upsilon with Hook Symbol + Combining Acute Accent (ϓ)
* Normalized in `NFKC`: `\u038E` Greek Capital Letter Upsilon with Tonos (Ύ)
* Normalized in `NFKD`: `\u03A5\u0301` Greek Capital Letter Upsilon + Combining Acute Accent (Ύ)

In the compatibility normalization forms (`NFKC` and `NFKD`) the character is visibly different as it no longer contains the hook symbol.

*Syntax:*

[source, syntax, role="noheader"]
----
normalize(input, normalForm)
----

*Returns:*

|===

| `STRING`

|===

*Arguments:*

[options="header"]
|===
| Name | Description

| `input`
| An expression that returns a `STRING`.


| `normalForm`
| A keyword specifying the normal form, can be `NFC`, `NFD`, `NFKC` or `NFKD`.

|===

*Considerations:*

|===

| `normalize(null, NFC)` returns `null`.

|===


.+normalize()+
======
.Query
[source, cypher, indent=0]
----
RETURN normalize('\uFE64', NFKC) = '\u003C' AS result
----
.Result
[role="queryresult",options="header,footer",cols="1*<m"]
|===
| +result+
| +true+
1+d|Rows: 1
|===
======

To check if a `STRING` is normalized in a specific Unicode normal form, use the xref:syntax/operators.adoc#match-string-is-normalized-specified-normal-form[`IS NORMALIZED`] operator with a specified normalization form.

[[functions-replace]]
== replace()

Expand Down
Loading

0 comments on commit 943503a

Please sign in to comment.