Skip to content

Latest commit

 

History

History
 
 

case_sensitivity

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

Case sensitivity

In Cassandra

Cassandra identifiers, such as keyspace, table and column names, are case-insensitive by default. For example, if you create the following table:

cqlsh> CREATE TABLE test.FooBar(k int PRIMARY KEY);

Cassandra actually stores the table name as lower-case:

cqlsh> SELECT table_name FROM system_schema.tables WHERE keyspace_name = 'test';

 table_name
------------
     foobar

And you can use whatever case you want in your queries:

cqlsh> SELECT * FROM test.FooBar;
cqlsh> SELECT * FROM test.foobar;
cqlsh> SELECT * FROM test.FoObAr;

However, if you enclose an identifier in double quotes, it becomes case-sensitive:

cqlsh> CREATE TABLE test."FooBar"(k int PRIMARY KEY);
cqlsh> SELECT table_name FROM system_schema.tables WHERE keyspace_name = 'test';

 table_name
------------
     FooBar

You now have to use the exact, quoted form in your queries:

cqlsh> SELECT * FROM test."FooBar";

If you forget to quote, or use the wrong case, you'll get an error:

cqlsh> SELECT * FROM test.Foobar;
InvalidRequest: Error from server: code=2200 [Invalid query] message="table foobar does not exist"

cqlsh> SELECT * FROM test."FOOBAR";
InvalidRequest: Error from server: code=2200 [Invalid query] message="table FOOBAR does not exist"

In the driver

When we deal with identifiers, we use the following definitions:

  • CQL form: how you would type it in a CQL query. In other words, case-sensitive if it's quoted, case-insensitive otherwise;
  • internal form: how it is stored in system tables. In other words, never quoted and always in its exact case.

In previous driver versions, identifiers were represented as raw strings. The problem is that this does not capture the form; when a method processed an identifier, it always had to know where it came from and what form it was in, and possibly convert it. This led a lot of internal complexity, and recurring bugs.

To address this issue, driver 4+ uses a wrapper: CqlIdentifier. Its API methods are always explicit about the form:

CqlIdentifier caseInsensitiveId = CqlIdentifier.fromCql("FooBar");
System.out.println(caseInsensitiveId.asInternal());             // foobar
System.out.println(caseInsensitiveId.asCql(/*pretty=*/ false)); // "foobar"
System.out.println(caseInsensitiveId.asCql(true));              // foobar

// Double-quotes need to be escaped inside Java strings
CqlIdentifier caseSensitiveId = CqlIdentifier.fromCql("\"FooBar\"");
System.out.println(caseSensitiveId.asInternal());               // FooBar
System.out.println(caseSensitiveId.asCql(true));                // "FooBar"
System.out.println(caseSensitiveId.asCql(false));               // "FooBar"

CqlIdentifier caseSensitiveId2 = CqlIdentifier.fromInternal("FooBar");
assert caseSensitiveId.equals(caseSensitiveId2);

Side note: as shown above, asCql has a pretty-printing option that omits the quotes if they are not necessary. This looks nicer, but is slightly more expensive because it requires parsing the string.

The driver API uses CqlIdentifier whenever it produces or consumes an identifier. For example:

  • getting the keyspace from a table's metadata: CqlIdentifier keyspaceId = tableMetadata.getKeyspace();
  • setting the keyspace when building a session: CqlSession.builder().withKeyspace(keyspaceId).

For "consuming" methods, string overloads are also provided for convenience, for example SessionBuilder.withKeyspace(String).

  • getters and setters of "data container" types Row, UdtValue, and BoundStatement follow special rules described here (these methods are treated apart because they are generally invoked very often, and therefore avoid to create CqlIdentifier instances internally);
  • in other cases, the string is always assumed to be in CQL form, and converted on the fly with CqlIdentifier.fromCql.

Good practices

As should be clear by now, case sensitivity introduces a lot of extra (and arguably unnecessary) complexity.

The Java Driver team's recommendation is:

Always use case-insensitive identifiers in your data model.

You'll never have to create CqlIdentifier instances in your application code, nor think about CQL/internal forms. When you pass an identifier to the driver, use the string-based methods. When the driver returns an identifier and you need to convert it into a string, use asInternal().

If you worry about readability, use snake case (shopping_cart), or simply stick to camel case (ShoppingCart) and ignore the fact that Cassandra lower-cases everything internally.

The only reason to use case sensitivity should be if you don't control the data model. In that case, either pass quoted strings to the driver, or use CqlIdentifier instances (stored as constants to avoid creating them over and over).