Skip to content
tpyssysa edited this page Mar 18, 2018 · 35 revisions

String handling and value types

Explanation of the contents of a topic page @ Week 1 Topic 1

Back to Week 1

Objective: Basic string manipulation (min: QString + QChar + QCodec, creating, concatenation + light-weight use with QLatin1String (or QStringView this one may be omitted)

Comment: There is no String type in Qt, so should be written on lower-case (E: I'll leave this here for reference. Actually I'll just leave all the comments here for reference, comment on them myself so the dialogue stays somewhere. I didn't do that for 1.00 but it will probably be a better idea to just leave all the comments on the topic content pages for future references.)

Comment: This is a huge topic and I'd try to avoid losing attendees' interest at the beginning. The minimum: QString + QChar + QCodec, creating, concatenation + light-weight use with QLatin1String or QStringView (the latter can be omitted). String literals could be already another topic. And QByteArray is relevant in cases, where coding is not needed. I'm afraid this is too much for one section, isn't it.

E: Ill bump String Literals and QStringView into the expert section and include them if it seems feasible after everything else has been seen to.

Comment: The important message is that we do have literals and it does make sense to use mutable strings in general, if literals are ok. Like file names.

E: This page including the next ones will mostly be copy and paste from the Qt documentation and/or wiki, I'll start reading and editing them in the next sprint (20.3 ->)

Comment: This is very detailed now and explains a lot of details about string handling. If I was a student, I'd be exhausted in reading this all and possibly I'd would be at least little confused, what I need in there exercise and what not. This is related to the questions, I have asked a couple of times that how exercise creation and material creation are synched. This is definitely one way to create the material, but I see a risk that at least our customers start thinking about quitting, if the amount of reading is this large. You may see in my slides that a lot of details are just not mentioned. This section can be as it is, but I'd really consider omitting some details in further sections not to mention too many details.

Beginner

Intermediate

  • What is QString?
  • QString vs QByteArray?
  • How does std::string/c string compare to QString?
  • How do you compare and manipulate strings efficiently (QLatin1String, QStringRef, ((QStringView)))?
  • What are value types in Qt?

Expert

  • What is QStringLiteral?
  • What is QStringView?

Course material content

As our next topic we're going to take a look at some basic string manipulation, interesting string facts and some value types. We'll also take a brief look at how to compare and manipulate strings efficiently.
To start, there is no String type in Qt, so in the case of writing, it should be written on lower-case. If you should walk away from this topic with something in mind, it is that Qt does have literals and that it does make sense to use mutable strings in situations where literals are ok, such as file names.

Comment: I'd change the wording little bit: "there is no String type". Sounds like you cannot handle strings. Something, like Qt has its own type, replacing standard String, QString... I would start comparing any APIs, but perhaps worth mentioning is that it's implicitly shared and reference to the place, where we discuss that sharing. That should explain, why there is a separate type.

We will start with string manipulation, after which we will take a comparative look at QString and QByteArray. After that, we will take a look at how std::string and c string compares to QString, followed by a brief discussion about how to compare and manipulate strings efficiently. Lastly, we will discuss value types in Qt.

String manipulation (QString, QChar, QCodec)

http://doc.qt.io/qt-5/qstring.html
http://doc.qt.io/qt-5/qchar.html
http://doc.qt.io/qt-5/qtextcodec.html

Basic string manipulation (min: QString + QChar + QCodec, creating, concatenation + light-weight use with QLatin1String

QChar

The QChar class provides a 16-bit Unicode character.

In Qt, Unicode characters are 16-bit entities without any markup or structure. This class represents such an entity. It is lightweight, so it can be used everywhere. Most compilers treat it like an unsigned short.

QChar provides a full complement of testing/classification functions, converting to and from other formats, converting from composed to decomposed Unicode, and trying to compare and case-convert if you ask it to.

The classification functions include functions like those in the standard C++ header (formerly <ctype.h>), but operating on the full range of Unicode characters, not just for the ASCII range. They all return true if the character is a certain type of character; otherwise they return false. These classification functions are isNull() (returns true if the character is '\0'), isPrint() (true if the character is any sort of printable character, including whitespace), isPunct() (any sort of punctation), isMark() (Unicode Mark), isLetter() (a letter), isNumber() (any sort of numeric character, not just 0-9), isLetterOrNumber(), and isDigit() (decimal digits). All of these are wrappers around category() which return the Unicode-defined category of each character. Some of these also calculate the derived properties (for example isSpace() returns true if the character is of category Separator_* or an exceptional code point from Other_Control category).

QChar also provides direction(), which indicates the "natural" writing direction of this character. The joiningType() function indicates how the character joins with its neighbors (needed mostly for Arabic or Syriac) and finally hasMirrored(), which indicates whether the character needs to be mirrored when it is printed in it's "unnatural" writing direction.

Composed Unicode characters (like ring) can be converted to decomposed Unicode ("a" followed by "ring above") by using decomposition().

In Unicode, comparison is not necessarily possible and case conversion is very difficult at best. Unicode, covering the "entire" world, also includes most of the world's case and sorting problems. operator==() and friends will do comparison based purely on the numeric Unicode value (code point) of the characters, and toUpper() and toLower() will do case changes when the character has a well-defined uppercase/lowercase equivalent. For locale-dependent comparisons, use QString::localeAwareCompare().

The conversion functions include unicode() (to a scalar), toLatin1() (to scalar, but converts all non-Latin-1 characters to 0), row() (gives the Unicode row), cell() (gives the Unicode cell), digitValue() (gives the integer value of any of the numerous digit characters), and a host of constructors.

QChar provides constructors and cast operators that make it easy to convert to and from traditional 8-bit chars. If you defined QT_NO_CAST_FROM_ASCII and QT_NO_CAST_TO_ASCII, as explained in the QString documentation, you will need to explicitly call fromLatin1(), or use QLatin1Char, to construct a QChar from an 8-bit char, and you will need to call toLatin1() to get the 8-bit value back.

QString

The QString class provides a Unicode character string.

QString stores a string of 16-bit QChars, where each QChar corresponds one Unicode 4.0 character. (Unicode characters with code values above 65535 are stored using surrogate pairs, i.e., two consecutive QChars.)

Unicode is an international standard that supports most of the writing systems in use today. It is a superset of US-ASCII (ANSI X3.4-1986) and Latin-1 (ISO 8859-1), and all the US-ASCII/Latin-1 characters are available at the same code positions.

Behind the scenes, QString uses implicit sharing (copy-on-write) to reduce memory usage and to avoid the needless copying of data. This also helps reduce the inherent overhead of storing 16-bit characters instead of 8-bit characters.

In addition to QString, Qt also provides the QByteArray class to store raw bytes and traditional 8-bit '\0'-terminated strings. For most purposes, QString is the class you want to use. It is used throughout the Qt API, and the Unicode support ensures that your applications will be easy to translate if you want to expand your application's market at some point. The two main cases where QByteArray is appropriate are when you need to store raw binary data, and when memory conservation is critical (like in embedded systems).

Initializing a String

One way to initialize a QString is simply to pass a const char * to its constructor. For example, the following code creates a QString of size 5 containing the data "Hello":

QString str = "Hello"; QString converts the const char * data into Unicode using the fromUtf8() function.

In all of the QString functions that take const char * parameters, the const char * is interpreted as a classic C-style '\0'-terminated string encoded in UTF-8. It is legal for the const char * parameter to be 0.

You can also provide string data as an array of QChars:

 static const QChar data[4] = { 0x0055, 0x006e, 0x10e3, 0x03a3 };
 QString str(data, 4);

QString makes a deep copy of the QChar data, so you can modify it later without experiencing side effects. (If for performance reasons you don't want to take a deep copy of the character data, use QString::fromRawData() instead.)

Another approach is to set the size of the string using resize() and to initialize the data character per character. QString uses 0-based indexes, just like C++ arrays. To access the character at a particular index position, you can use operator. On non-const strings, operator returns a reference to a character that can be used on the left side of an assignment. For example:

 QString str;
 str.resize(4);

 str[0] = QChar('U');
 str[1] = QChar('n');
 str[2] = QChar(0x10e3);
 str[3] = QChar(0x03a3);

For read-only access, an alternative syntax is to use the at() function:

 QString str;

 for (int i = 0; i < str.size(); ++i) {
      if (str.at(i) >= QChar('a') && str.at(i) <= QChar('f'))
           qDebug() << "Found character in range [a-f]";
 }

The at() function can be faster than operator, because it never causes a deep copy to occur. Alternatively, use the left(), right(), or mid() functions to extract several characters at a time.

A QString can embed '\0' characters (QChar::Null). The size() function always returns the size of the whole string, including embedded '\0' characters.

After a call to the resize() function, newly allocated characters have undefined values. To set all the characters in the string to a particular value, use the fill() function.

QString provides dozens of overloads designed to simplify string usage. For example, if you want to compare a QString with a string literal, you can write code like this and it will work as expected:

 QString str;

 if (str == "auto" || str == "extern"
          || str == "static" || str == "register") {
      // ...
 }

You can also pass string literals to functions that take QStrings as arguments, invoking the QString(const char *) constructor. Similarly, you can pass a QString to a function that takes a const char * argument using the qPrintable() macro which returns the given QString as a const char *. This is equivalent to calling .toLocal8Bit().constData().

Manipulating String Data

QString provides the following basic functions for modifying the character data: append(), prepend(), insert(), replace(), and remove(). For example:

 QString str = "and";
 str.prepend("rock ");     // str == "rock and"
 str.append(" roll");        // str == "rock and roll"
 str.replace(5, 3, "&");   // str == "rock & roll"

If you are building a QString gradually and know in advance approximately how many characters the QString will contain, you can call reserve(), asking QString to preallocate a certain amount of memory. You can also call capacity() to find out how much memory QString actually allocated.

The replace() and remove() functions' first two arguments are the position from which to start erasing and the number of characters that should be erased. If you want to replace all occurrences of a particular substring with another, use one of the two-parameter replace() overloads.

A frequent requirement is to remove whitespace characters from a string ('\n', '\t', ' ', etc.). If you want to remove whitespace from both ends of a QString, use the trimmed() function. If you want to remove whitespace from both ends and replace multiple consecutive whitespaces with a single space character within the string, use simplified().

If you want to find all occurrences of a particular character or substring in a QString, use the indexOf() or lastIndexOf() functions. The former searches forward starting from a given index position, the latter searches backward. Both return the index position of the character or substring if they find it; otherwise, they return -1. For example, here's a typical loop that finds all occurrences of a particular substring:

 QString str = "We must be <b>bold</b>, very <b>bold</b>";
 int j = 0;

 while ((j = str.indexOf("<b>", j)) != -1) {
      qDebug() << "Found <b> tag at index position" << j;
      ++j;
 }

QString provides many functions for converting numbers into strings and strings into numbers. See the arg() functions, the setNum() functions, the number() static functions, and the toInt(), toDouble(), and similar functions.

To get an upper- or lowercase version of a string use toUpper() or toLower().

Lists of strings are handled by the QStringList class. You can split a string into a list of strings using the split() function, and join a list of strings into a single string with an optional separator using QStringList::join(). You can obtain a list of strings from a string list that contain a particular substring or that match a particular QRegExp using the QStringList::filter() function.

Querying String Data If you want to see if a QString starts or ends with a particular substring use startsWith() or endsWith(). If you simply want to check whether a QString contains a particular character or substring, use the contains() function. If you want to find out how many times a particular character or substring occurs in the string, use count().

QStrings can be compared using overloaded operators such as operator<(), operator<=(), operator==(), operator>=(), and so on. Note that the comparison is based exclusively on the numeric Unicode values of the characters. It is very fast, but is not what a human would expect; the QString::localeAwareCompare() function is a better choice for sorting user-interface strings.

To obtain a pointer to the actual character data, call data() or constData(). These functions return a pointer to the beginning of the QChar data. The pointer is guaranteed to remain valid until a non-const function is called on the QString.

Converting Between 8-Bit Strings and Unicode Strings QString provides the following three functions that return a const char * version of the string as QByteArray: toUtf8(), toLatin1(), and toLocal8Bit().

toLatin1() returns a Latin-1 (ISO 8859-1) encoded 8-bit string. toUtf8() returns a UTF-8 encoded 8-bit string. UTF-8 is a superset of US-ASCII (ANSI X3.4-1986) that supports the entire Unicode character set through multibyte sequences. toLocal8Bit() returns an 8-bit string using the system's local encoding. To convert from one of these encodings, QString provides fromLatin1(), fromUtf8(), and fromLocal8Bit(). Other encodings are supported through the QTextCodec class.

As mentioned above, QString provides a lot of functions and operators that make it easy to interoperate with const char * strings. But this functionality is a double-edged sword: It makes QString more convenient to use if all strings are US-ASCII or Latin-1, but there is always the risk that an implicit conversion from or to const char * is done using the wrong 8-bit encoding. To minimize these risks, you can turn off these implicit conversions by defining the following two preprocessor symbols:

QT_NO_CAST_FROM_ASCII disables automatic conversions from C string literals and pointers to Unicode. QT_RESTRICTED_CAST_FROM_ASCII allows automatic conversions from C characters and character arrays, but disables automatic conversions from character pointers to Unicode. QT_NO_CAST_TO_ASCII disables automatic conversion from QString to C strings. One way to define these preprocessor symbols globally for your application is to add the following entry to your qmake project file:

 DEFINES += QT_NO_CAST_FROM_ASCII \
            QT_NO_CAST_TO_ASCII

You then need to explicitly call fromUtf8(), fromLatin1(), or fromLocal8Bit() to construct a QString from an 8-bit string, or use the lightweight QLatin1String class, for example:

 QString url = QLatin1String("http://www.unicode.org/");

Similarly, you must call toLatin1(), toUtf8(), or toLocal8Bit() explicitly to convert the QString to an 8-bit string. (Other encodings are supported through the QTextCodec class.)

Note for C Programmers Due to C++'s type system and the fact that QString is implicitly shared, QStrings may be treated like ints or other basic types. For example:

 QString Widget::boolToString(bool b)
 {
      QString result;
      if (b)
           result = "True";
      else
           result = "False";
      return result;
 }

The result variable, is a normal variable allocated on the stack. When return is called, and because we're returning by value, the copy constructor is called and a copy of the string is returned. No actual copying takes place thanks to the implicit sharing.

Distinction Between Null and Empty Strings For historical reasons, QString distinguishes between a null string and an empty string. A null string is a string that is initialized using QString's default constructor or by passing (const char *)0 to the constructor. An empty string is any string with size 0. A null string is always empty, but an empty string isn't necessarily null:

 QString().isNull();               // returns true
 QString().isEmpty();              // returns true

 QString("").isNull();             // returns false
 QString("").isEmpty();            // returns true

 QString("abc").isNull();          // returns false
 QString("abc").isEmpty();         // returns false

All functions except isNull() treat null strings the same as empty strings. For example, toUtf8().constData() returns a pointer to a '\0' character for a null string (not a null pointer), and QString() compares equal to QString(""). We recommend that you always use the isEmpty() function and avoid isNull().

Argument Formats

In member functions where an argument format can be specified (e.g., arg(), number()), the argument format can be one of the following:

  • e - format as [-]9.9e[+|-]999
  • E - format as [-]9.9E[+|-]999
  • f - format as [-]9.9
  • g - use e or f format, whichever is the most concise
  • G - use E or f format, whichever is the most concise

A precision is also specified with the argument format. For the 'e', 'E', and 'f' formats, the precision represents the number of digits after the decimal point. For the 'g' and 'G' formats, the precision represents the maximum number of significant digits (trailing zeroes are omitted).

More Efficient String Construction

Many strings are known at compile time. But the trivial constructor QString("Hello"), will copy the contents of the string, treating the contents as Latin-1. To avoid this one can use the QStringLiteral macro to directly create the required data at compile time. Constructing a QString out of the literal does then not cause any overhead at runtime.

A slightly less efficient way is to use QLatin1String. This class wraps a C string literal, precalculates it length at compile time and can then be used for faster comparison with QStrings and conversion to QStrings than a regular C string literal.

Using the QString '+' operator, it is easy to construct a complex string from multiple substrings. You will often write code like this:

 QString foo;
 QString type = "long";

 foo->setText(QLatin1String("vector<") + type + QLatin1String(">::iterator"));

 if (foo.startsWith("(" + type + ") 0x"))
          ...

There is nothing wrong with either of these string constructions, but there are a few hidden inefficiencies. Beginning with Qt 4.6, you can eliminate them.

First, multiple uses of the '+' operator usually means multiple memory allocations. When concatenating n substrings, where n > 2, there can be as many as n - 1 calls to the memory allocator.

In 4.6, an internal template class QStringBuilder has been added along with a few helper functions. This class is marked internal and does not appear in the documentation, because you aren't meant to instantiate it in your code. Its use will be automatic, as described below. The class is found in src/corelib/tools/qstringbuilder.cpp if you want to have a look at it.

QStringBuilder uses expression templates and reimplements the '%' operator so that when you use '%' for string concatenation instead of '+', multiple substring concatenations will be postponed until the final result is about to be assigned to a QString. At this point, the amount of memory required for the final result is known. The memory allocator is then called once to get the required space, and the substrings are copied into it one by one.

Additional efficiency is gained by inlining and reduced reference counting (the QString created from a QStringBuilder typically has a ref count of 1, whereas QString::append() needs an extra test).

There are two ways you can access this improved method of string construction. The straightforward way is to include QStringBuilder wherever you want to use it, and use the '%' operator instead of '+' when concatenating strings:

 #include <QStringBuilder>

 QString hello("hello");
 QStringRef el(&hello, 2, 3);
 QLatin1String world("world");
 QString message =  hello % el % world % QChar('!');

A more global approach which is the most convenient but not entirely source compatible, is to this define in your .pro file:

 DEFINES *= QT_USE_QSTRINGBUILDER

and the '+' will automatically be performed as the QStringBuilder '%' everywhere.

QCodec

The QTextCodec class provides conversions between text encodings.

Qt uses Unicode to store, draw and manipulate strings. In many situations you may wish to deal with data that uses a different encoding. For example, most Japanese documents are still stored in Shift-JIS or ISO 2022-JP, while Russian users often have their documents in KOI8-R or Windows-1251.

Qt provides a set of QTextCodec classes to help with converting non-Unicode formats to and from Unicode. You can also create your own codec classes.

The supported encodings are:

Big5
Big5-HKSCS
CP949
EUC-JP
EUC-KR
GB18030
HP-ROMAN8
IBM 850
IBM 866
IBM 874
ISO 2022-JP
ISO 8859-1 to 10
ISO 8859-13 to 16
Iscii-Bng, Dev, Gjr, Knd, Mlm, Ori, Pnj, Tlg, and Tml
KOI8-R
KOI8-U
Macintosh
Shift-JIS
TIS-620
TSCII
UTF-8
UTF-16
UTF-16BE
UTF-16LE
UTF-32
UTF-32BE
UTF-32LE
Windows-1250 to 1258

If Qt is compiled with ICU support enabled, most codecs supported by ICU will also be available to the application.

QTextCodecs can be used as follows to convert some locally encoded string to Unicode. Suppose you have some string encoded in Russian KOI8-R encoding, and want to convert it to Unicode. The simple way to do it is like this:

 QByteArray encodedString = "...";
 QTextCodec *codec = QTextCodec::codecForName("KOI8-R");
 QString string = codec->toUnicode(encodedString);

After this, string holds the text converted to Unicode. Converting a string from Unicode to the local encoding is just as easy:

 QString string = "...";
 QTextCodec *codec = QTextCodec::codecForName("KOI8-R");
 QByteArray encodedString = codec->fromUnicode(string);

To read or write files in various encodings, use QTextStream and its setCodec() function. See the Codecs example for an application of QTextCodec to file I/O.

Some care must be taken when trying to convert the data in chunks, for example, when receiving it over a network. In such cases it is possible that a multi-byte character will be split over two chunks. At best this might result in the loss of a character and at worst cause the entire conversion to fail.

The approach to use in these situations is to create a QTextDecoder object for the codec and use this QTextDecoder for the whole decoding process, as shown below:

 QTextCodec *codec = QTextCodec::codecForName("Shift-JIS");
 QTextDecoder *decoder = codec->makeDecoder();

 QString string;
 while (new_data_available()) {
      QByteArray chunk = get_new_data();
      string += decoder->toUnicode(chunk);
 }
 delete decoder;

The QTextDecoder object maintains state between chunks and therefore works correctly even if a multi-byte character is split between chunks.

If you need to create your own codec class, you can check out the instructions for doing so at http://doc.qt.io/qt-5/qtextcodec.html#creating-your-own-codec-class.

QString vs QByteArray

QByteArray

http://doc.qt.io/qt-5/qbytearray.html

The QByteArray class provides an array of bytes.

QByteArray can be used to store both raw bytes (including '\0's) and traditional 8-bit '\0'-terminated strings. Using QByteArray is much more convenient than using const char *. Behind the scenes, it always ensures that the data is followed by a '\0' terminator, and uses implicit sharing (copy-on-write) to reduce memory usage and avoid needless copying of data.

In addition to QByteArray, Qt also provides the QString class to store string data. For most purposes, QString is the class you want to use. It stores 16-bit Unicode characters, making it easy to store non-ASCII/non-Latin-1 characters in your application. Furthermore, QString is used throughout in the Qt API. The two main cases where QByteArray is appropriate are when you need to store raw binary data, and when memory conservation is critical (e.g., with Qt for Embedded Linux).

One way to initialize a QByteArray is simply to pass a const char * to its constructor. For example, the following code creates a byte array of size 5 containing the data "Hello":

 QByteArray ba("Hello");

Although the size() is 5, the byte array also maintains an extra '\0' character at the end so that if a function is used that asks for a pointer to the underlying data (e.g. a call to data()), the data pointed to is guaranteed to be '\0'-terminated.

QByteArray makes a deep copy of the const char * data, so you can modify it later without experiencing side effects. (If for performance reasons you don't want to take a deep copy of the character data, use QByteArray::fromRawData() instead.)

Another approach is to set the size of the array using resize() and to initialize the data byte per byte. QByteArray uses 0-based indexes, just like C++ arrays. To access the byte at a particular index position, you can use operator. On non-const byte arrays, operator returns a reference to a byte that can be used on the left side of an assignment. For example:

 QByteArray ba;
 ba.resize(5);
 ba[0] = 0x3c;
 ba[1] = 0xb8;
 ba[2] = 0x64;
 ba[3] = 0x18;
 ba[4] = 0xca;

For read-only access, an alternative syntax is to use at():

 for (int i = 0; i < ba.size(); ++i) {
      if (ba.at(i) >= 'a' && ba.at(i) <= 'f')
           cout << "Found character in range [a-f]" << endl;
 }

at() can be faster than operator, because it never causes a deep copy to occur.

To extract many bytes at a time, use left(), right(), or mid().

A QByteArray can embed '\0' bytes. The size() function always returns the size of the whole array, including embedded '\0' bytes, but excluding the terminating '\0' added by QByteArray. For example:

 QByteArray ba1("ca\0r\0t");
 ba1.size();                     // Returns 2.
 ba1.constData();                // Returns "ca" with terminating \0.

 QByteArray ba2("ca\0r\0t", 3);
 ba2.size();                     // Returns 3.
 ba2.constData();                // Returns "ca\0" with terminating \0.

 QByteArray ba3("ca\0r\0t", 4);
 ba3.size();                     // Returns 4.
 ba3.constData();                // Returns "ca\0r" with terminating \0.

 const char cart[] = {'c', 'a', '\0', 'r', '\0', 't'};
 QByteArray ba4(QByteArray::fromRawData(cart, 6));
 ba4.size();                     // Returns 6.
 ba4.constData();                // Returns "ca\0r\0t" without terminating \0.

If you want to obtain the length of the data up to and excluding the first '\0' character, call qstrlen() on the byte array.

After a call to resize(), newly allocated bytes have undefined values. To set all the bytes to a particular value, call fill().

To obtain a pointer to the actual character data, call data() or constData(). These functions return a pointer to the beginning of the data. The pointer is guaranteed to remain valid until a non-const function is called on the QByteArray. It is also guaranteed that the data ends with a '\0' byte unless the QByteArray was created from a raw data. This '\0' byte is automatically provided by QByteArray and is not counted in size().

QByteArray provides the following basic functions for modifying the byte data: append(), prepend(), insert(), replace(), and remove(). For example:

 QByteArray x("and");
 x.prepend("rock ");         // x == "rock and"
 x.append(" roll");          // x == "rock and roll"
 x.replace(5, 3, "&");       // x == "rock & roll"

The replace() and remove() functions' first two arguments are the position from which to start erasing and the number of bytes that should be erased.

When you append() data to a non-empty array, the array will be reallocated and the new data copied to it. You can avoid this behavior by calling reserve(), which preallocates a certain amount of memory. You can also call capacity() to find out how much memory QByteArray actually allocated. Data appended to an empty array is not copied.

A frequent requirement is to remove whitespace characters from a byte array ('\n', '\t', ' ', etc.). If you want to remove whitespace from both ends of a QByteArray, use trimmed(). If you want to remove whitespace from both ends and replace multiple consecutive whitespaces with a single space character within the byte array, use simplified().

If you want to find all occurrences of a particular character or substring in a QByteArray, use indexOf() or lastIndexOf(). The former searches forward starting from a given index position, the latter searches backward. Both return the index position of the character or substring if they find it; otherwise, they return -1. For example, here's a typical loop that finds all occurrences of a particular substring:

 QByteArray ba("We must be <b>bold</b>, very <b>bold</b>");
 int j = 0;
 while ((j = ba.indexOf("<b>", j)) != -1) {
      cout << "Found <b> tag at index position " << j << endl;
      ++j;
 }

If you simply want to check whether a QByteArray contains a particular character or substring, use contains(). If you want to find out how many times a particular character or substring occurs in the byte array, use count(). If you want to replace all occurrences of a particular value with another, use one of the two-parameter replace() overloads.

QByteArrays can be compared using overloaded operators such as operator<(), operator<=(), operator==(), operator>=(), and so on. The comparison is based exclusively on the numeric values of the characters and is very fast, but is not what a human would expect. QString::localeAwareCompare() is a better choice for sorting user-interface strings.

For historical reasons, QByteArray distinguishes between a null byte array and an empty byte array. A null byte array is a byte array that is initialized using QByteArray's default constructor or by passing (const char *)0 to the constructor. An empty byte array is any byte array with size 0. A null byte array is always empty, but an empty byte array isn't necessarily null:

 QByteArray().isNull();          // returns true
 QByteArray().isEmpty();         // returns true

 QByteArray("").isNull();        // returns false
 QByteArray("").isEmpty();       // returns true

 QByteArray("abc").isNull();     // returns false
 QByteArray("abc").isEmpty();    // returns false

All functions except isNull() treat null byte arrays the same as empty byte arrays. For example, data() returns a pointer to a '\0' character for a null byte array (not a null pointer), and QByteArray() compares equal to QByteArray(""). We recommend that you always use isEmpty() and avoid isNull().

Functions that perform conversions between numeric data types and strings are performed in the C locale, irrespective of the user's locale settings. Use QString to perform locale-aware conversions between numbers and strings.

In QByteArray, the notion of uppercase and lowercase and of which character is greater than or less than another character is locale dependent. This affects functions that support a case insensitive option or that compare or lowercase or uppercase their arguments. Case insensitive operations and comparisons will be accurate if both strings contain only ASCII characters. (If $LC_CTYPE is set, most Unix systems do "the right thing".) Functions that this affects include contains(), indexOf(), lastIndexOf(), operator<(), operator<=(), operator>(), operator>=(), toLower() and toUpper().

This issue does not apply to QStrings since they represent characters using Unicode.

std::string/c string vs QString

http://2016-aalto-c.mooc.fi/fi/Module_2/index.html#04_strings

Comparing and manipulating strings efficiently

http://doc.qt.io/qt-5/qlatin1string.html
http://doc.qt.io/qt-5/qstringref.html
https://doc.qt.io/qt-5/qstringview.html

QLatin1String

The QLatin1String class provides a thin wrapper around an US-ASCII/Latin-1 encoded string literal.

Many of QString's member functions are overloaded to accept const char * instead of QString. This includes the copy constructor, the assignment operator, the comparison operators, and various other functions such as insert(), replace(), and indexOf(). These functions are usually optimized to avoid constructing a QString object for the const char * data. For example, assuming str is a QString,

if (str == "auto" || str == "extern"
          || str == "static" || str == "register") {
      ...
 }

is much faster than

 if (str == QString("auto") || str == QString("extern")
          || str == QString("static") || str == QString("register")) {
      ...
 }

because it doesn't construct four temporary QString objects and make a deep copy of the character data.

Applications that define QT_NO_CAST_FROM_ASCII (as explained in the QString documentation) don't have access to QString's const char * API. To provide an efficient way of specifying constant Latin-1 strings, Qt provides the QLatin1String, which is just a very thin wrapper around a const char *. Using QLatin1String, the example code above becomes

 if (str == QLatin1String("auto")
          || str == QLatin1String("extern")
          || str == QLatin1String("static")
          || str == QLatin1String("register") {
      ...
  }

This is a bit longer to type, but it provides exactly the same benefits as the first version of the code, and is faster than converting the Latin-1 strings using QString::fromLatin1().

Thanks to the QString(QLatin1String) constructor, QLatin1String can be used everywhere a QString is expected. For example:

 QLabel *label = new QLabel(QLatin1String("MOD"), this);

Note: If the function you're calling with a QLatin1String argument isn't actually overloaded to take QLatin1String, the implicit conversion to QString will trigger a memory allocation, which is usually what you want to avoid by using QLatin1String in the first place. In those cases, using QStringLiteral may be the better option.

QStringRef

Value types in Qt

http://doc.qt.io/qt-5/custom-types.html


Exhaustive reference material mentioned in this topic

Further reading topics/links:

Clone this wiki locally