Chromium String usage

Types of Strings
In the Chromium code base, we use std::string and string16.  WebKit uses WTF::string instead, which is patterned on std::string, but is a slightly different class (see the webkit docs for their guidelines, we’ll only talk about chromium here).  We also have a StringPiece class, which is basically a pointer to a string that is owned elsewhere with a length of how many characters from the other string form this “token”. Finally, there is also WebCString and WebString, which is used by the webkit glue layer.

String Encodings
We use a variety of encodings in the code base. UTF-8 is most common, but we also use UTF-16, UCS-2, and others.

  • UTF-8 is an encoding where characters are one or more bytes (up to 6) in length. Each byte indicates whether another byte follows. ASCII text (common in HTML, CCS, and JavaScript) uses one byte per character.
  • UTF-16 is an encoding where all characters are at least 2 bytes long. There are also 4 byte UTF-16 characters (a pair of two 16-bit code units ; surrogate pair). While they are somewhat rare, 4 byte characters can occur in Chinese, not just languages like ancient Sumerian and Linear B. Most of Emoji characters are also represented in 4 bytes.
  • UCS-2 is an older format that is very similar to UTF-16 (think of UTF-16 with 2 byte characters only, no 4 byte characters).
  • ASCII is the older 7-bit encoding which includes 0-9, a-z, A-Z, and a few common punctuation characters, but not much else. ASCII is always one byte per character.

When to use which encoding
The most important rule here is the meta-rule, code in the style of the surrounding code. In the frontend, we use std::string/char for UTF-8 and string16/char16 for UTF-16 on all platforms.  Even though std::string is encoding agnostic, we only put UTF-8 into it. std::wstring/wchar_t is banned in cross-platform code (in part because it's differently-sized on different platforms), and only allowed in Windows-specific code where appropriate to interface with native APIs (which often take wchar_t* or similar). Most UI strings are UTF-16. URLs are generally UTF-8. Strings in the webkit glue layer are typically UTF-16 with several exceptions.

The GURL class and strings
One common data type using strings is the GURL class. The constructor takes a std::string in UTF-8 for the URL itself. If you have a GURL, you can use the spec() method to get the std::string for the entire URL, or you can use component methods to get parsed parts, such as scheme(), host(), port(), path(), query(), and ref(), all of which return a std::string. All the parts of the GURL with the exception of the ref string will be pure ASCII, the ref string may have UTF-8 characters which are not also ASCII characters.

Guidelines for string use in our codebase

  • Use std::string from the C++ standard library for normal use with strings
  • Length checking - if checking for empty, prefer “string.empty():” to “string.length() == 0”
  • When you make a string constant at the top of the file, use char[] instead of a std::string:
    • ex) const char kFoo[] = “foo”;
    • This is part of our style guidelines. It also makes faster code because there are no destructors, and more maintainable code because there are no shutdown order dependencies.
  • There are many handy routines which operate on strings. You can use IntToString() if you want to do atoi(), and StringPrintf() if you need the full power of printf. You can use WriteInto() to make a C++ string writeable by a C API. StringPiece makes it easy and efficient to write functions that take both C++ and C style strings.
  • For function input parameters, prefer to pass a string by const reference instead of making a new copy.
  • For function output parameters, it is OK to either return a new string or pass a pointer to a string. Performance wise, there isn’t much difference.
  • Often, efficiency is not paramount, but sometimes it is - when working in an inner loop, pay special attention to minimize the amount of string construction, and the number of temporary copies made.
    • When you use std::string, you can end up constructing lots of temporary string objects if you aren’t careful, or copying the string lots of times. Each copy makes a call to malloc, which needs a lock, and slows things down. Try to minimize how many temporaries get constructed.
    • When building a string, prefer “string1 += string2; string1 += string3;” to “string1 = string1 + string2 + string3;”  Better still, if you are doing lots of this, consider a string builder class.
  • For localization, we have the ICU library, with many useful helpers to do things like find word boundaries or convert to lowercase or uppercase correctly for the current locale.
  • We try to avoid repeated conversions between string encoding formats, as converting them is not cheap. It's generally OK to convert once, but if we have code that toggles the encoding six times as a string goes through some pipeline, that should be fixed.

Chromium String usage的更多相关文章

  1. String StringBuffer和StringBuilder区别及性能

    结论: (1)如果要操作少量的数据用 String: (2)多线程操作字符串缓冲区下操作大量数据 StringBuffer: (3)单线程操作字符串缓冲区下操作大量数据 StringBuilder(推 ...

  2. C++ int转string(stringstream可转更多类型)

    一.使用atoi 说明: itoa(   int   value,   char   *string,   int   radix   );      第一个参数:你要转化的int;      第二个 ...

  3. Go-15-flag.String 获取系统参数

    场景: 启动应用程序时,需要传入系统参数.例如:./start --b /notebook --p true --n 8 package main import ( "fmt" f ...

  4. JS魔法堂:不完全国际化&本地化手册 之 实战篇

    前言  最近加入到新项目组负责前端技术预研和选型,其中涉及到一个熟悉又陌生的需求--国际化&本地化.熟悉的是之前的项目也玩过,陌生的是之前的实现仅仅停留在"有"的阶段而已. ...

  5. ZooKeeper之FastLeaderElection算法详解

    当我们把zookeeper服务启动时,首先需要做的一件事就是leader选举,zookeeper中leader选举的算法有3种,包括LeaderElection算法.AuthFastLeaderEle ...

  6. 如何用Node编写命令行工具

    0. 命令行工具 当全局安装模块之后,我们可以在控制台下执行指定的命令来运行操作,如果npm一样.我把这样的模块称之为命令行工具模块(如理解有偏颇,欢迎指正) 1.用Node编写命令行工具 在Node ...

  7. Java编程思想重点笔记(Java开发必看)

    Java编程思想重点笔记(Java开发必看)   Java编程思想,Java学习必读经典,不管是初学者还是大牛都值得一读,这里总结书中的重点知识,这些知识不仅经常出现在各大知名公司的笔试面试过程中,而 ...

  8. SharePoint 2013 Apps TokenHelper SharePointContext OAuth Provider-Hosted App (抄袭,测试 csc.rsp 用)

    namespace Microshaoft.SharePointApps { using Microsoft.IdentityModel; using Microsoft.IdentityModel. ...

  9. flag--命令行参数解析之StringVar

    func StringVar func StringVar(p *string, name string, value string, usage string) StringVar定义了一个有指定名 ...

随机推荐

  1. TCP/IP、UDP、 Http、Socket的差别

    网络由上往下分为: 表示层和应用层 :HTTP协议(基于传输层的TCP协议,主要解决怎样包装数据) 会话层 传输层: TCP协议(基于网络层的IP协议).TPC/IP协议(主要解决数据怎样在网络中传输 ...

  2. 为什么button在设置标题时要用一个方法,而不像lable一样直接用一个属性

    为什么button在设置标题时要用一个方法.而不像lable一样直接用一个属性 原因是有时我们对      button做一次点击,须要改变button的标题.仅仅实用方法才干做到,而label是标签 ...

  3. 使用React Hook后的一些体会

    一.前言 距离React Hook发布已经有一段时间了,笔者在之前也一直在等待机会来尝试一下Hook,这个尝试不是像文档中介绍的可以先在已有项目中的小组件和新组件上尝试,而是尝试用Hook的方式构建整 ...

  4. Windows 绝赞应用(该网站收集了日常好用的工具和软件)

    在我们的电脑使用过程中,或多或少的被流氓软件恶心过.流氓软件之所以这么流氓全是靠他那恐怖的用户数量,基本上形成垄断后,各种流氓行为就一点点体现出来了. 我们也可以选择不用,但对流氓软件来说多你一个不多 ...

  5. PostgreSQL Replication之第一章 理解复制概念(3)

    1.3 使用分片和数据分配 本节您将了解基本可扩展性技术,例如数据库分片.分片被广泛应用于高端系统并提供一个简单而且可靠的扩展设置方式来向外扩展.近年来,分片已经成为一种扩大专业系统规模的标准方式. ...

  6. ui5 call view or method from another view

    // call view or method from another view //# view call // var view2=sap.ui.jsview("ui5d.popup01 ...

  7. Mojo For Chromium Developers1

    Mojo For Chromium Developers Overview This document contains the minimum amount of information neede ...

  8. Vue2.4.0 新增的inheritAttrs,attrs

    官方inheritAttrs,attrs文档https://cn.vuejs.org/v2/guide/components-props.html,从最下面的'非 Prop 的特性'开始看,看到最后 ...

  9. NOIP2017 Day-1 模板荟萃

    #include<bits/stdc++.h> #define MAXN 100005 using namespace std; int read(){ ;char c=getchar() ...

  10. 洛谷3857 [TJOI2008]彩灯

    题目描述 已知一组彩灯是由一排N个独立的灯泡构成的,并且有M个开关控制它们.从数学的角度看,这一排彩灯的任何一个彩灯只有亮与不亮两个状态,所以共有2N个样式.由于技术上的问题,Peter设计的每个开关 ...