C++基础：（九）string类的使用与模拟实现

_OP_CHEN

发布于 2026-01-14 09:47:51

1850

文章被收录于专栏：C++C++

前言

在 C++ 编程中，字符串是最常用的数据类型之一。C 语言中通过字符数组和str系列库函数处理字符串，却存在内存管理复杂、安全性低等问题。而 C++ 标准库中的string类，以面向对象的设计思想封装了字符串操作，不仅简化了代码编写，还提升了程序的安全性和效率。本文将从string类的学习意义出发，详细讲解其常用接口、底层实现差异，并深入剖析模拟实现过程中的核心问题（如浅拷贝、深拷贝），下面就让我们正式开始吧！

一、为什么要学习 C++ string 类？

在学习string类之前，我们先回顾 C 语言中的字符串处理方式，通过对比凸显string类的优势。

1.1 C 语言字符串的局限性

C 语言中，字符串本质是以'\0'结尾的字符数组，例如char str[] = "hello";。为了操作字符串，C 标准库提供了strlen（求长度）、strcpy（拷贝）、strcat（拼接）、strcmp（比较）等库函数，但这些函数存在明显缺陷：

与字符串分离：库函数和字符数组是独立的，不符合面向对象（OOP）“数据与操作封装” 的思想。例如，调用strcpy时需要手动传入字符数组地址，无法直接通过 “对象。方法” 的形式操作。
内存管理繁琐：字符数组的空间需要用户手动分配（如malloc）和释放（如free），稍不注意就会导致内存泄漏。例如，动态扩容字符串时，需先计算新空间大小、申请内存、拷贝数据、释放旧空间，步骤复杂。
安全性低：缺乏边界检查，容易引发越界访问。例如，strcat拼接字符串时，若目标数组空间不足，会覆盖后续内存，导致程序崩溃；strcpy也可能因源字符串长度超过目标数组容量而越界。

正是这些局限性，使得 C 语言处理字符串时效率低、bug 率高。而 C++ 的string类完美解决了这些问题。

1.2 string 类的优势

string类是 C++ 标准库（STL）中的核心类之一，它将字符串的 “数据”（字符序列）和 “操作”（增删改查）封装在一起，具备以下优势：

无需手动管理内存：string类内部自动处理内存分配与释放，用户无需调用malloc/free，避免内存泄漏和越界。
接口丰富易用：提供了大量成员函数（如size求长度、append拼接、find查找）和运算符重载（如+=、==），直接通过string 对象.接口即可操作，代码简洁。
兼容性强：支持与 C 语言字符串互转（通过c_str()方法），同时兼容 C++11 后的新特性（如范围 for 循环、auto关键字）。
工业界常用：在算法题（OJ）和实际开发中，string类是处理字符串的首选。例如，LeetCode 中 “字符串相加”“最长回文子串” 等题目，均以string类作为输入输出类型；工作中处理配置文件、日志信息时，string类能显著提升开发效率。

二、C++11 小语法：auto 与范围 for

在讲解string类接口前，先补充两个 C++11 语法 ——auto和范围 for，它们能简化string的遍历与变量声明，后续示例会频繁用到。

2.1 auto 关键字：自动推导变量类型

C++11 中，auto的含义从 “自动存储类型指示符”（局部变量默认属性）改为类型推导符：编译器会根据变量的初始化值，自动推导其类型。这在声明复杂类型（如迭代器）时尤为有用。

auto 的核心规则如下：

1. 必须初始化：auto声明的变量必须有初始值，否则编译器无法推导类型。例如：

auto a; // 错误：未初始化，无法推导类型
auto b = 10; // 正确：b推导为int
auto c = 'a'; // 正确：c推导为char
auto d = string("hello"); // 正确：d推导为string

2. 指针与引用的差异：

声明指针时，auto和auto*效果相同（编译器会自动识别指针类型）；
声明引用时，必须显式加&，否则会推导为值类型。

int x = 10;
auto y = &x; // y是int*（指针）
auto* z = &x; // z也是int*（与auto等价）
auto& m = x; // m是int&（引用，修改m会改变x）

3. 同一行声明的变量类型必须一致：编译器仅推导第一个变量的类型，后续变量需与该类型兼容。例如：

auto a = 1, b = 2; // 正确：a和b均为int
auto c = 3, d = 4.0; // 错误：c是int，d是double，类型不一致

4. 不能用于函数参数和数组声明：

auto无法作为函数参数类型（编译器无法在编译期确定实参类型）；
auto不能直接声明数组（数组类型需明确大小和元素类型）。

// 错误：auto不能作为函数参数
void func(auto a) {} 
// 错误：auto不能声明数组
auto arr[] = {1, 2, 3};

string和 STL 容器的迭代器类型通常很长（如string::iterator），如果我们使用auto九可以大幅简化代码：

#include <iostream>
#include <string>
#include <map>
using namespace std;

int main() {
    map<string, string> dict = {{"apple", "苹果"}, {"orange", "橙子"}};
    // 传统写法：类型冗长
    map<string, string>::iterator it1 = dict.begin();
    // auto写法：简洁
    auto it2 = dict.begin();
    // 遍历map
    while (it2 != dict.end()) {
        cout << it2->first << ":" << it2->second << endl;
        ++it2;
    }
    return 0;
}

2.2 范围 for 循环：简化遍历

C++11 引入的范围 for 循环，专门用于遍历 “有范围的集合”（如数组、string、STL 容器），无需手动控制索引或迭代器，语法格式为：

for (迭代变量 : 集合) {
    // 循环体
}

范围 for 的核心规则如下：

自动迭代：循环会自动遍历集合中的每个元素，从第一个到最后一个，无需判断结束条件。
迭代变量的类型：
- 若仅读取元素，可声明为值类型（如auto e）；
- 若需修改元素，需声明为引用类型（如auto& e），否则修改的是临时拷贝。
适用范围：支持数组、string、vector、list 等 STL 容器，不支持普通指针（无明确范围）。

下面我们以string的遍历为例，对比范围for和传统for两种写法的差异：

#include <iostream>
#include <string>
using namespace std;

int main() {
    string str = "hello world";
    
    // 传统for循环：需手动控制索引
    for (int i = 0; i < str.size(); ++i) {
        cout << str[i] << " ";
    }
    cout << endl;
    
    // 范围for循环：自动遍历
    for (auto ch : str) { // 读取元素（值类型）
        cout << ch << " ";
    }
    cout << endl;
    
    // 范围for修改元素（引用类型）
    for (auto& ch : str) {
        ch = toupper(ch); // 转为大写
    }
    cout << str << endl; // 输出：HELLO WORLD
    return 0;
}

范围 for 本质上是迭代器的 “语法糖”，编译器会将其转换为 “迭代器初始化→判断结束→访问元素→迭代器递增” 的逻辑，我们从汇编代码中可观察到这一转换。

三、标准库 string 类的常用接口

string类的接口非常丰富，本文将聚焦最常用、最核心的接口，按 “构造→容量→访问→修改” 的逻辑分类讲解。

3.1 string 类的构造函数

构造函数用于创建string对象，常用的 4 种构造方式如下表：

构造函数原型	功能说明
string()	无参构造，创建空字符串（长度为 0）
string(const char* s)	用 C 风格字符串（如"hello"）构造string
string(size_t n, char c)	创建包含 n 个字符 c 的string（如3, 'a'→"aaa"）
string(const string& s)	拷贝构造，用已有string对象 s 创建新对象

下面一段代码将为大家示范构造函数的使用：

#include <iostream>
#include <string>
using namespace std;

void TestStringConstructor() {
    // 1. 无参构造：空字符串
    string s1;
    cout << "s1: " << s1 << " (size: " << s1.size() << ")" << endl; // 输出：s1:  (size: 0)
    
    // 2. C风格字符串构造
    string s2("hello bit");
    cout << "s2: " << s2 << " (size: " << s2.size() << ")" << endl; // 输出：s2: hello bit (size: 8)
    
    // 3. n个字符c构造
    string s3(5, 'x');
    cout << "s3: " << s3 << " (size: " << s3.size() << ")" << endl; // 输出：s3: xxxxx (size: 5)
    
    // 4. 拷贝构造
    string s4(s2);
    cout << "s4: " << s4 << " (size: " << s4.size() << ")" << endl; // 输出：s4: hello bit (size: 8)
}

int main() {
    TestStringConstructor();
    return 0;
}

3.2 string 类的容量操作

容量操作用于获取string的长度、空间大小，或调整空间，常用接口如下表所示：

成员函数	功能说明
size_t size()	返回有效字符个数（不包含'\0'），与length()功能完全一致，推荐使用
size_t length()	历史接口，与size()等价（早期为兼容 C 语言设计）
size_t capacity()	返回当前分配的总空间大小（单位：字节），包含未使用的空间
bool empty()	判断字符串是否为空（有效字符个数为 0），空则返回true，否则false
void clear()	清空有效字符（将size置为 0），但不释放底层空间（capacity不变）
void reserve(size_t n)	为字符串预留 n 字节空间，仅扩容不缩容，不改变size
void resize(size_t n, char c)	将有效字符个数调整为 n：- 若 n > 当前size：用字符 c 填充新增空间（默认'\0'）；- 若 n < 当前size：截断字符串，capacity不变

注意事项

size 与 capacity 的区别：
- size：实际存储的有效字符个数（如"hello"的size是 5）；
- capacity：底层分配的总空间（如"hello"可能分配 15 字节，capacity是 15），预留空间是为了减少后续扩容的开销。
clear () 不释放空间：例如，string s("hello"); s.clear();后，s.size()为 0，但s.capacity()仍为 5（或更大），底层字符数组并未被释放。
reserve 的扩容规则：
- 若 n > 当前capacity：扩容到至少 n 字节（不同编译器可能扩容到更大值，如 VS 按 1.5 倍扩容，GCC 按 2 倍扩容）；
- 若 n ≤ 当前capacity：不做任何操作（reserve不缩容）。
resize 与 reserve 的区别：
- resize改变size（有效字符个数），可能扩容；
- reserve仅改变capacity（空间大小），不改变size。

对于容量操作的使用示例如下：

#include <iostream>
#include <string>
using namespace std;

void TestStringCapacity() {
    string s("hello");
    cout << "初始状态：" << endl;
    cout << "size: " << s.size() << ", capacity: " << s.capacity() << endl; // 输出：size:5, capacity:15（VS下）
    
    // 1. empty()判断空
    cout << "是否为空：" << (s.empty() ? "是" : "否") << endl; // 输出：否
    
    // 2. clear()清空有效字符
    s.clear();
    cout << "clear后：" << endl;
    cout << "size: " << s.size() << ", capacity: " << s.capacity() << endl; // 输出：size:0, capacity:15（capacity不变）
    
    // 3. reserve()预留空间
    s.reserve(20);
    cout << "reserve(20)后：" << endl;
    cout << "size: " << s.size() << ", capacity: " << s.capacity() << endl; // 输出：size:0, capacity:20（扩容到20）
    
    // 4. resize()调整有效字符个数
    s.resize(10, 'a'); // 用'a'填充，size变为10
    cout << "resize(10, 'a')后：" << endl;
    cout << "s: " << s << ", size: " << s.size() << ", capacity: " << s.capacity() << endl; // 输出：s:aaaaaaaaaa, size:10, capacity:20
    
    s.resize(5); // 截断到5个字符
    cout << "resize(5)后：" << endl;
    cout << "s: " << s << ", size: " << s.size() << ", capacity: " << s.capacity() << endl; // 输出：s:aaaaa, size:5, capacity:20
}

int main() {
    TestStringCapacity();
    return 0;
}

3.3 string 类的访问与遍历操作

string提供了多种访问字符和遍历字符串的方式，常用接口如下表：

成员函数 / 运算符	功能说明
char& operator[](size_t pos)	访问 pos 位置的字符（支持读写），若 pos 越界，行为未定义（VS 下会断言报错）
const char& operator[](size_t pos) const	const 对象的访问接口（仅读）
iterator begin()	返回指向第一个字符的迭代器
iterator end()	返回指向最后一个字符下一个位置的迭代器（标记遍历结束）
reverse_iterator rbegin()	返回指向最后一个字符的反向迭代器（用于反向遍历）
reverse_iterator rend()	返回指向第一个字符前一个位置的反向迭代器
范围 for 循环	C++11 特性，自动遍历所有字符（底层是迭代器）

注意事项

迭代器的使用：
- begin()和end()构成 “左闭右开” 区间，遍历条件为it != end()；
- 反向迭代器rbegin()对应最后一个字符，rend()对应第一个字符前，遍历方向从后向前。
operator [] 的越界检查：operator[]不做越界检查（效率优先），若 pos 超过size()-1，会导致未定义行为；若需安全访问，可使用at(pos)（越界会抛异常）。

代码示例如下：

#include <iostream>
#include <string>
using namespace std;

void TestStringAccess() {
    string s("hello world");
    
    // 1. operator[]访问单个字符
    cout << "第3个字符（索引2）：" << s[2] << endl; // 输出：l
    s[2] = 'L'; // 修改字符
    cout << "修改后：" << s << endl; // 输出：heLlo world
    
    // 2. 迭代器遍历（正向）
    cout << "正向迭代器遍历：";
    string::iterator it = s.begin();
    while (it != s.end()) {
        cout << *it << " ";
        ++it;
    }
    cout << endl; // 输出：h e L l o   w o r l d 
    
    // 3. 反向迭代器遍历（反向）
    cout << "反向迭代器遍历：";
    string::reverse_iterator rit = s.rbegin();
    while (rit != s.rend()) {
        cout << *rit << " ";
        ++rit;
    }
    cout << endl; // 输出：d l r o w   o l L e h 
    
    // 4. 范围for遍历（C++11）
    cout << "范围for遍历：";
    for (auto ch : s) {
        cout << ch << " ";
    }
    cout << endl; // 输出：h e L l o   w o r l d 
}

int main() {
    TestStringAccess();
    return 0;
}

3.4 string 类的修改操作

修改操作包括尾部插入、字符串拼接、查找、截取等，是string类最核心的功能之一，常用接口如下表：

成员函数 / 运算符	功能说明
void push_back(char c)	在字符串尾部插入单个字符 c
void append(const string& str)	在尾部追加字符串 str（也支持 C 风格字符串、n 个字符 c）
string& operator+=(const string& str)	重载+=运算符，尾部追加字符串 / 字符（推荐使用，最简洁）
const char* c_str() const	返回 C 风格字符串（以'\0'结尾），用于兼容 C 语言接口（如printf）
size_t find(const string& str, size_t pos=0) const	从 pos 位置开始向后查找 str，返回首次出现的起始索引；若未找到，返回string::npos（一个很大的无符号数，可视为 - 1）
size_t rfind(const string& str, size_t pos=npos) const	从 pos 位置开始向前查找 str，返回首次出现的起始索引；未找到返回string::npos
string substr(size_t pos=0, size_t len=npos) const	从 pos 位置开始截取 len 个字符，返回新的string；若 len 省略，截取到末尾

注意事项

尾部插入的效率对比：
- push_back(c)：仅插入单个字符，效率高；
- append(str)：插入字符串，需计算长度并拷贝；
- operator+=：支持插入单个字符（s += 'a'）或字符串（s += "abc"），语法简洁，推荐优先使用。
find 与 npos 的配合：string::npos是size_t类型的静态常量，表示 “未找到”。判断查找结果时，需用== string::npos，不可用== -1（size_t是无符号类型，-1 会被解释为极大值）。
substr 的截取规则：若 pos 超过size()-1，会抛异常；若 len 超过剩余字符数，仅截取到末尾。

代码示例如下所示：

#include <iostream>
#include <string>
using namespace std;

void TestStringModify() {
    string s("hello");
    
    // 1. 尾部插入/追加
    s.push_back(' '); // 插入空格："hello "
    s.append("world"); // 追加字符串："hello world"
    s += "!!!"; // 重载+=："hello world!!!"
    cout << "追加后：" << s << endl; // 输出：hello world!!!
    
    // 2. c_str()：转为C风格字符串
    printf("C风格输出：%s\n", s.c_str()); // 输出：hello world!!!（printf需C风格字符串）
    
    // 3. find()查找
    size_t pos1 = s.find("world");
    if (pos1 != string::npos) {
        cout << "\"world\"的起始索引：" << pos1 << endl; // 输出：6
    }
    
    size_t pos2 = s.find("test");
    if (pos2 == string::npos) {
        cout << "\"test\"未找到" << endl; // 输出："test"未找到
    }
    
    // 4. rfind()反向查找
    size_t pos3 = s.rfind('l');
    cout << "最后一个'l'的索引：" << pos3 << endl; // 输出：9（"hello world!!!"中最后一个'l'在索引9）
    
    // 5. substr()截取
    string sub1 = s.substr(6, 5); // 从索引6开始，截取5个字符
    cout << "截取sub1：" << sub1 << endl; // 输出：world
    
    string sub2 = s.substr(12); // 从索引12开始，截取到末尾
    cout << "截取sub2：" << sub2 << endl; // 输出：!!
}

int main() {
    TestStringModify();
    return 0;
}

3.5 string 类的非成员函数

非成员函数是独立于string类的全局函数，用于输入输出、比较等操作，常用接口如下表：

非成员函数	功能说明
ostream& operator<<(ostream& os, const string& str)	重载<<，用于输出string（如cout << s）
istream& operator>>(istream& is, string& str)	重载>>，用于输入string，但会忽略空格和换行（遇到空格 / 换行停止）
istream& getline(istream& is, string& str)	读取一行字符串（包括空格），直到遇到换行符（换行符不存入 str）
bool operator==(const string& s1, const string& s2)	重载比较运算符（==、!=、<、>、<=、>=），按字典序比较

注意事项

cin >> 与 getline 的区别：
- cin >> s：输入时跳过开头的空白字符（空格、换行），遇到空白字符停止，例如输入"hello world"，s仅存储"hello"；
- getline(cin, s)：读取整行内容（包括空格），直到换行符，例如输入"hello world"，s存储"hello world"。
字典序比较：operator<按字符的 ASCII 码值逐位比较，例如"apple" < "banana"（'a' 的 ASCII 码小于 'b'），"abc" < "abd"（前两位相同，第三位 'c' < 'd'）。

代码示例如下：

#include <iostream>
#include <string>
using namespace std;

void TestStringNonMember() {
    string s1, s2;
    
    // 1. operator>>输入（忽略空格）
    cout << "输入s1（空格分隔）：";
    cin >> s1; // 若输入"hello world"，s1仅存"hello"
    cout << "s1: " << s1 << endl;
    
    // 注意：cin >> 后会残留换行符，需用cin.ignore()清除，否则getline会读取空行
    cin.ignore(); // 清除缓冲区中的换行符
    
    // 2. getline输入（读取整行）
    cout << "输入s2（含空格）：";
    getline(cin, s2); // 输入"hello world"，s2存"hello world"
    cout << "s2: " << s2 << endl;
    
    // 3. 比较运算符
    string s3("apple"), s4("banana");
    cout << "s3 == s4? " << (s3 == s4 ? "是" : "否") << endl; // 输出：否
    cout << "s3 < s4? " << (s3 < s4 ? "是" : "否") << endl; // 输出：是（'a' < 'b'）
}

int main() {
    TestStringNonMember();
    return 0;
}

四、不同编译器下 string 类的底层结构

string类的底层实现因编译器而异，最典型的是微软 VS 和 GCC（Linux 下）的差异，主要体现在内存布局和空间分配策略上。以下基于 32 位平台（指针占 4 字节）分析。

4.1 VS 下的 string 结构（32 位）

VS 的string类（属于 MSVC 标准库）采用 “小字符串优化（SSO，Small String Optimization）” 策略，结构总大小为 28 字节，内部包含：

联合体（union）_Bx：用于存储字符串数据，占 16 字节（_BUF_SIZE为 16）：
- 当字符串长度小于 16时：使用内部固定数组_Buf[16]存储（无需堆内存，效率高）；
- 当字符串长度大于等于 16时：使用指针_Ptr指向堆内存（存储字符串数据）。
两个 size_t 字段：各占 4 字节，共 8 字节：
- _Mysize：有效字符个数（即size()的返回值）；
- _Myres：当前容量（即capacity()的返回值，不包含'\0'）。
额外指针：占 4 字节，用于内部管理（如调试信息、内存分配器指针）。

总大小计算：16（联合体）+ 4（_Mysize）+ 4（_Myres）+ 4（额外指针）= 28 字节。

SSO 在大多数场景下，字符串长度较短（如文件名、配置项等），无需申请堆内存，减少内存分配开销和碎片。

4.2 GCC 下的 string 结构（32 位）

GCC 的string类（属于 GNU libstdc++）采用 “写时拷贝（Copy-On-Write，COW）” 策略，结构非常简洁，仅占 4 字节：

内部仅包含一个指针_M_p，指向堆内存中的_Rep结构体和字符串数据。

_Rep结构体（堆内存中）包含：

_M_length：有效字符个数（size_t，4 字节）；
_M_capacity：容量（size_t，4 字节）；
_M_refcount：引用计数（_Atomic_word，4 字节），用于实现写时拷贝；
字符串数据：紧跟_Rep结构体，以'\0'结尾。

写时拷贝的核心思想如下：

拷贝字符串时，不立即复制数据，而是共享同一块堆内存，仅增加引用计数（_M_refcount++）；
当某个对象修改字符串时（比如operator[]写操作），先检查引用计数，若大于 1，则复制数据到新堆内存，降低引用计数，再修改新数据，避免影响其他对象。

COW 能减少不必要的拷贝，节省内存；但在多线程环境下，引用计数的原子操作会带来性能开销，且 C++11 后因线程安全问题，部分编译器已弃用 COW（如 GCC 5.0 + 默认关闭 COW）。

4.3 两种实现的对比

特性	VS（SSO）	GCC（COW，旧版本）
对象大小（32 位）	28 字节	4 字节
短字符串（<16）存储	栈上固定数组（无堆分配）	堆内存（共享）
长字符串存储	堆内存	堆内存（共享）
拷贝开销	短字符串快，长字符串慢	拷贝时快（仅改引用计数），修改时可能慢
线程安全	较好（无共享）	较差（引用计数需原子操作）

五、string 类的模拟实现

掌握string类的使用后，我们就要来学习模拟实现其核心功能了，这是面试高频的考点。

5.1 错误的 string 实现：浅拷贝问题

首先看一个简单的string实现（命名为String以区分标准库）：

#include <iostream>
#include <cstring>
#include <cassert>
using namespace std;

class String {
public:
    // 构造函数：用C风格字符串初始化
    String(const char* str = "") {
        if (str == nullptr) { // 防止传入nullptr
            assert(false);
            return;
        }
        // 分配空间：strlen(str) + 1（+1用于存储'\0'）
        _str = new char[strlen(str) + 1];
        strcpy(_str, str); // 拷贝字符串
    }

    // 析构函数：释放空间
    ~String() {
        if (_str) {
            delete[] _str; // 释放堆内存
            _str = nullptr; // 避免野指针
        }
    }

private:
    char* _str; // 指向存储字符串的堆内存
};

// 测试
void TestString() {
    String s1("hello");
    String s2(s1); // 调用编译器合成的拷贝构造函数
}

int main() {
    TestString();
    return 0;
}

在上述代码中，String类未显式定义拷贝构造函数，编译器会合成一个默认拷贝构造函数。默认拷贝构造函数采用 “浅拷贝”（位拷贝）：仅将_str指针的值拷贝给新对象，而非复制指针指向的内容。

这将会导致如下的问题：

s1和s2的_str指向同一块堆内存；
当TestString函数结束时，先销毁s2：~String()释放s2._str指向的内存；
再销毁s1：~String()再次释放同一块内存，导致双重释放，程序崩溃。

实际上，浅拷贝的本质就是多个对象共享同一份资源（堆内存），资源释放时冲突。

5.2 解决方案 1：深拷贝（传统版实现）

深拷贝的核心思想就是为新对象独立分配资源（堆内存），并复制原对象的内容，使多个对象的资源互不干扰。需显式实现拷贝构造函数和赋值运算符重载。传统的实现代码如下：

#include <iostream>
#include <cstring>
#include <cassert>
using namespace std;

class String {
public:
    // 1. 构造函数
    String(const char* str = "") {
        if (str == nullptr) {
            assert(false);
            return;
        }
        _str = new char[strlen(str) + 1];
        strcpy(_str, str);
    }

    // 2. 拷贝构造函数（深拷贝）
    String(const String& s) {
        // 为新对象分配独立空间
        _str = new char[strlen(s._str) + 1];
        strcpy(_str, s._str); // 复制内容
    }

    // 3. 赋值运算符重载（深拷贝）
    // 返回值为String&：支持链式赋值（如s1 = s2 = s3）
    // 参数为const String&：避免拷贝，且防止修改原对象
    String& operator=(const String& s) {
        // 防止自赋值（如s1 = s1）
        if (this != &s) {
            // 步骤1：先释放当前对象的旧空间
            delete[] _str;
            // 步骤2：分配新空间并拷贝内容
            _str = new char[strlen(s._str) + 1];
            strcpy(_str, s._str);
        }
        return *this;
    }

    // 4. 析构函数
    ~String() {
        if (_str) {
            delete[] _str;
            _str = nullptr;
        }
    }

    // 辅助接口：获取字符串（用于测试）
    const char* c_str() const {
        return _str;
    }

private:
    char* _str;
};

// 测试
void TestString() {
    String s1("hello");
    String s2(s1); // 调用深拷贝构造函数
    cout << "s1: " << s1.c_str() << endl; // 输出：hello
    cout << "s2: " << s2.c_str() << endl; // 输出：hello

    String s3("world");
    s3 = s1; // 调用深拷贝赋值运算符
    cout << "s3: " << s3.c_str() << endl; // 输出：hello
}

int main() {
    TestString();
    return 0;
}

上述代码有以下两个关键改进：

拷贝构造函数：为s2分配新的堆内存，复制s1._str的内容，s1和s2的_str指向不同空间，销毁时互不影响。
赋值运算符重载：
- 先判断自赋值（this != &s）：若不判断，自赋值时会先释放_str，导致后续拷贝时访问野指针；
- 先释放旧空间，再分配新空间，避免内存泄漏。

5.3 解决方案 2：深拷贝（现代版实现）

传统版实现的赋值运算符重载存在一个潜在问题：若new char[]失败（抛出异常），当前对象的_str已被释放，会变成野指针。现代人们实现利用的是 “构造函数 + swap” 的方式，能够在避免异常安全问题的同时简化代码。如下所示：

#include <iostream>
#include <cstring>
#include <cassert>
using namespace std;

class String {
public:
    // 1. 构造函数（同传统版）
    String(const char* str = "") {
        if (str == nullptr) {
            assert(false);
            return;
        }
        _str = new char[strlen(str) + 1];
        strcpy(_str, str);
    }

    // 2. 拷贝构造函数（现代版）
    String(const String& s)
        : _str(nullptr) { // 先初始化为nullptr，避免swap后野指针
        String strTmp(s._str); // 用s._str构造临时对象strTmp（调用构造函数，分配新空间）
        swap(_str, strTmp._str); // 交换当前对象和临时对象的_str
    }

    // 3. 赋值运算符重载（现代版1：参数为值传递）
    String& operator=(String s) { // s是实参的拷贝（调用拷贝构造函数）
        swap(_str, s._str); // 交换当前对象和s的_str
        return *this;
    }

    /*
    // 赋值运算符重载（现代版2：参数为const引用，内部构造临时对象）
    String& operator=(const String& s) {
        if (this != &s) {
            String strTmp(s); // 构造临时对象
            swap(_str, strTmp._str);
        }
        return *this;
    }
    */

    // 4. 析构函数（同传统版）
    ~String() {
        if (_str) {
            delete[] _str;
            _str = nullptr;
        }
    }

    // 辅助接口
    const char* c_str() const {
        return _str;
    }

private:
    char* _str;
};

// 测试
void TestStringModern() {
    String s1("hello");
    String s2(s1);
    cout << "s1: " << s1.c_str() << endl; // 输出：hello
    cout << "s2: " << s2.c_str() << endl; // 输出：hello

    String s3("world");
    s3 = s1;
    cout << "s3: " << s3.c_str() << endl; // 输出：hello
}

int main() {
    TestStringModern();
    return 0;
}

现代版本的深拷贝具有如下的核心思想：

拷贝构造函数：
- 先将当前对象的_str初始化为nullptr；
- 构造临时对象strTmp（分配新空间，复制内容）；
- 交换_str和strTmp._str：当前对象获得strTmp的新空间，strTmp获得原_str（nullptr）；
- 函数结束时，strTmp析构：释放nullptr（无操作），避免资源泄漏。
赋值运算符重载（值传递参数）：
- 参数s是实参的拷贝（调用拷贝构造函数，分配新空间）；
- 交换_str和s._str：当前对象获得s的新空间，s获得当前对象的旧空间；
- 函数结束时，s析构：释放旧空间，避免内存泄漏；
- 无需判断自赋值：若自赋值，s是当前对象的拷贝，交换后s析构释放旧空间，当前对象获得新空间（与旧空间相同），但无错误。

由此可见，现代版本的实现代码更简洁，且天然具备异常安全性（若new失败，临时对象未构造，当前对象的_str未被修改）。

5.4 拓展：写时拷贝（了解）

写时拷贝（COW）是一种 “延迟拷贝” 策略，结合了浅拷贝的高效和深拷贝的安全，核心是引用计数。其实现思路如下：

在堆内存中增加一个 “引用计数”，记录当前共享该内存的对象个数；
拷贝对象时，仅增加引用计数（浅拷贝），不复制数据；
当对象修改数据时，先检查引用计数：

若引用计数 > 1：复制数据到新堆内存，降低原内存的引用计数，当前对象指向新内存（深拷贝）；
若引用计数 = 1：直接修改数据（无需拷贝）。

代码实现如下：

#include <iostream>
#include <cstring>
#include <cassert>
using namespace std;

class String {
private:
    // 引用计数结构体：存储在堆内存，与字符串数据关联
    struct RefCount {
        size_t count; // 引用计数
        char data[1]; // 柔性数组：存储字符串数据（实际大小动态分配）
    };

    RefCount* _pRef; // 指向RefCount结构体的指针

public:
    // 构造函数：创建新的RefCount和字符串数据
    String(const char* str = "") {
        if (str == nullptr) str = "";
        size_t len = strlen(str);
        // 分配内存：RefCount大小 + 字符串长度 + 1（'\0'）
        _pRef = (RefCount*)new char[sizeof(RefCount) + len + 1];
        _pRef->count = 1; // 初始引用计数为1
        strcpy(_pRef->data, str); // 拷贝字符串
    }

    // 拷贝构造函数：增加引用计数（浅拷贝）
    String(const String& s) {
        _pRef = s._pRef;
        _pRef->count++; // 引用计数+1
    }

    // 赋值运算符重载：先减少当前引用计数，再共享新内存
    String& operator=(const String& s) {
        if (this != &s) {
            // 减少当前内存的引用计数，若为0则释放
            Release();
            // 共享s的内存，引用计数+1
            _pRef = s._pRef;
            _pRef->count++;
        }
        return *this;
    }

    // 析构函数：减少引用计数，必要时释放内存
    ~String() {
        Release();
    }

    // 重载operator[]：写操作时触发拷贝（COW核心）
    char& operator[](size_t pos) {
        assert(pos < strlen(_pRef->data));
        // 若引用计数>1，触发深拷贝
        if (_pRef->count > 1) {
            size_t len = strlen(_pRef->data);
            // 分配新内存
            RefCount* newRef = (RefCount*)new char[sizeof(RefCount) + len + 1];
            newRef->count = 1;
            strcpy(newRef->data, _pRef->data);
            // 减少原内存的引用计数，若为0则释放
            _pRef->count--;
            _pRef = newRef;
        }
        // 返回当前内存的数据（可修改）
        return _pRef->data[pos];
    }

    // 辅助接口：获取字符串
    const char* c_str() const {
        return _pRef->data;
    }

private:
    // 释放内存：引用计数-1，若为0则删除
    void Release() {
        _pRef->count--;
        if (_pRef->count == 0) {
            delete[] (char*)_pRef; // 释放整个RefCount结构体
            _pRef = nullptr;
        }
    }
};

// 测试写时拷贝
void TestCOW() {
    String s1("hello");
    String s2(s1); // 浅拷贝，引用计数=2
    cout << "s1: " << s1.c_str() << ", s2: " << s2.c_str() << endl; // 输出：hello, hello

    s2[0] = 'H'; // 修改s2，触发深拷贝，引用计数分别为1
    cout << "s1: " << s1.c_str() << ", s2: " << s2.c_str() << endl; // 输出：hello, Hello
}

int main() {
    TestCOW();
    return 0;
}

写实拷贝实际上还是具有一定的局限性的：

线程安全问题：引用计数的修改需原子操作（如atomic_int），否则多线程下可能出现计数错误；
频繁修改场景低效：若对象频繁修改（如循环调用operator[]），会多次触发深拷贝，效率低于普通深拷贝；
C++11 后的弃用：C++11 标准对容器的线程安全要求提高，COW 的实现复杂度增加，目前主流编译器（如 GCC 5.0+、VS 2015+）已默认不使用 COW，转而采用 SSO。

六、string 类实战：经典 OJ 题目解析

掌握了string类的接口后，通过 OJ 题目巩固知识点。以下选取 4 道经典题目，涵盖字符串反转、查找、相加等核心操作。

6.1 题目 1：仅仅反转字母（LeetCode 917）

题目描述给定一个字符串s，反转字符串中所有的字母，同时保持非字母字符的位置不变。例如：

输入："ab-cd"，输出："dc-ba"；
输入："a-bC-dEf-ghIj"，输出："j-Ih-gfE-dCba"。

解题思路

双指针法：用begin指向字符串开头，end指向字符串末尾；
循环移动begin找到字母，移动end找到字母，交换两者；
直到begin >= end。

代码实现如下：

#include <iostream>
#include <string>
using namespace std;

class Solution {
public:
    // 判断是否为字母
    bool isLetter(char ch) {
        return (ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z');
    }

    string reverseOnlyLetters(string s) {
        if (s.empty()) return s;
        size_t begin = 0;
        size_t end = s.size() - 1;
        while (begin < end) {
            // 找到左侧第一个字母
            while (begin < end && !isLetter(s[begin])) {
                ++begin;
            }
            // 找到右侧第一个字母
            while (begin < end && !isLetter(s[end])) {
                --end;
            }
            // 交换
            swap(s[begin], s[end]);
            ++begin;
            --end;
        }
        return s;
    }
};

int main() {
    Solution sol;
    string s1 = "ab-cd";
    cout << sol.reverseOnlyLetters(s1) << endl; // 输出：dc-ba

    string s2 = "a-bC-dEf-ghIj";
    cout << sol.reverseOnlyLetters(s2) << endl; // 输出：j-Ih-gfE-dCba
    return 0;
}

6.2 题目 2：找字符串中第一个只出现一次的字符（LeetCode 387）

题目描述给定一个字符串s，找到它的第一个不重复的字符，并返回它的索引。如果不存在，返回 - 1。例如：

输入："leetcode"，输出：0（'l' 是第一个不重复字符）；
输入："loveleetcode"，输出：2（'v' 是第一个不重复字符）。

解题思路

计数法：用大小为 256 的数组（覆盖所有 ASCII 字符）记录每个字符的出现次数；
第一次遍历字符串，统计每个字符的出现次数；
第二次遍历字符串，找到第一个出现次数为 1 的字符，返回其索引。

#include <iostream>
#include <string>
using namespace std;

class Solution {
public:
    int firstUniqChar(string s) {
        // 初始化计数数组（256个ASCII字符）
        int count[256] = {0};
        // 第一次遍历：统计次数
        for (char ch : s) {
            count[ch]++;
        }
        // 第二次遍历：找第一个次数为1的字符
        for (int i = 0; i < s.size(); ++i) {
            if (count[s[i]] == 1) {
                return i;
            }
        }
        // 无重复字符
        return -1;
    }
};

int main() {
    Solution sol;
    string s1 = "leetcode";
    cout << sol.firstUniqChar(s1) << endl; // 输出：0

    string s2 = "loveleetcode";
    cout << sol.firstUniqChar(s2) << endl; // 输出：2
    return 0;
}

6.3 题目 3：字符串相加（LeetCode 415）

题目描述给定两个非负整数num1和num2，以字符串形式表示，返回它们的和（也以字符串形式表示）。例如：

输入：num1 = "11", num2 = "123"，输出："134"；
输入：num1 = "456", num2 = "77"，输出："533"。

解题思路

模拟手动加法：从字符串末尾（个位）开始相加，记录进位；
用end1和end2分别指向num1和num2的末尾，next记录进位（初始为 0）；
循环相加：value1 + value2 + next，计算当前位的值和新的进位；
结果存入临时字符串，最后反转字符串（因是从后向前存储）。

#include <iostream>
#include <string>
#include <algorithm> // 用于reverse函数
using namespace std;

class Solution {
public:
    string addStrings(string num1, string num2) {
        int end1 = num1.size() - 1;
        int end2 = num2.size() - 1;
        int next = 0; // 进位
        string result;

        // 循环条件：任意一个字符串未遍历完，或有进位
        while (end1 >= 0 || end2 >= 0 || next > 0) {
            // 取当前位的值（若已遍历完，取0）
            int value1 = (end1 >= 0) ? (num1[end1--] - '0') : 0;
            int value2 = (end2 >= 0) ? (num2[end2--] - '0') : 0;

            // 计算当前位总和
            int sum = value1 + value2 + next;
            next = sum / 10; // 新的进位
            int current = sum % 10; // 当前位的值

            // 存入结果（尾部插入，后续需反转）
            result += (current + '0');
        }

        // 反转结果（从后向前存储→从前向后）
        reverse(result.begin(), result.end());
        return result;
    }
};

int main() {
    Solution sol;
    string num1 = "11", num2 = "123";
    cout << sol.addStrings(num1, num2) << endl; // 输出：134

    num1 = "456", num2 = "77";
    cout << sol.addStrings(num1, num2) << endl; // 输出：533
    return 0;
}

6.4 题目 4：验证回文串（LeetCode 125）

题目描述给定一个字符串，验证它是否是回文串，只考虑字母和数字字符，可以忽略字母的大小写。例如：

输入："A man, a plan, a canal: Panama"，输出：true（忽略非字母数字，为"amanaplanacanalpanama"，是回文）；
输入："race a car"，输出：false（忽略非字母数字，为"raceacar"，不是回文）。

解题思路

预处理：将所有小写字母转为大写（或反之），忽略非字母数字字符；
双指针法：begin指向开头，end指向末尾，比较两者是否相等；
若不相等，返回false；若遍历结束均相等，返回true。

#include <iostream>
#include <string>
#include <cctype> // 用于toupper函数
using namespace std;

class Solution {
public:
    // 判断是否为字母或数字
    bool isLetterOrNumber(char ch) {
        return isalnum(ch); // 库函数：判断是否为字母或数字
    }

    bool isPalindrome(string s) {
        // 预处理：转为大写，忽略非字母数字
        for (char& ch : s) {
            if (islower(ch)) {
                ch = toupper(ch); // 小写转大写
            }
        }

        int begin = 0;
        int end = s.size() - 1;
        while (begin < end) {
            // 找到左侧第一个字母/数字
            while (begin < end && !isLetterOrNumber(s[begin])) {
                ++begin;
            }
            // 找到右侧第一个字母/数字
            while (begin < end && !isLetterOrNumber(s[end])) {
                --end;
            }
            // 比较
            if (s[begin] != s[end]) {
                return false;
            }
            ++begin;
            --end;
        }
        return true;
    }
};

int main() {
    Solution sol;
    string s1 = "A man, a plan, a canal: Panama";
    cout << (sol.isPalindrome(s1) ? "true" : "false") << endl; // 输出：true

    string s2 = "race a car";
    cout << (sol.isPalindrome(s2) ? "true" : "false") << endl; // 输出：false
    return 0;
}