首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >用JNI编写用C和Java编写的多线程Conway生命游戏实现

用JNI编写用C和Java编写的多线程Conway生命游戏实现
EN

Code Review用户
提问于 2023-03-27 20:53:21
回答 1查看 168关注 0票数 4

去年我在学校有过一些多线程C编程经验,但我想要的更多,所以我试图从零开始编写一个Conway的生命游戏实现。

代码主要是用C编写的,但通过GameOfLifeMultithread.java类的JNI调用来调用。java代码还负责维护本机代码出于性能原因使用的“查找表”。

这段代码的主要目标是尽可能快地运行,这使得代码变得有点复杂和清晰。据我所知,这里没有bug,但我正在尝试学习C(以及在较小程度上学习Java)文档和代码样式规则,因此我非常希望能就如何在这方面重构提供建议。

特别是,我不太了解初始化和去初始化资源模式。我的代码使用了一堆错误,我正在努力解决如何最好地处理部分故障。

代码的入口点是GameOfLifeMultithread#getNGeneration(int ),它运行n代,变异板并返回更改。

本机C入口点是Java_game_1of_1life_GameOfLifeMultithread_getNGenerationNative (对不起,我不知道这些方法的名称)

本机代码的步骤如下:

  1. 初始化本机游戏板和脏位板,从Java数组复制到本机数组: initialize_all_resources()、initialize_lookup_table()、initialize_dirty_bits()
  2. 将代码拆分为线程do_all_thread_work(),thread_do_work()
  3. 让每个线程在生成结束时使用synch_thread()运行n代同步
  4. 将本机数组复制回Java数组并释放所有资源free_all_resources()

每一代(即第3步):

  1. 通过咨询dirty_bit板,检查其值或相邻值是否在上一个gen中更改,如果没有更改,则跳过此gen
  2. 使用查找表计算新值
  3. 如果新值不同,则设置板上的脏位。

所有这些都是通过一种方法完成的: perform_single_line()

以下是我的Java代码:

代码语言:javascript
复制
package game_of_life;

public class GameOfLifeMultithread {
    static {
        System.loadLibrary("binary/" + "native_multithread");
    }
    /*-
     * Indexing will look like this (P is padding, C is center:                 
     *      PPPPPP
     *      PCCCCP
     *      PPPPPP
     *  However, when indexing these values will be concatenated giving the following 
     *  bit string:
     *      PPPPPPPCCCCPPPPPPP
     */

    // Size of the value returned by a lookup
    public static final int LOOKUP_LEN = 4;
    // The row length should be the lookup length +2 for padding on either side
    public static final int ROW_LEN = LOOKUP_LEN + 2;
    // Additionally, pad the top and bottom
    public static final int TOTAL_LOOKUP_SIZE = ROW_LEN * 3;

    public static final int threadcount = 8;
    private final static String name = "Multithreaded Technique";
    private final static String description = "Uses a lookuptable and dirty bits with multithreading";
    private static byte[] lookuptable;

    private final boolean[][] board;

    public GameOfLifeMultithread(boolean[][] board) {
        this.board = board;
    }

    public boolean[][] getNGeneration(int n) {
        if (lookuptable == null) {
            lookuptable = generateLookup();
        }
        getNGenerationNative(threadcount, lookuptable, n, this.board);
        return this.board;
    }

    private native void getNGenerationNative(int threadcount, byte[] lookuptable, int n, boolean[][] array);

    private static byte[] generateLookup() {
        byte[] lookup = new byte[1 << (TOTAL_LOOKUP_SIZE)];
        // Adjacency count for center cells, give +2 padding, 1 for each side, not
        // because we need it, but to simplify indexing the array and avoid checking for
        // under/over-flow
        byte[] count = new byte[ROW_LEN + 2];
        // Skip 0 because this doesn't work through underflow
        // An all 0 bitfield should return 0 either way, so no special logic is needed
        for (int i = 1; i < lookup.length; i++) {
            // Track changes between the current bit-field and the previous one
            int change = i ^ (i - 1);
            for (int j = 0; j < TOTAL_LOOKUP_SIZE; j++) {
                // Check if a given position was changed between this one and the last
                if ((change & (1 << j)) != 0) {
                    // If it was changed, add 1 to all adjacent if it was "born" or -1 if it "died"
                    int delta = ((i & (1 << j)) != 0) ? 1 : -1;
                    count[(j) % ROW_LEN] += delta;
                    // In GOL, you don't count the cell itself in the live count, so if we're in the
                    // center row, don't increment the current cell's count, only the adjacent cells
                    if (j < ROW_LEN || j > TOTAL_LOOKUP_SIZE - ROW_LEN) {
                        count[(j + 1) % ROW_LEN] += delta;
                    }
                    count[(j + 2) % ROW_LEN] += delta;
                }
            }
            byte result = 0;
            for (int j = LOOKUP_LEN; j > 0; j--) {
                result <<= 1;
                // Because we gave count array padding, we need to +1 to get the actual value
                if (count[j + 1] == 3 || (count[j + 1] == 2 && (i & (1 << (j + ROW_LEN))) != 0)) {
                    result++;
                }
            }
            lookup[i] = result;
        }
        return lookup;
    }

    public boolean[][] getBoard() {
        return this.board;
    }
}

头文件(由Java自动创建):

代码语言:javascript
复制
/* DO NOT EDIT THIS FILE - it is machine generated */
#include <jni.h>
/* Header for class game_of_life_GameOfLifeMultithread */

#ifndef _Included_game_of_life_GameOfLifeMultithread
#define _Included_game_of_life_GameOfLifeMultithread
#ifdef __cplusplus
extern "C" {
#endif
#undef game_of_life_GameOfLifeMultithread_threadcount
#define game_of_life_GameOfLifeMultithread_threadcount 8L
/*
 * Class:     game_of_life_GameOfLifeMultithread
 * Method:    getNGenerationNative
 * Signature: (I[BI[[Z)V
 */
JNIEXPORT void JNICALL Java_game_1of_1life_GameOfLifeMultithread_getNGenerationNative
  (JNIEnv *, jobject, jint, jbyteArray, jint, jobjectArray);

#ifdef __cplusplus
}
#endif
#endif

C代码(尽我最大努力记录所使用的技术,但我不确定细节):

代码语言:javascript
复制
/**
 * @brief
 * JNI implementation for a hyper-optimized Conway's Game of Life
 * 
 * @author Shmuel Newmark <https://github.com/synewmark>
 * Except barrier_wait() which was taken from User Tsyvarev on StackOveflow: 
 * https://stackoverflow.com/questions/33598686/spinning-thread-barrier-using-atomic-builtins
 * 
 * @details
 * Code packs 8 booleans into a single char value (@see pack_8(), and unpack_8() methods)
 * 
 * For example a boolean array like this:
 * [0] [1] [2] [3] [4] [5] [6] [7] [8] ... 
 * ..0 ..0 ..1 ..0 ..1 ..0 ..1 ..1 ..1 ...
 * Will turn into:
 *   [0]        [1]
 * 00101011  1.......
 * 
 * #IMPORTANT# Throughout code, "leftmost" bit is the highest bit and "rightmost" is lowest
 * However, the lowest value on the array is left and highest is right
 * Because numerical values are usually read most sig to least, this simplifies
 * the intuition when it comes to merging values but must be kept consistent
 * 
 * Code uses a parallel "dirty bit" array of the same size as the board to track changes
 * If the board is unchanged for a generation the corresponding value in the dirty bit array is cleared 
 * If a cell is changed all adjacent cells' dirty bits are set allowing changes to propogate
 * Note: because we pack cells each char cell (i.e. 8 normal cells) share a dirty bit value
 * 
 * Code uses a "lookup table" to calculate values
 * Lookup is done 18 bits at a time as a 6x3.
 * Because we're storing values packed in 8 bits this requires 2 lookups
 * For example suppose the following section of a life board:
 * ... ........ ...
 * ..0 00101011 1..
 * ..1 10100101 1..
 * ..1 01010010 0..
 * ... ........ ...
 * If we wanted to lookup the value of the middle row we'd need to follow 2 steps
 * Step 1. split the row in half and attach the adjacent values on either side
 * For the left value that would look like:
 * 1 10100
 * But we also need to the row on top and bottom so,
 * 0 00101
 * 1 10100
 * 1 01010
 * Step 2. we'd need to combine those into a single bit string: 000101110100101010
 * Same deal for the right:
 * 01011 1
 * 00101 1
 * 10010 0
 * As a string: 010111001011100100
 * We'll use the int value of both of these bit strings as indices on our lookup table
 * 
 * Code favors speed over clarity for all innermost loop functions:
 * @see get_integral_val_left(), get_integral_val_right(), perform_single_line(), thread_do_work()
 */

#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#include "game_of_life_GameOfLifeMultithread.h"

#define is_alive(char, pos) ((1 << (pos)) & (char))
#define set_alive(char, pos) (char |= (1 << (pos)))
// Don't use mod for wrapping because division is too expensive
#define get_low(n, len) (n-1 >= 0 ? n-1 : len-1)
#define get_high(n, len) (n+1 < len ? n+1 : 0)
// Step 1 for left: get the 5 left bits and the rightmost bit on the one to the left
#define get_integral_row_left(left, center) (((left & 1) << 5)) | (center >> 3)
// Step 1 for right: get the 5 right bits and the leftmost bit on the one to the right
#define get_integral_row_right(center, right) (((center & 0x1F) << 1) | (right & (1 << 7)) >> 7)
#define swap_board(x, y) ({unsigned char** _temp = x; x = y; y =_temp;})


jbyte* lookuptable = NULL;
unsigned char** board1 = NULL;
unsigned char** board2 = NULL;
unsigned char** dirty_bit1 = NULL;
unsigned char** dirty_bit2 = NULL;


int xlen;
int ylen;
int ylenpacked;

int global_thread_count = 8;

struct thread_work{
    int genlength;
    int start;
    int end;
};

static void barrier_wait();

static void* safe_calloc(size_t NumOfElements, size_t SizeOfElements) {
  // malloc family can return NULL for 0 allocs
  // Want to distinguish between a failure and a "succesful" 0 allocation
  if (!NumOfElements || !SizeOfElements) {
    return NULL;
  }
  errno = 0;
  void* result = calloc(NumOfElements, SizeOfElements);
  if (!result) {
    printf("Calloc of size %lld failed with code: %d\n", NumOfElements*SizeOfElements, errno);
  }
  return result;
}

static void free2d(int x, unsigned char** array) {
  if (!array) {
    return;
  }
  for (int i = 0; i < x; i++) {
    if (!array[i]) {
      return;
    }
    free(array[i]);
  }
  free(array);
}

static unsigned char** malloc2d(int x, int y) {
  unsigned char** board = safe_calloc(x, sizeof(*board));
  if (!board) {
    return NULL;
  }
  for (int i = 0; i < x; i++) {
    if (!(board[i] = safe_calloc(y, sizeof(*board[i])))) {
      free2d(x, board);
      return NULL;
    }
  }
  return board;
}

static int set_length_values(JNIEnv * env, jobjectArray array, int* xlen_store, int* ylen_store, int* ylenpacked_store) {
  // Caller ensures the array is not of 0 length and is a square 
  // so we just need the x and any y lengths
  int xlen = (*env)->GetArrayLength(env, array);
  jobjectArray dim1 = (*env)->GetObjectArrayElement(env, array, 0);
  int ylen = (*env)->GetArrayLength(env, dim1);
  if (ylen % 8) {
    return -1;
  }
  *xlen_store = xlen;
  *ylen_store = ylen;
  *ylenpacked_store = ylen/8;
  return 0;
}

static int pack_8(JNIEnv * env, jobjectArray array, unsigned char** board) {
  for (int i = 0; i < xlen; i++) {
      jbooleanArray boolArrayi = (*env)->GetObjectArrayElement(env, array, i);
      jboolean isCopy = JNI_FALSE;
      //entering critical
      jboolean* boolElementsi = (*env)->GetPrimitiveArrayCritical(env, boolArrayi, &isCopy);
      if (isCopy) {
          return -1;
      }
      //pack 8 boolean values into each char
      for (int j = 0; j < ylenpacked; j++) {
        unsigned char c = 0;
        // This places the lowest bit as the most significant
        // @see @details
        for (int k = 0; k < 8; k++) {
          c<<=1;
          c+=(boolElementsi[j*8+k]);
        }
        board[i][j] = c;
      }
    (*env)->ReleasePrimitiveArrayCritical(env, boolArrayi, boolElementsi, 0);
    //exiting critical
  }
  return 0;
}

static void unpack_8(JNIEnv* env, jobjectArray array, unsigned char** board){
  for (int i = 0; i < xlen; i++) {
      jbooleanArray boolArrayi = (*env)->GetObjectArrayElement(env, array, i);
      jboolean* boolElementsi = (*env)->GetPrimitiveArrayCritical(env, boolArrayi, 0);
      for (int j = 0; j < ylenpacked; j++) {
        for (int k = 0; k < 8; k++) {
          // We cannot simply mask and place back into the array because Java only reliably treats values with the lowest bit set as true
          boolElementsi[j*8+(7-k)] = is_alive(board[i][j], k) ? JNI_TRUE : JNI_FALSE;
          // We want to put the value in k-7 because we reversed the bits when packing
          // So we need to un-reverse the bits when unpacking
          // @see @details
        }
      }
      (*env)->ReleasePrimitiveArrayCritical(env, boolArrayi, boolElementsi, 0);
    }
}

// Very performance sensitive: Runs g*x*(y/8) times
static inline unsigned int get_integral_val_left(unsigned int nw, unsigned int n, unsigned int w, unsigned int c, unsigned int sw, unsigned int s) {
    unsigned int top = get_integral_row_left(nw, n);
    unsigned int mid = get_integral_row_left(w, c);
    unsigned int bot = get_integral_row_left(sw, s);
    // Step 2: combine all 3 rows into a single bit string
    return ((top << 12) | (mid << 6) | bot);
}
// Very performance sensitive: Runs g*x*(y/8) times
static inline unsigned int get_integral_val_right(unsigned int n, unsigned int ne, unsigned int c, unsigned int e, unsigned int s, unsigned int se) {
    unsigned int top = get_integral_row_right(n, ne);
    unsigned int mid = get_integral_row_right(c, e);
    unsigned int bot = get_integral_row_right(s, se);
    // Step 2: combine all 3 rows into a single bit string
    return ((top << 12) | (mid << 6) | bot);
}
// Very performance sensitive: runs g*x times
static inline void perform_single_line(int xpos, unsigned char** board1, unsigned char** board2, unsigned char** dirty_bit1, unsigned char** dirty_bit2) {
  // inner loop runs: runs g*x*y times
  for (int i = 0; i < ylenpacked; i++) {
        // If dirty bit is clear we don't run
        if(!dirty_bit1[xpos][i]) {
          continue;
        }
        // Watch for over and underflow when wrapping
        int up = get_low(xpos, xlen);
        int down = get_high(xpos, xlen);
        int left = get_low(i, ylenpacked);
        int right = get_high(i, ylenpacked);
        // Split the lookup into 2 request: left and right
        // Each request is 18 bits, or 6 for each row
        unsigned int lookupvalleft = get_integral_val_left(board1[up][left], board1[up][i], board1[xpos][left], board1[xpos][i], board1[down][left], board1[down][i]);
        unsigned char newvalleft = lookuptable[lookupvalleft];
        
        unsigned int lookupvalright = get_integral_val_right(board1[up][i], board1[up][right], board1[xpos][i], board1[xpos][right], board1[down][i], board1[down][right]);
        unsigned char newvalright = lookuptable[lookupvalright];

        unsigned char newval = (newvalleft << 4) | newvalright;
        char xc = newval^board1[xpos][i];
        
        // Set all dirty bits around the just changed value
        // Even though we may have a race condition on access it's safe 
        // because char access is atomic and we're only ever setting them, never clearing

        if (xc) {
          // If there's any difference between the curr and prev value we must set the dirty on the ones above and below
          dirty_bit2[down][i] = 1;
          dirty_bit2[xpos][i] = 1;
          dirty_bit2[up][i] = 1;
          // But we only need to set the left and right dirty values if the left and right-most values were changed
          if (is_alive(xc, 0)) {
            dirty_bit2[down][right] = 1;
            dirty_bit2[xpos][right] = 1;
            dirty_bit2[up][right] = 1;
          }

          if (is_alive(xc, 7)) {
            dirty_bit2[down][left] = 1;
            dirty_bit2[xpos][left] = 1;
            dirty_bit2[up][left] = 1;
          }
        }
        // Note: we need to set the next board to this value regardless of whether it was changed on *this* iteration
        // However, if the dirty bit was not set and it wasn't changed on the *prev* iteration it will already be the correct val
        board2[xpos][i] = newval;
      }
}

void* thread_do_work(void* threadwork) {
  struct thread_work* work = (struct thread_work*) threadwork;
  unsigned char** cache_board1 = board1;
  unsigned char** cache_board2 = board2;
  unsigned char** cache_dirtybit1 = dirty_bit1;
  unsigned char** cache_dirtybit2 = dirty_bit2;
  // Somewhat performace sensitive: each barrier_wait runs g times
  for (int i = 0; i < work->genlength; i++) {
    for (int j = work->start; j < work->end; j++) {
        perform_single_line(j, cache_board1, cache_board2, cache_dirtybit1, cache_dirtybit2);
    }
    barrier_wait();
    swap_board(cache_board1, cache_board2);
    swap_board(cache_dirtybit1, cache_dirtybit2);
    for (int j = work->start; j < work->end; j++) {
      // clear all dirty bits after reading them
      // The big drawback of the approach is that it requires 2 barrier waits
      // Otherwise we risk clearing the bits while another thread is reading them
      memset(cache_dirtybit2[j], 0, ylenpacked);
    }
    barrier_wait();
  }
  return NULL;
}

static int initialize_boards(JNIEnv* env, jobjectArray array) {
  board1 = malloc2d(xlen, ylenpacked);
  if (!board1) {        
    printf("Indeterminate error, terminating\n");
    return -1;
  }
  if (pack_8(env, array, board1)) {
    printf("Indeterminate error, terminating\n");
    return -1;
  }
  board2 = malloc2d(xlen, ylenpacked);
  if (!board2) {
    printf("Indeterminate error, terminating\n"); 
    return -1;
  }
  return 0;
}

static int initialize_dirty_bits(int xlen, int ylenpacked) {
  dirty_bit1 = malloc2d(xlen, ylenpacked);
  if (!dirty_bit1) {
    return -1;
  }
  // turn on all dirty bits on the first run so nothing gets skipped
  for (int i = 0; i < xlen; i++) {
    memset(dirty_bit1[i], 0xFF, ylenpacked);
  }

  dirty_bit2 = malloc2d(xlen, ylenpacked);
  if (!dirty_bit2) {
    return -1;
  }
  return 0;
}

static void initialize_lookup_table(JNIEnv* env, jarray array) {
  lookuptable = (*env)->GetPrimitiveArrayCritical(env, array, NULL);
}

static void free_all_resources() {
  free2d(xlen, board1);
  free2d(xlen, board2);
  free2d(xlen, dirty_bit1);
  free2d(xlen, dirty_bit2);
}

static int do_all_thread_work(int threadcount, int runlength) {
  // Each thread works on it's own x range, no reason to make more threads than xlen
  global_thread_count = xlen < threadcount ? xlen : threadcount;
  struct thread_work workpool[global_thread_count];
  int startwork = 0;
  for (int i = 0; i < global_thread_count-1; i++) {
    // If the x value is not divisble by 8 
    int amountofwork = (xlen/global_thread_count) + ((xlen%global_thread_count) > i);
    workpool[i] = (struct thread_work) {runlength, startwork, startwork+amountofwork};
    // The possibility of some threads being created succesfully, but choking on the rest is unhandled
    int error;
    if ((error = (pthread_create(NULL, 0, thread_do_work, workpool+i)))) {
      printf("Creating thread: %d failed with code: %d\n", i+1, error);
      return -1;
    }
    startwork+=amountofwork;
  }
  int amountofwork = (xlen/global_thread_count);
  struct thread_work finalwork = (struct thread_work) {runlength, startwork, startwork+amountofwork};
  thread_do_work(&finalwork);
  return 0;
}

static int initialize_all_resources(JNIEnv* env, jint threadcount, jbyteArray lookup, jint runlength, jobjectArray array) {
  if (set_length_values(env, array, &xlen, &ylen, &ylenpacked)) {
    printf("Array must have a y size divisable by 8\n");
    return -1;
  }
  if (initialize_boards(env, array)) {
    return -1;
  }
  if (initialize_dirty_bits(xlen, ylenpacked)) {
    return -1;
  }
  initialize_lookup_table(env, lookup);
  do_all_thread_work(threadcount, runlength);
  return 0;
}

JNIEXPORT void JNICALL Java_game_1of_1life_GameOfLifeMultithread_getNGenerationNative
  (JNIEnv* env, jobject object, jint threadcount, jbyteArray lookup, jint runlength, jobjectArray array) {
  printf("Running for %d generations!\n", runlength);
  if (!initialize_all_resources(env, threadcount, lookup, runlength, array)) {
    unsigned char** finalboard = runlength % 2 ? board2 : board1;  
    unpack_8(env, array, finalboard);
  }
  free_all_resources();
}

// Barrier code is taken from User Tsyvarev: 
// https://stackoverflow.com/questions/33598686/spinning-thread-barrier-using-atomic-builtins

// Because we're saturating the cores and denying multitasking, there's no reason to sleep on waits
// So we're busy-waiting instead to squeeze out a bit more performance

int bar = 0; // Counter of threads, faced barrier.
volatile int passed = 0; // Number of barriers, passed by all threads.

// Due to __sync primitives, code is GCC/Clang dependant
static void barrier_wait()
{
    int passed_old = passed; // Should be evaluated before incrementing *bar*!

    if(__sync_fetch_and_add(&bar,1) == (global_thread_count - 1))
    {
        // The last thread, faced barrier.
        bar = 0;
        // *bar* should be reseted strictly before updating of barriers counter.
        __sync_synchronize(); 
        passed++; // Mark barrier as passed.
        // return 1;
    }
    else
    {
        // Not the last thread. Wait others.
        while(passed == passed_old) {};
        // Need to synchronize cache with other threads, passed barrier.
        __sync_synchronize();
        // return 0;
    }
}

正如注释中所指出的,我的代码是不可移植的,因为它依赖GCC/Clang原语作为屏障。如果有人对如何以便携方式实现有任何建议,将不胜感激。

不过,我主要喜欢关于如何简化步骤1、2和4的建议,即使代价是性能,因为它们只运行一次。

此外,任何关于如何简化步骤3而不花费太多性能的建议将不胜感激。

此外,如上所述,关于如何处理失败的嵌套malloc调用或线程创建失败的建议部分通过。

EN

回答 1

Code Review用户

发布于 2023-04-25 17:04:23

safe_calloc()弱点

  • errno = 0;最好留给调用代码。
  • calloc()肯定不会在分配失败时设置errno。因此,使用errno来表示某些东西在这里是不可移植的。
  • NumOfElements*SizeOfElements可能会溢出size_t的数学。如果想要unsigned long long数学,可以使用(unsigned long long) NumOfElements*SizeOfElements等。IAC,乘法并不是真正需要的,因为如果分配失败,调试将希望知道传递给calloc()的确切值,而不是派生值。
  • NumOfElements*SizeOfElementssize_t产品。这与"%lld"不匹配。
  • “成功”拼写错误。对代码运行拼写检查程序。
  • 风格问题:我发现与==的比较比!更清楚。避免否定:我不知道你们中的一半和我想要的一样好;我喜欢你们中不到一半的人,一半是你应得的。
  • stderr上写入错误消息。

候选人重写:

代码语言:javascript
复制
static void* calloc_log(size_t NumOfElements, size_t SizeOfElements) {
  if (NumOfElements == 0 || SizeOfElements == 0) {
    return NULL;
    // Or adjust parameters so a return value of NULL is always an error.
    NumOfElements = SizeOfElements = 1; 
  }
  void* result = calloc(NumOfElements, SizeOfElements);
  if (result == NULL) {
    fprintf(stderr, "calloc(%zu, %zu) failed.\n", NumOfElements, SizeOfElements);
  }
  return result;
}

safe_calloc()的使用也是错误的,因为代码忽略了负的x问题,并悄悄地将这样的值更改为可能成功分配的大型无符号值。

代码语言:javascript
复制
static unsigned char** malloc2d(int x, int y) {
  unsigned char** board = safe_calloc(x, sizeof(*board));  // Weak code

考虑使用size_tunsigned x, y或测试负值。

票数 2
EN
页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://codereview.stackexchange.com/questions/284190

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档