首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在Oracle中,限制字符串中字符的最快方法是什么?

在Oracle中,限制字符串中字符的最快方法是什么?
EN

Stack Overflow用户
提问于 2013-11-05 19:30:25
回答 3查看 1.8K关注 0票数 2

我有很多文本字段需要处理。为了处理它们,我需要做的第一件事就是规范我所处理的一组字符。我需要我的输出字符串包含以下内容;

A-Z,0-9和空格,我要把所有小写转换成大写.

因此,我在pl/sql中使用以下内容;

代码语言:javascript
复制
X := UPPER(TRIM(REGEXP_REPLACE
(REGEXP_REPLACE(X, '[^0-9A-Za-z ]', ' '),'( )* ',' ')));

这太慢了。什么会更快?

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2013-11-06 19:19:06

您可以尝试这种方法,基于一些(非常)松散的测试,这种方法看起来要快得多。这是一个本地编译的函数:

代码语言:javascript
复制
CREATE OR REPLACE function clean_string(
    in_string    in varchar2)
 return varchar2 AS
   out_string varchar2(4000) := '';
   in_length number;
   cnt number := 0;
   in_char char(1);
   out_char char(1);
   dec_char number;
   prev_space boolean := false;
begin
   --dbms_output.put_line('In string: ' || in_string);
   in_length := LENGTH(in_string);
   while cnt < in_length
   LOOP
     cnt := cnt + 1;
     in_char := substr(in_string, cnt, 1);
     dec_char := ascii(in_char);
     -- blank out non alphanumerics
     IF (
       (dec_char >= 48 AND dec_char <= 57) OR
       (dec_char >= 65 AND dec_char <= 90) OR
       (dec_char >= 97 AND dec_char <= 122)
       ) THEN
         --keep it
         out_char := in_char;
     ELSE
       out_char := ' ';
     END IF;

     IF (NOT(prev_space AND out_char = ' ')) THEN
       out_string := out_string || out_char;
     END IF;

     <<endloop>>
     IF (out_char = ' ') THEN
       prev_space := true;
     ELSE
       prev_space := false;
     END IF;

   END LOOP;
   return trim(upper(out_string));
end;

ALTER SESSION SET PLSQL_CODE_TYPE=NATIVE;
ALTER function clean_string COMPILE;

为了进行测试,我从一个表中提取了500万行,并清理了一些字符串:

代码语言:javascript
复制
set serveroutput on
declare
    cursor sel_cur1 is
    select name, clean_string(name) as cln_name,
        address1, clean_string(address1) as cln_addr1,
        address2, clean_string(address2) as cln_addr2,
        city, clean_string(city) as cln_city,
        state, clean_string(state) as cln_state,
        postalcode, clean_string(postalcode) as cln_zip
    from my_table
    where rownum <= 5000000;

    cursor sel_cur2 is
    select name,
        address1,
        address2,
        city,
        state,
        postalcode
    from my_table
    where rownum <= 5000000;

    l_cnt integer := 0;
    l_cln_name varchar2(100);
    l_cln_addr1 varchar2(100);
    l_cln_addr2 varchar2(100);
    l_cln_city varchar2(100);
    l_cln_state varchar2(100);
    l_cln_zip varchar2(100);

    l_interval interval day to second(4);
    l_start timestamp;
    l_end timestamp;
begin
   l_start := systimestamp;
   for rec in sel_cur2
   loop
         l_cnt := l_cnt + 1;
         l_cln_name := clean_string(rec.name);
         l_cln_addr1 := clean_string(rec.address1);
         l_cln_addr2 := clean_string(rec.address2);
         l_cln_city := clean_string(rec.city);
         l_cln_state := clean_string(rec.state);
         l_cln_zip := clean_string(rec.postalcode);
   end loop;
    l_end := systimestamp;
    l_interval := l_end - l_start;
    dbms_output.put_line('Procedural approach timing: ' || l_interval);
   -------------------------------------------------
   l_cnt := 0;
   l_start := systimestamp;
   for rec in sel_cur1
   loop
         -- cleaning already done in SQL
         l_cnt := l_cnt + 1;
   end loop;
   l_end := systimestamp;
   l_interval := l_end - l_start;
   dbms_output.put_line('SQL approach timing: ' || l_interval);

   -------------------------------------------------
   l_cnt := 0;
   l_start := systimestamp;
   for rec in sel_cur2
   loop
         l_cnt := l_cnt + 1;
         l_cln_name := UPPER(TRIM(REGEXP_REPLACE(REGEXP_REPLACE(rec.name, '[^0-9A-Za-z ]', ' '),'( )* ',' ')));
         l_cln_addr1 := UPPER(TRIM(REGEXP_REPLACE(REGEXP_REPLACE(rec.address1, '[^0-9A-Za-z ]', ' '),'( )* ',' ')));
         l_cln_addr2 := UPPER(TRIM(REGEXP_REPLACE(REGEXP_REPLACE(rec.address2, '[^0-9A-Za-z ]', ' '),'( )* ',' ')));
         l_cln_city := UPPER(TRIM(REGEXP_REPLACE(REGEXP_REPLACE(rec.city, '[^0-9A-Za-z ]', ' '),'( )* ',' ')));
         l_cln_state := UPPER(TRIM(REGEXP_REPLACE(REGEXP_REPLACE(rec.state, '[^0-9A-Za-z ]', ' '),'( )* ',' ')));
         l_cln_zip := UPPER(TRIM(REGEXP_REPLACE(REGEXP_REPLACE(rec.postalcode, '[^0-9A-Za-z ]', ' '),'( )* ',' ')));
   end loop;
   l_end := systimestamp;
   l_interval := l_end - l_start;
   dbms_output.put_line('Existing approach timing: ' || l_interval);
end;

产出如下:

代码语言:javascript
复制
Procedural approach timing: +00 00:02:04.0320
SQL approach timing: +00 00:02:49.4326
Existing approach timing: +00 00:05:50.1607

此外,本机编译似乎只有助于对处理进行过程化处理(而不是从SQL查询调用函数),但似乎比regexp_replace解决方案快得多。希望这能有所帮助。

票数 1
EN

Stack Overflow用户

发布于 2013-11-08 19:54:42

首先,让我说我并没有真正回答我自己的问题,但我接受了tbone的回答。提供这个答案的原因,是这些评论不让我发布我真正想要的东西。

我通过几次调整创建了一个与tbone函数几乎相同的函数,通过更改处理小写字符范围的方式去掉了上面的函数,并将数字更改为binary_integers。

代码语言:javascript
复制
  FUNCTION CLEAN_STRING(IN_STRING in VARCHAR2) RETURN VARCHAR2 
  AS
    OUT_STRING VARCHAR2(32767) := '';
    IN_LENGTH BINARY_INTEGER;
    CNT BINARY_INTEGER := 0;
    IN_CHAR CHAR(1);
    OUT_CHAR CHAR(1);
    DEC_CHAR BINARY_INTEGER;
    PREV_SPACE BOOLEAN := FALSE;
  BEGIN
    IN_LENGTH := LENGTH(IN_STRING);
    WHILE CNT < IN_LENGTH
    LOOP
      CNT := CNT + 1;
      IN_CHAR := SUBSTR(IN_STRING, CNT, 1);
      DEC_CHAR := ASCII(IN_CHAR);
      -- blank out non alphanumerics
      IF ((DEC_CHAR >= 48 AND DEC_CHAR <= 57) OR
         (DEC_CHAR >= 65 AND DEC_CHAR <= 90)) 
      THEN
      --keep it
      OUT_CHAR := IN_CHAR;
      ELSE
        IF (DEC_CHAR >= 97 AND DEC_CHAR <= 122)
        THEN
          OUT_CHAR := CHR(DEC_CHAR - 32);
        ELSE
         OUT_CHAR := ' ';
        END IF;
      END IF;

      IF (NOT(PREV_SPACE AND OUT_CHAR = ' ')) 
      THEN
       OUT_STRING := OUT_STRING || OUT_CHAR;
      END IF;

      <<endloop>>
      IF (OUT_CHAR = ' ') THEN
       PREV_SPACE := TRUE;
      ELSE
       PREV_SPACE := FALSE;
      END IF;

    END LOOP;
    RETURN TRIM(OUT_STRING);
  END CLEAN_STRING;

然后,我像tbone一样创建了一个简单的测试平台,但是我测试了三个不同的例程。首先,我验证它们是否都返回相同的结果,然后对每个例程进行计时。这是试验台;

代码语言:javascript
复制
set serveroutput on
DECLARE
  CURSOR PATHMAST_CURS
  IS
    SELECT PATHMAST_TEXT_DIAGNOSIS FROM PATHMAST WHERE ROWNUM < 100000;
  DUMMY CLOB;
  DUMMY_1 CLOB;
  DUMMY_2 CLOB;
  l_interval interval day to second(4);
  l_start timestamp;
  l_end timestamp;
  diff_count_1 binary_integer := 0;
  diff_count_2 binary_integer := 0;
BEGIN

  FOR PATH_REC IN PATHMAST_CURS
  LOOP
    DUMMY := UPPER(TRIM(REGEXP_REPLACE(REGEXP_REPLACE(NVL(PATH_REC.PATHMAST_TEXT_DIAGNOSIS,' '), '[^0-9A-Za-z ]', ' '),'( )* ',' ')));
    DUMMY_1 := pathmast_utility_3.CLEAN_STRING(NVL(PATH_REC.PATHMAST_TEXT_DIAGNOSIS,' '));
    DUMMY_2 := regexp_replace(trim(translate(NVL(PATH_REC.PATHMAST_TEXT_DIAGNOSIS,' '),'abcdefghijklmnopqrstuvwxyz`~!@#$%^&*()''_+-={[}]|/\":;,.<>?µ’±€'||chr(9),'ABCDEFGHIJKLMNOPQRSTUVWXYZ                                     ')),'( )* ',' ');
    IF DUMMY_1 != DUMMY
    THEN
      diff_count_1 := diff_count_1 + 1;
    END IF;
    IF DUMMY_2 != DUMMY
    THEN
      diff_count_2 := diff_count_2 + 1;
      dbms_output.put_line('Regexp: ' || DUMMY);
      dbms_output.put_line('Translate: ' || DUMMY_2);
    END IF;
  END LOOP;
  dbms_output.put_line('CLEAN_STRING differences: ' || diff_count_1);
  dbms_output.put_line('Translate differences: ' || diff_count_2);


  l_start := systimestamp;
  FOR PATH_REC IN PATHMAST_CURS
  LOOP
    DUMMY := UPPER(TRIM(REGEXP_REPLACE(REGEXP_REPLACE(PATH_REC.PATHMAST_TEXT_DIAGNOSIS, '[^0-9A-Za-z ]', ' '),'( )* ',' ')));
  END LOOP;
  l_end := systimestamp;
  l_interval := l_end - l_start;
  dbms_output.put_line('Regexp approach timing: ' || l_interval);
  -------------------------------------------------    
  l_start := systimestamp;
  FOR PATH_REC IN PATHMAST_CURS
  LOOP
    DUMMY := pathmast_utility_3.CLEAN_STRING(PATH_REC.PATHMAST_TEXT_DIAGNOSIS);
  END LOOP;
  l_end := systimestamp;
  l_interval := l_end - l_start;
  dbms_output.put_line('CLEAN_STRING approach timing: ' || l_interval);
  -------------------------------------------------  
  l_start := systimestamp;
  FOR PATH_REC IN PATHMAST_CURS
  LOOP
    DUMMY := regexp_replace(trim(translate(NVL(PATH_REC.PATHMAST_TEXT_DIAGNOSIS,' '),'abcdefghijklmnopqrstuvwxyz`~!@#$%^&*()''_+-={[}]|/\":;,.<>?µ’±€'||chr(9),'ABCDEFGHIJKLMNOPQRSTUVWXYZ                                     ')),'( )* ',' ');
  END LOOP;
  l_end := systimestamp;
  l_interval := l_end - l_start;
  dbms_output.put_line('TRANSLATE approach timing: ' || l_interval);
  -------------------------------------------------  
END;

以下是结果;

代码语言:javascript
复制
anonymous block completed
CLEAN_STRING differences: 0
Translate differences: 0
Regexp approach timing: +00 00:00:52.9160
CLEAN_STRING approach timing: +00 00:00:05.5220
TRANSLATE approach timing: +00 00:00:13.4320

这一切都是不编译本机的。所以tbone是大赢家。谢谢你tbone。

如果出于任何原因,您希望/需要使用翻译版本,则应该以编程方式构建转换字符串,以获取所有特殊字符。

票数 1
EN

Stack Overflow用户

发布于 2013-11-06 06:37:10

也许,您可以使用TRANSLATE而不是regex来删除特殊字符,并将小写转换为大写。

代码语言:javascript
复制
regexp_replace(
               trim(
                    translate(x,
                              'abcdefghijklmnopqrstuvwxyz`~!@#$%^&*()_+-={[}]|/\"'':;,.<>?',
                              'ABCDEFGHIJKLMNOPQRSTUVWXYZ                                '
                             )
                   ),
                   ' {2,}',
                   ' '
              )

在具有1000行的表和具有从1到4000之间的任意字符的列上尝试它。结果大约减少了35%的时间(没有在PLSQL中尝试)。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/19797255

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档