PHP/LLVM/MYSQL/BSD regex library Heap Buffer Overflow

2015.02.08
Credit: guidovranken
Risk: High
Local: Yes
Remote: Yes
CVE: N/A
CWE: CWE-119

Introduction The following document describes a heap overflow vulnerability in Henry Spence's regex library, affecting 32 bit systems only. This library, or variations on and derivations of it, is used in such software as: PHP LLVM MySQL server Bionic libc As well as various other *BSD libc implementations: FreeBSD NetBSD The above applications are listed here merely to point out that they include the library. I have NOT tested the above applications for being vulnerable and thus I cannot give any guarantee that they are; they are listed here to point out that the library has been disseminated widely and that the vulnerability MAY not only be exploitable in'laboratory setting' cases and the danger of it MAY permeate deeply into software stacks. The vulnerability requires a significant amount of control over one of the library's functions to be exploited and is unlikely to occur in a general programming context, since it requires a string of ~683 megabytes to be constructed. However, allocations of such a size are, in certain contexts, certainly feasible. An additional factor that limits the overall feasibility of an attack is that the exact data written outside the bounds of the heap can only be controlled by the attacker to a certain extent, as opposed to a fully arbitrary mutation of memory. Technical description Source code excerpts that follow are taken from https://codeload.github.com/garyhouston/rxspencer/tar.gz/alpha3.8.g5 (as referenced to on http://www.arglist.com/regex/). The vulnerability is caused inside the regcomp function: 85 int /* 0 success, otherwise REG_something */ 86 regcomp(preg, pattern, cflags) 87 regex_t *preg; 88 const char *pattern; 89 int cflags; 90 { This function compiles the regex as defined in string form by 'const char *pattern'. The vulnerable code: 111 len = strlen((char *)pattern); ... ... 118 p->ssize = len/(size_t)2*(size_t)3 + (size_t)1; /* ugh */ 119 p->strip = (sop *)malloc(p->ssize * sizeof(sop)); &#8216;len&#8217; is here enlarged to such an extent that, in the process of enlarging (multiplication and addition), causes the 32 bit register/variable to overflow. Formally, the smallest value of 'en' that causes an overflow is: (2<<32 / 4 - 1) / 3 * 2 = 0x2AAAAAAA Conversely: (0x2AAAAAAA / 2 * 3 + 1) * 4 = 0x100000000 But since this is too large a value for a 32 bit register to hold, we yield: 0x100000000 & 0xFFFFFFFF = 0x00000000 The smallest &#8216;len&#8217; value to result in a positive value to be passed to malloc is: ((0x2AAAAAAC / 2 * 3 + 1) * 4) & 0xFFFFFFFF = 0x0000000C This is about 0x2AAAAAAC / 1024 / 1024 = 682 megabytes. The 'p->ssize' variable, however, does not overflow, and contains the number of elements purportedly allocated by malloc, and is therefore an unreliable indicator to the library as to the size of the allocated buffer: 1375 /* deal with undersized strip */ 1376 if (p->slen >= p->ssize) 1377 enlarge(p, (p->ssize+1) / 2 * 3); /* +50% */ Having discovered this vulnerability only recently, my research into the actual exploitability has been limited. At present I am mainly concerned at pointing it out rather than exploiting it. However, mutation of the heap-allocated memory that p->strip points to is mainly performed by the doemit function: 1363 doemit(p, op, opnd) 1364 register struct parse *p; 1365 sop op; 1366 size_t opnd; 1367 { 1368 /* avoid making error situations worse */ 1369 if (p->error != 0) 1370 return; 1371 1372 /* deal with oversize operands ("can't happen", more or less) */ 1373 assert(opnd < 1<<OPSHIFT); 1374 1375 /* deal with undersized strip */ 1376 if (p->slen >= p->ssize) 1377 enlarge(p, (p->ssize+1) / 2 * 3); /* +50% */ 1378 assert(p->slen < p->ssize); 1379 1380 /* finally, it's all reduced to the easy case */ 1381 p->strip[p->slen++] = SOP(op, opnd); 1382 } A simply grep of the invocations to doemit() in regcomp.c: #define EMIT(op, sopnd) doemit(p, (sop)(op), (size_t)(sopnd)) EMIT(OEND, 0); EMIT(OEND, 0); EMIT(OOR2, 0); /* offset is very wrong */ EMIT(OLPAREN, subno); EMIT(ORPAREN, subno); EMIT(OBOL, 0); EMIT(OEOL, 0); EMIT(OANY, 0); EMIT(OOR2, 0); /* offset very wrong... */ EMIT(OBOL, 0); EMIT(OEOL, 0); EMIT(OANY, 0); EMIT(OLPAREN, subno); EMIT(ORPAREN, subno); EMIT(OBACK_, i); EMIT(O_BACK, i); EMIT(OBOW, 0); EMIT(OEOW, 0); EMIT(OANYOF, freezeset(p, cs)); EMIT(OCHAR, (unsigned char)ch); EMIT(OOR2, 0); EMIT(OOR2, 0); /* offset very wrong... */ EMIT(op, opnd); /* do checks, ensure space */ where (regex2.h): 43 #define OPSHIFT (26) 46 #define SOP(op, opnd) ((op)|(opnd)) 49 #define OEND (1<<OPSHIFT) /* endmarker - */ 50 #define OCHAR (2<<OPSHIFT) /* character unsigned char */ 51 #define OBOL (3<<OPSHIFT) /* left anchor - */ 52 #define OEOL (4<<OPSHIFT) /* right anchor - */ 53 #define OANY (5<<OPSHIFT) /* . - */ 54 #define OANYOF (6<<OPSHIFT) /* [...] set number */ 55 #define OBACK_ (7<<OPSHIFT) /* begin d paren number */ 56 #define O_BACK (8<<OPSHIFT) /* end d paren number */ 57 #define OPLUS_ (9<<OPSHIFT) /* + prefix fwd to suffix */ 58 #define O_PLUS (10<<OPSHIFT) /* + suffix back to prefix */ 59 #define OQUEST_ (11<<OPSHIFT) /* ? prefix fwd to suffix */ 60 #define O_QUEST (12<<OPSHIFT) /* ? suffix back to prefix */ 61 #define OLPAREN (13<<OPSHIFT) /* ( fwd to ) */ 62 #define ORPAREN (14<<OPSHIFT) /* ) back to ( */ 62 #define ORPAREN (14<<OPSHIFT) /* ) back to ( */ 63 #define OCH_ (15<<OPSHIFT) /* begin choice fwd to OOR2 */ 64 #define OOR1 (16<<OPSHIFT) /* | pt. 1 back to OOR1 or OCH_ */ 65 #define OOR2 (17<<OPSHIFT) /* | pt. 2 fwd to OOR2 or O_CH */ 66 #define O_CH (18<<OPSHIFT) /* end choice back to OOR1 */ 67 #define OBOW (19<<OPSHIFT) /* begin word - */ 68 #define OEOW (20<<OPSHIFT) /* end word - */ Given the way doemit works (OR-ing the first and second parameter of EMIT and writing it to p->strip), this means that someone exploiting this has only a limited amount of control over which values are written.

References:

https://guidovranken.wordpress.com/2015/02/04/full-disclosure-heap-overflow-in-h-spencers-regex-library-on-32-bit-systems/
http://www.arglist.com/regex/


Vote for this issue:
50%
50%


 

Thanks for you vote!


 

Thanks for you comment!
Your message is in quarantine 48 hours.

Comment it here.


(*) - required fields.  
{{ x.nick }} | Date: {{ x.ux * 1000 | date:'yyyy-MM-dd' }} {{ x.ux * 1000 | date:'HH:mm' }} CET+1
{{ x.comment }}

Copyright 2024, cxsecurity.com

 

Back to Top