mark Posted May 26, 2014 Share Posted May 26, 2014 As mentioned in another posting there are some code pieces I consider problematic due to the usage of C functions instead of their C++ counterparts. So far I found the following C functions in gemrb/core: - malloc (including memory overflow possibilities and magic numbers) and malloc+memset - instead of new, - unsafe C string functions: strcpy, strcat, sprintf, sscanf and strlen(*) - instead of C++ strings and streams which avoid memory allocation/length checking/terminating issues. (*) if it cannot be guaranteed that the strings really terminate with \0, which I did not check for For the lulz I also tried some more pedantic compiler checks to see what happens if more safety-checks are enforced. It caused the build to fail (because gemrb treats warnings as errors, which isn't a bad thing to do): CXXFLAGS = -g -Werror -Wall -W -pedantic -funsigned-char -Wmissing-declarations -Wcast-qual -Wcast-align -Wformat-security -Wformat-nonliteral -Wmissing-format-attribute -Winline (probably -funsigned-char should be replaced by -fno-unsigned-char depending on what you expect a char to be by default, signed or unsigned) Replacing the C functions and fixing the pedantic warnings will improve the quality and stability of the sources (as well as the security - though probably nobody is going to exploit a game... but actually that is no excuse to use strcat and partners ^^). Out of curiosity: Was gemrb a C program in it's early days? Link to comment
lynx Posted May 26, 2014 Share Posted May 26, 2014 Those functions are unsafe if used improperly, not by definition. I'm sure you could find relevant places where we're not careful enough (there are some remainders in our coverity scan reports too), but those should be few. Patches welcome. Oh, what's a good replacement for sscanf? I'm pretty sure it started as c++, though I wasn't here back then. It's light on c++ magic due to portability (horrible compilers), debugging and, I guess, original data formats, which use char. Link to comment
SyntaxError Posted May 26, 2014 Share Posted May 26, 2014 none of that is news to us the best way to fix most of it (IMO) isn't to simply replace the use of functions. For example, the ieResRef would be best stuffed into a wrapper class with overloaded operators to safely do the copies/comparison/whatever. most of the uses of strcpy et al are to do with the resrefs. (I have already done this and started using it in my branch btw). Anyway, this is hardly the worst thing about the GemRB codebase Link to comment
Avenger Posted May 26, 2014 Share Posted May 26, 2014 Just looked at strcat - the only place where it could be unsafe is in ParseGameDate of SaveGameIterator, in all other places we use it with unexploitable literals. Link to comment
mark Posted May 29, 2014 Author Share Posted May 29, 2014 I found lines like MapSet = (unsigned short *) malloc(sizeof(unsigned short) * Width * Height); which I think are interesting: Without checking for overflows, gemrb will crash if someone writes a mod for a game with unexpected content or if someone damaged his original files (e.g. if large mapsizes or whatever cause Width or Height to be too large). In this particular case there is also no check if MapSet != NULL which will cause gemrb to crash without information if the malloc function did not suceed. The usage of the new operator would throw a bad_alloc exception instead. (plus, malloc isn't typesafe but I think that is clear) Another thing are those malloc + memset to zero calls I have seen which could be replaced by calloc ([1] for security aspects of mem-functions and [2] for possible performance reasons) or new which also sets the memory to zero (e.g. with "new int[10]();"). There is a nice, short article about integer/memory allocation overflows [3] and another one [4] with examples how to handle memory allocation. [1] http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=202 [2] https://stackoverflow.com/questions/2688466/why-mallocmemset-is-slower-than-calloc [3] http://undeadly.org/cgi?action=article&sid=20060330071917 [4] http://drj11.wordpress.com/2008/06/04/calloc-when-multiply-overflows/ Link to comment
lynx Posted May 29, 2014 Share Posted May 29, 2014 The MapSet one is again a more academic affair. Why would anyone create a tilemap with so many cells in any directions is beyond me, not to mention the amount of space it would need for storage. Any related crash would likely happen sooner, during tilemap construction. Again, patches are welcome. I trivially grepped only 6 of 218 memset uses where calloc could be used, so it would be a nice and small change to get started. Link to comment
Avenger Posted May 29, 2014 Share Posted May 29, 2014 I don't quite understand why using new is safer than malloc, or using calloc instead of malloc+memset. If it crashes due to a bad allocation, it will crash from an unhandled or even handled exception. The only difference is that during normal functioning (99.9% of the time) it will be slower. Link to comment
mark Posted June 1, 2014 Author Share Posted June 1, 2014 Long story short: The code works, so it is not _really_ necessary to replace all the C thingies but I could C++ify these parts if you like. Consistency never hurts and replacing quasi-deprecated functions with security problems never hurts either (one C paradigm was: trust the user - hmm, no!) ^^ Some things regarding new/delete vs free/malloc: new/delete should be preferred in C++ because new does not need typecasts. (You don't need those in C so it's some kind of a step backwards in C++). Typecasting errors can be very weird like the error I had in the RNG code where a signed int has been automatically and unexpectedly cast to an unsigned int which caused the app to crash - sometimes. new is an operator (part of the language), malloc is a function (part of the OS-libraries). This might introduce problems e.g. if some Linux user decides to use another libc like uclibc or dietlibc which works differently than glibc, especially in the case if malloc(X*Y) overflows (where OpenBSD e.g. suggests to use reallocarray() instead which is part of the standard library there - but only there). As every library is different, you don't need to worry about that with new. If the programmer uses malloc+memset, two functions calls are needed which means that the programmer potentionally may create more errors, especially as the usage of memset is pretty bug-prone (yeah yeah, the code works). Also, depending on the platform, calloc might check for overflows, while memset just doesn't. I also found discussions that stated that calloc is much faster than malloc+memset while doing the same thing. I didn't benchmark this and guess that rather depends on the library (again). Faster and safer is good, I guess ^^ Also, if new fails you get an exception you can use to debug. Even if you don't use exceptions. With malloc simply "bad stuff happens" and you don't know what happened. The example of NULL-unchecked malloc function calls tells us that the programmer intended to use new in this case :-) There are other differences but I guess they don't apply here (calling constructors etc). Link to comment
Avenger Posted June 1, 2014 Share Posted June 1, 2014 Ever looked at how a compiled program looks like, from the inside? Here is a snippet: .idata:004182BC ; void *__cdecl _malloc(size_t Size) .idata:004182BC extrn __imp__malloc:dword ; CODE XREF: _wmain+22p .idata:004182BC ; DATA XREF: _wmain+22r ... .idata:004182C0 extrn __imp___decode_pointer:dword ; CODE XREF: __onexit+38p .idata:004182C0 ; __onexit+74p ... .idata:004182C4 ; __declspec(dllimport) void * __cdecl operator new(unsigned int) .idata:004182C4 extrn __imp_??2@YAPAXI@Z:dword .idata:004182C4 ; DATA XREF: operator new(uint)r To me, it looks like both new, and malloc are handled exactly the same way. (I reached this external reference for new from an actual program through a few redirections) Original C++ program: int *tmp1 = (int *) malloc(10*sizeof(int)); int *tmp2 = new int[10]; - note - this is microsoft visual C++, other compilers might have different ways, but generally 'new' is just another operator that is handled internally as a function. Since it is polymorphic, it either has an overhead or it is generated multiple times for the different objects. Link to comment
Avenger Posted June 1, 2014 Share Posted June 1, 2014 Even more damning evidence about the new operator (now in a static linked exe): Look at that _malloc label inside? That's what would be called if i used malloc directly!!! So, don't ever tell me that 'new' is faster. It definitely has all the overhead (obviously, it should throw the exception in case _malloc returns null). .text:004011A6 ; void *__cdecl operator new(unsigned int size) .text:004011A6 ??2@YAPAXI@Z proc near ; CODE XREF: _wmain+9p .text:004011A6 .text:004011A6 this = std::bad_alloc ptr -0Ch .text:004011A6 size = dword ptr 8 .text:004011A6 .text:004011A6 mov edi, edi .text:004011A8 push ebp .text:004011A9 mov ebp, esp .text:004011AB sub esp, 0Ch .text:004011AE jmp short loc_4011BD .text:004011B0 ; --------------------------------------------------------------------------- .text:004011B0 .text:004011B0 loc_4011B0: ; CODE XREF: operator new(uint)+22j .text:004011B0 push [ebp+size] ; size .text:004011B3 call __callnewh .text:004011B8 pop ecx .text:004011B9 test eax, eax .text:004011BB jz short loc_4011CC .text:004011BD .text:004011BD loc_4011BD: ; CODE XREF: operator new(uint)+8j .text:004011BD push [ebp+size] ; size .text:004011C0 call _malloc .text:004011C5 pop ecx .text:004011C6 test eax, eax .text:004011C8 jz short loc_4011B0 .text:004011CA leave .text:004011CB retn .text:004011CC ; --------------------------------------------------------------------------- .text:004011CC .text:004011CC loc_4011CC: ; CODE XREF: operator new(uint)+15j .text:004011CC test byte ptr $S1, 1 .text:004011D3 mov esi, offset nomem .text:004011D8 jnz short loc_4011F3 .text:004011DA or $S1, 1 .text:004011E1 mov ecx, esi ; this .text:004011E3 call ??0bad_alloc@std@@QAE@XZ ; std::bad_alloc::bad_alloc(void) .text:004011E8 push offset _operator_new____6____dynamic_atexit_destructor_for__nomem__ ; func .text:004011ED call _atexit Long story short: if you handle the null return value from _malloc, and you free what you have taken, you are golden. Link to comment
AnonymousHero Posted June 2, 2014 Share Posted June 2, 2014 Long story short: if you handle the null return value from _malloc, and you free what you have taken, you are golden. Not quite: You'd need to add "as long as your casts are always correct" (EDIT: "...and you're always using the correct sizeof, and you don't overflow the size") which is a non-trivial thing to guarantee and which "new" guarantees automatically. (Also, I'm not quite sure, but I would suspect that the spec would guarantee that integer overflow must be handled as a failure by the new operator, i.e. it should throw if N_ELEMS * sizeof T would overflow.) You did also specify "if you handle the null return value", but that's also something that the compiler cannot/won't check for you. Therefore it is preferable to have an uncaught exception thrown. (Uncaught exceptions are guaranteed to result in an abort() rather than undefined behavior. You want the former, not the latter.) On why anyone would try to cause N_ELEMS * sizeof T to overflow: This is basic security -- if somebody downloads a "poisoned" savegame you really don't want their account to get owned. This kind of thing has been done to compromise users of vulnerable MP3 players. (I realize that GemRB is probably not a realistic target, but that's beside the point.) EDIT: Of course what one should really do is use std::unique_ptr, std::weak_ptr and std::shared_ptr instead of raw pointers. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.