• Advertisement
Sign in to follow this  

Pointers out of bounds legal???

This topic is 3950 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Someone who thinks it's bad. Edit: del My code that dereferences NULL. Actually, code that uses what I think is a bad pointer. Make sure optimizations are off, volatile used just in case. My asm was unoptimized when I checked it.
  volatile int arr[1] = {99};
  volatile int offset = (int)arr;
  volatile char *ptr = 0;
  volatile int x = ptr[offset];




Asm:
  volatile int arr[1] = {99};
0083233F  mov         dword ptr [arr],63h 
  volatile int offset = (int)arr;
00832346  lea         eax,[arr] 
00832349  mov         dword ptr [offset],eax 
  volatile char *ptr = 0;
0083234C  mov         dword ptr [ptr],0 
  volatile int x = ptr[offset];
00832353  mov         eax,dword ptr [ptr] 
00832356  add         eax,dword ptr [offset] 
00832359  movsx       ecx,byte ptr [eax] 
0083235C  mov         dword ptr [x],ecx 




The above code executed just fine (x will == 99). Is there another address I can't dereference? Question: Is this ok to do on a Windows 32bit OS, any CPU brand, compiled as protected mode exe? Bonus is, does this work on other devices like handhelds, 64bit, etc?

Share this post


Link to post
Share on other sites
Advertisement
Quote:
Original post by KodeWarrior
My code that dereferences NULL.

Your code doesn't dereference NULL. Its behaviour is undefined, though.

Share this post


Link to post
Share on other sites
Is it undefined tho?
the final line is just like writing
x = *(0 + arr);
Also if I remember correctly the address and the index are interchangeable
so that :

int i =0;
i[arr] and arr[i] are the same

Share this post


Link to post
Share on other sites
It is undefined. It will likely crash if you tried to dereference x. It might not, because that's the nature of undefined behavior: you can't be too sure about what will happen.

i[arr] would require a cast. It is also undefined, but has I tenancy to work.

Remember also that at compile time, a null pointer does not necessarily hold a value equal to an integer representation of zero.

Share this post


Link to post
Share on other sites
offset is being cast between a pointer type and an integral type. While this will tend to work on current 32-bit platforms, it's certainly not well-defined in general, and absolutely a very bad idea.

This code, unlike that in the OP, actually does dereference NULL, and it crashes on every platform I have access to at the moment (Win32, an old ia32 Linux, and an ancient PalmOS build):

int* p = NULL;
*p = 42;



Either way, dereferencing NULL isn't really the issue at hand here, but rather accessing out of defined array boundaries. The linked FAQ is absolutely correct.


int* array = new int[100];
array[101] = 0;


This code also has undefined behaviour; it may work, or it may blow up, or it may cause demons to fly out of your printer and assail you with loud, drunken renditions of I've Been Working on the Railroad while beating you about the head with a rubber balloon.

Pointer voodoo is, in general, a bad idea, and should be avoided as much as possible; if nothing else it is to be done with exceeding care. If you even begin to suspect that you may be wandering into Undefined Land, bail out as soon as possible.

Share this post


Link to post
Share on other sites
Looks like perfectly valid C to me and no pointers are being dereference out of bounds. The link is correct in that you can get wraparound which leads to undefined behavior.
Example:

int arr0[100];
int* ptrA = &arr0[100];
int* ptrB = &arr0[-1];
ptrA[-1] = 0;
ptrB[1] = 2;

C will generate the same address for all these statements:
ptr[offset] = 0[offset] = offset[0] = arr[0];
But if the pointer type was different from the offset type then you get undefined behavior.

Dereferencing is another problem which may lead to an undefined result. It depends on the type/size of offset pointer and pointer type. This leads to nasty casting which are generally not a good ideal.

To that point, you're only storing the first byte of 'arr' in x since you're dereferencing a 'char' pointer when it needs to be an int pointer.

Should look like this:

volatile int x = *(reinterpret_cast<int*>(&ptr[offset]));


[Edit]

[Edited by - Jack Sotac on March 26, 2007 9:25:57 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by ApochPiQ
This code also has undefined behaviour; it may work, or it may blow up, or it may cause demons to fly out of your printer and assail you with loud, drunken renditions of I've Been Working on the Railroad while beating you about the head with a rubber balloon.


[lol]

I don't think this would be such a problem if compiler writers bothered to use the bounds instruction [1], but they don't. My guess is that they would have if a bounds violation threw int 3 instead of int 5.

Share this post


Link to post
Share on other sites
Quote:

i[arr] would require a cast. It is also undefined, but has I tenancy to work.


I don't have access to any books at the moment, but why would it require a cast? Was I wrong to assume the address and index are interchangeable?

[edit]It seems Jack Sotac's post confirms what I thought.

Share this post


Link to post
Share on other sites
Quote:
Original post by Jack Sotac
Looks like perfectly valid C to me and no pointers are being dereference out of bounds.

It might be valid, but it isn't defined. Pointers are being dereferenced out of bounds, and that's why it's undefined.
Quote:

C will generate the same address for all these statements:
ptr[offset] = 0[offset] = offset[0] = arr[0];

1. ptr[offset] is *(ptr + offset).

The value of offset is not defined. There is no guarantee that, if cast back to int*, it would point at the same object. For example, a 64-pointer can't reliably be converted to a 32-bit int and back. Real systems exist with such data types.

Furthermore, the value of ptr is not defined. It's possible that the null pointer might actually point at the last addressable byte of memory, not the first. (i.e. it might have the integer value ~0.) This might be the case on a platform where the data at byte 0 needs to be accessible.

On x86 systems the interrupt table is stored starting from byte 0, so for system software a pointer to byte 0 is a reasonable pointer upon which pointer arithmetic should have defined results. However, the standard requires that a null pointer not compare equal to any other valid pointer, including a pointer to byte 0.

Even if offset does contain the offset of the array from ptr, pointer arithmetic is only defined for pointers into an array (or one-past-the-end of an array), which the null pointer is not.

2. 0[offset] and offset[0]

These are invalid. One argument must have pointer type.
Quote:

Dereferencing is the only problem which may lead to an undefined result.

This is so wrong I'm sure you can't possibly mean what you seem to mean. C and C++ have tons of undefined behaviour that doesn't involve dereferencing.

Share this post


Link to post
Share on other sites
Quote:
Original post by LessBread
I don't think this would be such a problem if compiler writers bothered to use the bounds instruction [1], but they don't. My guess is that they would have if a bounds violation threw int 3 instead of int 5.

I don't think it has much to do with what interrupt "bounds" triggers.

Storing and checking array bounds means increased memory and processing overhead. Each pointer needs to be associated with not only the address it is currently pointing at, but also the address of the start of the array and the address of the end of the array: the memory footprint of each pointer is tripled.

Dereferencing pointers becomes more expensive: because one-past-the-end pointers are valid, every dereference must be checked to ensure it isn't one-past-the-end. Full bounds checking needs to be performed when doing pointer arithmetic. (We could have cheap pointer arithmetic with no checking and more expensive dereferencing which checks both bounds, but I'm assuming dereferences occur more often than arithmetic.)

Share this post


Link to post
Share on other sites
Quote:

volatile int offset = (int)arr;


This line is misleading. An offset is the distance between two pointers which are within the same block of memory, it is of the aptly named type ptrdiff_t and is obtained by substracting one pointer from another. Converting a pointer to an integer is most certainly not an offset, and nothing in the standard defines its behaviour as such.

Since the null pointer belongs to no block of memory, there can be no offset between it and another pointer, and therefore no operation in the C or C++ languages is defined to yield a non-null pointer by adding an offset to the null pointer. The mere addition (not even the dereference) is a cause for undefined behavior, so writing this would also be incorrect:

volatile char *ptr2 = ptr + offset;

As for the safety, it might work on a particular machine, with a particular compiler, in a given compiler version and with certain compilation settings. There is no way, however, to predict what it will do if any of the parameters change, which is why it's unsafe to have. You will certainly be much better off if you redesign your code so that it's within the language bounds.

Share this post


Link to post
Share on other sites
Quote:
Original post by Nathan Baum
Quote:
Original post by LessBread
I don't think this would be such a problem if compiler writers bothered to use the bounds instruction [1], but they don't. My guess is that they would have if a bounds violation threw int 3 instead of int 5.

I don't think it has much to do with what interrupt "bounds" triggers.

Storing and checking array bounds means increased memory and processing overhead. Each pointer needs to be associated with not only the address it is currently pointing at, but also the address of the start of the array and the address of the end of the array: the memory footprint of each pointer is tripled.

Dereferencing pointers becomes more expensive: because one-past-the-end pointers are valid, every dereference must be checked to ensure it isn't one-past-the-end. Full bounds checking needs to be performed when doing pointer arithmetic. (We could have cheap pointer arithmetic with no checking and more expensive dereferencing which checks both bounds, but I'm assuming dereferences occur more often than arithmetic.)


I agree with the extra processing argument, but with only 8 extra bytes needed to invoke the instruction the increased memory argument doesn't hold up so well. Here's a snippet of code designed to purposefully throw an array bounds exceeded exception.



/*
bound index, mem

Where mem location contains two signed dwords { lowerbound, upperbound }
Array Bounds Exceeded is thrown when index exceeds array boundaries

index must be a register

*/


void ArrayBoundsExceeded(void)
{
struct {
int lower;
int upper;
} bounds = { 0, 0 };

_asm("movl $1, %ecx \n"
"boundl %ecx, %bounds");

}




Applying this to every pointer seems like overkill. It should be limited to arrays. The processing overhead suggests that it should be limited to debug mode, where int3 would be more helpful.

Share this post


Link to post
Share on other sites
Quote:
Original post by LessBread
Quote:
Original post by Nathan Baum
Quote:
Original post by LessBread
I don't think this would be such a problem if compiler writers bothered to use the bounds instruction [1], but they don't. My guess is that they would have if a bounds violation threw int 3 instead of int 5.

I don't think it has much to do with what interrupt "bounds" triggers.

Storing and checking array bounds means increased memory and processing overhead. Each pointer needs to be associated with not only the address it is currently pointing at, but also the address of the start of the array and the address of the end of the array: the memory footprint of each pointer is tripled.

Dereferencing pointers becomes more expensive: because one-past-the-end pointers are valid, every dereference must be checked to ensure it isn't one-past-the-end. Full bounds checking needs to be performed when doing pointer arithmetic. (We could have cheap pointer arithmetic with no checking and more expensive dereferencing which checks both bounds, but I'm assuming dereferences occur more often than arithmetic.)


I agree with the extra processing argument, but with only 8 extra bytes needed to invoke the instruction the increased memory argument doesn't hold up so well.

In modern computers, perhaps. I'm pretty sure it's at least part of the reason C didn't do array bounds checking from the beginning.

Note that whilst tripling the size of a pointer is unlikely to significantly increase net memory usage, tripling the amount of memory touched by any pointer-using function may well cause troubles.

A lot of people have some kind of moral objection to C doing anything more than the bare minimum necessary to process a strictly conforming and perfectly bug-free program. This is probably also part of the reason.
Quote:

Applying this to every pointer seems like overkill. It should be limited to arrays.

I can't see any way to apply bounds checking to pointers without either having to apply it to every pointer or rejecting some valid programs. C provides no general means to distinguish between a pointer to a complete object and a pointer to an element of an array.

e.g.


/// A.c

int some_function (int *p)
{
return *p + 2;
}



This function doesn't perform pointer arithmetic upon p, so in principle no bounds checking is needed: if the caller guarantees that p is a valid pointer. But:


/// B.c

extern int some_function (int *);

void other_function ()
{
int x[] = { 1, 2, 3 };
some_function(&x[3]);
}



Unless the compiler has special knowledge of some_function, it has no way to know that this isn't perfectly valid.

Either the compiler assumes that all pointer arguments must be dereferencable - which would break lots of interesting code which relies upon being able to point one-past-the-end (e.g. the STL) - or it assumes that all pointer arguments might point one-past-the-end - which would mean that a function would always have to check its arguments.

A currently viable solution not involving great amounts of ambiguity would be to compile multiple entry-points for each function. In this example, some_function would have an entry point for a known-dereferencable argument, and one for a not-known-dereferencable argument.

I don't think that would have been considered viable earlier in C's history, however.

Share this post


Link to post
Share on other sites
Quote:
Original post by Nathan Baum
In modern computers, perhaps. I'm pretty sure it's at least part of the reason C didn't do array bounds checking from the beginning.


That beginning was more than 30 years ago. The bound instruction has been around at least since the 286. I have no idea if a similar instruction existed on earlier cpus.

Quote:
Original post by Nathan Baum
Note that whilst tripling the size of a pointer is unlikely to significantly increase net memory usage, tripling the amount of memory touched by any pointer-using function may well cause troubles.


I don't see how that follows.

Quote:
Original post by Nathan Baum
A lot of people have some kind of moral objection to C doing anything more than the bare minimum necessary to process a strictly conforming and perfectly bug-free program. This is probably also part of the reason.


It seems like a strange arena to inject morals into, but I agree the purists would likely pitch a fit.

Quote:
Original post by Nathan Baum
Quote:

Applying this to every pointer seems like overkill. It should be limited to arrays.

I can't see any way to apply bounds checking to pointers without either having to apply it to every pointer or rejecting some valid programs. C provides no general means to distinguish between a pointer to a complete object and a pointer to an element of an array.


I'm not saying that bounds checking should become part of the standard. It just seems to me like a good thing for a compiler to implement in debug mode.

Quote:
Original post by Nathan Baum
e.g.

*** Source Snippet Removed ***

This function doesn't perform pointer arithmetic upon p, so in principle no bounds checking is needed: if the caller guarantees that p is a valid pointer. But:

*** Source Snippet Removed ***

Unless the compiler has special knowledge of some_function, it has no way to know that this isn't perfectly valid.

Either the compiler assumes that all pointer arguments must be dereferencable - which would break lots of interesting code which relies upon being able to point one-past-the-end (e.g. the STL) - or it assumes that all pointer arguments might point one-past-the-end - which would mean that a function would always have to check its arguments.

A currently viable solution not involving great amounts of ambiguity would be to compile multiple entry-points for each function. In this example, some_function would have an entry point for a known-dereferencable argument, and one for a not-known-dereferencable argument.

I don't think that would have been considered viable earlier in C's history, however.


void other_function ()
{
int x[] = { 1, 2, 3 };
some_function(&x[3]);
}

I don't see why special knowledge of some_function is required since the arguments are pushed inside other_function. If x[n] is out of bounds, then bing. Of course, I'd want to review assembler addressing modes before staking my life on that [smile].

Share this post


Link to post
Share on other sites
Quote:
Original post by LessBread
Quote:
Original post by Nathan Baum
In modern computers, perhaps. I'm pretty sure it's at least part of the reason C didn't do array bounds checking from the beginning.

That beginning was more than 30 years ago. The bound instruction has been around at least since the 286. I have no idea if a similar instruction existed on earlier cpus.

Is the date of birth of the bound instruction relevant? Whatever instructions are used, a bounds checked pointer requires that the bounds be stored.
Quote:

Quote:
Original post by Nathan Baum
Note that whilst tripling the size of a pointer is unlikely to significantly increase net memory usage, tripling the amount of memory touched by any pointer-using function may well cause troubles.

I don't see how that follows.

Extra memory reads, even if that memory is likely to be in the cache, will slow things down. And tripling the size of pointers means it is less likely to fit in the cache.

That depends upon context. For example, on an Athlon, an array of 4096 pointers will fill 256 of the available 1024 L1 cache lines. Add bounds to that, and it takes up 768. For some data access patterns, the penalties would be significant.

I know this is a performance issue but, the way I read it, you implied that because computers have lots of system RAM in comparison to how many pointers the average program has, the "2 * sizeof(void*)" extra bytes is essentially "free" and we shouldn't worry about it. My point is that it may be not "free".
Quote:

I'm not saying that bounds checking should become part of the standard. It just seems to me like a good thing for a compiler to implement in debug mode.

Yes, but you were saying that bounds checking only needs to be applied to arrays. Ideally, that would be true but, debug mode or not, the C standard makes it hard to make that distinction.
Quote:

void other_function ()
{
int x[] = { 1, 2, 3 };
some_function(&x[3]);
}

I don't see why special knowledge of some_function is required since the arguments are pushed inside other_function. If x[n] is out of bounds, then bing. Of course, I'd want to review assembler addressing modes before staking my life on that [smile].

I think you misunderstand.

&x[3] is not out of bounds. It's one-past-the-end, and is a valid place for a pointer to be pointing at. (I note that although x[3] is defined to be *(x + 3), which makes it look like it's dereferencing an undereferencable pointer, the standard explicitly states that &*E is equivalent to E, for all E. So in this case &x[3] is equivalent to x + 3.)

Because the argument is a valid pointer, the compiler must, unless it has special knowledge, assume that some_function will accept that pointer. This means that it falls to some_function to check the pointer for dereferencability.

Share this post


Link to post
Share on other sites
While we're on the topic of pointers, I've been wondering about this for a while:

Contrived example:


class ClassOne
{
public:
static const ClassOne* static_class()
{
return(&static_instance);
}

private:
static ClassOne static_instance;
};

class ClassTwo
{
public:
explicit ClassTwo(const ClassOne* ptr) :
classOnePtr(ptr)
{
}

static const ClassTwo* static_class()
{
return(&static_instance);
}

private:
const ClassOne* classOnePtr;
static ClassTwo static_instance;
};

ClassTwo ClassTwo::static_instance(ClassOne::static_class());
ClassOne ClassOne::static_instance;



I know the standard doesn't guarantee anything with regards to static initialization order across different translation units, but I am wondering if


ClassTwo ClassTwo::static_instance(ClassOne::static_class());



has well defined behavior if ClassOne is in a different translation unit from ClassTwo and has not been initialized before ClassTwo gets initialized. That is, will ClassTwo::static_class()->classOnePtr be a valid pointer, even if the instance itself has not been initialized yet? With valid I mean will the address of the ClassOne::static_instance stay the same before and after initialization? I really don't care about what's inside the instance, only the address is important.

I've tested the code above under both VC++ and GCC numerous times, and while it works perfectly there, I don't know if that is simply because the two static instances are in the same translation unit, whether I'm getting lucky all the time, or if it is compiler-dependant, etc. As said above, the code is pretty contrived but I am not sure how I would go about testing this across different translation units.

Share this post


Link to post
Share on other sites
Quote:
Original post by 97
Will ClassTwo::static_class()->classOnePtr be a valid pointer, even if the instance itself has not been initialized yet? With valid I mean will the address of the ClassOne::static_instance stay the same before and after initialization?

The address will remain the same.
Quote:

I really don't care about what's inside the instance, only the address is important.

Nonetheless:

If ClassTwo is in a separate translation unit, then the instance might be zero-initialized. This means that each primitive object it contains will be set to 0. With unions, the first member will be set to 0, which might not mean the other members are 0.

It may also be that any dynamic initializer (this includes an implied default initializer) gets to run first.

If they're in the same translation unit, then they are initialized in the order in which their initializers appear. In the example, therefore, ClassTwo::static_instance is certainly initialized before ClassOne::static_instance.

Also in the example, ClassOne contains no data, so initialized and uninitialized versions of it are identical. Given that you said you don't care about the data inside, that may have been intentional.

Share this post


Link to post
Share on other sites
Quote:
Original post by Nathan Baum
Quote:
Original post by LessBread
Quote:
Original post by Nathan Baum
In modern computers, perhaps. I'm pretty sure it's at least part of the reason C didn't do array bounds checking from the beginning.

That beginning was more than 30 years ago. The bound instruction has been around at least since the 286. I have no idea if a similar instruction existed on earlier cpus.

Is the date of birth of the bound instruction relevant? Whatever instructions are used, a bounds checked pointer requires that the bounds be stored.


Had such an instruction been available back when, perhaps bounds checking might have been put into the language from the start.

Quote:
Original post by Nathan Baum
Quote:

Quote:
Original post by Nathan Baum
Note that whilst tripling the size of a pointer is unlikely to significantly increase net memory usage, tripling the amount of memory touched by any pointer-using function may well cause troubles.

I don't see how that follows.

Extra memory reads, even if that memory is likely to be in the cache, will slow things down. And tripling the size of pointers means it is less likely to fit in the cache.

That depends upon context. For example, on an Athlon, an array of 4096 pointers will fill 256 of the available 1024 L1 cache lines. Add bounds to that, and it takes up 768. For some data access patterns, the penalties would be significant.

I know this is a performance issue but, the way I read it, you implied that because computers have lots of system RAM in comparison to how many pointers the average program has, the "2 * sizeof(void*)" extra bytes is essentially "free" and we shouldn't worry about it. My point is that it may be not "free".


The Intel docs say that The bounds limit data structure is usually placed just before the array itself, making the limits addressable via a constant offset from the beginning of the array. Because the address of the array already will be present in a register, this practice avoids extra bus cycles to obtain the effective address of the array bounds. So what we're talking about is "2 * sizeof(void*)" extra bytes prepended onto the array, not necessarily to every pointer stored in it.

Quote:
Original post by Nathan Baum
Quote:

I'm not saying that bounds checking should become part of the standard. It just seems to me like a good thing for a compiler to implement in debug mode.

Yes, but you were saying that bounds checking only needs to be applied to arrays. Ideally, that would be true but, debug mode or not, the C standard makes it hard to make that distinction.


What I said was it should be limited to arrays. That's a bit different than saying it only needs to be applied to arrays. It seems to me that a compiler would be able to tell the difference between an array and a pointer, that is, between a constant pointer and a non-constant pointer. Perhaps bounds checking would be better served at compile time than run time.

Quote:
Original post by Nathan Baum
Quote:

void other_function ()
{
int x[] = { 1, 2, 3 };
some_function(&x[3]);
}

I don't see why special knowledge of some_function is required since the arguments are pushed inside other_function. If x[n] is out of bounds, then bing. Of course, I'd want to review assembler addressing modes before staking my life on that [smile].

I think you misunderstand.

&x[3] is not out of bounds. It's one-past-the-end, and is a valid place for a pointer to be pointing at. (I note that although x[3] is defined to be *(x + 3), which makes it look like it's dereferencing an undereferencable pointer, the standard explicitly states that &*E is equivalent to E, for all E. So in this case &x[3] is equivalent to x + 3.)

Because the argument is a valid pointer, the compiler must, unless it has special knowledge, assume that some_function will accept that pointer. This means that it falls to some_function to check the pointer for dereferencability.


Would it fall to some_function to check any other pointer for dereferencability as well? The Intel docs also say that The array index must be greater than or equal to the lower bound and less than or equal to the upper bound plus the operand size in bytes. If I understand that correctly, it covers &x[3] but not &x[7]. The other_function could check if 3 is in bounds before pushing x+3.

Share this post


Link to post
Share on other sites
Quote:
Original post by Nathan Baum
Quote:
Original post by 97
Will ClassTwo::static_class()->classOnePtr be a valid pointer, even if the instance itself has not been initialized yet? With valid I mean will the address of the ClassOne::static_instance stay the same before and after initialization?

The address will remain the same.
Quote:

I really don't care about what's inside the instance, only the address is important.

Nonetheless:

If ClassTwo is in a separate translation unit, then the instance might be zero-initialized. This means that each primitive object it contains will be set to 0. With unions, the first member will be set to 0, which might not mean the other members are 0.

It may also be that any dynamic initializer (this includes an implied default initializer) gets to run first.

If they're in the same translation unit, then they are initialized in the order in which their initializers appear. In the example, therefore, ClassTwo::static_instance is certainly initialized before ClassOne::static_instance.

Also in the example, ClassOne contains no data, so initialized and uninitialized versions of it are identical. Given that you said you don't care about the data inside, that may have been intentional.


Thank you. That is perfect for what I need.

Share this post


Link to post
Share on other sites
Quote:
Original post by LessBread
Had such an instruction been available back when, perhaps bounds checking might have been put into the language from the start.

Quite possibly, but I wasn't talking about that in the bit you quoted. I was just talking about memory usage. Whether or not there is an equivalent to the "bound" instruction, bounds checking necessarily leads to increased memory usage.
Quote:

The Intel docs say that The bounds limit data structure is usually placed just before the array itself, making the limits addressable via a constant offset from the beginning of the array. Because the address of the array already will be present in a register, this practice avoids extra bus cycles to obtain the effective address of the array bounds. So what we're talking about is "2 * sizeof(void*)" extra bytes prepended onto the array, not necessarily to every pointer stored in it.

I said array of pointers for a reason. The bounds checking I was imagining would be to check that the the pointers in the array were valid, not pointers into the array of pointers. When iterating over an array in an idiomatic way, a competent compiler would know only to check the bounds once, so it wouldn't even be an issue.


for (int i = 0; i < size; ++i) {
// A competent compiler knows that if array[0] and array[size - 1] are valid, then
// any use of array[i] is valid.
do_something(array[i]);
}




Quote:

What I said was it should be limited to arrays. That's a bit different than saying it only needs to be applied to arrays. It seems to me that a compiler would be able to tell the difference between an array and a pointer, that is, between a constant pointer and a non-constant pointer. Perhaps bounds checking would be better served at compile time than run time.

It's not obvious to me how bounds checking can possibly be applied, in the general case, at compile time in C.


int foo (int *array, int size)
{
for (int i = 0; i < size; ++i)
bar(array[i]);
}





How can you check at compile time that foo won't go outside the bounds of array?
Quote:

Would it fall to some_function to check any other pointer for dereferencability as well?

Without cleverness, yes.

Compiling some_function twice, once for a pointer known to be dereferencable and once for a pointer not known to be dereferencable would allow a compiler to make a function call the one which doesn't check the pointer if it knows the pointer is dereferencable. You need to do that for every function which accepts a pointer, and the number of versions you need is 2number of pointer arguments.

That also only solves it for pointer arguments. Functions would still need to check every non-argument pointer they dereference, and pointers accessed via pointer arguments. i.e. in "int do_something (Foo *foo) { return foo->bar->baz; }" the compiler could compile a version of do_something which assumes "foo" is valid, but can it assume bar is a valid pointer? Only if the caller is required to check first. Either way, the check must be performed, because C provides no way to decorate "Foo::bar" to let all functions know it will always point at a valid object.
Quote:

The Intel docs also say that The array index must be greater than or equal to the lower bound and less than or equal to the upper bound plus the operand size in bytes. If I understand that correctly, it covers &x[3] but not &x[7]. The other_function could check if 3 is in bounds before pushing x+3.

&x[3] is not out of bounds. It is a perfectly valid pointer. Only some_function knows that it uses that pointer in a way which has undefined consequences. Only if C enabled a function's signature to distinguish between dereferencable-pointers and dereferencable-and-one-past-the-end-pointers would it be possible to declare some_function in a way which allowed it to assume that all pointers were valid.

[Edited by - Nathan Baum on March 27, 2007 5:27:18 AM]

Share this post


Link to post
Share on other sites
An example problem. I need a cos lookup table and for the sake of this example I want it to be memory efficient. We will assume for this example that indexes are in degrees and that the first 45 degrees will never be used for some reason.


//Ugly solution?
static union{
struct{
float pad[45];
float cosTable[360 - 45];
};

struct{
//useful variables. must not exceed 45 * sizeof(float) with packing
};
};


//Preferred but undefined solution?
float memory[360 - 45], *cosTable = memory - 45;


//Speed inefficient
float cosTable[360 - 45];//lookup becomes cosTable[degees - 45];

//Suggested by ToohrVyk
char memory[360 * sizeof(float)];
float *cosTable = new(memory + 45 * sizeof(float)) float[360 - 45];





//Even more stupidity
#pragma const_seg
const char str[] = "asdf";

void f()
{
//Works
char *constPtr = const_cast<char*>(str);
char arr[1] = {99};
int offset = arr - str;
int x = constPtr[offset];
constPtr[offset] = 0;

//Does not work. Exception thrown
const_cast<char*> (str)[0] = 'b';
}






Comments?

[Edited by - KodeWarrior on March 27, 2007 12:53:53 PM]

Share this post


Link to post
Share on other sites
Quote:
float memory[360 - 45], *cosTable = memory - 45;


You're right, it's undefined. Avoid.

Quote:
float cosTable[360 - 45];


It becomes efficient if cosTable is global (which will probably be the case), because the compiler may be able to optimize the -45 part at compile time.

Your padding approach works, although it is highly suggested that you use placement new instead of an union, if at all possible.

My reaction to this would be, however, to hide this detail behind a clean math library interface, profile the second method, and if necessary reimplement the function and array in assembler (where these kinds of limitations do not apply).



Share this post


Link to post
Share on other sites
Quote:
Original post by Nathan Baum
Quote:
Original post by LessBread
The Intel docs say that The bounds limit data structure is usually placed just before the array itself, making the limits addressable via a constant offset from the beginning of the array. Because the address of the array already will be present in a register, this practice avoids extra bus cycles to obtain the effective address of the array bounds. So what we're talking about is "2 * sizeof(void*)" extra bytes prepended onto the array, not necessarily to every pointer stored in it.

I said array of pointers for a reason. The bounds checking I was imagining would be to check that the the pointers in the array were valid, not pointers into the array of pointers. When iterating over an array in an idiomatic way, a competent compiler would know only to check the bounds once, so it wouldn't even be an issue.


The bounds checking that I've been imagining would only concern itself with constant pointers. However, your code snippet does go to my speculation about compile time checking.

Quote:
Original post by Nathan Baum
Quote:

What I said was it should be limited to arrays. That's a bit different than saying it only needs to be applied to arrays. It seems to me that a compiler would be able to tell the difference between an array and a pointer, that is, between a constant pointer and a non-constant pointer. Perhaps bounds checking would be better served at compile time than run time.

It's not obvious to me how bounds checking can possibly be applied, in the general case, at compile time in C.

*** Source Snippet Removed ***

How can you check at compile time that foo won't go outside the bounds of array?


You couldn't, but the array wouldn't be checked unless it was a constant pointer. But if the compiler saw the [] operator in that situation it would insert the runtime bounds check as part of the debug code (with the bounds prepended to the pointer address per the Intel doc).

Quote:
Original post by Nathan Baum
Quote:

Would it fall to some_function to check any other pointer for dereferencability as well?

Without cleverness, yes.

Compiling some_function twice, once for a pointer known to be dereferencable and once for a pointer not known to be dereferencable would allow a compiler to make a function call the one which doesn't check the pointer if it knows the pointer is dereferencable. You need to do that for every function which accepts a pointer, and the number of versions you need is 2number of pointer arguments.

That also only solves it for pointer arguments. Functions would still need to check every non-argument pointer they dereference, and pointers accessed via pointer arguments. i.e. in "int do_something (Foo *foo) { return foo->bar->baz; }" the compiler could compile a version of do_something which assumes "foo" is valid, but can it assume bar is a valid pointer? Only if the caller is required to check first. Either way, the check must be performed, because C provides no way to decorate "Foo::bar" to let all functions know it will always point at a valid object.


Is that what happens now? Wouldn't that only be needed for functions accepting constant pointers? And would only dereferenced constant pointers need checking?

Quote:
Original post by Nathan Baum
Quote:

The Intel docs also say that The array index must be greater than or equal to the lower bound and less than or equal to the upper bound plus the operand size in bytes. If I understand that correctly, it covers &x[3] but not &x[7]. The other_function could check if 3 is in bounds before pushing x+3.

&x[3] is not out of bounds. It is a perfectly valid pointer. Only some_function knows that it uses that pointer in a way which has undefined consequences. Only if C enabled a function's signature to distinguish between dereferencable-pointers and dereferencable-and-one-past-the-end-pointers would it be possible to declare some_function in a way which allowed it to assume that all pointers were valid.


If I read the docs correctly, &x[7] would be out of bounds. The other_function could check if N is in bounds before pushing x+N. Might not the const pointer in the signature be used to indicate whether checking should be applied?

Share this post


Link to post
Share on other sites
Quote:
Original post by ToohrVyk
Quote:
float cosTable[360 - 45];


It becomes efficient if cosTable is global (which will probably be the case), because the compiler may be able to optimize the -45 part at compile time.

On x86 systems, if the degree is in a register (which will probably be the case) then "cosTable[degree - literal]" can be done in a single instruction even if cosTable isn't global. I haven't done any timing, but I'd guess there's no speed penalty.
Quote:
Original post by LessBread
The bounds checking that I've been imagining would only concern itself with constant pointers.

Why would you not apply bounds checking to non-constant pointers?
Quote:
Original post by LessBread
Quote:

Compiling some_function twice, once for a pointer known to be dereferencable and once for a pointer not known to be dereferencable would allow a compiler to make a function call the one which doesn't check the pointer if it knows the pointer is dereferencable. You need to do that for every function which accepts a pointer, and the number of versions you need is 2number of pointer arguments.

Is that what happens now?

Most compilers don't produce code which checks bounds, so I think it follows that they don't use any clever tricks to make bounds checking more efficient.
Quote:
Original post by LessBread
If I read the docs correctly, &x[7] would be out of bounds.

Well, yes, it should be. &x[4] should also be out of bounds.

My docs say that bound does


IF (ArrayIndex < LowerBound OR ArrayIndex > UpperBound)
THEN #BR; FI;


There's nothing about adding the operand size to the upper bound, so the bounds as passed to would be {x, x + 3}. That would mean x[0], x[1], x[2], x[3] would be considered in bounds. I also note that bound is not available in 64-bit mode.

Note that when dereferencing a pointer, there's no need to use "bound". If the system is designed so that a pointer is always checked after modification - which makes sense because this alerts us to bad pointers at the earliest opportunity - then a pointer "in the wild" must always either point at a valid element or one-past-the-end. When dereferencing, we only need to compare the pointer to the upper bound, which is one-past-the-end. If they're equal, we can trap.
Quote:

The other_function could check if N is in bounds before pushing x+N.

Yes, but x+3 is not out of bounds. A pointer is allowed to point at x+3. x+3 is one-past-the-end, which is an explicitly permitted place for a pointer to point at. That means some_function has to assume that any pointer arguments it has might be pointing one-past-the-end.
Quote:

Might not the const pointer in the signature be used to indicate whether checking should be applied?

Not unless one of non-constant or constant pointers are unable to be out of bounds. As far as I know, that isn't the case.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement