Pointers out of bounds legal???

Started by
23 comments, last by Nathan Baum 17 years ago
Quote:Original post by LessBread
Had such an instruction been available back when, perhaps bounds checking might have been put into the language from the start.

Quite possibly, but I wasn't talking about that in the bit you quoted. I was just talking about memory usage. Whether or not there is an equivalent to the "bound" instruction, bounds checking necessarily leads to increased memory usage.
Quote:
The Intel docs say that The bounds limit data structure is usually placed just before the array itself, making the limits addressable via a constant offset from the beginning of the array. Because the address of the array already will be present in a register, this practice avoids extra bus cycles to obtain the effective address of the array bounds. So what we're talking about is "2 * sizeof(void*)" extra bytes prepended onto the array, not necessarily to every pointer stored in it.

I said array of pointers for a reason. The bounds checking I was imagining would be to check that the the pointers in the array were valid, not pointers into the array of pointers. When iterating over an array in an idiomatic way, a competent compiler would know only to check the bounds once, so it wouldn't even be an issue.

for (int i = 0; i < size; ++i) {  // A competent compiler knows that if array[0] and array[size - 1] are valid, then  // any use of array is valid.  do_something(array);}

Quote:
What I said was it should be limited to arrays. That's a bit different than saying it only needs to be applied to arrays. It seems to me that a compiler would be able to tell the difference between an array and a pointer, that is, between a constant pointer and a non-constant pointer. Perhaps bounds checking would be better served at compile time than run time.

It's not obvious to me how bounds checking can possibly be applied, in the general case, at compile time in C.

int foo (int *array, int size){  for (int i = 0; i < size; ++i)    bar(array);}


How can you check at compile time that foo won't go outside the bounds of array?
Quote:
Would it fall to some_function to check any other pointer for dereferencability as well?

Without cleverness, yes.

Compiling some_function twice, once for a pointer known to be dereferencable and once for a pointer not known to be dereferencable would allow a compiler to make a function call the one which doesn't check the pointer if it knows the pointer is dereferencable. You need to do that for every function which accepts a pointer, and the number of versions you need is 2number of pointer arguments.

That also only solves it for pointer arguments. Functions would still need to check every non-argument pointer they dereference, and pointers accessed via pointer arguments. i.e. in "int do_something (Foo *foo) { return foo->bar->baz; }" the compiler could compile a version of do_something which assumes "foo" is valid, but can it assume bar is a valid pointer? Only if the caller is required to check first. Either way, the check must be performed, because C provides no way to decorate "Foo::bar" to let all functions know it will always point at a valid object.
Quote:
The Intel docs also say that The array index must be greater than or equal to the lower bound and less than or equal to the upper bound plus the operand size in bytes. If I understand that correctly, it covers &x[3] but not &x[7]. The other_function could check if 3 is in bounds before pushing x+3.

&x[3] is not out of bounds. It is a perfectly valid pointer. Only some_function knows that it uses that pointer in a way which has undefined consequences. Only if C enabled a function's signature to distinguish between dereferencable-pointers and dereferencable-and-one-past-the-end-pointers would it be possible to declare some_function in a way which allowed it to assume that all pointers were valid.

[Edited by - Nathan Baum on March 27, 2007 5:27:18 AM]
Advertisement
An example problem. I need a cos lookup table and for the sake of this example I want it to be memory efficient. We will assume for this example that indexes are in degrees and that the first 45 degrees will never be used for some reason.

//Ugly solution?static union{  struct{    float pad[45];    float cosTable[360 - 45];  };  struct{    //useful variables. must not exceed 45 * sizeof(float) with packing  };};//Preferred but undefined solution?float memory[360 - 45], *cosTable = memory - 45;//Speed inefficientfloat cosTable[360 - 45];//lookup becomes cosTable[degees - 45];//Suggested by ToohrVykchar memory[360 * sizeof(float)];float *cosTable = new(memory + 45 * sizeof(float)) float[360 - 45];


//Even more stupidity#pragma const_seg  const char str[] = "asdf";void f(){  //Works  char *constPtr = const_cast<char*>(str);  char arr[1] = {99};  int offset = arr - str;  int x = constPtr[offset];  constPtr[offset] = 0;  //Does not work. Exception thrown  const_cast<char*> (str)[0] = 'b';}


Comments?

[Edited by - KodeWarrior on March 27, 2007 12:53:53 PM]
Quote:float memory[360 - 45], *cosTable = memory - 45;


You're right, it's undefined. Avoid.

Quote:float cosTable[360 - 45];


It becomes efficient if cosTable is global (which will probably be the case), because the compiler may be able to optimize the -45 part at compile time.

Your padding approach works, although it is highly suggested that you use placement new instead of an union, if at all possible.

My reaction to this would be, however, to hide this detail behind a clean math library interface, profile the second method, and if necessary reimplement the function and array in assembler (where these kinds of limitations do not apply).



Quote:Original post by Nathan Baum
Quote:Original post by LessBread
The Intel docs say that The bounds limit data structure is usually placed just before the array itself, making the limits addressable via a constant offset from the beginning of the array. Because the address of the array already will be present in a register, this practice avoids extra bus cycles to obtain the effective address of the array bounds. So what we're talking about is "2 * sizeof(void*)" extra bytes prepended onto the array, not necessarily to every pointer stored in it.

I said array of pointers for a reason. The bounds checking I was imagining would be to check that the the pointers in the array were valid, not pointers into the array of pointers. When iterating over an array in an idiomatic way, a competent compiler would know only to check the bounds once, so it wouldn't even be an issue.


The bounds checking that I've been imagining would only concern itself with constant pointers. However, your code snippet does go to my speculation about compile time checking.

Quote:Original post by Nathan Baum
Quote:
What I said was it should be limited to arrays. That's a bit different than saying it only needs to be applied to arrays. It seems to me that a compiler would be able to tell the difference between an array and a pointer, that is, between a constant pointer and a non-constant pointer. Perhaps bounds checking would be better served at compile time than run time.

It's not obvious to me how bounds checking can possibly be applied, in the general case, at compile time in C.

*** Source Snippet Removed ***

How can you check at compile time that foo won't go outside the bounds of array?


You couldn't, but the array wouldn't be checked unless it was a constant pointer. But if the compiler saw the [] operator in that situation it would insert the runtime bounds check as part of the debug code (with the bounds prepended to the pointer address per the Intel doc).

Quote:Original post by Nathan Baum
Quote:
Would it fall to some_function to check any other pointer for dereferencability as well?

Without cleverness, yes.

Compiling some_function twice, once for a pointer known to be dereferencable and once for a pointer not known to be dereferencable would allow a compiler to make a function call the one which doesn't check the pointer if it knows the pointer is dereferencable. You need to do that for every function which accepts a pointer, and the number of versions you need is 2number of pointer arguments.

That also only solves it for pointer arguments. Functions would still need to check every non-argument pointer they dereference, and pointers accessed via pointer arguments. i.e. in "int do_something (Foo *foo) { return foo->bar->baz; }" the compiler could compile a version of do_something which assumes "foo" is valid, but can it assume bar is a valid pointer? Only if the caller is required to check first. Either way, the check must be performed, because C provides no way to decorate "Foo::bar" to let all functions know it will always point at a valid object.


Is that what happens now? Wouldn't that only be needed for functions accepting constant pointers? And would only dereferenced constant pointers need checking?

Quote:Original post by Nathan Baum
Quote:
The Intel docs also say that The array index must be greater than or equal to the lower bound and less than or equal to the upper bound plus the operand size in bytes. If I understand that correctly, it covers &x[3] but not &x[7]. The other_function could check if 3 is in bounds before pushing x+3.

&x[3] is not out of bounds. It is a perfectly valid pointer. Only some_function knows that it uses that pointer in a way which has undefined consequences. Only if C enabled a function's signature to distinguish between dereferencable-pointers and dereferencable-and-one-past-the-end-pointers would it be possible to declare some_function in a way which allowed it to assume that all pointers were valid.


If I read the docs correctly, &x[7] would be out of bounds. The other_function could check if N is in bounds before pushing x+N. Might not the const pointer in the signature be used to indicate whether checking should be applied?
"I thought what I'd do was, I'd pretend I was one of those deaf-mutes." - the Laughing Man
Quote:Original post by ToohrVyk
Quote:float cosTable[360 - 45];


It becomes efficient if cosTable is global (which will probably be the case), because the compiler may be able to optimize the -45 part at compile time.

On x86 systems, if the degree is in a register (which will probably be the case) then "cosTable[degree - literal]" can be done in a single instruction even if cosTable isn't global. I haven't done any timing, but I'd guess there's no speed penalty.
Quote:Original post by LessBread
The bounds checking that I've been imagining would only concern itself with constant pointers.

Why would you not apply bounds checking to non-constant pointers?
Quote:Original post by LessBread
Quote:
Compiling some_function twice, once for a pointer known to be dereferencable and once for a pointer not known to be dereferencable would allow a compiler to make a function call the one which doesn't check the pointer if it knows the pointer is dereferencable. You need to do that for every function which accepts a pointer, and the number of versions you need is 2number of pointer arguments.

Is that what happens now?

Most compilers don't produce code which checks bounds, so I think it follows that they don't use any clever tricks to make bounds checking more efficient.
Quote:Original post by LessBread
If I read the docs correctly, &x[7] would be out of bounds.

Well, yes, it should be. &x[4] should also be out of bounds.

My docs say that bound does

IF (ArrayIndex < LowerBound OR ArrayIndex > UpperBound)  THEN #BR; FI;


There's nothing about adding the operand size to the upper bound, so the bounds as passed to would be {x, x + 3}. That would mean x[0], x[1], x[2], x[3] would be considered in bounds. I also note that bound is not available in 64-bit mode.

Note that when dereferencing a pointer, there's no need to use "bound". If the system is designed so that a pointer is always checked after modification - which makes sense because this alerts us to bad pointers at the earliest opportunity - then a pointer "in the wild" must always either point at a valid element or one-past-the-end. When dereferencing, we only need to compare the pointer to the upper bound, which is one-past-the-end. If they're equal, we can trap.
Quote:
The other_function could check if N is in bounds before pushing x+N.

Yes, but x+3 is not out of bounds. A pointer is allowed to point at x+3. x+3 is one-past-the-end, which is an explicitly permitted place for a pointer to point at. That means some_function has to assume that any pointer arguments it has might be pointing one-past-the-end.
Quote:
Might not the const pointer in the signature be used to indicate whether checking should be applied?

Not unless one of non-constant or constant pointers are unable to be out of bounds. As far as I know, that isn't the case.

This topic is closed to new replies.

Advertisement