Discussion:
inlining functions
(too old to reply)
Mike King
2008-05-16 19:47:26 UTC
Permalink
How can I get the C++Builder 2006 compiler to inline the function Comp (so
it's not recursive)?

#include <stdio.h>

template <unsigned int T>
class Test {
unsigned int buf[T];
public:
unsigned int Comp ( unsigned int sz = T ) {
if (sz) return buf[sz] | Comp(sz-1);
}
};

int main(int argc, char* argv[]) {
Test<2> test;
printf("Hello, %d", test.Comp());
return 0;
}

00401264 /$ 55 PUSH EBP ;
Test<2>::Comp()
00401265 |. 8BEC MOV EBP,ESP
00401267 |. 53 PUSH EBX
00401268 |. 56 PUSH ESI
00401269 |. 8B75 0C MOV ESI,DWORD PTR SS:[EBP+C]
0040126C |. 8B5D 08 MOV EBX,DWORD PTR SS:[EBP+8]
0040126F |. 85F6 TEST ESI,ESI
00401271 |. 74 12 JE SHORT Project1.00401285
00401273 |. 8BC6 MOV EAX,ESI
00401275 |. 48 DEC EAX
00401276 |. 50 PUSH EAX ; /Arg2
00401277 |. 53 PUSH EBX ; |Arg1
00401278 |. E8 E7FFFFFF CALL Project1.00401264 ;
\Project1.00401264
0040127D |. 83C4 08 ADD ESP,8
00401280 |. 8B14B3 MOV EDX,DWORD PTR DS:[EBX+ESI*4]
00401283 |. 0BC2 OR EAX,EDX
00401285 |> 5E POP ESI
00401286 |. 5B POP EBX
00401287 |. 5D POP EBP
00401288 \. C3 RETN
Mike King
2008-05-16 21:36:52 UTC
Permalink
The previous code was buggy.

#include <stdio.h>

template <unsigned int T>
class Test {
unsigned int buf[T];
public:
inline unsigned int Comp ( unsigned int sz = T-1 ) {
if (sz) return Comp(sz-1) | buf[sz];
else return buf[0];
}
};

int main(int argc, char* argv[])
{
Test<2> test;
printf("Hello, %d", test.Comp());
return 0;
}

00401268 /$ 55 PUSH EBP ;
Test<2>::Comp()
00401269 |. 8BEC MOV EBP,ESP
0040126B |. 53 PUSH EBX
0040126C |. 56 PUSH ESI
0040126D |. 57 PUSH EDI
0040126E |. 8B7D 0C MOV EDI,DWORD PTR SS:[EBP+C]
00401271 |. 8B5D 08 MOV EBX,DWORD PTR SS:[EBP+8]
00401274 |. 85FF TEST EDI,EDI ;
Switch (cases 0..1)
00401276 |. 74 20 JE SHORT Project1.00401298
00401278 |. 8BF7 MOV ESI,EDI
0040127A |. 4E DEC ESI
0040127B |. 85F6 TEST ESI,ESI
0040127D |. 74 12 JE SHORT Project1.00401291
0040127F |. 8BC6 MOV EAX,ESI ;
Default case of switch 00401274
00401281 |. 48 DEC EAX
00401282 |. 50 PUSH EAX ; /Arg2
00401283 |. 53 PUSH EBX ; |Arg1
00401284 |. E8 DFFFFFFF CALL Project1.00401268 ;
\Project1.00401268
00401289 |. 83C4 08 ADD ESP,8
0040128C |. 0B04B3 OR EAX,DWORD PTR DS:[EBX+ESI*4]
0040128F |. EB 02 JMP SHORT Project1.00401293
00401291 |> 8B03 MOV EAX,DWORD PTR DS:[EBX] ; Case
1 of switch 00401274
00401293 |> 0B04BB OR EAX,DWORD PTR DS:[EBX+EDI*4]
00401296 |. EB 02 JMP SHORT Project1.0040129A
00401298 |> 8B03 MOV EAX,DWORD PTR DS:[EBX] ; Case
0 of switch 00401274
0040129A |> 5F POP EDI
0040129B |. 5E POP ESI
0040129C |. 5B POP EBX
0040129D |. 5D POP EBP
0040129E \. C3 RETN
unknown
2008-05-17 02:11:27 UTC
Permalink
Post by Mike King
if (sz) return Comp(sz-1) | buf[sz];
else return buf[0];
If you don't want recursion, use while() or for().

That asm is pretty wild. Translates to:

switch( sz )
{case 0:
return buf[0];
case 1;
return buf[0] | buf[1];
default:
return Comp(this, sz-2) | buf[sz-1] | buf[sz];
};
Which is pretty much optimized for your Test<2>
For the compiler to remove the "deafult:" code,
you would need a mechanism to assure the compiler
that it can never be called with a value larger than T-1.
Ahh, perhaps?
if( sz > T-1 ) return 0;
Post by Mike King
if (sz)
{ return Comp(sz-1) | buf[sz];
Post by Mike King
}else{ return buf[0];
Another way to write that switch is:
int ret =0;
switch( sz )
{case T-1:
ret |= buf[T-1];
case T-2:
ret |= buf[T-2];
};
return ret;
Or, in general
int ret =0;
switch( sz )
{
<expand while T > 0 >
case T-1:
ret |= buf[sz--];
<end macro expansion>
default: // handle too big sz
};
return ret;
Or, for a slight bump:
int ret = buf[0];
switch( sz )
{
<expand while T > 1 >
case T-1:
ret |= buf[sz--];
<end macro expansion>
default:
};
return ret;

Now, how you do that <expand>, is up to the template gurus to explain.
Mike King
2008-05-19 14:32:12 UTC
Permalink
Thank you for your help. I wasn't sure if anyone would respond to my
message. I'm trying to give the compiler the maximum amount of information
so it can inline the function at the call site. I'm normally not too
concerned with performance but this type of code will be used in an
application that typically has a long runtime. Here's an example. In one
of my test cases, the main search function gets called approx. 500 billons
times, so any optimizations that I can find will greatly lowers the runtime.

The compiler seems to be generating code for a generic Comp function (not
specialized for sz==4). Because this type of code will be literally called
more than a trillion times, I think having an inlined version at the call
site would be faster. I tried the Microsoft compiler and it does inline the
function. I'm not pushing to use the Microsoft compiler but I'm wondering
if I can get the Borland compiler to do the same. I would like to stick
with one compiler (the Borland one).

#include <stdio.h>

template <unsigned int T>
struct Test {
unsigned int buf[T];
public:
inline unsigned int Comp ( unsigned int sz = T ) {
unsigned int ret=0;
switch (sz) {
case 4: ret |= buf[3];
case 3: ret |= buf[2];
case 2: ret |= buf[1];
case 1: ret |= buf[0]; break;
default: return buf[sz-1] | Comp(sz-1);
}
return ret;
}
};

int main(int argc, char* argv[]) {
Test<4> test;
printf("Hello, %d", test.Comp());
return 0;
}

//==========================================================
//================== Borland Compiler ======================
//==========================================================
00401200 /. 55 PUSH EBP ; int
main(int argc, char* argv[])
00401201 |. 8BEC MOV EBP,ESP
00401203 |. 83C4 F0 ADD ESP,-10
00401206 |. 6A 04 PUSH 4 ; /Arg2
= 00000004
00401208 |. 8D45 F0 LEA EAX,DWORD PTR SS:[EBP-10] ; |
0040120B |. 50 PUSH EAX ; |Arg1
0040120C |. E8 17000000 CALL Project2.00401228 ;
\Project2.00401228
00401211 |. 83C4 08 ADD ESP,8
00401214 |. 50 PUSH EAX ; /Arg2
00401215 |. 68 ACB04000 PUSH Project2.0040B0AC ; |Arg1
= 0040B0AC ASCII "Hello, %d"
0040121A |. E8 D52F0000 CALL Project2.004041F4 ;
\Project2.004041F4
0040121F |. 83C4 08 ADD ESP,8
00401222 |. 33C0 XOR EAX,EAX
00401224 |. 8BE5 MOV ESP,EBP
00401226 |. 5D POP EBP
00401227 \. C3 RETN

00401228 /$ 55 PUSH EBP ;
Test<2>::Comp()
00401229 |. 8BEC MOV EBP,ESP
0040122B |. 53 PUSH EBX
0040122C |. 56 PUSH ESI
0040122D |. 8B75 0C MOV ESI,DWORD PTR SS:[EBP+C]
00401230 |. 8B5D 08 MOV EBX,DWORD PTR SS:[EBP+8]
00401233 |. 33C0 XOR EAX,EAX
00401235 |. 8BD6 MOV EDX,ESI
00401237 |. 4A DEC EDX ;
Switch (cases 1..4)
00401238 |. 74 12 JE SHORT Project2.0040124C
0040123A |. 4A DEC EDX
0040123B |. 74 0C JE SHORT Project2.00401249
0040123D |. 4A DEC EDX
0040123E |. 74 06 JE SHORT Project2.00401246
00401240 |. 4A DEC EDX
00401241 |. 75 0D JNZ SHORT Project2.00401250
00401243 |. 0B43 0C OR EAX,DWORD PTR DS:[EBX+C] ; Case
4 of switch 00401237
00401246 |> 0B43 08 OR EAX,DWORD PTR DS:[EBX+8] ; Case
3 of switch 00401237
00401249 |> 0B43 04 OR EAX,DWORD PTR DS:[EBX+4] ; Case
2 of switch 00401237
0040124C |> 0B03 OR EAX,DWORD PTR DS:[EBX] ; Case
1 of switch 00401237
0040124E |. EB 11 JMP SHORT Project2.00401261
00401250 |> 8BCE MOV ECX,ESI ;
Default case of switch 00401237
00401252 |. 49 DEC ECX
00401253 |. 51 PUSH ECX ; /Arg2
00401254 |. 53 PUSH EBX ; |Arg1
00401255 |. E8 CEFFFFFF CALL Project2.00401228 ;
\Project2.00401228
0040125A |. 83C4 08 ADD ESP,8
0040125D |. 0B44B3 FC OR EAX,DWORD PTR DS:[EBX+ESI*4-4]
00401261 |> 5E POP ESI
00401262 |. 5B POP EBX
00401263 |. 5D POP EBP
00401264 \. C3 RETN

//==========================================================
//================== Microsoft Compiler ====================
//==========================================================
004017E0 /$ 8B4424 FC MOV EAX,DWORD PTR SS:[ESP-4]
004017E4 |. 0B4424 F8 OR EAX,DWORD PTR SS:[ESP-8]
004017E8 |. 8B4C24 F0 MOV ECX,DWORD PTR SS:[ESP-10]
004017EC |. 0B4424 F4 OR EAX,DWORD PTR SS:[ESP-C]
004017F0 |. 83EC 10 SUB ESP,10
004017F3 |. 0BC8 OR ECX,EAX
004017F5 |. 51 PUSH ECX ; /<%d>
004017F6 |. 68 F4204000 PUSH test.004020F4 ;
|format = "Hello, %d"
004017FB |. FF15 A0204000 CALL DWORD PTR DS:[<&MSVCR80.printf>] ;
\printf
00401801 |. 33C0 XOR EAX,EAX
00401803 |. 83C4 18 ADD ESP,18
00401806 \. C3 RETN
Post by unknown
Post by Mike King
if (sz) return Comp(sz-1) | buf[sz];
else return buf[0];
If you don't want recursion, use while() or for().
switch( sz )
return buf[0];
case 1;
return buf[0] | buf[1];
return Comp(this, sz-2) | buf[sz-1] | buf[sz];
};
Which is pretty much optimized for your Test<2>
For the compiler to remove the "deafult:" code,
you would need a mechanism to assure the compiler
that it can never be called with a value larger than T-1.
Ahh, perhaps?
if( sz > T-1 ) return 0;
Post by Mike King
if (sz)
{ return Comp(sz-1) | buf[sz];
Post by Mike King
}else{ return buf[0];
int ret =0;
switch( sz )
ret |= buf[T-1];
ret |= buf[T-2];
};
return ret;
Or, in general
int ret =0;
switch( sz )
{
<expand while T > 0 >
ret |= buf[sz--];
<end macro expansion>
default: // handle too big sz
};
return ret;
int ret = buf[0];
switch( sz )
{
<expand while T > 1 >
ret |= buf[sz--];
<end macro expansion>
};
return ret;
Now, how you do that <expand>, is up to the template gurus to explain.
Chris Uzdavinis (TeamB)
2008-05-19 15:57:06 UTC
Permalink
Post by Mike King
The compiler seems to be generating code for a generic Comp function (not
specialized for sz==4). Because this type of code will be literally called
more than a trillion times, I think having an inlined version at the call
site would be faster. I tried the Microsoft compiler and it does inline the
function. I'm not pushing to use the Microsoft compiler but I'm wondering
if I can get the Borland compiler to do the same. I would like to stick
with one compiler (the Borland one).
#include <stdio.h>
template <unsigned int T>
struct Test {
unsigned int buf[T];
inline unsigned int Comp ( unsigned int sz = T ) {
unsigned int ret=0;
switch (sz) {
case 4: ret |= buf[3];
case 3: ret |= buf[2];
case 2: ret |= buf[1];
case 1: ret |= buf[0]; break;
default: return buf[sz-1] | Comp(sz-1);
}
return ret;
}
};
Perhaps you could hand-eliminate the recursion for the special case of
T > 4.


// primary template as you wrote it:
// Non-optimized case
template <unsigned int T>
struct Test {
unsigned int buf[T];
public:

inline unsigned int Comp ( unsigned int sz = T ) {
unsigned int ret=0;
switch (sz) {
case 4: ret |= buf[3];
case 3: ret |= buf[2];
case 2: ret |= buf[1];
case 1: ret |= buf[0]; break;
default: return buf[sz-1] | Comp(sz-1);
}
return ret;
}
};


// specialize the class for size of 4, then overload Comp, so when we
// have a "default" argument size of T, then we get the optimized
// version.

template <>
struct Test<4> {
unsigned int buf[T];
public:

inline unsigned int Comp ( unsigned int sz = T ) {
unsigned int ret=0;
switch (sz) {
case 4: ret |= buf[3];
case 3: ret |= buf[2];
case 2: ret |= buf[1];
case 1: ret |= buf[0]; break;
default: return buf[sz-1] | Comp(sz-1);
}
return ret;
}

// optimized case
inline unsigned int Comp () {
return buf[3] | buf[2] | buf[1] | buf[0];
}
};



Note, I haven't actually tested this, but it seems like a way to
handle the general case and optimize one particular case that you're
interested in optimizing.
--
Chris (TeamB);
Chris Uzdavinis (TeamB)
2008-05-19 15:59:49 UTC
Permalink
Post by Chris Uzdavinis (TeamB)
template <>
struct Test<4> {
unsigned int buf[T];
...
Post by Chris Uzdavinis (TeamB)
inline unsigned int Comp ( unsigned int sz = T ) {
oops. T isn't known, so replace the buf declaration with buf[4], and
we also should get rid of the default value of =T in the non-optimized
Comp, since it's overloaded with a nullary version.

template <>
struct Test<4> {
unsigned int buf[4];

...


// no default size
inline unsigned int Comp ( unsigned int sz) // 1 arg
{...}

inline unsigned int Comp () // no args
{...}
--
Chris (TeamB);
Mike King
2008-05-19 17:03:03 UTC
Permalink
Post by Chris Uzdavinis (TeamB)
// specialize the class for size of 4, then overload Comp, so when we
// have a "default" argument size of T, then we get the optimized
// version.
template <>
struct Test<4> {
unsigned int buf[T];
inline unsigned int Comp ( unsigned int sz = T ) {
unsigned int ret=0;
switch (sz) {
case 4: ret |= buf[3];
case 3: ret |= buf[2];
case 2: ret |= buf[1];
case 1: ret |= buf[0]; break;
default: return buf[sz-1] | Comp(sz-1);
}
return ret;
}
// optimized case
inline unsigned int Comp () {
return buf[3] | buf[2] | buf[1] | buf[0];
}
};
I foresee sz being either 2, 3, or 4. I can make three specialized classes
if that's what I have to do but is there another way that's cleaner.
Chris Uzdavinis (TeamB)
2008-05-19 18:28:11 UTC
Permalink
Post by Mike King
Post by Chris Uzdavinis (TeamB)
// specialize the class for size of 4, then overload Comp, so when we
// have a "default" argument size of T, then we get the optimized
// version.
template <>
struct Test<4> {
unsigned int buf[T];
inline unsigned int Comp ( unsigned int sz = T ) {
unsigned int ret=0;
switch (sz) {
case 4: ret |= buf[3];
case 3: ret |= buf[2];
case 2: ret |= buf[1];
case 1: ret |= buf[0]; break;
default: return buf[sz-1] | Comp(sz-1);
}
return ret;
}
// optimized case
inline unsigned int Comp () {
return buf[3] | buf[2] | buf[1] | buf[0];
}
};
I foresee sz being either 2, 3, or 4. I can make three specialized classes
if that's what I have to do but is there another way that's cleaner.
Why does the class have a size template parameter if the function can
have different values in its paramter? It's a harder problem because
you must handle so many cases.

As Bob Gondor noted, whenever sz > T, it is an error condition.

Recognizing that:

template <>
struct Test<4> {
unsigned int buf[4];
public:

inline unsigned int Comp ( unsigned int sz = 4 ) {

assert(sz <= 4); // <<<<<<<<< New test

switch (sz) {
case 4: ret |= buf[3];
case 3: ret |= buf[2];
case 2: ret |= buf[1];
case 1: ret |= buf[0]; break;
}
return ret;
}
};


If you're worried about the small stuff--and you don't trust the
optimizer--you could work around some of the intermediate assignments
by doing this:

inline unsigned int Comp ( unsigned int sz = 4 ) {
assert(sz <= 4);
switch (sz) {
case 4: return buf[0] | buf[1] | buf[2] | buf[3];
case 3: return buf[0] | buf[1] | buf[2];
case 2: return buf[0] | buf[1];
case 1: return buf[0];
}
return 0;
}
--
Chris (TeamB);
Mike King
2008-05-19 19:34:18 UTC
Permalink
Post by Chris Uzdavinis (TeamB)
Why does the class have a size template parameter if the function can
have different values in its paramter?
I should have removed it because I'm not calling Comp recursively anymore.
Post by Chris Uzdavinis (TeamB)
If you're worried about the small stuff--and you don't trust the
optimizer
I would like to trust the optimizer, I normally do, but it appears to not
optimize as well as the Microsoft compiler. It really likes to generate
"jumpy" and non-inlined code. I'm just looking for help. I appreciate both
yours and Bob Gonder help.
Chris Uzdavinis (TeamB)
2008-05-19 19:40:53 UTC
Permalink
Post by Mike King
I would like to trust the optimizer, I normally do, but it appears to not
optimize as well as the Microsoft compiler.
That does not really surprise me. MS has a pretty aggressive
optimizer.
Post by Mike King
It really likes to generate "jumpy" and non-inlined code. I'm just
looking for help. I appreciate both yours and Bob Gonder help.
No problem. When you decide upon a final solution, would you mind
summarizing what it turned out to be?
--
Chris (TeamB);
Mike King
2008-05-19 19:38:03 UTC
Permalink
Here's the implemention of some of the suggestions.

#include <stdio.h>

template <unsigned int T>
struct Test {
unsigned int* _buf;
unsigned int _sz;
public:
Test (unsigned int sz) { _sz = sz; _buf = new unsigned int[sz]; }
inline unsigned int Comp ( ) {
unsigned int sz = _sz;
unsigned int ret=0;
do { ret |= _buf[sz]; } while(sz--);
return ret;
}
};

template <>
struct Test<2> {
unsigned int buf[2];
public:
inline unsigned int Comp ( ) {
return buf[1] | buf[0];
}
};

template <>
struct Test<3> {
unsigned int buf[3];
public:
inline unsigned int Comp ( ) {
return buf[2] | buf[1] | buf[0];
}
};

template <>
struct Test<4> {
unsigned int buf[4];
public:
inline unsigned int Comp ( ) {
return buf[3] | buf[2] | buf[1] | buf[0];
}
};

int main(int argc, char* argv[]) {
unsigned int value_determined_at_runtime = 2;
unsigned int result;
switch (value_determined_at_runtime) {
case 2:
Test<2> test2;
result = test2.Comp();
break;
case 3:
Test<3> test3;
result = test3.Comp();
break;
case 4:
Test<4> test4;
result = test4.Comp();
break;
default:
Test<5> testn(value_determined_at_runtime); // I chose five because
I had to choose something other than 2,3, or 4
result = testn.Comp();
break;
}
printf("Hello, %d", result);
return 0;
}
unknown
2008-05-19 16:25:30 UTC
Permalink
Post by Mike King
The compiler seems to be generating code for a generic Comp function (not
specialized for sz==4).
Yes, well, it is written as a generic with an optimizer for 4.
Post by Mike King
Because this type of code will be literally called
more than a trillion times, I think having an inlined version at the call
site would be faster. I tried the Microsoft compiler and it does inline the
function. I'm not pushing to use the Microsoft compiler but I'm wondering
if I can get the Borland compiler to do the same. I would like to stick
with one compiler (the Borland one).
I haven't been able to get TCE2006 (free version) to inline functions.
Recompiling with the old BCB 5.5 inlines many functions that TCE won't.
Post by Mike King
#include <stdio.h>
template <unsigned int T>
struct Test {
unsigned int buf[T];
inline unsigned int Comp ( unsigned int sz = T ) {
unsigned int ret=0;
switch (sz) {
case 4: ret |= buf[3];
case 3: ret |= buf[2];
case 2: ret |= buf[1];
case 1: ret |= buf[0]; break;
default: return buf[sz-1] | Comp(sz-1);
This is incorrect for Test<4>.
default: is an error condition.
At worst you should return Comp(T)
Post by Mike King
switch (sz) {
default: // fall through
Post by Mike King
case 4: ret |= buf[3];
Why not
unsigned int ret = buf[0];
if( sz > T ) sz = T;
while( --sz )ret |= buf[sz];
return ret;
Mike King
2008-05-19 17:11:01 UTC
Permalink
Post by unknown
Post by Mike King
The compiler seems to be generating code for a generic Comp function (not
specialized for sz==4).
Yes, well, it is written as a generic with an optimizer for 4.
Then, how should I write it so it's not generic.
dhoke
2008-05-19 20:49:40 UTC
Permalink
Post by unknown
Post by Mike King
#include <stdio.h>
template <unsigned int T>
struct Test {
unsigned int buf[T];
inline unsigned int Comp ( unsigned int sz = T ) {
unsigned int ret=0;
switch (sz) {
case 4: ret |= buf[3];
case 3: ret |= buf[2];
case 2: ret |= buf[1];
case 1: ret |= buf[0]; break;
default: return buf[sz-1] | Comp(sz-1);
This is incorrect for Test<4>.
default: is an error condition.
How is it incorrect for Test<4>??? I can see it being incorrect for a call
of Comp(0), and I can see that Test< (with n > 4)> probably would keep it
from inlining most anywhere... but incorrect for Test<4> how, in what sense?
Post by unknown
At worst you should return Comp(T)
Post by Mike King
switch (sz) {
default: // fall through
Post by Mike King
case 4: ret |= buf[3];
Why not
unsigned int ret = buf[0];
if( sz > T ) sz = T;
while( --sz )ret |= buf[sz];
return ret;
This does not seem to me like it would generate very efficient code unless
Borland is unrolling loops... But give the possible variance of "sz", I
think you'd have a lot more jumps than Mike's original switch() falling thru
the various cases... Although in his switch, I would've just returned from
the case 0, rather than break'ing and then return'ing...

Also, in the past some code I'm aware of was found to be more efficiently
generated if it was coded as *(buf+n) rather than using buf[n]...
Mike King
2008-05-19 21:18:35 UTC
Permalink
Post by dhoke
This does not seem to me like it would generate very efficient code unless
Borland is unrolling loops... But give the possible variance of "sz", I
think you'd have a lot more jumps than Mike's original switch() falling
thru the various cases... Although in his switch, I would've just
returned from the case 0, rather than break'ing and then return'ing...
how about this?

class Test {
unsigned int buf[4];
public:
// I know if sz > 4 then a memory access exception will occurr (this isn't
production code)
inline unsigned int Comp ( unsigned int sz ) {
unsigned int ret=0;
switch (sz) {
default: do { ret |= buf[--sz]; } while(sz);
case 4: ret |= buf[3];
case 3: ret |= buf[2];
case 2: ret |= buf[1];
ret |= buf[0]; // sz will never be <=1
}
return ret;
}
};
Post by dhoke
Also, in the past some code I'm aware of was found to be more efficiently
generated if it was coded as *(buf+n) rather than using buf[n]...
How did you find out that it was more efficient?
Chris Uzdavinis (TeamB)
2008-05-19 21:39:27 UTC
Permalink
Post by Mike King
class Test {
unsigned int buf[4];
// I know if sz > 4 then a memory access exception will occurr (this isn't
production code)
inline unsigned int Comp ( unsigned int sz ) {
unsigned int ret=0;
switch (sz) {
default: do { ret |= buf[--sz]; } while(sz);
case 4: ret |= buf[3];
case 3: ret |= buf[2];
case 2: ret |= buf[1];
ret |= buf[0]; // sz will never be <=1
}
Still, why have the default case if it's certain to be an error? When
it finishes, then cases 4,3,2, and the invisible case 1 will all be
re-applied anyway.
Post by Mike King
Post by dhoke
Also, in the past some code I'm aware of was found to be more efficiently
generated if it was coded as *(buf+n) rather than using buf[n]...
How did you find out that it was more efficient?
I remember hearing long ago that incrementing a pointer was faster
than indexing into an array, but optimizers should have improved
beyond that stage years ago. In fact, it was probably like 10 years
ago when I heard from a friend who looked into generated code that the
pointer/indexing code was 100% identical. (But he was using MS's
compiler, and I've never been motivated to see how other compilers do
it.)

But as optimizers go, I'm pretty impressed with the one in g++ 4.x. I
discovered that code like this:

std::string s;
//...
s += "abcdefg";

has a pretty awesome optimization. I expected the operator += to do a
strlen on the string literal, then append n bytes. I was astounded to
see that the generated code was equivalent to this:

s.append("abcdefg", 7);

Apparently,

1) operator+=(char const * str) calls append(str)

2) append(char const * str) figures the length, then forwards it on:
size_t len = strlen(str);
append(str, len);

3) append(str,len) implements the real work.


The inliner was smart enough to trace through the code and know that
operator+= called append() with a string literal, and then the call to
strlen on the string literal was recognized as something that could be
done at compile time, and that value was inlined, and then the call to
append(str, 7) was inlined as well.

I had no idea that compilers would trace string literals through char*
parameter arguments. (In g++ 3.x series, it didn't.)

Anyway, that was my most recent moment of awe. :)
--
Chris (TeamB);
dhoke
2008-05-20 13:10:33 UTC
Permalink
Post by Mike King
Post by dhoke
This does not seem to me like it would generate very efficient code
unless Borland is unrolling loops... But give the possible variance of
"sz", I think you'd have a lot more jumps than Mike's original switch()
falling thru the various cases... Although in his switch, I would've
just returned from the case 0, rather than break'ing and then
return'ing...
how about this?
class Test {
unsigned int buf[4];
// I know if sz > 4 then a memory access exception will occurr (this isn't
production code)
inline unsigned int Comp ( unsigned int sz ) {
unsigned int ret=0;
You hope the compiler does it, but doesn't hurt to tell it just in case...
register unsigned int ret=0 ;
Post by Mike King
switch (sz) {
default: do { ret |= buf[--sz]; } while(sz);
That still has the loop (and thus jump, unless unrolled), and doesn't matter
if you're only going to have 2-4 anyway...
But you would need a break on this (default) case unless you otherwise
adjust size and still fall thru the other four cases...
Post by Mike King
case 4: ret |= buf[3];
case 3: ret |= buf[2];
case 2: ret |= buf[1];
ret |= buf[0]; // sz will never be <=1
and might change this to simply:
case 2: return ret | buf[1] | buf[2] ;
Post by Mike King
}
return ret;
}
};
Post by dhoke
Also, in the past some code I'm aware of was found to be more efficiently
generated if it was coded as *(buf+n) rather than using buf[n]...
How did you find out that it was more efficient?
Well I'm taking someone's word for it, but I believe they actually tried it
both ways in a particular situation, looking at the generated code, and
timing it. That would have been done with BCB6 or earlier, but when I
mentioned this more recently in one of these groups, someone else seemed to
concur.
dhoke
2008-05-20 13:39:29 UTC
Permalink
Post by dhoke
case 2: return ret | buf[1] | buf[2] ;
duh - edit as required:
case 2: return ret | buf[1] | buf[0] ;
unknown
2008-05-20 03:47:42 UTC
Permalink
Post by dhoke
Post by unknown
Post by Mike King
template <unsigned int T>
struct Test {
unsigned int buf[T];
inline unsigned int Comp ( unsigned int sz = T ) {
unsigned int ret=0;
switch (sz) {
case 4: ret |= buf[3];
case 3: ret |= buf[2];
case 2: ret |= buf[1];
case 1: ret |= buf[0]; break;
default: return buf[sz-1] | Comp(sz-1);
This is incorrect for Test<4>.
default: is an error condition.
How is it incorrect for Test<4>??? I can see it being incorrect for a call
of Comp(0), and I can see that Test< (with n > 4)> probably would keep it
from inlining most anywhere... but incorrect for Test<4> how, in what sense?
Incorrect in that you should handle errors instead of throwing
(or worse, not throwing) memory access errors.
For Test<4>, buf is buf[4], so when you get to the default:
case ( <=0 or >=5 ), you are accessing beyond the bounds of buf.
dhoke
2008-05-20 13:19:37 UTC
Permalink
Post by unknown
Post by dhoke
Post by unknown
Post by Mike King
template <unsigned int T>
struct Test {
unsigned int buf[T];
inline unsigned int Comp ( unsigned int sz = T ) {
unsigned int ret=0;
switch (sz) {
case 4: ret |= buf[3];
case 3: ret |= buf[2];
case 2: ret |= buf[1];
case 1: ret |= buf[0]; break;
default: return buf[sz-1] | Comp(sz-1);
This is incorrect for Test<4>.
default: is an error condition.
How is it incorrect for Test<4>??? I can see it being incorrect for a call
of Comp(0), and I can see that Test< (with n > 4)> probably would keep it
from inlining most anywhere... but incorrect for Test<4> how, in what sense?
Incorrect in that you should handle errors instead of throwing
(or worse, not throwing) memory access errors.
case ( <=0 or >=5 ), you are accessing beyond the bounds of buf.
But you're only going to get to default for Comp(sz) when sz<=0, which I did
and do acknowledge would be an error...

But I don't see it for Test<4>, Comp(4)...

With sz == 4, 3, 2, or 1, you'll take one of those branches, break and
return...

With initial sz > 4, you will hit default until sz == 4, on which recursion
will then hit the other cases, which will not ever reach the default as they
will take case 4: and fall thru, breaking... ??? I think...
unknown
2008-05-20 15:43:43 UTC
Permalink
Post by dhoke
Post by unknown
Post by Mike King
template <unsigned int T>
struct Test {
unsigned int buf[T];
inline unsigned int Comp ( unsigned int sz = T ) {
unsigned int ret=0;
switch (sz) {
case 4: ret |= buf[3];
case 3: ret |= buf[2];
case 2: ret |= buf[1];
case 1: ret |= buf[0]; break;
default: return buf[sz-1] | Comp(sz-1);
This is incorrect for Test<4>.
default: is an error condition.
With initial sz > 4, you will hit default until sz == 4, on which recursion
will then hit the other cases, which will not ever reach the default as they
will take case 4: and fall thru, breaking... ??? I think...
Take another look at sz=5:
default: return buf[4] | Comp(4);
buf[4] does not exist for a Test<4>.

default: should either
1) return 0; //ignore error return minimum
2) return Comp(T); // ignore error, return max
3) return -1; // error code -1 not likely result
4) throw. // error state
dhoke
2008-05-20 16:06:54 UTC
Permalink
Post by unknown
default: return buf[4] | Comp(4);
buf[4] does not exist for a Test<4>.
Won't argue with that, I just wasn't thinking of calling Comp(5) for an item
defined as Test<4>

Loading...