The industrial-strength framework
Dear MacApper,
I've just returned from this year's C++ World Conference and have
to say that it was everything I had hoped and more. My desire had
been to get totally immersed in C++, from the basics to the minutia,
with the hope that I would be able to validate recent decisions
relating to the use of C++ in MacApp and other products, and have
guidance in making future decisions.
If you attended please share your comments with the rest of
us.
These notes are in SOC order (stream-of-consciousness). I'm not
writing a book, after all!
Oh Well, Dylan
I've been secretly dreaming that some day the Dylan language would
come into the mainstream and I would be able to employ it in some
serious work. After this conference I've decided to retire my copy of
Dylan and put those foolish dreams aside. C++, while it is no Dylan,
has come a very long way and with the just-announced standard is
truly, IMHO, the best language for general development purposes.
Use of 'string' (or 'wstring')
One thing I have been concerned about for quite a while is the
presence of custom string classes in MacApp, namely CPascalStr_AC and
CChar_AC. Well, yes, we do need pascal type strings for interfacing
with the Toolbox but there are few other requirements for pascal
strings.
Use of 'string' in MacApp would have several benefits. First, it
reduces the amount of maintenance by removing a significant chunk of
code. Second, the chance of compatibility with third party class
libraries will increase dramatically. And third, use of string
provides better unicode support.
The approach I'm currently considering is to retain CStr255_AC
(but probably with a different name) but reduce it to nothing more
than a class which can be constructed from a 'string' (or 'wstring')
and which provide conversion operators to a 'string' (or 'wstring').
All occurrences of MacApp's custom string classes will be replaced
and the pascal string class only used in Toolbox calls.
This will not be a simple change because of various streaming
issues and the introduction of 'locales' -- and the view structure
may have to be modified -- so a change is not necessarily
imminent. A change to 'string' needs far more thought.
Use the Standard Library Containers
Using the various collection classes in the Standard Library
appears to have many benefits and is attractive. But there seem to be
few places in MacApp's design where these containers could actually
be used effectively.
Care has to be taken to prevent memory allocation problems with
the use of <vector> and <deque>. It was suggested that
the 'reserve' function can be used to control memory allocation.
The <list> class might be valuable for use in places where
we currently employ a linked list: behavior lists, subview lists,
adorner lists, etc. There is a cost to this: there would undoubtedly
be more code generated; but the benefit is that this is code we don't
have to maintain. (It's interesting to note that behaviors in Macapp
are already managed using auto_ptrs.)
It's possible that there are places where <set> and
<map> could be used effectively. For example, MacApp's param
text implementation could benefit from a <map> reducing the
complexity of the current implementation. One potential problem with
<map> is reclamation of memory.
In behavior management, for example, if we were to use an STL
container the following code:
void TEventHandler::RemoveBehavior(TBehavior* theBehavior)
{
if (fBehavior)
{
if (fBehavior == theBehavior)
{
fBehavior.release();
fBehavior = theBehavior->GetNextBehavior();
if (fBehavior)
fBehavior->SetPreviousBehavior(NULL);
// remove theBehavior from the linked list
theBehavior->SetNextBehavior(NULL);
theBehavior->SetPreviousBehavior(NULL);
theBehavior->SetOwner(NULL);
}
else
fBehavior->RemoveBehavior(theBehavior);
}
}
would become something like:
void TEventHandler::RemoveBehavior(TBehavior* theBehavior)
{
iter = fBehaviorList.begin();
while (iter != fBehaviorList.end()
&& iter != theBahavior)
iter++;
if (iter != fBehaviorList.end())
erase(iter);
}
We should probably templatize this into something like a
'SubcomponentList' and use it for subview, adorner and behavior
collections. If so, then we can make the above even cleaner by adding
a 'find' to the <list> resulting in:
void TEventHandler::RemoveBehavior(TBehavior* theBehavior)
{
iter = fBehaviorList.find(theBehavior);
if (iter != fBehaviorList.end())
erase(iter);
}
Further, the SubComponentList template could have a 'deleteAll'
function called from its destructor. This would eliminate the need to
delete the sub-components manually. Synchronization issues would have
to be considered, of course.
Some other notes about Standard Library collections:
- It can be difficult to enforce constness on collections.
- 'const_iterator' is not the same as 'const iterator'!
- Don't test <map> membership using the index operator
[]. Use 'find' instead because [] will create an
entry in the map if it doesn't already exist!
- Be sure you know what you're doing when you 'pre-allocate'
space when creating a vector such as in 'list<X*>
xtr(100);'!
- Beware of the default 'less<x>' comparison function used
for <map> -- you'll probably want to write your own.
Can't get the following to compile?
vector<auto_ptr<X>> vx;
The compiler thinks the '>>' is a shift operator so put a
space between them:
vector<auto_ptr<X> > vx;
Here's a handy template for deleting the contents of a container
(since calling 'clear' doesn't delete the elements of a
container:
template<class InIterator>
void DeleteAll(InIterator first, InIterator last)
{
while (first != last)
{
delete *first;
++first;
}
}
This can then be used in the following:
typedef vector<X*> XPVec;
XPVec vx;
... // Put a bunch of stuff into vx.
DeleteAll(vx.begin(), vx.end());
vx.clear();
All STL containers have copy semantics defined. So if you
write:
typedef map<string, X> Xmap;
Xmap theOneAndOnlyXmap;
...
and then write:
Xmap mx = theOneAndOnlyXmap; // Create alias.
You don't get an alias -- you get a second copy!
Why does the following crash?
vector<double> x; // x has no elements.
copy(v.begin(), v.end(), x.begin());
This is probably one of the most common mistakes programmers make
when they start using containers from the standard library. One can
only copy elements into other elements that already exist. So one of
the following should be used:
1. vector<double> x(v.size);
copy(v.begin(), v.end(), x.begin());
2. vector<double> x;
...
x.resize(v.size());
copy(v.begin(), v.end(), x.begin());
3. vector<double> x;
...
copy(v.begin(), v.end(), back_inserter(x));
Standard Library Algorithms
The Standard Library is chock full of useful utilities but I have
not been able to take a thorough look at all of them to see if there
are any which could be used effectively in MacApp beyond the
iterator-related algorithms; but I'm sure there are.
One thing that impressed me about the algorithms in the Standard
Library was their similarity to the generic algorithms in Dylan. For
example, there are two versions of 'sort': one version may change the
order of the collection (for items that have matching keys) and the
other guarantees that the order is not affected.
BTW, all of the container-related algorithms also work for
built-in arrays, i.e. you can use all of the iteraters with 'double
dArray[100];'.
Standard Template Library
STL now includes auto_ptr and reference counted pointer templates
that would be worthy replacements for our custom CAutoPtr_AC in most
places. It would be worth some consideration to see if MacApp's
auto-pointer classes should be replaced with STL's. The same is true
for reference counted pointers.
There is going to be a need for thread-safe reference counted
pointers. The STL does not currently address this need.
iostream
Many years ago I got hooked on Borland's 'ostrstream'. The
Standard Library now has a decent implementation of this (called
'ostringstream'). They work well with the 'string' class and we
should use them where possible.
Dimensional Analysis
Scott Meyers's, in his excellent talk 'Three Cool Things in C++',
discussed Barton's & Nackman's unique use of templates to provide
compiler-time type checking.
Meyers gave an example of how compilers have no understanding of
type of units of measurement. If you have the following in your
code:
double t; // Time - in seconds.
double a; // Acceleration - in meters/second^2
double d; // Distance - in meters
and then you do the following:
cout << d/(t*t) - a; // Okay, subtracts meters/sec^2
cout << d/t - a; // Huh? What does it mean to
// subtract acceleration from d/t?
The compiler has no concept of 'units' and so the second line
above is perfectly fine even though it makes no sense at all.
To correct this problem some programmers try using 'tyedefs':
typedef double Acceleration;
typedef double Time;
typedef double Distance;
Time t;
Acceleration a;
Distance d;
cout << d/t - a; // Compiler still doesn't complain.
Can we introduce the concept of 'units' at no cost in terms of
performance or memory? Yes, with templates.
What we need are 'types' corresponding to the various units such
as time, distance, and mass. But there are an unlimited number of
unit types possible since 't^2' is a totally different type than 't'.
This led to the conclusion that counting the exponents was all that
was needed to differentiate one type of unit form another.
Here's how Barton & Nackman solved the problem (as simplified
by Meyers):
template<int m, // Exponent for mass
int d, // Exponent for distance
int t> // Exponent for time
class Units
{
public:
Units(): val(1.0) {}
double value() const { return val; }
double& value() { return val; }
...
private:
double val;
}
Now we can say:
Units<1, 0, 0> m; // m is type 'mass'.
Units<0, 1, 0> d; // d is type 'distance'.
Units<0, 0, 1> t; // t is type 'time'.
m = t; // ERROR! Type mis-match!
While adding mass to distance is a type mismatch multiplying them
isn't. But what you end up with is a new type:
mass * distance -> Units<1, 1, 0>
So an operator* template is provided which simply adds the
exponents. Note, though, that these exponents are not stored
anywhere! It is simply a new type created at compile or runtime.
There was a lot more of great interest relating to this specific
example but to summarize using templates in this manner required
absolutely no overhead and performance was the same as the original
'double' implementation.
For more information on this really cool use of templates please
refer to: John J. Barton and Lee R. Nackman, "Dimensional Analysis."
C++ Report, January 1995.
Trap Doors
Think that private data member is really private? Think again if
you have declared a non-inline 'friend' function! These provide trap
door capability where a programmer can masquerade their own inline
version of the friend function. This works in many environments
because the illegal duplicate inline functions don't get caught by
the linker.
Friend/member templates and explicit template specializations:
they provide a trap door mechanism which another programmer can take
advantage of in order to get to private data members and
functions.
Defensive Programming
There were a few sessions on 'defensive programming' of which I
caught a couple. It was pleased to find that MacApp employs almost
all of the defensive techniques mentioned. There were four major
categories discussed:
- 1. Coding approaches to avoid.
- 2. Habits to get into.
- 3. Platform and target specification.
- 4. Preventive and diagnostic instrumentation.
The last category, preventive and diagnostic instrumentation is
worthy of being further broken down:
- 4a. Compile time assertions and expectations
- 4b. Runtime assertions and expectations
- 4c. Memory management and inventorying
- 4d. Runtime analysis and tuning
Here's some summary notes about these categories.
1. Coding approaches to avoid.
- Complex language features.
- Automatically generated interfaces. (A default ctor or copy
ctor could cause problems if you didn't expect one to be there.
See 4a.)
- Spreading conditionalizations based on platform or target
throughout the code. (It should be managed in a central place. See
3.)
- Conversion operators. (Use explicit conversion function unless
"it just makes sense." Besides, they're pretty hard to search for
in your favorite source code editor.)
- Don't make 'workarounds' conditional - just put them in and
explain why it's done that way.
- NEVER ignore compiler warnings!
2. Habits to get into.
- Follow the standards, use the standard language features,
don't roll your own.
- Use encapsulation.
- Always initialize. Variables of built-in type are _not_
initialized to zero. Pointers are _not_ initialized to NULL. bools
are _not_ initialized to false. Only class instances are
initialized but even then only so far as their constructors do any
initialization. (Assignment is NOT initialization!)
- Use '#if' instead of '#ifdef' and '#ifndef'. (A forgotten
condition will be diagnosed this way.)
3. Platform and target specification.
- Use classes and specialization wherever possible.
- Centralize platform specializations into its their
headers.
- Centralize target specializations into it's their
headers.
4. Preventive and diagnostic instrumentation.
- "Writing more code now!" ensures "Writing less code
later!"
4a. Compile time assertions and expectations
- Use '#error' or '#pragma message' to diagnose mismatched
external switches.
- Use messages to identify incomplete code, code needing a
revisit, etc.
- Prevent access to illegal automatic functions such as ctors.
(They can be made private for debug and eliminated for non-debug.
This way the compiler will catch an unexpected reference during
debug builds but there will be no overhead for non-debug.)
4b. Runtime assertions and expectations
- Validate pre- and post-conditions and invariants.
- Test logical constraints.
- Test for expectations. (Non-fatal assertion.)
- All of this should compile out for non-debug builds.
- Logic should not change between debug and non-debug.
- Check for what you expect to be unreachable code or
unimplemented code.
- Add trace points.
4c. Memory management and inventorying
This essentially matched TidyHeap with the exception that he
advised more tools for analyzing the usage statistics. For example,
analyzing memory allocations and deallocations against time or code
location with a 3D graphing tool has been very useful for the
speaker.
One thing which his test harness does is to detect mismatches
between vector/scalar allocation and the subsequent deallocation.
4d. Runtime analysis and tuning
And this is essentially a profiling tool. His solution required
source code modification. I prefer a profiling tool integrated with
the environment so that source code does not have to be modified.
Efficiency/Performance
Which of the following two sets of code is more efficient?
#1: Point3D p1, p2;
...
Point3D p3 = p1 + p2;
#2: Point3D p1, p2, p3;
...
p3 = p1 + p2;
#1 is more efficient because the assignment to p3 involves only
initialization. #2 requires both initialization and
assignment. The moral is that it's better to use initialization for
assignment if possible.
Is the following efficient?
...
Matrix m1, m2;
if (!arrayMat)
return 0;
if (cacheItem(arrayMat))
return 1;
// Okay, do the real work here.
...
No. m1 and m2 get constructed before the optimization checks are
performed. Then they get destructed in three possible places where
there are returns. But they are never used before two of those
returns. Suppose this is a particularly effective caching technique
and 90% of the time one of these two returns is taken -- then you're
getting potentially lengthy construction and destruction operations
that are wasted that 90% of the time. The moral is that it is better
to postpone the declaration of your variables until they are
needed.
Which of the following is more efficient?
#1: Point3D p1;
while (mumble) {
p1 = GetSomePoint();
...
}
#2: while (mumble) {
Point3D p1 = GetSomePoint();
...
}
The more efficient approach is #2 because it removes the need to
create and construct a temporary which is then assigned to p1 and
then destructed. In #2 p1 is simply initialized with the result of
GetSomePoint().
Lippman, in his tests, was able to improve the run time of his
test harness from 31.37 seconds to 11.88 seconds simply by removing
unnecessary copy constructors, assignment operators and destructors!
And all this time we've been told to always include these functions
even if they are empty! Lippman says that the compiler is far better
at understanding what needs to be done in most circumstances. (BTW,
the best he could do in improving the efficiency of his test harness
was 6.94 seconds.)
Resource Management
One topic in which I am quite interested is resource management.
Consider that any of the following are 'resources': memory, reference
counts, critical sections of code, file handles, and objects. In
other words, a resource is anything that can be acquired and then
released.
A couple of the talks I attended discussed this issue in detail
while almost all other talks at least mentioned the issue. Of
particular interest was one talk by Bartosz Milewski where he
introduced the concept of a StrongPointer.
A strong pointer is similar to a smart pointer in behavior except
that it adds value semantics. With a strong pointer ownership is
always transferred on copy. auto_ptr is somewhat heavier in
implementation and has some confusing use of const.
In any case, whether we were to use smart pointers or strong
pointers, there are places in MacApp where it would be appropriate to
employ better resource management techniques.
Writing Exception-safe C++
Herb Sutter gave an excellent, two hour talk about writing
exception-safe C++. Unfortunately, I was only able to hear the last
20 minutes of his talk because I went to another (real yawner) talk.
Fortunately, I got more out of that 20 minutes than I got out of the
entire other talk!
The twenty minutes of Herb's talk I heard boils down to
essentially the following recommendation:
- Make a copy of the object you are going to change.
- Make the change to the copy.
- Swap the copy with the original.
The essence of this is that anything that can fail does so
_before_ the final commitment has been made. So it's best to postpone
any actions which will change the original object prior to the point
in time when you know it's going to work.
Exceptions
When throwing an exception throw the exception you have created
rather than a base class of the exception otherwise the exception
gets sliced. For example, if 'derived' derives from the 'base'
exception class, consider the following:
derived except(...);
base* pBase = &except;
throw *pse;
In this case an exception of type 'base' has been thrown and it
has been 'sliced'.
And, of course, if your 'catch' doesn't specify the exception by
reference then the exception will get sliced there as well (if the
catch is receiving a 'base' by value but was sent a 'derived').
Exception class member functions, as well all know, should
never throw so it is wise to add a throw specification in the
definition of those functions by appending 'throw()'.
You knew, didn't you, that you can never call 'new' in any of an
exception class's member functions?
And the final word on exceptions is that any exception thrown
during the copy construction of the temporary created for a parameter
passed by value is never caught by a function try block of the
function being called.
Other Things
We need to continue the effort to make all of MacApp's data
members private or protected.
You can use either of the following for initialization:
int i = 0;
int j(0);
The second technique is used ubiquitously in templates if it
involves one of the template parameters because one never knows if
its going to be a class or a built-in type.
Bjarne Stroustrap -- C++ as a New Language
This was an excellent, albeit short, talk where Bjarne
demonstrated with several examples how C++ was indeed far more
effective as a programming language.
He started with the oft cited "Hello World!" sample. The specs of
the test were:
- Ask for the person's first name.
- Input the name.
- Print "Hello " followed by the person's first name and then an
exclamation mark.
Of course, the first C program was pretty straightforward. But
then we found out that if the person typed in too much the buffer
would overflow so some limit checking was added. Then we found out
that the exclamation mark wouldn't print in the right place unless we
added some code to handle the white space. And even when the input
buffer was expanded to handle "reasonably sized" names there was
still no protection if someone decided to type in the first chapter
of War and Peace.
Well, the end result was that in order to write a robust C program
dozens of lines of code had to be written, many seemingly unrelated
concepts had to be understood, and the resulting code was pretty much
unreadable; and there were still conditions that it couldn't
handle.
The C++ version of the program was only 10 lines of code of which
6 were scaffolding: 4 lines did the real work. And it was more
robust than the C version!
But then Bjarne asked the important question: "Well, this is fine
for this little problem but how well does it scale to real production
systems?"
So he gave another example: read an arbitrary set of numbers, sort
them, then output mean and median. In C you have to allocate the
buffer and manage its memory and then explain how qsort works.
In C++ one uses the standard library 'sort' algorithm with a
standard container class: there is no need to understand compare
functions, buffer pointers or memory management. Use of the standard
C++ utilities avoids the need to get into other subjects. And
C++'s sort runs about 7 times faster than qsort because the compare
function can be inlined!
Here's another example: given text of arbitrary length produce a
word count. It took three lines of C++ code (not counting code to
print the results). Printing adds 8 lines of code. At this point we
have finally had to introduce some new concepts because this solution
uses a template function to define the output stream operator. He
then showed a second solution that simplified the code a bit more but
introduced yet another new concept. The point was that one can get
into a productive environment much easier and then learn other
concepts that make one even more productive.
He claims that it is fundamentally easier to teach the new 'style'
of programming which C++ introduces and that productivity can be
quadrupled.
Summary
I had a great time! The immersion was successful! It was the best
conference I've been to since going to a Microdata get-together back
in 1979!
My only complaint was that some of the advertised 'advanced'
sessions weren't very advanced. In my feedback I indicated that my
attendance next year would depend on their having a truly 'advanced'
track.
I tried to speak with all of the great C++ers there (Lippman,
Lajoie, Meyers, Koenig, Milewski, Sutter, etc.) and at least see if
they had heard of MacApp. These are for the most part "Really Smart
People." With one exception they were all aware of MacApp and pretty
much knew its place in object oriented history. (And I got some
interesting gossip about the father of MacApp, Larry Tesler!) All
were pleased to hear that MacApp was being constantly improved.
I had two sessions with Stanley Lippman. By the second session he
was picking on me, in a friendly way, throughout the session. So
Apple and MacApp got some good exposure.
Please accept my apologies for taking so much of your time. It is
my hope that we will be able to take advantage of the new C++
Standard in MacApp.
Sincerely,
Mike R.
MacApp Engineering