On 2003-11-28T15:43:00,
Kashif Shaikh <kshaikh at consensys.com> said:
Hi Kashif, I'd suggest to move the heartbeat development related thread
to the linux-ha-dev list. As that's the bulk of this mail except for the
SGI license question, I'm cc'ing linux-ha-dev directly.
Post by Kashif ShaikhPost by Lars Marowsky-BreeSGI owns most of the files and would have to reevaluate the copyright
decisions.
Maybe someone from SGI can comment here? Otherwise someone(me?) could just
strip out GPL symbols from the LGPL header files.
I'm not sure whether that would be helpful for anyone, actually.
What is your intention?
Post by Kashif ShaikhPost by Lars Marowsky-BreeHowever, if you are comparing FailSafe to heartbeat, I can tell you that
as of today, it's evaluating a space ship to a car ;-) FailSafe is much
further advanced; it's not a reasonable comparison. The two benefits
heartbeat has offer FailSafe is less complexity and tighter security.
Yes, FailSafe tries to be a space-ship, but ends up being the
kitchen-sink and too over-engineered.
You are preaching to the choir ;-) Yes, FailSafe is much too complex in
my opinion too. If I was thinking differently, I'd try to revive it,
instead of taking the good ideas from it and reimplementing them (and
hopefully not too many of the problems). I just was involved in porting
it, not in writing it. And as of today, I still don't grasp all the
interactions.
The design papers I have read were all very good. Alas, during
implementation some of the nice and clean separation into modules has
been compromised. And it's way too difficult to debug because of the
thread model and the complexity of the subcomponents. (And the lack of
good documentation on them.)
Post by Kashif ShaikhThe nicest things Failsafe has is the cms/gcs layer, and IMO the rest
is crap;
I would disagree with that. The CMS layer is nice, but the algorithm
used is pure horror - its N! complexity, and will easily saturate a
100mbit/s network and use substantial bandwidth on gigE, and that's
ignoring the latency issues it has. The problem is that it is mapping an
algorithm designed for a ring topology to a broadcast based network,
which does imply a certain overhead.
The other parts do have their good sides. The CDB is a very, very nice
concept (albeit the implementation sucks). The GUI is probably the
nicest GUI I've ever seen for a cluster, even though it is implemented
in Java.
Post by Kashif Shaikhthe cdb should be placed over the cms/gcs layer, but it's pointless
since cdb uses rpc for communication and will be difficult to use
gcs.
Yes. The CDB is a sucky implementation. The original papers called for
the cdb to run on top of cms/gcs, but I guess then they found out that
they needed info from the CDB to bootstrap the startup of cms/gcs, and
thus the layers blurred and complexity exploded.
The security model of the autodiscovery of cluster nodes is also
dubious to say the least; the networks need to be completely secure,
FailSafe is more than prone to attacks.
Post by Kashif ShaikhFSD/SRM is hell -- both trying to keep the same states and similar
logic results in many bugs due to 'assumptions' and what not.
Yup.
SRM in itself makes sense: Keeping of local resources, starting,
stopping them et al is a clearly separate piece from the rest. But
again, they were married too closely I think.
Post by Kashif ShaikhThe scary part is Heartbeat's proposed CRM/CIB/SRM/CRS looks very
similar to FailSafe's CHOAS layer(no offense, I know you wrote the crm
docs)...and that's when complexity will shoot through the roof
again...only good for the 'enterprise' market. The reason why heartbeat
gets adopted in a 'heartbeat' because its easy to configure with
down-to-earth configuration and only a single daemon to manage.
Well, the division into such components is just natural. Most more
powerful solutions have roughly the same structure; just as most every
cluster software has a membership layer at a low level, and some GUI on
top.
Rest assured that I do want to complexity as low as possible for several
reasons. One of them being that I got lost in FailSafe and don't want
that to happen again. This will be addressed by several measures, one of
them is better documentation of the software itself (FailSafe has
excellent docs, but only from the user perspective), and the other is of
course "KISS" - I have no other choice anyway, as we have an aggressive
schedule and too little budget as always and there will be little time
for getting too academic ;-)
However, moving the local resource book keeping out of heartbeat's core
is actually lowering complexity. Right now the resource book keeping in
heartbeat is a mess already. Adding any little feature is getting
dangerous. Alan has planned on this anyway, and that's a good thing to
attack. You can't tell me that you call the mess with combining
heartbeat + mon for resource monitoring a down-to-earth configuration!
That belongs 3ft _under_ the earth! ;-)
The "CRM" is also already there in heartbeat as it stands; however, it's
spread out over hb_resource.c, various other places and even shell
scripts. Moving that into a clear framework will, I believe, also help
make it more manageable. And note that the "CRM" is already going to the
most simple version: A master node is elected and resources are
coordinated from there.
The "Policy Engine" is meant to separate out the complexity of these
decisions. It will be clearly separated to faciliate testing so it can
be tried without an actual cluster. This will allow us to pinpoint
errors more quickly. It doesn't need to be all that fancy to start
with.
Now, the CIB _is_ adding complexity. I am not disagreeing. Let me try to
convince you that it is necessary complexity. If you look at the
postings to the mailing lists, a substantial number of bugs are due to
configuration mismatches between the two nodes, and users being unsure
about how to change the configuration online. This alone needs
addressing, but I'm very afraid of how often users would get it wrong
with even 3 or 4 nodes. And there are reasonably sane setups where 3 or
4 nodes make quite a bit of sense.
The CRM now could just always use the 'local' configuration on the
elected master node; that already would prevent such configuration
mismatches. All the CIB does is add a small step to that: Figuring out
the most recent consistent configuration version, retrieving it from the
node, and replicating it. If it fails, hey, the worst what can happen
is that the cluster will revert to a slightly older configuration.
Little harm done, and _much_ better than operating on different
configuration versions at once.
That's all the CIB really is meant to be: Making sure the users screw up
less often. It's really reasonably straightforward. And it _will_ run on
top of the messaging/membership layers, so that can of worms from
FailSafe will stay closed.
The crm.txt gets a little bit fancy by calling the CIB a distributed
database with weak transactional semantics, I think I had some very good
chocolate when I wrote that ;-)
Another design goal I absolutely intend to stick with is 'security'.
heartbeat's messaging & membership layers are pretty secure, and
building on top of these will give that to the 'bigger' solution too.
We obviously also will reuse the rather good and helpful IPC layer.
All in all I agree that the heartbeat-Next Generation will likely be
more complex than heartbeat right now; definetely internally. But by
dividing it into manageable pieces with separate test suites,
documentation etc, I hope to actually have it more manageable and more
easily customized than heartbeat is right now.
And of course, I will kick the implementation and design discussions out
into the public onto the linux-ha-dev lists, so that even in the future,
one will be able to figure out why something was done that way. That was
a pain with FailSafe, not knowing why it was done as it was. Typically
there have been very good reasons, but I didn't know, and couldn't find
out! This discussion for one is already an important part: Figuring out
whether the approach is worth it.
The new design allows for more features; proper support for replicated
resources for example, resource monitoring etc. We'll see.
Of course, the very important part is to keep the complexity for the
user down. I believe this will be possible, even though some paradigms
will change naturally, but I believe that they actually change for the
better. Again, documentation is a key role here.
(I _have_ noticed that FailSafe gets this wrong, don't worry. It's easy
to configure on the surface, but it has lots of hidden dependencies (the
most obvious one being how anal-retentive it is about /etc/hosts).)
I hope this addresses some of your concerns.
Next week I will have to defend^Wexplain my design to the project team
for implementing the CRM suite. If you see any big holes in it which
need addressing, or any other concerns, please do voice them. It is
highly appreciated!
Sincerely,
Lars Marowsky-Br?e <lmb at suse.de>
--
High Availability & Clustering \ ever tried. ever failed. no matter.
SUSE Labs | try again. fail again. fail better.
Research & Development, SUSE LINUX AG \ -- Samuel Beckett