No subject

Discussion:

No subject

b***@does.not.exist.com

2013-06-04 18:16:57 UTC

release, containing the minor revisions of each stable release.

At each minor revision, the specific branch will be tagged accordingly. The
development snapshots will also be tagged in this manner.

This allows access to each minor release of the Linux FailSafe code base for
comparison and review, and also allows us to checkin important bugfixes to
older stable releases.

3.1. Branch and tag naming conventions

The stable branch names are of the format "STABLE_n", where n corresponds to
the major revision number of the stable release.

The minor revision tag names will be named "STABLE_x_y" (STABLE_2_0,
STABLE_4_3 etc), where x corresponds to the major revision number and y to the
minor revision number.

The development snapshots will be named "DEVEL_x_y", where x will be odd and
correspond to the major revision number, and y to the minor revision number
accordingly.

Since the development code lives in the main line of the CVS repository, there
are no branches for the development code, so no "DEVEL_x" branches exists.

4. Read only access to the CVS repository.

Read only access to the CVS repository is available via anonymous CVS to
everybody. Please see http://oss.sgi.com/projects/failsafe/source.html for
details.

BE AWARE that the default code base you will get when checking out without
specifying any branch will be the latest _development_ snapshot. Please be
sure to specify the -r STABLE_2 (or appropriate) option to access the latest
stable code if that is what you want.

While we strive to make Linux FailSafe as reliable as possible, the Linux
FailSafe core team does not accept any legal liability for potential problems,
as stated by the (L)GPL under which Linux FailSafe is distributed. Please see
the LICENSE file in the distribution for details.

The FailSafe CVS repository makes use of CVS branches and tags to provide
access to not only the development branch, but also to every release we put
out. This is outlined below.

5. Write access to the CVS repository.

Write access to the CVS repository is restricted to members of the FailSafe
core team. The members of the core team are selected on a trust basis.

Please see the core team document for an explanation of how the core team
operates.

We welcome help and patches from everyone. The core team members will be happy
to review your patch and help you get it incorporated into the tree. If you
become an active contributor, we will approach you about joining the core
team. If you are an interested documenter or developer, please contribute!

Different branches of the CVS tree have different policies for checking in new
code. Please see the explanation below.

5.1. Checkin guidelines for the stable branches

Users expect that the stable branches are stable and do not lose their data.
The stable branches thus have stricter rules for code which goes into them.

i) The production branches should always be as stable as possible. Each stable
release has to be approved by the core team.

ii) At least one other core team member has to review each patch before
checkin to a stable branch.

iii) Every patch checked in must correspond to an entry in bugzilla
and must say so in the check-in message.

iv) Resolved bugs and isolated features from the development tree could
be migrated into this branch by core team consensus.

v) Care should be taken to also apply the bugfix to the development tree if
appropriate.

5.2. Checkin guidelines for the development code

The latest development code will contain new features and enhancements. To
faciliate rapid development, the checkin guidelines for the development tree
are considerably less strict.

i) New features should be checked into the development tree.

ii) Basic sanity testing: it should at least compile and if possible some unit
testing should be done.

iii) When in doubt, please run your patch by other core team members first.

iv) Every patch checked in must correspond to an entry in bugzilla and must
say so in the check-in message.

v) The development branch should always be a super-set of the production
branch. Please make sure that you also apply patches applied to a stable
branch to the development branch if and when appropriate.

vi) Major rewrites, which aren't yet fully done and may impact other parts of
the system, should be disabled via a compile time option if at all possible.
When in doubt, consult other core team members.

Sincerely,
Lars Marowsky-Br?e <lmb at suse.de>

--
Perfection is our goal, excellence will be tolerated. -- J. Yahl

b***@does.not.exist.com

2013-06-04 18:16:57 UTC

Permalink

break [n]
Exit from within a for, while, or until loop. If n
is specified, break n levels. n must be >= 1. If
n is greater than the number of enclosing loops,
all enclosing loops are exited. The return value
is 0 unless the shell is not executing a loop when
break is executed.

continue [n]
Resume the next iteration of the enclosing for,
while, or until loop. If n is specified, resume at
the nth enclosing loop. n must be >= 1. If n is
greater than the number of enclosing loops, the
last enclosing loop (the `top-level' loop) is
resumed. The return value is 0 unless the shell is
not executing a loop when continue is executed.

Regards,

-- Raju

scriptlib2 has a mainloop, somewhat along
for res in resources; do call process_resource $res action done
So, somewhere in process_resource or potentially even a few
calls deeper, I want to continue with the next resource.

Juri> Well, I'm not a programmer but I don't think you will find a
Juri> programming language where it is possible to jump directly
Juri> from some nested calls to the outside call (except with an
Juri> exception maybe). So IMO you have to exit each call as
Juri> quickly as possible and check the return value in the outer
Juri> call whether to continue or abort this call immediately,
Juri> giving back a return value that the outer call can check.

Juri> Does this make sense?

Juri> Juri

--
Raju Mathur raju at kandalaya.org http://kandalaya.org/

b***@does.not.exist.com

2013-06-04 18:16:57 UTC

Permalink

much further along, like having fixed the run away cdbd processes et al. He
said he already has a few productive deployments out there ;-) Are the changes
going to be ported soon?

Sincerely,
Lars Marowsky-Br?e <lmb at suse.de>

--
Perfection is our goal, excellence will be tolerated. -- J. Yahl

b***@does.not.exist.com

2013-06-04 18:16:57 UTC

Permalink

Pidof finds the process id's (pids) of the named programs.
It prints those id's on the standard output. This program
is on some systems used in run-level change scripts, espe?
cially when the system has a System-V like rc structure.
In that case these scripts are located in /etc/rc?.d,
where ? is the runlevel. If the system has a start-stop-
daemon (8) program that should be used instead.

OK.

What are the operations we need to perform?

One is the start/stop/daemon program...
http://www.linux.gr/cgi-bin/man2html/usr/share/man/man8/start-stop-daemon.8.gz

It looks like you can coerce it to give you status -- although it would be
nice if it could give it to you more directly...

In the optimal case, the scripts would even be portable between heartbeat and
FailSafe etc, since the operations they need to perform are very similiar. But
a common "resource script API" might be too much to ask for.

I think this would be wonderful! I also don't think it's even that
difficult.

The paradigm for these scripts is that they perform the following set of
actions for a given instance of a resource.
stop - make the resource unavailable
start - make the resource available
status - is the resource activated?
monitor - is the service provided by the resource instance actually
working?

There are several things which ought to be common for these scripts.

First, all the scripts should be Bourne shell scripts or at worst case bash
scripts...

A file to include via "." to provide common functions, environment
settings, etc.

Included in this environment are variables to define the location of
log files, pid files, and other OS-dependent things...
Including the path to the start-stop-daemon program ;-)

A function GetInstanceParameters which will give the script the parameters
for this instance of the given resource.
It will take at least these arguments:
--prefix - prefix all the parameters by the given string
--listparms: List all the parameter names for this rsc/inst
--stdout: provide parameters as name=value format, one per line
--assign: assign all the parameters to shell variables
The default is --assign.

Their behavior will conform to the LSB definition for start/stop/status
behavior.

As an aside, I've asked them to change the behavior of status because it is
ambiguous, particularly when extending the function to things like
"monitor". So, I hope they fix it ;-)

What else is needed?

-- Alan Robertson
alanr at unix.sh

b***@does.not.exist.com

2013-06-04 18:16:57 UTC

Permalink

from it, unless it has seriously changed in the last months. I remember there
was this imagination that UDP packets were reliable communication. Has this
changed?

An alternative would be drbd, which I intend to test later.

Joachim is working on a drbd plugin for FailSafe. Joachim, could you commit it
to CVS, so that people could start working on it?

Sincerely,
Lars Marowsky-Br?e <lmb at suse.de>

--
Perfection is our goal, excellence will be tolerated. -- J. Yahl

b***@does.not.exist.com

2013-06-04 18:16:57 UTC

Permalink

ones from SGI or recompile.

However, when i tried to start the FailSafe Admin CLI, i got the following
libreadline.so.4: cannot open shared object file: No such file or directory
What's wrong and what do i need to install to get libreadline.so.4?

You are missing a shared library ;-) You will need to install the readline
package; on my system, I am at readline 2.05.

Sincerely,
Lars Marowsky-Br?e <lmb at suse.de>

--
Perfection is our goal, excellence will be tolerated. -- J. Yahl

b***@does.not.exist.com

2013-06-04 18:16:57 UTC

Permalink

this. (i.e. XFS, xfsdump, and xfsrestore all claim to support ACLs). XFS also =
journals all metadata changes. (It does not journal data changes.)

If reiserfs, or some other filesytem is a possible candidate, please let me =
know.

BTW: ext2/ext3 do have a set of patches available to support ACLs, but my =
impression is that it is not yet widely supported enough that I would want to =
use it in production. (http://acl.bestbits.at)

Greg Freemyer
Internet Engineer
Deployment and Integration Specialist
The Norcross Group
www.NorcrossGroup.com

b***@does.not.exist.com

2013-06-04 18:16:57 UTC

Permalink

Basicly your name resolution should be okay. You must have 2 names for
each machine. (machine and machineHB).

This I do not understand; why two names? I have never done this.

Sincerely,
Lars Marowsky-Br?e <lmb at suse.de>

--
Perfection is our goal, excellence will be tolerated. -- J. Yahl

b***@does.not.exist.com

2013-06-04 18:16:57 UTC

Permalink

Bye, Martin

b***@does.not.exist.com

2013-06-04 18:16:57 UTC

Permalink

because all processes are starting up on the sec node, but the primary
node doesn't seem to get the heartbeats from the secondary node so it marks
it as "down", could I be right about this?

My configuration looks like this:
Node 1: hostname tore.perfekt
nodeID: 1
Contr Net Ipaddr: 192.168.2.133 //public n/w, address given by dhcp
Contr Net HB: true
Contr Net Control: true
Contr Net Priority: 2
Contr Net Ipaddr: 10.0.0.1 //private n/w
Contr Net HB: true
Contr Net Control: true
Contr Net Priority: 1

Node 2: hostname tyson.perfekt
nodeID: 1
Contr Net Ipaddr: 192.168.2.108 //public n/w , address given by dhcp
Contr Net HB: true
Contr Net Control: true
Contr Net Priority: 2
Contr Net Ipaddr: 10.0.0.3 //private n/w
Contr Net HB: true
Contr Net Control: true
Contr Net Priority: 1

And my cluster:
Cluster name: Perfekt
Cluster is failsafe: true
Cluster mode: normal

Cluster Perfekt has following two machine(s)
tyson
tore

I don't have any serial-connection between them but that shouldn't matter
, right?
I can ping my private network from the clusternodes but not from the
192-network, and that should be so, shouldn't it?
I don't know what to do now, so again I turn to you guys for some
advise in the matter.
Would be very pleased for some opinions and advises.

Sincerely,
Daniel

-------------------------------------------
Daniel Berg
V?ktargatan 44B nb
754 22 Uppsala
tel: 018-25 75 30
mob: 070-634 5556

--
__________________________________________________________
Sign-up for your own FREE Personalized E-mail at Mail.com
http://www.mail.com/?sr=signup

Save up to $160 by signing up for NetZero Platinum Internet service.
http://www.netzero.net/?refcd=N2P0602NEP8

b***@does.not.exist.com

2013-06-04 18:16:57 UTC

Permalink

cmgr> admin reset node HA2
Server error: no response from system controller at HA2
or a remote server

Failed to admin:
reset

admin command failed

If I enter the following command on cluster node "HA1"
node HA2 successfully restarts.

/usr/local/sbin/stonith -t rps10 -p "/dev/ttyS0 HA2 0" HA2

Any ideas on what I am doing wrong in my node configuration for
failsafe to have problems contacting the system controller?
Here is my node configuration, I followed the suggestions on
other posts for the syntax (rps10,/dev/ttyS0,0).

cmgr> show node HA2
Logical Machine Name: HA2
Hostname: HA2
Is FailSafe: true
Nodeid: 2
Reset type: powerCycle
System Controller: stonith
System Controller status: enabled
System Controller owner: HA1
System Controller owner device: rps10,/dev/ttyS0,0
System Controller owner type: tty
ControlNet Ipaddr: 172.27.5.23
ControlNet HB: true
ControlNet Control: true
ControlNet Priority: 2
ControlNet Ipaddr: 192.168.1.2
ControlNet HB: true
ControlNet Control: true
ControlNet Priority: 1

cmgr> show node HA1
Logical Machine Name: HA1
Hostname: HA1
Is FailSafe: true
Nodeid: 1
Reset type: powerCycle
System Controller: stonith
System Controller status: enabled
System Controller owner: HA2
System Controller owner device: rps10,/dev/ttyS0,HA2,0
System Controller owner type: tty
ControlNet Ipaddr: 192.168.1.1
ControlNet HB: true
ControlNet Control: true
ControlNet Priority: 1
ControlNet Ipaddr: 172.27.5.22
ControlNet HB: true
ControlNet Control: true
ControlNet Priority: 2

Any suggestions greatly appreciated, Thanks.

Tabitha

------------------------------------------------
Changed your e-mail? Keep your contacts! Use this free e-mail change of address service from Return Path. Register now!

b***@does.not.exist.com

2013-06-04 18:16:57 UTC

Permalink

and could not resolve tie membership. In other words, failsafe doesn't
know if db2 is really down or if this is a cluster partition. IF reset
was a success, RGs on db2 would have moved to db1.

In the interest of avoiding data corruption, RGs on db2(if any) are NOT
failed over. You should read the FailSafe sysadmin manual and get your
reset net properly configured/operational. Look at the source for cmsd
in failsafe for more info.

Kashif