From: softlib.cs.rice.edu
Last mod: December 23, 1995
fastlink, version 3.0P

P4 Parallel Processing Library


This file describes installation of the P4 Parallel Processing Library in conjunction with parallel FASTLINK. See README for an overview of all the documentation. See README.parallel for an overview of how FASTLINK can run in parallel.

P4, Background

P4 is a portable, parallel processing library developed at Argonne National Laboratory. P4 supports both shared-memory and message-passing models of parallel computation. Shared-memory programs can be run on advanced parallel supercomputers, and single, shared-memory multiprocessors. Additionally, P4 can simulate shared-memory multiprocessing on uniprocessors that support SYSV IPC (UNIX System V, Inter Process Communication).

The P4 version of FASTLINK utilizes the shared-memory model. It has been successfully run on Sun multiprocessor workstations running Solaris, DEC Alphas running OSF, and SGI Challenges running IRIX. It will likely run with minor modifications on other platforms supported by P4. We are eager to provide portability assistance if you can provide us access to your shared-memory multiprocessor. Please note that while P4 message-passing programs can also be run on networks of workstations, P4 FASTLINK will not. If you are interested in running parallel FASTLINK on a network of UNIX workstations, please refer to the file README.parallel for more information.

FTP Instructions

The P4 distribution can be obtained by anonymous ftp from info.mcs.anl.gov in the directory pub/p4. The current version of p4 is 1.4 (even though the README in the ftp directory says it's 1.3). Here are specific instructions for obtaining p4.
  ftp info.mcs.anl.gov
Login as "anonymous", and use your full e-mail address as password.
  cd pub/p4
Remember to use binary mode for transfer. To retrieve the distribution:
  bin
  get p4-1.4.tar.Z

Building P4

The distribution is a compressed tar file. To unpack it, use:
  uncompress p4-1.4.tar.Z
  tar xvf p4-1.4
or
  zcat p4-1.4.tar.Z | tar xvf -
Then, go to the directory created.
  cd p4-1.4
The distribution contains all the source code, installation instructions, a complete reference manual, and a number of sample programs. Refer to the README in the directory doc for details about the documentation.

To build P4, you will need to specify the specific architecture of your machine. Type:

  make P4ARCH=< machine >
where is one of the machines listed in the file util/machines. If you are not quite sure which machine to choose, the file doc/p4.txt enumerates these names more specifically. For example, to build P4 on a Sun workstation running Solaris, you would type:
  make P4ARCH=SUN_SOLARIS
The default makefile uses the cc compiler. If you wish to use another compiler, you can edit the file util/defs.all. Look for the section corresponding to your architecture, and change the variables CC and CLINKER. If you have trouble compiling P4, please consult your system administrator for assistance.

To install, type:

  make install INSTALLDIR=< dir >
where < dir > is the directory where you want P4 to be installed. The directory you choose here will be the P4_HOME_DIR that you will need to supply in the FASTLINK Makefile.

Building P4, Using SYSV_IPC

Depending on your system, P4 should be built with or without SYSV_IPC (Unix System V Interprocess Communication). The file OPTIONS in the P4 distribution directory contains the line:
  /* #define SYSV_IPC */
Since this line is commented out, by default P4 is always built *without* SYSV_IPC. If your system requires SYSV_IPC, you should change the above line to:
  #define SYSV_IPC
This change, of course, must be made *before* compiling P4.

The P4 documentation states that in certain cases, you can build P4 with *or* without SYSV_IPC. From our experience, however, FASTLINK has specific requirements for each system that in each case mandate only one of these options. In general, with the exception of SOLARIS, it seems that shared-memory multiprocessors require SYSV_IPC in order to run FASTLINK properly. Refer to the following sections on building P4 for SOLARIS, IRIX, and OSF for details.

A final note on SYSV_IPC: due to the way this protocol allocates shared data, if a P4 run crashes before completion, some of the data may not be properly de-allocated. If a run crashes, you should use the "ipcs" utility to determine if there is any unclaimed data. The command:

  ipcs -c
will tell you if you have any stray message queues, shared memory, or semaphores lying around. You can use the "ipcrm" command to remove any, should they exist. Consult your local documentation (man pages) for details.

Building P4 for SOLARIS

P4 should compile pretty much "out of the box" for SOLARIS. Note that for SOLARIS, P4 must be build *without* SYSV_IPC, although this should be done automatically for you. The make variable P4ARCH should be defined as SUN_SOLARIS.

On some systems, make might crash while building in the directory "alog", due to a header file that can have different names on different systems. Since FASTLINK does not make use of the alog package, you should just be able to restart make (using the same P4ARCH flag as before), and it should proceed smoothly until the end.

Make sure to read the notes in the section "Hooking FASTLINK and P4 Together" below that relate to SOLARIS.

Building P4 for IRIX

IRIX is an implementation of UNIX for SGI workstations. P4 will run under IRIX, but our experience has been that some simple modifications must be made to the source code in order to get it to compile properly.

First, note that for IRIX, P4 must be built *with* SYSV_IPC. Refer to the section "Building P4, Using SYSV_IPC" above for details. The make variable P4ARCH should be defined as SGI.

Next, from our experience, the linking flag -lsun automatically included in the Makefile did not work. You can disable this by editing the file defs.all in the P4 util directory. Look for the line:

  #   BEGIN SGI
in defs.all. Eight lines below this, you will see the line:
  MDEP_LIBS = -lsun

Comment this line out, by changing it to:

  #MDEP_LIBS = -lsun
Make sure that the line you comment out comes before the line:
  #   END SGI
Finally, you must modify the source code for compilation. There are two ways you can do this:

First, if you have access to the unix "patch" utility, you can use the file p4_patchfile.IRIX found in the directory 3.0P/irix of the FASTLINK distribution to make these changes. First copy this file into the P4 distribution directory p4-1.4/lib. Then cd to this directory. Within the p4-1.4/lib directory, type:

  patch < p4_patchfile.IRIX
In the event that you don't have patch available, you can make the modifications by hand. The files you need to change are in the lib directory in the P4 distribution. They are p4_secure.c, and p4_sock_cr.c. Only three lines of code need to be changed.
  1. In p4_secure.c, chage line 336 from:
      char *getpw(host, name)
    
    to:
      char *p4_getpw(host, name)
    

  2. In p4_sock_cr.c, change line 149 from:
      char *getpw();
    
    to:
      char *p4_getpw();
    

  3. Also in p4_sock_cr.c, change line 182 from:
      rc = start_slave(host, username, pgm, serv_port, am_slave_c, getpw);
    
    to:
      rc = start_slave(host, username, pgm, serv_port, am_slave_c, p4_getpw);
    
In either case, you should then continue with the compilation process as described above.

Make sure to read the notes in the section "Hooking FASTLINK and P4 Together" below that relate to IRIX.

Building P4 for OSF

OSF is an implementation of UNIX used on most DEC Alpha workstations. Thanks to Garret Taylor at DEC Ireland for the following information on compiling P4 for OSF.

For OSF, P4 must be built *with* SYSV_IPC. Refer to the section "Building P4, Using SYSV_IPC" above for details. The make variable P4ARCH should be defined as ALPHA.

When running P4 FASTLINK, you may have to increase the maximum allowed size of a shared-memory block to run the code. This can be done by editing the /etc/sysconfigtab file and adding the following entry:

  ipc:
          shm-max = 16777216  (or whatever size you need, in bytes)
and rebooting.

Hooking FASTLINK and P4 Together

Please refer to the file README.install, README.makefile and README.parallel for general details on building FASTLINK.

When building the P4 version of FASTLINK, you will need to examine the following variables in the FASTLINK Makefile:

PARLIB
This variable must be set to "-DIS_P4=1", which tells FASTLINK to use it's P4-specific code. Uncomment the line in the Makefile that reads:
    PARLIB = -DIS_P4=1
Make sure that the other definition of PARLIB is commented out.

PARINCLPATH
This variable tells make where to find the P4 include files. Uncomment the line in the Makefile that reads:
    PARINCLPATH = -I$(P4_INCLDIR)
Again, make sure that the other definition of PARINCLPATH is commented out.

P4_HOME_DIR
Set this variable to the directory you specified when installing as above. For example, if you installed with /usr/lib/p4-1.4 as the INSTALLDIR, set P4_HOME_DIR to:
    P4_HOME_DIR = /usr/lib/p4-1.4

P4_MDEP_LD SYSDEP
If you are running Solaris, you must uncomment the line:
    P4_MDEP_LD = -lsocket -lnsl -lthread
as well as the line:
    SYSDEP = -DSOLARIS
in order to successfully compile FASTLINK. If your compiler complains about "undefined symbol"s, or about "EXIT_FAILURE redefined", it is likely you have forgotten to uncomment one of these lines.
After installing P4, and setting these variables, you may proceed to building FASTLINK as described in the files README.install, README.makefile, and README.parallel. The target P4 executables are ilink.p4, mlink.p4, and linkmap.p4. They can be built individually with:
  make ilink.p4
  make linkmap.p4
  make mlink.p4
respectively. Alternately, you can build all three with:
  make installp4
These will put the corresponding executables wherever the BINDIR flag (described in README.makefile) points to.

Running P4 FASTLINK

You will typically want to make a soft link between the regular name without the .p4 extension in your data directory and the actual executable. e.g.:.
  ln -s ../bin/ilink.p4 ilink
There are some compilation flags you may want to set to prepare for a run. See README.Makefile or the Makefile itself for instructions.

Running P4 FASTLINK, Specifying number of processors

One small modification is needed either in the command line (if you call ilink, linkmap, or mlink directly) or in the lcp-produced shell script. At the line where the main program is invoked, the string:
  -n 
must be appended, where is the number of processors you wish to run on. If you do not specify the number of processors, FASTLINK will default to a 4 processor run. For example, when running LINKMAP on 8 processors,
  linkmap
becomes
  linkmap -n 8
When modifying lcp-produced scripts be careful that the first occurrence of the string ilink, linkmap, or mlink is a parameter to lsp, and therefore should not be altered. It is the second occurrence which is actually the call to the program, where the flag must be set.

Running P4 FASTLINK, Specifying maxworkingset

You may also specify a value for the variable maxworkingset, which represents the maximum number of people active during the analysis (sometimes known as maximum cutset). In FASTLINK 3.0P, maxworkingset is estimated automatically at runtime. In some pedigrees with loops, the estimate is unnecessarily high, so you may wish to override the estimate with a different value.

If you use the automatic estimate of maxworkingset and the code complains that this estimate is too low, you have hit a bug and should report it (see README.bugreport). However, you can still use the -w flag to work around the bug, while I fix it.

For example, to run ILINK with maxworkingset defined to 40, you would type:

  ilink -w 40
The error message you would encounter if maxworkingset is too low will report what the current value is. You may try incrementally larger values until the run succeeds.

Running P4 FASTLINK, Specifying memory usage

Additionally, P4 allows you to specify the amount of global memory to allocate with the flag
  -p4gm < nbytes >
where < nbytes > is the number of bytes of global memory to allocate. This parameter is really measured in bytes, not any larger units.

FASTLINK has been tuned to calculate in advance the amount of memory it will need, and to request the proper amount for you. In general, it will request a bit more memory than is actually needed, just to be safe. If, for some reason, a run fails because it does not have sufficient shared memory, or if your machine refuses to grant the amount FASTLINK specifies, you can override FASTLINK's calculation and specify the amount of global memory you would like with this flag. You can use the -m flag (described in the next section) to see how much memory FASTLINK is requesting, and try manually to request a bit less.

Running P4 FASTLINK, Analyzing memory usage

Memory usage becomes important when using P4 FASTLINK, because P4 allocates all of the shared memory it will need for the entire run at the beginning of the run. Depending on your system configuration, you may not be able to complete arbitrarily large (in terms of memory usage) runs. As described in README.makefile, there are different levels of memory usage for FASTLINK. A sample run of LINKMAP with the command:
  linkmap -n 2 -m
yielded:
  LINKMAP is currently compiled with PRECOMPUTE=1.
  Shared memory usage for this run will be as follows:

          memory usage          total request
          calculated            (including fudge factor)
          ----------------------------------------------
          7441589     bytes     9301986     bytes

  Recompiling with PRECOMPUTE=0 would yield:

          memory usage          total request
          calculated            (including fudge factor)
          ----------------------------------------------
          6360069     bytes     7950086     bytes

  Please refer to the README.makefile and README.p4 for details.
As was mentioned above in the section on "Specifying memory usage", FASTLINK calculates how much memory a run is likely to require, and then adds in a little extra (the fudge factor). You can see from this message both the original amount calculated, and the amount of the final request. You can use the amount calculated as a lower bound on reasonable requests with -p4gm.

You can also see from this message the difference in memory usage between compiling with PRECOMPUTE=0 and PRECOMPUTE=1. This can be useful when determining whether or not you can expect to compute a given run on your system.

As with the -n flag above, -w, -p4gm, and -m would need to be added to your lcp-produced script if you use them in conjunction with a script.

Parallel FASTLINK, Running times

Due to the way theta evaluations are done in parallel, unlike the sequential version, output will not appear on the screen after each theta is complete. However, you will see periodic reports of execution times for each group of thetas as they are evaluated. A sample run might show:
  Execution time (!parallelThetas) =   0.099
  Execution time (parallelThetas) for 1 =   0.052
  Execution time (parallelThetas) for 2 =   0.896
  Execution time (parallelThetas) for 3 =   0.096
  Execution time (!parallelThetas) =   0.086
  Elapsed time: 1.28 seconds
Each of the "Execution time" statements will appear one at a time as computation progresses. The difference between parallelThetas and !parallelThetas has to do with whether all processors are working together on a single theta, or whether they are working independently on different thetas. The "Elapsed time" statement shows the total execution time for the entire run.

P4, Troubleshooting

Unfortunately, some of the error messages that P4 produces are not very descriptive. Many times, the particular message produced under similar circumstances can differ from system to system. As a result, it is generally difficult to determine what the problem really is.

In general, error messages produced by P4 will look like:

  p4_error: 
where is the error being reported, or:
  p0_xxxxx:  p4_error: 
where xxxxx is the pid (process id) of the FASTLINK program. Most any other error message you see is likely a FASTLINK error message.

If you encounter a P4 error message near the beginning of a run (ie, before the first theta evaluation is complete), and the program quits, it is likely that you do not have enough shared memory available for this run. Likewise, if you encounter a FASTLINK error message with the phrase "MALLOC ERROR", you need more memory. See the above section on "Analyzing memory usage" for possible remedies.

Another common problem is getting different results for a run on 1 processor, and the same run on more than 1 processor. If this happens to you, it is likely that you have not defined SYSV_IPC properly for your system. Please refer to the section above on "Using SYSV_IPC" for details.

If you encounter an error that you feel is not of the flavor described above, please send us as precise a description of the problem as possible (including output, etc.) so we can try to help discern the cause of the problem. Please refer to README.bugreport for guidlines on submitting a bug report.

P4, Misc

If you have any other problems compiling or building P4, please refer to the documentation included in the P4 distribution, or ask your system administrator for help. Specific problems or questions may be directed to p4@mcs.anl.gov.
back to fastlink