MATLAB®Distributed ComputingServer™System Administrator’s GuideR2013b
1 Introduc tionMATLAB Distributed Computing Server Product DescriptionPerform MATLAB®and Simulin k®computations on clusters, clouds,and gridsMATLAB Di
3 Prod uct InstallationStep 3: Validate Cluster ProfileIn this step you valid ate your cluster profile, and thereby your installation.1 If it is not a
Configure for a Generic SchedulerNote If your validation fails any stage, contact the MathWorks installsupport team.If your validation passed, you n o
3 Prod uct Installation3-54
4Admin Center• “Start Admin Center” on page 4-2• “Set Up Resources” on page 4-3• “Test Connectivity” on page 4-11• “Export and Im po rt Sessions” o n
4 Admin C enterStar t Admin CenterAdmin Center is a graphical user interface with which y ou can control andmonitor the MATLAB Distributed Computing S
Set Up ResourcesSet Up ResourcesIn this section...“Add Hosts” on page 4-3“Start mdce Service” on page 4-4“Start an MJS” on page 4-5“Start Workers” on
4 Admin C enterStart mdce ServiceA host must be running the mdce service if an MJS or worker is to run on thathost. Normally, you set this up with Adm
Set Up ResourcesA dialog box leads you through the procedure of starting the mdce service onthe selected h osts. There are five steps to the procedure
4 Admin C enterIn the New MATLAB Job Scheduler dialog b ox, provide a name for the MJS,and s elect a host to run it on.Alternative methods for startin
Set Up ResourcesStart WorkersTo start MATLAB workers, click Start in the Workers module.In the Start Workers dialog box, s pecify the numbers of worke
Product OverviewProduct OverviewIn this section...“Parallel Com puting Concepts” on page 1-3“Determining Product Installation and Versions” on page 1-
4 Admin C enterAlternative methods for starting workers include s electing the pull-downWorkers > Start, or right-click in g a li sted host or MJS
Set Up ResourcesTo get more info rmation on any host, MJS, or worker listed in A dm in Center,right-click its name in the display and select Propertie
4 Admin C enterMove a WorkerTo move a worker from one host to another, you must completely shut it down,than start a new worker on the desired host:1
Test ConnectivityTest ConnectivityAdmin Center lets you test communications between your MJS node, workernodes, and the node where A dm in Center is r
4 Admin C enterWhen the tests are complete, the Running Tests dialog box automaticallycloses, and Admin C enter displays the test results in the Conne
Test ConnectivityTest that include failures or other results might look like the following figure.Double-click any of the symbols in the test results
4 Admin C enterExpor t and Impor t SessionsBy default, Admin C enter saves the cluster definition, process status, andtest results, so the next time t
Prepare for Cluster ProfilesPrepare for Cluster ProfilesAdmin Cente r does not create cluster profiles, but the inform ation displaye din Admin Center
4 Admin C enter4-16
5Control Scripts —Alphabetical List
1 Introduc tionMATLAB WorkerSchedulerMATLAB ClientParallelComputingToolboxMATLAB DistributedComputing ServerMATLAB WorkerMATLAB DistributedComputing S
admincenterPurpose Start Admin Center GUISyntax admincenterDescription admincenter opens the MATLAB Distributed Computing Server AdminCenter. When set
createSharedSecretPurpose Create shared secret for secure communicationSyntax createSharedSecretcreateSharedSecret -file <filename>Description c
mdcePurpose Install, start, stop, or uninstall mdce serviceSyntax mdce installmdce uninstallmdce startmdce stopmdce consolemdce restartmdce ... -mdced
mdcemdce stop stops running the m dce service. This automatically stops alljob m anagers and workers on the computer, but leaves their checkpointinfor
nodestatusPurpose Status of mdce processes running on nodeSyntax nodestatusnodestatus -flagsDescription nodestatus displays the status of the mdce ser
nodestatusFlagOperation-baseport <port_number>Specifies th e base port th at themdce service on the remote hostis using. You need to specify thi
remotecopyPurpose Copy file or folder to or from one or more remote hosts using transportprotocolSyntax remotecopy <flags><protocol options&g
remotecopyFlags and OptionsOperation-quietPrevent remotecopy from prompting formissing information. The command fails ifall required information is no
remotecopyRetrieve folders of the same name from two hosts to the local machine.(Enter command on a single line.)remotecopy -local C:\temp\log -from -
remotemdcePurpose Execute mdce command on on e or more remote hosts by transportprotocolSyntax remotemdce <mdce options><flags><protoco
Toolbox and Server ComponentsToolbox and Server ComponentsIn this section...“Schedulers, Workers, and Clients” on page 1-5“Third-Party Schedulers” on
remotemdceFlags and OptionsOperation-protocol <type>Force the usage of a particular protocoltype. Specifying a protocol type with all itsrequire
remotemdceStart mdce in a clean state on two UNIX operating system machinesfrom a W indow s operating syste m machine, using the ssh protocol.Enter th
star tjobmanagerPurpose Start job manager p rocessSyntax startjobmanagerstartjobmanager -flagsDescription startjobmanager starts a job manager process
star tjobmanagerFlagOperation-cleanDeletes all checkpoint information storedon disk from previous instance s of this jobmanager b efore starting . Thi
star tworkerPurpose Start MA TLAB w orker sessionSyntax startworkerstartworker -flagsDescription startworker starts a MATL AB worker process under the
startworkerFlagOperation-jobmanagerhost <job_manager_hostname>Specifies the host on which the jobmanager is running. The worker contactsthe job
star tworkerStart two workers, named worker 1 and w orke r2, on the hostWorkerHost, registering with the job manager MyJobManager that isrunning on th
stopjobmanagerPurpose Stop job manager processSyntax stopjobmanagerstopjobmanager -flagsDescription stopjobmanager stops a job manager that is running
stopjobmanagerFlagOperation-baseport <port_number>Specifies th e base port th at themdce service on the remote hostis using. You need to specify
stopworkerPurpose Stop MATLAB worker sessionSyntax stopworkerstopworker -flagsDescription stopworker stops a MATLAB worker process that is running und
1 Introduc tionWorkerSchedulerClientWorkerWorkerClientJobAll ResultsJobAll ResultsTaskResultsTaskResultsTaskResultsInteractions of Parallel Computing
stopworkerFlagOperation-baseport <port_number>Specifies th e base port th at themdce service on the remote hostis using. You need to specify thi
GlossaryGlossaryCHECKPOINTBASEThenameoftheparameterinthemdce_def file that defines the locationof the checkpoint directories for the MATLAB job schedu
Glossarydistributed applicationThe same application that runs independently o n several nodes, possiblywith different input parameters. There is no co
Glossaryhomogeneous clusterA cluster of identical machines, in terms of both hardware and software.independent jobA job compose d of independent tasks
Glossarymdce_d ef fileThe file that defines all the defaults for the mdce processes by allowingyou to set preferences or definitions in the form of pa
Glossaryspmd (single program multiple data)A block of code that ex ecutes simultaneously on multiple w orke rs ina parallel pool. Each worker can oper
GlossaryGlossary-6
IndexIndexAadmincenter control script 5-2administrationnetwork 2-1Ccheckpoint folderlocating 2-18clean statestarting services 2-16clientprocess 1-5con
IndexRremotecopy control script 5-8remotemdce control script 5-11requirements 2-3Sschedulerthird-party 1-6security 2-4startjobmanager control script 5
Toolbox a nd Server Componentsscheduler, PBS Pro scheduler, TORQUE schedu ler, m p iexec, or a genericscheduler.Choosing Between a Scheduler and MJSYo
1 Introduc tion• Who administers your cluster?The person administering your cluster might have a preference for howjobs are scheduled.Components on Mi
Using Parallel Computing Toolbox™ SoftwareUsing Parallel Computing Toolbox SoftwareA typical Parallel Computing Toolbox client s ession includes the f
1 Introduc tion1-10
2Network AdministrationThis chapter provides information useful for network administration ofParallel Comp u t in g T o ol bo x sof twa re and MATL AB
How to Contact MathWorkswww.mathworks.comWebcomp.soft-sys.matlab Newsgroupwww.mathworks.com/contact_TS.html Technical [email protected] Pro
2 Netw ork AdministrationPrepare for Parallel ComputingIn this section...“Plan Your Network Layout” on page 2-2“Network Requirements” on page 2-3“Full
Prepare for Parallel Computingrunning on all machines that run job manager sessions or workers that areregistered with a job manager. (The mdce servic
2 Netw ork AdministrationSecurity ConsiderationsThe parallel computing products do not provide any security measures.Therefore, be aware of the follow
Install and ConfigureInstall and ConfigureTo find the most up-to-date instructions for installing and configuringthe current or past versions of the p
2 Netw ork AdministrationUse Different MPI Builds on UNIX SystemsIn this section...“Build MPI” on page 2-6“Use Your MPI Build” on page 2-6Build MPITo
Use Different MPI Builds on UNIX®Systems1 Test your build by running the mpiexec executable. The build should beready to test if itsbin/mpiexec and li
2 Netw ork Administrationany), together. Set the configuration’s MpiexecFil eNam e property to/opt/mpich2/mpich2-1.4.1p1/bin/mpiexec.• If you are usin
Shut Down a Job Manager ClusterShut Down a Job Manager ClusterIn this section...“UNIX and Macintosh Operating Systems” on page 2-9“Microsoft Windows O
2 Netw ork AdministrationIfyouhavemorethanoneworkersessionrunning,youcanstopeachofthem individually by host and name.stopworker -name worker1 -remoteh
Shut Down a Job Manager ClusterMicrosoft Windows Operating SystemsStop the Job Manager and WorkersEnter the commands of this section at the prompt in
Revision HistoryNovember 2005 Online only New for Version 2.0 (Release 14SP3+)December 2005 Online only Revised for V ersion 2.0 (Release 14SP3+)March
2 Netw ork Administrationservice while leaving the machine on, enter the following commands a t aDOS com m and prompt:cd matlabroot\toolbox\distcomp\b
Custom Startup ParametersCustom Star tup ParametersIn this section...“Define Script Defaults” on page 2-13“Override Script Defaults” on page 2-15The M
2 Netw ork AdministrationNote If you want to run more than one job manager on the same machine,they must all have unique nam es. Spe cify the names us
Custom Startup ParametersPrivilegePurposeLocal Security SettingsPolicySeServiceLogonRightRequired to log on using theservice logon type.Log on as a se
2 Netw ork AdministrationAlternatively, you can make a copy of this file, modify the copy, and specifythat this copy be used for the default parameter
Access Service Record FilesAccess Serv ice R ecord FilesIn this section...“Locate Log Files” on page 2-17“Locate Checkpoint F olders” on page 2-18The
2 Netw ork AdministrationLocate Checkpoint FoldersCheckpoint folders contain information related to persistence data, whichthe server services use to
Set MJS Cluster SecuritySet MJS Cluster SecurityIn this section...“Set the Security Level” on page 2-19“Local, MJS, and Network Passwords” on page 2-2
2 Netw ork AdministrationSecurityLevelDescription User Requi re ments• Tasks run as the user who started themdce process on the worker machines(typica
Set MJS Cluster SecuritySecurityLevelDescription User Requi re mentsyour system/network user name andpassword, because the worker mustlog you in to ru
2 Netw ork AdministrationYou must also provide a value for the SHARED_SECRE T_FILE parameter in themdce_def file, identifying where the file can be fo
Troubleshoot Common ProblemsTroubleshoot Common ProblemsIn this section...“License Errors” on page 2-23“Memory Errors on UNIX Operating Systems” on pa
2 Netw ork Administration• If you receive this error w hen starting a worker with MATLAB DistributedComputing Server software:- You may be calling the
Troubleshoot Common Problems- If you installed only the Parallel Computing T oolbox product, and youare attempting to run a worker on the same machine
2 Netw ork AdministrationWith Third-Party SchedulerBefore the worker processes start, you can control the range of ports used bythe workers for commun
Troubleshoot Common ProblemsEphemeral TCP Ports with Job ManagerIf you use the jobmanager on a cluster of nodes running Windows operatingsystems, you
2 Netw ork AdministrationWith Command-Line InterfaceFirst, be sure that the machines in question agree on their IP resolutions. TheIP address for a pa
Troubleshoot Common ProblemsVerify Multicast CommunicationsNote Multicast is required on the head node running the MATLAB jobscheduler (MJS) and on th
2 Netw ork AdministrationThe following example shows how to use the Java class inside MATLAB.Start MATLA B on two machines (e.g.,host1name and h ost2
3Product Installation• “Install Products and Choose Cluster Configuration” on page 3-2• “ConfigureforanMJS”onpage3-5• “Configure for HPC Server” on pa
ContentsIntroduction1MATLAB Distributed Computing Server ProductDescription... 1-2Key Features...
3 Prod uct InstallationInstall Products and Choose Cluster ConfigurationIn this section...“Cluster Descriptio n ” on page 3 -2“Install Products” on pa
Install Products and Choose Cluster ConfigurationMDCS ClusterClient Node PCTProduct Installations on Client NodesInstall ProductsOn the Cluster Node
3 Prod uct InstallationConfigure Your ClusterWhen the c luster an d client insta l lations are complete, you can proceed toconfigure the products for
Configure for an MJSConfigure for an MJSIn this section...“Configure Cluster to Use a MATLAB Job Scheduler (MJS)” on page 3-5“Configure Windows Firew
3 Prod uct InstallationStep 1: Set Up Windows Cluster HostsIf this is the first installation of MATLAB Distributed C omputing Serveron a cluster of W
Configure for an MJSmatlabroot\toolbox\distcomp\bin\mdce_def.bat2 Find the line for setting the MDCEUSER parameter, and p rovide a value inthe f ormdo
3 Prod uct Installationcd oldmatlabroot\toolbox\distcomp\bin3 Sto p and uninstall th e old mdce service and remove its associated files b ytyping the
Configure for an MJSUsing A d min Center GUI.Note To use Admin Center, you must run it on a computer that hasdirect network connectivity to all the n
3 Prod uct Installationb Click Add or Find.The Add or Find Hosts dialog box opens.c Select Enter H ostnam es , then list your hosts in the text box. Y
Configure for an MJSKeep the check to start mdce service.d Click OK to open the Start mdce service dialog box. Proceed through thesteps clicking Next
Use Your MPI Build ... 2-6Shut Down a Job Manager Cluster... 2-9UNIX and Macintosh Operating Systems...
3 Prod uct InstallationIt might take a moment for Admin Center to communicate with all thenodes, start the services, and acquire the status of all of
Configure for an MJSIf any of the connectivity tests fail, double-click the icon that indicates afailure to get in formation about tha t sp ecif ic te
3 Prod uct Installationa T o start an MJS (job m an a ge r), c lick Start in the MJS module. (Th is isone of several ways to open the New MJS dialog b
Configure for an MJSe Click OK to start the workers and return to the Admin Center dialogbox. It might take a moment for Admin Center to initialize al
3 Prod uct InstallationIf you encounter any problems or failures, contact the MathWorks installsupport team.For more information about Admin Center fu
Configure for an MJSCommand Window,andselectRun as A dministrator.Thisoptionis available only if you are running User Account Control (UAC).ii If you
3 Prod uct Installation2 Start the MJSTo start the MATLAB job scheduler (MJS), enter the following comm andsin a DOS command window. You do not have t
Configure for an MJScd matlabroot\toolbox\distcomp\binb Start the workers on each node, using the text for <MyMJS> that identifiesthe name of th
3 Prod uct Installationindicate protoco l, platform (such as in a mixed environment), or othe rinformation, see the help forremotemdce by typing./remo
Configure for an MJScd matlabroot/toolbox/distcomp/binb Start the workers on each node, using the text for <MyMJS> that identifiesthe name of th
Configure C luster to Use a MATLAB Job Scheduler(MJS)... 3-5Configure Windows Firewalls on Client...
3 Prod uct InstallationDebian, Fedora Platforms. On each cluster node, register the mdce serviceas a known service and configure it to start automatic
Configure for an MJS4 L ook in /etc/initt ab for the default run level. Create a link in the rcfolder associated with that run level. For example, if
3 Prod uct Installationsudo ln -s matlabroot/toolbox/distcomp/bin/mdce /usr/sb in/m dce3 Copy the launchd .plist file for m dce to /Library/LaunchDa e
Configure for an MJS1 On the client computer where Parallel Computing Toolbox is installed,openaDOScommandwindow(forWindowssoftware)orashell(forUNIXso
3 Prod uct Installation5 Click Done to sa ve your cluster profile.Step 3: Validate the Cluster ProfileIn this step you valid ate your cluster profile,
Configure for an MJSNote If your validation does not pass, contact the MathWorks install supportteam.If your validation passed, you now have a val id
3 Prod uct InstallationConfigure for HPC ServerIn this section...“Configure Cluster for Microsoft Windows HPC Server” on page 3-28“Configure Client Co
Configure for HP C ServerNote If you need to override the script default values, modify thevalues defined inMicrosoftHPCServerSetup.xml before running
3 Prod uct InstallationNote Ifyouneedtooverridethedefaultvaluesthescript,modifythe values defined inMicrosoftHPCServerSetup.xml before runningMicrosof
Configure for HP C Serverb Set the NumWorkers field to the number of w orkers y ou want to runthe validation t ests o n, within the limitation o f you
Test Connectivity ... 4-11Export and Im port Sessions... 4-14Prepare for Cluster Profiles...
3 Prod uct Installation5 Click Done to sa ve your cluster profile.Step 2: Validate the ConfigurationIn this step you valid ate your cluster profile, a
Configure for HP C ServerNote If your validation does not pass, contact the MathW orks install supportteam.If your validation passed, you n ow have a
3 Prod uct InstallationConfigure for PBS Pro, Platform LSF, TORQUEIn this section...“Configure Platform LSF Scheduler on Windows Cluster” on p ag e 3-
Configure for PBS Pro, Platform LSF, T ORQUETo use mpiexec to distribute a job, the smpd service must be running on allnodes that will be used for run
3 Prod uct Installation4 If you are using Windows firewalls on your cluster nodes, execute thefollowing in a DOS command window.matlabroot\toolbox\dis
Configure for PBS Pro, Platform LSF, T ORQUEshared installation), e xecute the following comm and in a DOS commandwindow.matlabroot\bin\matlab.bat -in
3 Prod uct Installation1 Start the Cluster Profile Manager from the MA TLAB desktop by selectingon the Home tab in the Environment area Parallel >
Configure for PBS Pro, Platform LSF, T ORQUE5 Click Done to sa ve your cluster profile.Step 2: Validate the Cluster ProfileIn this step you verify you
3 Prod uct InstallationNote If your validation does not pass, contact the MathW orks install supportteam.If your validation passed, you n ow have a va
Configure for a Generic SchedulerConfigure for a Generic SchedulerIn this section...“Interfacing with Gene ric Schedulers” on page 3-42“Configure Gene
1Introduction• “MATLAB®Distributed Computing Server™ Product Description” onpage 1-2• “Product Overview” on page 1-3• “Toolbox and Server Components”
3 Prod uct InstallationInterfacing with Generic Schedulers• “Support Scripts” on page 3-42• “Submission Mode” on page 3-42Support ScriptsTo support us
Configure for a Generic SchedulerBefore using the support scripts, decide which submission mode describesyour particular network setup.Configure Gener
3 Prod uct Installation2 Start smpd by typing in a DOS command window one of the following,as appropriate:matlabroot\bin\win32\smpd -installormatlabro
Configure for a Generic Scheduler8 Repeat all these steps on all Window s nodes in your cluster.Using Passwordless Delegation1 Log in as a user with a
3 Prod uct InstallationConfigure Sun Grid Engine on Linux ClusterTo run communicating jobs with MATLAB Distributed Computing Serverand Sun™ Grid Engin
Configure for a Generic Schedulerqconf -mq all.qThis will bring up a text editor for you to make changes: search for the linepe_list,andaddmatlab.5 En
3 Prod uct InstallationNote The remainder of this chapter illustrates only the case of using LSF ina nonshared file sy stem. For other schedulers or a
Configure for a Generic SchedulerIn this type of configuration, job data is copied from the client host runninga Windows operating system to a host on
3 Prod uct Installation2 Start the Cluster Profile Manager from the MA TLAB desktop by selectingParallel > Manage Cluster Profiles.3 Create a new p
Configure for a Generic Schedulerg Set the OperatingSystem to the operating system of your clusterworker machines.h Set HasSharedFilesystem to false,
Commenti su questo manuale