一、源码安装munge

1、下载munge

下载地址:https://github.com/dun/munge/releases

2、安装编译

tar -Jxvf munge-0.5.15.tar.xz
./bootstrap
./configure --prefix=/usr/local/munge \
--sysconfdir=/usr/local/munge/etc \
--localstatedir=/usr/local/munge/local \
--with-runstatedir=/usr/local/munge/run \
--libdir=/usr/local/munge/lib64
make
make install

3、创建用户并修改目录权限

useradd -s /sbin/nologin -u 601 munge
sudo -u munge mkdir -p /usr/local/munge/run/munge
// sudo -u munge mkdir /usr/local/munge/var/munge /usr/local/munge/var/run
// sudo -u munge mkdir /usr/local/munge/var/run/munge
chown -R munge.munge /usr/local/munge/
chmod 700 /usr/local/munge/etc/
chmod 711 /usr/local/munge/local/
chmod 755 /usr/local/munge/run
chmod 711 /usr/local/munge/lib

4、创建munge.key文件

执行以下命令完成以后,在/usr/local/munge/etc/munge/下面会生成munge.key,需修改munge.key的权限

sudo -u munge /usr/local/munge/sbin/mungekey --verbose
chmod 600 /usr/local/munge/etc/munge/munge.key


【注意】:如果有多台服务器,需将服务端的munge.key发给客户端,客户端无需自己生成

5、生成链接文件并启动服务

ln -s /usr/local/munge/lib/systemd/system/munge.service /usr/lib/systemd/system/munge.service
(cp /usr/local/munge/lib/systemd/system/munge.service /usr/lib/systemd/system/)
systemctl daemon-reload
systemctl start munge
systemctl status munge

6、安装中会出现的问题

(1)configure报错

【解决方式】:apt -y install openssl-devel   openssl
这里采用符合GPL许可的Open SSL加密库,如果是源码编译的此库环境,编译时需要通过--with-crypto-lib选择指定
或者源码安装openssl后--with-openssl-prefix=/usr/local/openssl
(2)文件权限和所有者有问题

/usr/local的文件权限和所有者有问题

【解决方式】:修改/usr/local的文件权限和所有者

chown -R root.root /usr/local
chmod -R 755 /usr/local

二、源码安装slurm

apt-get install make hwloc libhwloc-dev libmunge-dev libmunge2 munge mariadb-server libmysalclient-dey -y

1、下载并解压安装包

下载地址:https://www.schedmd.com/downloads.php

tar -jxvf slurm-22.05.8.tar.bz2
// find . -name "config.guess" cp /usr/share/misc/config.* auxdir/
// cp /usr/share/libtool/build-aux/config.* .

2、编译安装

./configure --prefix=/usr/local/slurm \
--with-munge=/usr/local/munge \
sysconfdir=/usr/local/slurm/etc \
--localstatedir=/usr/local/slurm/local \
--runstatedir=/usr/local/slurm/run \
--libdir=/usr/local/slurm/lib64

查看vim config.log文件是否有错误


如果下面显示no,则需要重新./configure并指定,--with-mysql_config=/usr/bin

make -j
make install

3、配置数据库

// 生成slurm用户,以便该用户操作slurm_acct_db数据库,其密码是123456
create user 'slurm'@'localhost' identified by '123456';
// 生成账户数据库slurm_acct_db
create database slurm_acct_db;
// 赋予slurm从本机localhost采用密码123456登录具备操作slurm_acct_db数据下所有表的全部权限
grant all on slurm_acct_db.* TO 'slurm'@'localhost' identified by '123456' with grant option;
// 赋予slurm从system0采用密码123456登录具备操作slurm_acct_db数据下所有表的全部权限
grant all on slurm_acct_db.* TO 'slurm'@'system0' identified by '123456' with grant option;
// 生成作业信息数据库slurm_jobcomp_db
create database slurm_jobcomp_db;
// 赋予slurm从本机localhost采用密码123456登录具备操作slurm_jobcomp_db数据下所有表的全部权限
grant all on slurm_jobcomp_db.* TO 'slurm'@'localhost' identified by '123456' with grant option;
// 赋予slurm从system0采用密码123456登录具备操作slurm_jobcomp_db数据下所有表的全部权限
grant all on slurm_jobcomp_db.* TO 'slurm'@'system0' identified by '123456' with grant option;

4、编辑配置文件(示例配置文件在源码包中的etc下)

cp slurm.conf.example /usr/local/slurm/etc/slurm.conf
cp slurmdbd.conf.example /usr/local/slurm/etc/ slurmdbd.conf
cp cgroup.conf.example /usr/local/slurm/etc/cgroup.conf
chmod 600 slurmdbd.conf
cd /usr/local/slurm
mkdir run slurm log

【注意】:客户端只需要把服务端修改好的slurm.conf发过去即可,具体配置内容可在文末参考

5、配置环境变量

vim /etc/profile.d/slurm.sh
export PATH=$PATH:/usr/local/slurm/bin:/usr/local/slurm/sbin
export LD_LIBRARY_PATH=/usr/local/slurm/lib64:$LD_LIBRARY_PATH

6、启动服务(服务启动文件在源码包中的etc下)

// cp etc/slurmctld.service etc/slurmdbd.service etc/slurmd.service /etc/systemd/system/
cp etc/slurmctld.service etc/slurmdbd.service etc/slurmd.service /usr/lib/systemd/system/
systemctl daemon-reload
systemctl start slurmctld
systemctl start slurmd
systemctl start slurmdbd

【注意】:客户端只需要slurmd
正常情况下显示绿色的active状态;如果失败,则用下面命令查看错误日志

slurmctld -Dvvvvv
slurmdbd -Dvvvvv
slurmd -Dvvvvv

启动后如果节点状态是down,可用下面命令启动节点:

scontrol update nodename=sw01 state=idle

7、其它

重启slurmctld服务

systemctl restart slurmctld
scp -r /usr/local/slurm test10:/usr/local/
scp /etc/profile.d/slurm.sh test10:/etc/profile.d/
scp /etc/systemd/system/slurmd.service test10:/etc/systemd/system/

三、openssl源码安装

1、查看版本openssl version

2、下载相应版本openssl

下载地址:https://www.openssl.org/source/old/
tar -zxvf openssl-1.1.1s.tar.gz
./config --prefix=/usr/local/openssl
./config -t
make & make install

3、测试(/usr/local/openssl/bin/openssl version)

如果正确显示版本号,则安装成功。某些版本的操作系统会报下列错误
openssl: symbol lookup error: openssl: undefined symbol: EVP_mdc2, version OPENSSL_1_1_0

// 此时需要配置下系统库:
// echo “/usr/local/openssl/lib” >> /etc/ld.so.conf.d/libc.conf && ldconfig
// 最后将/usr/local/openssl/bin/openssl添加到系统路径
// ln -s /usr/local/openssl/bin/openssl /bin/openssl

4、切换openssl版本

// mv /usr/bin/openssl /usr/bin/openssl.bak
// mv /usr/include/openssl /usr/include/openssl.bak
// ln -s /usr/local/openssl/bin/openssl /usr/bin/openssl
// ln -s /usr/local/openssl/include/openssl /usr/include/openssl
// echo "/usr/local/openssl/lib" >> /etc/ld.so.conf ldconfig -v
// ln -s /usr/local/openssl/lib/libssl.so.1.1 /usr/lib64/libssl.so.1.1
// ln -s /usr/local/openssl/lib/libcrypto.so.1.1 /usr/lib64/libcrypto.so.1.1 // 【注意】:不能直接删除软链接
// 如需使用新版本开发,则需替换原来的软链接指向,即替换原动态库,进行版本升级。
// 替换/lib(lib64)和/usr/lib(lib64)和/usr/local/lib(lib64)存在的相应动态库:
// ln -sf /usr/local/openssl/lib/libssl.so.1.1 /usr/lib64/libssl.so
// ln -sf /usr/local/openssl/lib/libcrypto.so.1.1 /usr/lib64/libcrypto.so

四、直接安装munge

1、添加munge用户

groupadd -g 972 munge
useradd -g 972 -u 972 munge

2、下载munge

apt-get install munge -y

3、执行以下命令,创建munge.key文件:

create-munge-key

4、修改权限

执行完以后,在/etc/munge/下面会生成munge.key,需修改munge.key的权限以及所属用户,把所属用户改成munge(/etc和/usr应为root权限)

chown -R munge: /etc/munge/ /var/log/munge/ /var/lib/munge/ /var/run/munge/
chmod 400 /etc/munge/munge.key

ps -ef | grep munge
kill -9 16730

五、slurm配置文件

(1)slurm.conf配置文件

########################################################
# Configuration file for Slurm - 2021-08-20T10:27:23 #
########################################################
#
#
#
################################################
# CONTROL #
################################################
ClusterName=Sunway # 集群名
SlurmUser=root # 主节点管理账号
SlurmctldHost=sw01 # 主节点名
#SlurmctldHost=psn2 #备控制器的主机名
SlurmctldPort=6817
SlurmdPort=6818
SlurmdUser=root
#
################################################
# LOGGING & OTHER PATHS #
################################################
SlurmctldLogFile=/usr/local/slurm/log/slurmctld.log # 主节点log文件
SlurmdLogFile=/usr/local/slurm/log/slurmd.log # 子节点log文件
SlurmdPidFile=/usr/local/slurm/run/slurmd.pid # 子节点进程文件
SlurmdSpoolDir=/usr/local/slurm/slurm/d # 子节点状态文件夹
#SlurmSchedLogFile=
SlurmctldPidFile=/usr/local/slurm/run/slurmctld.pid # 主服务进程文件
StateSaveLocation=/usr/local/slurm/slurm/state # 主节点状态文件夹
#
################################################
# ACCOUNTING #
################################################
#AccountingStorageBackupHost=psn2 #slurmdbd备机
AccountingStorageEnforce=associations,limits,qos
AccountingStorageHost=sw01 # 主节点
AccountingStoragePort=6819
AccountingStorageType=accounting_storage/slurmdbd
#AccountingStorageUser=
#AccountingStoreJobComment=Yes
AcctGatherEnergyType=acct_gather_energy/none
AcctGatherFilesystemType=acct_gather_filesystem/none
AcctGatherInterconnectType=acct_gather_interconnect/none
AcctGatherNodeFreq=0
#AcctGatherProfileType=acct_gather_profile/none
ExtSensorsType=ext_sensors/none
ExtSensorsFreq=0
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
#
################################################
# SCHEDULING & ALLOCATION #
################################################
PreemptMode=OFF
PreemptType=preempt/none
PreemptExemptTime=00:00:00
PriorityType=priority/basic
#SchedulerParameters=
SchedulerTimeSlice=30
SchedulerType=sched/backfill
#SelectType=select/cons_tres
SelectType=select/linear
#SelectTypeParameters=CR_CPU
SlurmSchedLogLevel=0
#
################################################
# TOPOLOGY #
################################################
TopologyPlugin=topology/none
#
################################################
# TIMERS #
################################################
BatchStartTimeout=10
CompleteWait=0
EpilogMsgTime=2000
GetEnvTimeout=2
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=60
SlurmdTimeout=60
WaitTime=0
#
################################################
# POWER #
################################################
#ResumeProgram=
ResumeRate=300
ResumeTimeout=60
#SuspendExcNodes=
#SuspendExcParts=
#SuspendProgram=
SuspendRate=60
SuspendTime=NONE
SuspendTimeout=30
#
################################################
# DEBUG #
################################################
DebugFlags=NO_CONF_HASH
SlurmctldDebug=info
SlurmdDebug=info
#
################################################
# EPILOG & PROLOG #
################################################
#Epilog=/usr/local/etc/epilog
#Prolog=/usr/local/etc/prolog
#SrunEpilog=/usr/local/etc/srun_epilog
#SrunProlog=/usr/local/etc/srun_prolog
#TaskEpilog=/usr/local/etc/task_epilog
#TaskProlog=/usr/local/etc/task_prolog
#
################################################
# PROCESS TRACKING #
################################################
ProctrackType=proctrack/pgid
#
################################################
# RESOURCE CONFINEMENT #
################################################
#TaskPlugin=task/none
#TaskPlugin=task/affinity
#TaskPlugin=task/cgroup
#TaskPluginParam=
#
################################################
# OTHER #
################################################
#AccountingStorageExternalHost=
#AccountingStorageParameters=
AccountingStorageTRES=cpu,mem,energy,node,billing,fs/disk,vmem,pages
AllowSpecResourcesUsage=No
#AuthAltTypes=
#AuthAltParameters=
#AuthInfo=
AuthType=auth/munge
#BurstBufferType=
#CliFilterPlugins=
#CommunicationParameters=
CoreSpecPlugin=core_spec/none
#CpuFreqDef=
CpuFreqGovernors=Performance,OnDemand,UserSpace
CredType=cred/munge
#DefMemPerNode=
#DependencyParameters=
DisableRootJobs=No
EioTimeout=60
EnforcePartLimits=NO
#EpilogSlurmctld=
#FederationParameters=
FirstJobId=1
#GresTypes=
GpuFreqDef=high,memory=high
GroupUpdateForce=1
GroupUpdateTime=600
#HealthCheckInterval=0
#HealthCheckNodeState=ANY
#HealthCheckProgram=
InteractiveStepOptions=--interactive
#JobAcctGatherParams=
JobCompHost=localhost
JobCompLoc=/var/log/slurmjobcomp.log
JobCompPort=0
JobCompType=jobcomp/mysql
JobCompUser=slurm
JobCompPass=123456
JobContainerType=job_container/none
#JobCredentialPrivateKey=
#JobCredentialPublicCertificate=
#JobDefaults=
JobFileAppend=0
JobRequeue=1
#JobSubmitPlugins=
#KeepAliveTime=
KillOnBadExit=0
#LaunchParameters=
LaunchType=launch/slurm
#Licenses=
LogTimeFormat=iso8601_ms
#MailDomain=
#MailProg=/bin/mail
MaxArraySize=1001
MaxDBDMsgs=20012
MaxJobCount=10000 #最大的作业数
MaxJobId=67043328
MaxMemPerNode=UNLIMITED
MaxStepCount=40000
MaxTasksPerNode=512
MCSPlugin=mcs/none
#MCSParameters=
MessageTimeout=10
MpiDefault=pmi2 ##启用MPI
#MpiParams=
#NodeFeaturesPlugins=
OverTimeLimit=0
PluginDir=/usr/local/slurm/lib64/slurm
#PlugStackConfig=
#PowerParameters=
#PowerPlugin=
#PrEpParameters=
PrEpPlugins=prep/script
#PriorityParameters=
#PrioritySiteFactorParameters=
#PrioritySiteFactorPlugin=
PrivateData=none
#PrologEpilogTimeout=65534
#PrologSlurmctld=
#PrologFlags=
PropagatePrioProcess=0
PropagateResourceLimits=ALL
#PropagateResourceLimitsExcept=
#RebootProgram=
#ReconfigFlags=
#RequeueExit=
#RequeueExitHold=
#ResumeFailProgram=
#ResvEpilog=
ResvOverRun=0
#ResvProlog=
ReturnToService=0
RoutePlugin=route/default
#SbcastParameters=
#ScronParameters=
#SlurmctldAddr=
#SlurmctldSyslogDebug=
#SlurmctldPrimaryOffProg=
#SlurmctldPrimaryOnProg=
#SlurmctldParameters=
#SlurmdParameters=
#SlurmdSyslogDebug=
#SlurmctldPlugstack=
SrunPortRange=0-0
SwitchType=switch/none
TCPTimeout=2
TmpFS=/tmp
#TopologyParam=
TrackWCKey=No
TreeWidth=50
UsePam=No
#UnkillableStepProgram=
UnkillableStepTimeout=60
VSizeFactor=0
#X11Parameters=
#
################################################
# NODES #
################################################
#NodeName=Intel Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 RealMemory=480000
#NodeName=Dell Sockets=2 CoresPerSocket=24 ThreadsPerCore=1 RealMemory=100000
#NodeName=swa CPUS=16 CoresPerSocket=1 ThreadsPerCore=1 Sockets=16 RealMemory=48000 State=UNKNOWN
#NodeName=swb CPUS=64 CoresPerSocket=32 ThreadsPerCore=1 Sockets=2 RealMemory=100000 State=UNKNOWN
#NodeName=sw5a0[1-3] CPUS=4 Sockets=4 CoresPerSocket=1 ThreadsPerCore=1 RealMemory=1
NodeName=sw01 CPUs=1 Sockets=1 CoresPerSocket=1 ThreadsPerCore=1 State=UNKNOWN
#
################################################
# PARTITIONS #
################################################
#PartitionName=x86 AllowGroups=all MinNodes=0 Nodes=Dell Default=YES State=UP
#PartitionName=multicore AllowGroups=all MinNodes=0 Nodes=swa,swb,swc,swd State=UP
#PartitionName=manycore Default=YES AllowGroups=all MinNodes=0 Nodes=sw5a0[1-3] State=UP
PartitionName=Manycore AllowGroups=all MinNodes=0 Nodes=sw01 State=UP Default=YES

(2)slurmdbd.conf配置文件

#
# Example slurmdbd.conf file.
#
# See the slurmdbd.conf man page for more information.
#
# Archive info
#ArchiveJobs=yes
#ArchiveDir="/tmp"
#ArchiveSteps=yes
#ArchiveScript=
#JobPurge=12
#StepPurge=1
#
# Authentication info
AuthType=auth/munge
#AuthInfo=/var/run/munge/munge.socket.2
#
# slurmDBD info为启用slurmdbd的管理服务器,与slurm.conf中的AccountingStorageHost一致
DbdHost=sw01
#DbdBackupAddr=172.17.0.2
#DbdBackupHost=mn02
DbdPort=6819
SlurmUser=root
MessageTimeout=30
DebugLevel=7
#DefaultQOS=normal,standby
LogFile=/usr/local/slurm/log/slurmdbd.log
PidFile=/usr/local/slurm/run/slurmdbd.pid
#PluginDir=/usr/lib/slurm
#PrivateData=accounts,users,usage,jobs
PrivateData=jobs
#TrackWCKey=yes
#
# Database info
StorageType=accounting_storage/mysql
StorageHost=localhost
#StorageBackupHost=mn02
StoragePort=3306
StoragePass=123456
StorageUser=slurm
StorageLoc=slurm_acct_db
CommitDelay=1

  

源码安装slurm的更多相关文章

  1. mono-3.4.0 源码安装时出现的问题 [do-install] Error 2 [install-pcl-targets] Error 1 解决方法

    Mono 3.4修复了很多bug,继续加强稳定性和性能(其实Mono 3.2.8 已经很稳定,性能也很好了),但是从http://download.mono-project.com/sources/m ...

  2. 搭建LNAMP环境(七)- PHP7源码安装Memcached和Memcache拓展

    上一篇:搭建LNAMP环境(六)- PHP7源码安装MongoDB和MongoDB拓展 一.安装Memcached 1.yum安装libevent事件触发管理器 yum -y install libe ...

  3. 搭建LNAMP环境(二)- 源码安装Nginx1.10

    上一篇:搭建LNAMP环境(一)- 源码安装MySQL5.6 1.yum安装编译nginx需要的包 yum -y install pcre pcre-devel zlib zlib-devel ope ...

  4. 搭建LNAMP环境(一)- 源码安装MySQL5.6

    1.yum安装编译mysql需要的包 yum -y install gcc-c++ make cmake bison-devel ncurses-devel perl 2.为mysql创建一个新的用户 ...

  5. Greenplum 源码安装教程 —— 以 CentOS 平台为例

    Greenplum 源码安装教程 作者:Arthur_Qin 禾众 Greenplum 主体以及orca ( 新一代优化器 ) 的代码以可以从 Github 上下载.如果不打算查看代码,想下载编译好的 ...

  6. salt源码安装软件和yum安装软件

    上面简单列出了源码安装的sls文件书写思路. 涉及到一些固定的思路:如, 1,拷贝 解压安装时候需要依赖tar.gz存在 如果已安装则无需再次安装. 2,启动脚本 加入chk时候需要文件存在,如果已添 ...

  7. 搭建LNAMP环境(六)- PHP7源码安装MongoDB和MongoDB拓展

    上一篇:搭建LNAMP环境(五)- PHP7源码安装Redis和Redis拓展 一.安装MongoDB 1.创建mongodb用户组和用户 groupadd mongodb useradd -r -g ...

  8. 搭建LNAMP环境(三)- 源码安装Apache2.4

    上一篇:搭建LNAMP环境(二)- 源码安装Nginx1.10 1.yum安装编译apache需要的包(如果已经安装,可跳过此步骤) yum -y install pcre pcre-devel zl ...

  9. Linux MySQL源码安装缺少ncurses-devel包

    在Red Hat Enterprise Linux Server release 5.7 上用源码安装MySQL-5.6.23时,遇到了" remove CMakeCache.txt and ...

  10. centos 6x系统下源码安装mysql操作记录

    在运维工作中经常部署各种运维环境,涉及mysql数据库的安装也是时常需要的.mysql数据库安装可以选择yum在线安装,但是这种安装的mysql一般是系统自带的,版本方面可能跟需求不太匹配.可以通过源 ...

随机推荐

  1. Delphi书籍大全【阿里云盘】

    「marco cantu的Object Pascal Handbook」等文件 https://www.aliyundrive.com/s/sJtUo8ziUpV 提取码: 5tp6点击链接保存,或者 ...

  2. 攻防(一)tomcat CVE-2020-1938,ftp 21端口

    TOMCAT kali自带POE msf6 > use auxiliary/admin/http/tomcat_ghostcat set RHOST 10.98.xx.xx msf6 auxil ...

  3. SpringBoot 快速开启事务(附常见坑点)

    序言:此前,我们主要通过XML配置Spring来托管事务.在SpringBoot则非常简单,只需在业务层添加事务注解(@Transactional )即可快速开启事务.虽然事务很简单,但对于数据方面是 ...

  4. 如果遇到This QueryDict instance is immutable错误

    添加数据的时候,大家遇到"This QueryDict instance is immutable". 唯一的解决方法是request.data.copy()即可成功实现添加功能

  5. 总结Unity查找物体的几种方法

    Unity中经常需要查找对象,对应的API也有好几种,各自有不同的适用场合. 1. GameObject.Find 通过名字或路径查找游戏对象. GameObject.Find("GameO ...

  6. DNS CNAME limitations cname 在哪些情况下不能配置

    https://www.rfc-editor.org/rfc/rfc1912.html https://www.rfc-editor.org/rfc/rfc2181.html 说明: domain n ...

  7. vs2019配置boost库(转载)

    网址:https://blog.csdn.net/qq_42214953/article/details/105087015 关于途中的执行文件,可以使用b2.exe,不用跟着教程走. 如果本来就有b ...

  8. DASCTF NOV X联合出题人-PWN

    太忙了,下午4点才开始做,,剩下的以后补上 签个到 逻辑很简单两个功能的堆,一个就是申请heap.还有一个是检验如果校验通过就会得到flag 申请模块 ​ 中间0x886是个很恶心的东西,需要我们绕过 ...

  9. Ubuntu 20.24 安装Postgresql 14

      1.运行环境 WSL+Ubuntu 20.04   2.安装Postgresql 进入Linux命令行,参照Postgresql官网安装指南 # Create the file repositor ...

  10. JConsole连接远程Java进程

    1.Java进程启动新增如下参数 java -Djava.rmi.server.hostname=118.89.68.13 #远程服务器ip,即本机ip -Dcom.sun.management.jm ...