Rocks cluster笔记——Rocks安装的一些常见问题

常见问题和命令
1.永久关闭防火墙: rocks run host   “chkconfig iptables off”
2.增加环境变量: 全局变量 加入到  /etc/profile
当前用户变量加入到  ~/.bashrc
3.设置系统时间
设置系统时间
date -s 20071215
date -s 15:35
如果要同时更改BIOS时间
在执行    clock -w

所有节点安装完成后:

4. ssh 其他节点时:
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
Warning: No xauth data; using fake authentication data for X11 forwarding.
解决办法:修改 /etc/ssh/ssh_config 文件,在最后加入
ForwardX11Trusted yes (加入各个节点,并将头结点的秘钥文件拷过来)
scp /root/.ssh/*  compute:/root/.ssh/
然后退出执行 rocks sync config

4.节点重新安装
如果集群中的节点机需要重新安装,可以在这个节点机上运行:
/boot/kickstart/cluster-kickstart
来重装系统。或者可以在Frontend节点机上运行:
rocks run host ‘/boot/kickstart/cluster-kickstart’
来重新安装所有的compute节点机。
如果想重装集群中所有的compute节点机,并在重装完以后让这些节点机继续执行由于重装而中断的计算任务,可以通过SGE控制来实现,运行:
/opt/gridengine/examples/jobs/sge-reinstall.sh
5. How do I remove a compute node from the cluster?
On your frontend end, execute:
# rocks remove host “[your compute node name]”
For example, if the compute node’s name is compute-0-1, you’d execute
# rocks remove host compute-0-1
# rocks sync config
The compute node has been removed from the cluster.
6. How do I export a new directory from the frontend to all the compute nodes that is accessible under /home?
Execute this procedure:
• Add the directory you want to export to the file /etc/exports.
For example, if you want to export the directory /export/disk1, add the following to /etc/exports:
/export/disk1 10.0.0.0/255.0.0.0(rw)
• Restart NFS:
# /etc/rc.d/init.d/nfs restart
• Add an entry to /etc/auto.home.
For example, say you want /export/disk1 on the frontend machine (named frontend-0) to be mounted as
/home/scratch on each compute node.
Add the following entry to /etc/auto.home:
scratch frontend-0:/export/disk1
• Inform 411 of the change:
make -C /var/411
Now when you login to any compute node and change your directory to /home/scratch, it will be automounted.
7. 注意:在每次运行完rocks的一些命令修改了数据库配置信息后,比如删除compute节点机,都要再运行:  rocks sync config
来将更新后的数据库信息写入到节点机的系统配置文件中,否则在运行其他管理命令时会遇到一些莫名的错误。

8 VASP 任务提交
1) (周健)名称: vasp.sh
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash

mpirun -r ssh -f $TMPDIR/machines -n $NSLOTS /home/software/vasp/vasp
蓝色部分每个作业脚本必写。
Entries which start with #$ will be treated as SGE options.
• -cwd  means to execute the job for the current working directory.
• -j y means to merge the standard error stream into the standard output stream instead of having two separate error and output streams.
• -S /bin/bash specifies the interpreting shell for this job to be the Bash shell.
-np $NSLOTS 表明使用多少个处理器核心进行计算,后面跟着计算软件路径。
提交时: qsub -pe mpich 4 vasp.sh
2)
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -pe mpich 16
(可加 expor=$PATH:路径)
mpirun -r ssh -f $TMPDIR/machines -n $NSLOTS /home/software/vasp/vasp
(MPI_DIR=/opt/mpich/gnu
$MPI_DIR/bin/mpirun -np $NSLOTS -machinefile $TMP/machines  ./cpi

蓝色部分每个作业脚本必写。
#$ -pe mpich 16   指定脚本的并行环境为mpich,同时申请了16个处理器核心来进行运算。其它
根据各个应用程序不同做相应更改。
提交时: qsub  vasp.sh (或 ./vasp.sh)
4)执行 qstat 查看作业执行状态
说明,作业执行状态 qw 作业处于等待状态,r 运行状态。Slots 显示的是当前作业时
几个处理器核心在运算。

9 软件安装
修改组名: group -n  新组名 旧组名
修改用户属组: usermod -g  组名 用户名
Usermod  -l  新用户名 旧用户名
Usermod  -d    登录目录  用户名
Userdel   -r   用户名
Groupadd  cluster
10  添加用户
(当不存在 cluster组时)
Adduser  -g root   mu
Adduser  -g root  soft
Passwd mu
Rocks sync users
make -C /var/411/   force
Rocks sync config
默认情况下,新建用户mu建立/export/home/mu目录,此目录是被其他计算节点共享的,对应/home/mu (包括头节点,软件可装在/export/home/mu/soft/下)。

2)  Root下建立用户 softe   useradd soft
3) Root下删除其密码  passwd  -d  soft
Chmod a+rwx /export/home/soft
同步账户  rocks sync users
发布密码的信息  make -C /var/411 force
2) 使用XFTP 将程序考到soft 下
使用root用户copy /export/home/soft/src 下
然后更改属主 chown -v soft:soft 文件名或目录
(用户名:用户组)
3)  rocks run host compute-0-0 command=”hostname”
rocks run host n  “reboot”
Run the command ’ls /tmp/’ on all n nodes.

11. ERROR: unable to send message to qmaster using port 536 on host “cluster.local”: got send error
Luca Clementi luca.clementi at gmail.com
Wed Sep 12 19:34:35 PDT 2012
• Previous message: [Rocks-Discuss] ERROR: unable to send message to qmaster using port 536 on host “cluster.local”: got send error
• Next message: [Rocks-Discuss] ERROR: unable to send message to qmaster using port 536 on host “cluster.local”: got send error
• Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
________________________________________
On Wed, Sep 12, 2012 at 4:42 PM, 杨燚 <yang_yi at neusoft.com> wrote:
> I just delete some users and stop some services. and then the rocks sync
> config doesn’t work any more
>
>
>
> [root at cluster ~]# rocks sync config
>
> error: commlib error: got select error (Connection refused)
>
> ERROR: unable to send message to qmaster using port 536 on host
> “cluster.local”: got send error
>
>

I would think it’s an sge problem.
Can you restart it from the init script?
/etc/init.d/sgemaster.zhaoming start
/etc/init.d/sgeexecd.sten start
Luca

12.Problems with X11 forwarding and qlogin
Anoop Rajendra anoop.rajendra at gmail.com
Fri Oct 9 17:19:37 PDT 2009
• Previous message: [Rocks-Discuss] Problems with X11 forwarding and qlogin
• Next message: [Rocks-Discuss] Problems with X11 forwarding and qlogin
• Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
________________________________________
On your frontend, add the line
ForwardX11Trusted       yes
to your /etc/ssh/ssh_config
and let us know if that solves your problem.

注:在rockscluster中所有节点的/etc/ssh/ssh_config都要加入这条语句

13 给所有节点安装软件

见rocks cluster6.2版手册5.1. Adding Packages to Compute Nodes
在主节点frontend上yum装完软件后,安装包在/var/cache/yum下,将该目录下所有安装包拷到/export/rocks/install/contrib/6.2/arch/RPMS 下,然后按如下操作,便可将将主节点安装的软件安装到所有子节点
Put the package you want to add in:
/export/rocks/install/contrib/6.2/arch/RPMS
Where arch is your architecture (“i386” or “x86_64”).
Create a new XML configuration file that will extend the current compute.xml configuration file:
# cd /export/rocks/install/site-profiles/6.2/nodes
# cp skeleton.xml extend-compute.xml
If you use extend-compute.xml your packages will be installed only on your computed nodes. If you
want your packages to be installed on all other appliances (e.g. login nodes, nas nodes, etc.) you should use
extend-base.xml instead of extend-compute.xml.
Inside extend-compute.xml, add the package name by changing the section from:
<package> <!– insert your package name here –> </package>
to:
<package> your package </package>

<package>rsh-server </package>

It is important that you enter the base name of the package in extend-compute.xml and not the full
name.
For example, if the package you are adding is named XFree86-100dpi-fonts-4.2.0-6.47.i386.rpm, input
XFree86-100dpi-fonts as the package name in extend-compute.xml.
<package>XFree86-100dpi-fonts</package>
If you have multiple packages you’d like to add, you’ll need a separate <package> tag for each. For example, to
add both the 100 and 75 dpi fonts, the following lines should be in extend-compute.xml:
<package>XFree86-100dpi-fonts</package>
<package>XFree86-75dpi-fonts</package>
Also, make sure that you remove any package lines which do not have a package in them. For example, the file
should NOT contain any lines such as:
36
Chapter 5. Customizing your Rocks Installation
<package> <!– insert your package name here –> </package>
Now build a new Rocks distribution. This will bind the new package into a RedHat compatible distribution in the
directory /export/rocks/install/rocks-dist/….
# cd /export/rocks/install
# rocks create distro
Now, reinstall your compute nodes.

14 网络重装reinstall所有计算节点
After your frontend completes its installation, the last step is to force a re-installation of all of your compute
nodes. The following will force a PXE (network install) reboot of all your compute nodes.
# ssh-agent $SHELL
# ssh-add
# rocks run host compute ’/boot/kickstart/cluster-kickstart-pxe’

15.  [Rocks-Discuss] installing rsh in rocks cluster 5.3
请参考上一节:给所有节点安装软件
Go Yoshimura go-yoshimura at sstc.co.jp
Mon May 17 23:41:14 PDT 2010
• Previous message: [Rocks-Discuss] installing rsh in rocks cluster 5.3
• Next message: [Rocks-Discuss] SGE Not reporting CPU Cores Correctly
• Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
________________________________________
Hi Leo!
base-rsh.xml
– We are not sure about base-rsh.xml but you can create it by hand.
– Perhaps, /export/rocks/install/rocks-dist/x86_64/build/nodes/rsh.xml may be the answer(I’m not sure).
[root at panrocks53 nodes]# cat /export/rocks/install/rocks-dist/x86_64/build/nodes/rsh.xml  | grep package
<package>rsh</package>
<package>rsh-server</package>
– We usually install rsh-server,telnet-server, vsftpd  with specifying like
<package>rsh-server </package>
<package>telnet-server </package>
<package>vsftpd </package>
in a node file.
– About node file and graph file, http://www.rocksclusters.org/rocksapalooza/2009/customizing.pdf is helpful.
RPM
– We pick up RPMs from CentOS5.4 iso file.
thank you
go
—–
Leo P. wrote:
>Hi everyone,
>
>I am trying to install rsh in rocks cluster 5.3. I tried using the old way specified here
>
>http://www.rocksclusters.org/rocks-documentation/4.2/customization-rsh.html
>
>But i can find the base-rsh.xml and RPM in the repository.
>
>So can anyone please tell me how i can install rsh in rocks cluster 5.3.
>
>I need rsh to run an old software and can not use ssh instead 🙂
>
>
>Leo
>
>————– next part ————–
>An HTML attachment was scrubbed…
>URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20100518/edebcc46/attachment.html
>

—-
Go Yoshimura <go-yoshimura at sstc.co.jp>
Scalable Systems Co., Ltd.  <http://www.sstc.co.jp/>
Osaka Office            HONMACHI-COLLABO Bldg. 4F, 4-4-2 Kita-kyuhoji-machi, Chuo-ku, Osaka 541-0057 Japan
Tel: 81-6-6224-4115
Tokyo Kojimachi Office  BUREX Kojimachi 11F, 3-5-2 Kojimachi, Chiyoda-ku, Tokyo 102-0083 Japan
Tel: 81-3-5875-4718 Fax: 81-3-3237-7612

16. 关于分区
export 链接到 硬盘剩余空间
share目录下新建链接apps文件夹 ,链接到 export下的apps文件夹

17. 制作frontend的iso文件并升级节点
见rockscluster6.2手册3.4. Upgrade or Reconfigure Your Existing Frontend
This procedure describes how to use a Restore Roll to upgrade or reconfigure your existing Rocks cluster.
Let’s create a Restore Roll for your frontend. This roll will contain site-specific info that will be used to quickly
reconfigure your frontend (see the section below for details).
# cd /export/site-roll/rocks/src/roll/restore
# make roll
The above command will output a roll ISO image that has the name of the form:
hostname-restore-date-0.arch.disk1.iso. For example, on the i386-based frontend with the FQDN of
rocks-45.sdsc.edu, the roll will be named like:
rocks-45.sdsc.edu-restore-2006.07.24-0.i386.disk1.iso
Burn your restore roll ISO image to a CD.
Reinstall the frontend by putting the Rocks Boot CD in the CD tray (generally, this is the Kernel/Boot Roll) and
reboot the frontend.
22
Chapter 3. Installing a Rocks Cluster
At the boot: prompt type:
build
At this point, the installation follows the same steps as a normal frontend installation (See the section: Install
Frontend) — with two exceptions:
1. On the first user-input screen (the screen that asks for ’local’ and ’network’ rolls), be sure to supply the
Restore Roll that you just created.
2. You will be forced to manually partition your frontend’s root disk.
You must reformat your / partition, your /var partition and your /boot partition (if it exists).
Also, be sure to assign the mountpoint of /export to the partition that contains the users’ home areas.
Do NOT erase or format this partition, or you will lose the user home directories. Generally, this is the
largest partition on the first disk.
After your frontend completes its installation, the last step is to force a re-installation of all of your compute
nodes. The following will force a PXE (network install) reboot of all your compute nodes.
# ssh-agent $SHELL
# ssh-add
# rocks run host compute ’/boot/kickstart/cluster-kickstart-pxe’

发表评论

电子邮件地址不会被公开。 必填项已用*标注