Skip to main content

Administrator Operation Manual

The administrator role has an independent management interface for easy access to cluster users. Computing power, image. Manage all resources such as containers. For the administrator interface, you can see the entrance to the administrator interface in the drop-down menu in the upper right corner.

After entering, you can see the following interface:

Here, we will first introduce common management scenarios. Later, we will introduce the basic operations of administrators one by one.

User Role

HyperAI There are currently three user roles:

  1. Ordinary users
  2. Automated Modeling Administrator
  3. system administrator

among "Ordinary users" It is the default role. HyperAI All users in the system have this role; "Automated Modeling Administrator" Users with the ability to manage automatic modeling templates. For more information, please refer to Automatic modeling; "system administrator" Users with system management capabilities. Can manage all resources in the cluster.

HyperAI The system defaults to having an administrator account. The username is admin, The password is 123, Please change your password as soon as possible after the first login.

Modify user roles

In the administrator's "user" In the management page. You can see a list of all users. Click on any user to view their detailed information. Including basic user information. User's Resources. Parallel restrictions for users, etc. This also includes "User Role" information, click "to update" Users' roles can be modified.

User management

User browsing and filtering

After entering the administrator interface, you can see the user's list page. Displayed all users in the cluster. You can search according to the search box username email Or screen users by phone number.

Create User

Click on the button below the interface "Create a new user" Administrators can directly create new users. After successful creation, an email verification email will be automatically sent to remind users to verify their email and log in.

Manage user resources

Click on any user on the browsing page to view their key information, include "essential information" "User Resources" "User Parallel Restrictions" .

Viewing user resources from the user's perspective

On the user information page, you can see a "Switch to the current user's perspective" After clicking, the administrator will view the user's resources from their perspective. On the left side "Computing power container" "Dataset Management" Wait for the navigation link to default to the user's page. At this point, the administrator can easily view the user's related resources.

Click on the top prompt bar "sign out" Will return to the administrator's perspective again.

View and add user resources

stay "User resource usage status" You can partially see the user's current resource usage and balance. Including computing resources and storage resources.click "User Resource Settings" Can increase computing or storage resources for users. And choose the effective time of the added resources this time.

The default effective time for resources is "permanent" , You can also choose "one month" "three months" Such a deadline. This means that the resource is valid during this period. Effective use of computing resources during this period. If not exhausted during this period. After exceeding this time range, it can no longer be used; After the expiration date of the storage resource, the total storage capacity of the user will be reduced to the pre addition limit. For example, the default storage resource for users is 10GB. If adding a validity period of three months for users 50GB The storage. So during this period, users have 60GB The storage space; But after three months. The user's storage limit will be adjusted again to 10 GB.

View and update user resource restrictions

The user's resource restrictions include the following::

  1. The number of parallel containers of different computing power types
  2. Number of containers created. Contains public containers / Private container
  3. Number of data warehouses created. Includes public data warehouse / Private data warehouse
  4. Number of caches in workspace. It refers to the number of cache containers that can be used for quick startup
  5. Cache time of workspace. It refers to how long after the workspace is closed, the cache will be automatically deleted
  6. Upload resource restrictions. It refers to the upper limit of data that users can upload at once. Determined by user storage. Administrators are unable to make adjustments

Container management & Dataset Management

Administrators can view and browse all containers within the cluster, according to "state" and "type" screen. Can easily see the containers in different states in the current cluster.

By clicking on the container link on the page, you can enter the container details page to view the container's log information and have the authority to shut down the container.

Dataset management is similar to container management. I won't go into detail here.

GPU

Set different types in the cluster GPU The total number of. This number must be less than or equal to the number available in the actual cluster GPU number. Exceeding the actual number will result in the corresponding resources being unable to run successfully.

Computing Resource Management

Users need to select the specified container when creating it "Computing power" , Specific options available for selection "Computing power" Created and maintained by administrators. Each computing power needs to be defined CPU. GPU. Memory. Store four dimensions.

The total computing power of a cluster is determined by the machine nodes of the cluster. However, the specific allocation of computing power resources needs to be tailored to the usage scenarios, each "Computing power" The configuration is too small to meet the user's usage needs, each "Computing power" Configuring too much can lead to a small number of users occupying too many resources. Causing cluster vacancy.

For example, for someone who possesses 10 individual T4 type GPU In terms of computing clusters. Administrators can adjust every two according to their needs T4 The type of graphics card is bound to a computing power resource and built into 5 A computing power quota. This means that the cluster can run at most simultaneously 5 A container of this type; You can also include each one T4 Type allocation as a resource constructed into 10 A computing power quota. This means that the cluster can run simultaneously 10 A container of this type.

The allocation of cluster resources must follow the actual situation. For example, possessing 10 individual T4 GPU The cluster cannot be created 11 A single card container. If the administrator assigns 11 A single card resource. So, the 11 A container will not be able to be successfully scheduled.

be careful

Newly created "Computing power" The total number is 0, And the default label is "Not available" , Administrators need to manually set the total number and mark it as "available" Can only be used by users.

Image Management

The image provides the basic software configuration required by the user. For example, in the field of machine learning HyperAI Provided TensorFlow. PyTorch. MxNet of CPU and GPU Images of different versions in different environments. Administrators can also add custom images according to their own needs.

Custom Image

As shown in the above figure, stay"Administrator Console " - "image" The page displays the name of each runtime image. Corresponding framework. Support device types and the location of the reported error image.

During system initialization, only the system preset image is available. Cannot be deleted:

If additional custom images need to be added. Its image must be built from a pre-set image.stay Runtime environment You can see the dependency status of all images below.

For example, if it is necessary to PyTorch-1.6.0 On the basis of this, adding some additional custom dependencies can be specified as follows Dockerfile:

FROM uhub.service.ucloud.cn/hyperairuntimes/pytorch:1.6.0-py36-cu101.47

RUN conda install seaborn

Then perform the following operations to build and push the image to the image repository:

docker build . -t <your-registry>/<your-image-name>:<your-tag>
docker push <your-registry>/<your-image-name>:<your-tag>

When creating a custom image, there are several points that need special attention:

  1. because HyperAI The startup of a container depends on some common configurations and dependencies in the preset image. Custom images must be in the original HyperAI Built on a pre-set image. Otherwise, there may be a situation where the container cannot be started.
  2. stay Runtime environment You can see it under the container list. For the same framework. The same version needs to be provided cpu With gpu The mirror image. So when adding custom images, it is usually necessary to add support cpu and gpu A pair of mirrors. This ensures that the image can be launched under both types of computing power resources. Lack of corresponding device images can cause errors when creating containers under the corresponding computing resources.
  3. Ensure that the provided image is accessible to the cluster. Recommend using the corresponding internal network image repository

Add a new image

After the construction is completed, you can "image" - "Create a new image" Configure the corresponding name and image reading location here:

Afterwards, you can choose this custom image when creating the container.

Invitation

Used for managing registered users. For private clusters, it is mainly for the convenience of personnel management in various departments.

Batch invitation code

"Batch invitation code" It has the following functions:

  1. Set the number of available invitation codes. The maximum number of users who can register using this invitation code.
  2. Set the groups that users registered with this invitation code will join. If not set, add to default group.
  3. Set up immediate access to resources for users registered through this invitation code. Including computing resources and storage resources.

For example, users who register through invitation codes can be provided with 10 hour A100 Type Resources and 20 hour rtx-3090 Types of resources:

Grouping

The main purposes of grouping are as follows::

  1. Bind groups during registration to distinguish users from different channels. Convenient by department. Manage laboratories and other facilities;
  2. Bind free products to groups. Enable users in this group to update corresponding resources on a weekly basis. See this section [Create a periodic supply plan](#Create a periodic supply plan);
info

Users will be added by default to default In the group

Modify user grouping

The group administrator of the user can make modifications on the personal management page:

Periodic supply plan

HyperAI To avoid computing resources being used as evenly as possible. Provides the function of periodic supply planning.

  1. Administrators can create a "commodity" Set as free and mark its validity period as one week.
  2. Take this "commodity" Bind one "grouping" lower

Fixed time per week HyperAI I will take this "commodity" Allocate resources to "grouping" All users below.

Give me an example:

  1. The administrator has created a 50 Hour by hour cpu Commodity of computing resources of different types. Set it as free and mark its validity period as one week, be known as free-cpu-50h

  2. The administrator has bound this product to a group xx in

Every Monday morning, xx All users under the group will receive it 50 Hour by hour cpu Types of computing resources. But the validity period of this computing resource is only one week. So it's next Monday. This resource will be recycled.in other words. If this week, xx The users in the group have not finished using this 50 Hour of resources. So this 50 The resources of the hour will be wasted. Not constantly accumulating computing resources due to users accessing them every week.

  1. Check for invalid computing resources

In the user's "Resource utilization status" - "Resource Change Details" You can see computing resources with expiration dates in the:

info
  1. there "commodity" It's a fairly common term. For private deployment users. A more accurate statement is "plan" Mainly targeting cyclical resource supply. Therefore, here "Periodic supply of goods" It can also be understood as "Periodic supply plan" ;
  2. The default update time for the periodic supply plan within the cluster is 8:00 am every Monday morning;
  3. After creating a cyclical supply plan for the first time, its effective period is the next cycle. That's next Monday morning at 8 o'clock;

Modify the periodic supply plan

stay "grouping" On the details page, you can see the bound periodic supply plan:

Here, you can delete periodic supply plans that are no longer needed:

New cyclical supply plans can also be added:

All modifications will take effect in the next cycle.