Skip to main content

HyperAI Introduction to Automatic Modeling Data Format Specification

Introduce

HyperAI data format yes HyperAI A set of data set formatting standards defined, be used for HyperAI Automatic modeling and related products. After formatting the dataset according to this specification. Automatic modeling can use this dataset to automatically build deep learning models.

HyperAI The data format is meta.csv The main format file for the dataset. File to csv Format as the main body:

  • The first line is the field type and field name. The format is: [type]_[name]
  • Data samples for the second row and each subsequent row.

Field Name

  1. Field names are named using uppercase and lowercase English letters.
  2. with“*”Fields starting with the number will be ignored. The automatic modeling training process will ignore this field.
  3. Label As an exclusive field. Specifically referring to the labels in the training data. There can only be one field name in the field name Label.

Field type

Field type Indicate the data type of the column field. This includes simple fields: int, float, category, txt. The value of a simple field is meta.csv The value of each column corresponding to each row in the middle. The other type is complex fields: text, image, video, json. Complex fields cannot be accessed meta.csv The middle represents. So the values corresponding to complex fields are a relative path. Indicate the file corresponding to the value of this field in the dataset.

  • int - Integer value
  • float - Floating point number
  • category - Classification value
  • txt - Short text value
  • text - text file. All contents in the file
  • image - Image file. The format includes: jpg, png, tif
  • video - video file , The format includes: mp4
  • json - Complex annotated data. According to different questions. There will be corresponding definitions

Data formats for various types of problems

Object detection

Object detection due to Label Multiple field contents. So use a separate one Json File as annotation, 001.jpg It's an original picture, 001.json Then it is to modify the labeling and corresponding types of several objects in the image.

json_Label,image_Source
labels/001.json,images/001.jpg

For detailed description, please refer to object detection

Semantic segmentation

001_mask.jpg and 001.jpg It's two pictures of the same size, 001_mask.jpg Each pixel in it is 001.jpg Annotation of corresponding positions.

image_Label,image_Source
images/001_mask.jpg,images/001.jpg

Instance segmentation

Instance segmentation

FAQ

  1. In the annotated file name and file content. It's best to only use English, number. Characters such as underline. Avoid appearing in Chinese. To prevent unexpected coding issues.
  2. All coordinates in the annotation specification are relative position coordinates. As shown in the following figure. The coordinate point is(X/800, Y/600)