gem5art artifact package

This package contains the Artifact type and an artifact database for use with gem5art.

Please cite the gem5art paper when using the gem5art packages. This documentation can be found on the gem5 website

gem5art artifacts

All unique objects used during gem5 experiments are termed “artifacts” in gem5art. Examples of artifacts include: gem5 binary, gem5 source code repo, Linux kernel source repo, linux binary, disk image, and packer binary (used to build the disk image). The goal of this infrastructure is to keep a record of all the artifacts used in a particular experiment and to return the set of used artifacts when the same experiment needs to be performed in the future.

The description of an artifact serves as the documentation of how that artifact was created. One of the goals of gem5art is for these artifacts to be self contained. With just the metadata stored with the artifact a third party should be able to perfectly reproduce the artifact. (We are still working toward this goal. For instance, we are looking into using docker to create artifacts to separate artifact creation from the host platform its run on.)

Each artifact is characterized by a set of attributes, described below:

  • command: command used to build this artifact
  • typ: type of the artifact e.g. binary, git repo etc.
  • name: name of the artifact
  • cwd: current working directory, where the command to build the artifact is run
  • path: actual path of the location of the artifact
  • inputs: a list of the artifacts used to build the current artifact
  • documentation: a docstring explaining the purpose of the artifact and any other useful information that can help to reproduce the artifact

Additionally, each artifact also has the following implicit information.

  • hash: an MD5 hash for a binary artifact or a git hash for a git artifact
  • time: time of the creation of an artifact
  • id: a UUID associated with the artifact
  • git: a dictionary containing the origin, current commit and the repo name for a git artifact (will be an empty dictionary for other types of artifacts)

These attribute are not specified by the user, but are generated by gem5art automatically (when the Artifact object is created for the first time).

An example of how a user would create a gem5 binary artifact using gem5art is shown below. In this example, the type, name, and documentation are up to the user of gem5art. You're encouraged to use names that are easy to remember when you later query the database. The documentation attribute should be used to completely describe the artifact that you are saving.

gem5_binary = Artifact.registerArtifact(
    command = 'scons build/X86/gem5.opt',
    typ = 'gem5 binary',
    name = 'gem5',
    cwd = 'gem5/',
    path =  'gem5/build/X86/gem5.opt',
    inputs = [gem5_repo,],
    documentation = '''
      Default gem5 binary compiled for the X86 ISA.
      This was built from the main gem5 repo (gem5.googlesource.com) without
      any modifications. We recently updated to the current gem5 master
      which has a fix for memory channel address striping.
    '''
)

Another goal of gem5art is to enable sharing of artifacts among multiple users, which is achieved through the use of the centralized database. Basically, whenever a user tries to create a new artifact, the database is searched to find if the same artifact exists there. If it does, the user can download the matching artifact for use. Otherwise, the newly created artifact is uploaded to the database for later use. The use of database also avoids running identical experiments (by generating an error message if a user tries to execute exact run which already exists in the database).

Creating artifacts

To create an Artifact, you must use registerArtifact as shown in the above example as well. This is a factory method which will initially create the artifact.

When calling registerArtifact, the artifact will automatically be added to the database. If it already exists, a pointer to that artifact will be returned.

The parameters to the registerArtifact function are meant for documentation, not as explicit directions to create the artifact from scratch. In the future, this feature may be added to gem5art.

Note: While creating new artifacts, warning messages showing that certain attributes (except hash and id) of two artifacts don't match (when artifact similarity is checked in the code) might appear. Users should make sure that they understand the reasons of any such warnings.

Using artifacts from the database

You can create an artifact with just a UUID if it is already stored in the database. The behavior will be the same as when creating an artifact that already exists. All of the properties of the artifact will be populated from the database.

ArtifactDB

The particular database used in this work is MongoDB. We use MongoDB since it can easily store large files (e.g., disk images), is tightly integrated with Python through pymongo, and has an interface that is flexible as the needs of gem5art changes.

Currently, it's required to run a database to use gem5. However, we are planning on changing this default to allow gem5art to be used standalone as well.

gem5art allows you to connect to any database, but by default assumes there is a MongoDB instance running on the localhost at mongo://localhost:27017. You can use the environment variable GEM5ART_DB to specify the default database to connect when running simple scripts. Additionally, you can specify the location of the database when calling getDBConnection in your scripts.

In case no database exists or a user want their own database, you can create a new database by creating a new directory and running the mongodb docker image. See the MongoDB docker documentation or the MongoDB documentation for more information.

`docker run -p 27017:27017 -v <absolute path to the created directory>:/data/db --name mongo-<some tag> -d mongo`

This uses the official MongoDB Docker image to run the database at the default port on the localhost. If the Docker container is killed, it can be restarted with the same command line and the database should be consistent.

Connecting to an existing database

By default, gem5art will assume the database is running at mongodb://localhost:27017, which is MongoDB's default on the localhost.

The environment variable GEM5ART_DB can override this default.

Otherwise, to programmatically set a database URI when using gem5art, you can pass a URI to the getDatabaseConnection function.

Currently, gem5art only supports MongoDB database backends, but extending this to other databases should be straightforward.

Searching the Database

gem5art provides a few convience functions for searching and accessing the database. These functions can be found in artifact.common_queries.

Specifically, we provide the following functions:

  • getByName: Returns all objects mathching name in database.
  • getDiskImages: Returns a generator of disk images (type = disk image).
  • getLinuxBinaries: Returns a generator of Linux kernel binaries (type = kernel).
  • getgem5Binaries: Returns a generator of gem5 binaries (type = gem5 binary).

Downloading from the Database

You can also download a file associated with an artifact using functions provided by gem5art. A good way to search and download items from the database is by using the Python interactive shell. You can search the database with the functions provided by the artifact module (e.g., getByName, getByType, etc.). Then, once you‘ve found the ID of the artifact you’d like to download, you can call downloadFile. See the example below.

$ python
Python 3.6.8 (default, Oct  7 2019, 12:59:55)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from gem5art.artifact import *
>>> db = getDBConnection()
>>> for i in getDiskImages(db, limit=2): print(i)
...
ubuntu
    id: d4a54de8-3a1f-4d4d-9175-53c15e647afd
    type: disk image
    path: disk-image/ubuntu-image/ubuntu
    inputs: packer:fe8ba737-ffd4-44fa-88b7-9cd072f82979, fs-x86-test:94092971-4277-4d38-9e4a-495a7119a5e5, m5:69dad8b1-48d0-43dd-a538-f3196a894804
    Ubuntu with m5 binary installed and root auto login
ubuntu
    id: c54b8805-48d6-425d-ac81-9b1badba206e
    type: disk image
    path: disk-image/ubuntu-image/ubuntu
    inputs: packer:fe8ba737-ffd4-44fa-88b7-9cd072f82979, fs-x86-test:5bfaab52-7d04-49f2-8fea-c5af8a7f34a8, m5:69dad8b1-48d0-43dd-a538-f3196a894804
    Ubuntu with m5 binary installed and root auto login
>>> for i in getLinuxBinaries(db, limit=2): print(i)
...

vmlinux-5.2.3
    id: 8cfd9fbe-24d0-40b5-897e-beca3df80dd2
    type: kernel
    path: linux-stable/vmlinux-5.2.3
    inputs: fs-x86-test:94092971-4277-4d38-9e4a-495a7119a5e5, linux-stable:25feca9a-3642-458e-a179-f3705266b2fe
    Kernel binary for 5.2.3 with simple config file
vmlinux-5.2.3
    id: 9721d8c9-dc41-49ba-ab5c-3ed169e24166
    type: kernel
    path: linux-stable/vmlinux-5.2.3
    inputs: npb:85e6dd97-c946-4596-9b52-0bb145810d68, linux-stable:25feca9a-3642-458e-a179-f3705266b2fe
    Kernel binary for 5.2.3 with simple config file
>>> from uuid import UUID
>>> db.downloadFile(UUID('8cfd9fbe-24d0-40b5-897e-beca3df80dd2'), 'linux-stable/vmlinux-5.2.3')

For another example, assume there is a disk image named npb (containing NAS Parallel Benchmarks) in your database and you want to download the disk image to your local directory. You can do the following to download the disk image:

import gem5art.artifact

db = gem5art.artifact.getDBConnection()

disks = gem5art.artifact.getByName(db, 'npb')

for disk in disks:
    if disk.type == 'disk image' and disk.documentation == 'npb disk image created on Nov 20':
        db.downloadFile(disk._id, 'npb')

Here, we assume that there can be multiple disk images/artifacts with the name npb and we are only interested in downloading the npb disk image with a particular documentation (‘npb disk image created on Nov 20’). Also, note that there is not a single way to download files from the database (although they will eventually use the downloadFile function).

The dual of the downloadFile method used above is upload.

Database schema

Alternative, you can use the pymongo Python module or the mongodb command line interface to interact with the database. See the MongoDB documentation for more information on how to query the MongoDB database.

gem5art has two collections. artifact_database.artifacts stores all of the metadata for the artifacts and artifact_database.fs is a GridFS store for all of the files. The files in the GridFS use the same UUIDs as the Artifacts as their primary keys.

You can list all of the details of all of the artifacts by running the following in Python.

#!/usr/bin/env python3

from pymongo import MongoClient

db = MongoClient().artifact_database
for i in db.artifacts.find():
    print(i)

gem5art also provides a few methods to search the database for artifacts of a particular type or name. For example, to find all disk images in a database you can do the following:

import gem5art.artifact
db = gem5art.artifact.getDBConnection('mongo://localhost')
for i in gem5art.artifact.getDiskImages(db):
    print(i)

Other similar methods include: getLinuxBinaries(), getgem5Binaries()

You can use getByName() method to search database for artifacts using the name attribute. For example, to search for gem5 named artifacts:

import gem5art.artifact
db = gem5art.artifact.getDBConnection('mongo://localhost')
for i in gem5art.artifact.getByName(db, "gem5"):
    print(i)

Artifacts API Documentation

Artifact Module
--------
.. automodule:: gem5art.artifact
    :members:

Artifact
--------
.. automodule:: gem5art.artifact.artifact
    :members:
    :undoc-members:

Artifact
--------
.. automodule:: gem5art.artifact.artifact.Artifact
    :members:
    :undoc-members:

Helper Functions for Common Queries
-----------------------------------
.. automodule:: gem5art.artifact.common_queries
    :members:
    :undoc-members:

AritifactDB
-----------
This is mostly internal.

.. automodule:: gem5art.artifact._artifactdb
    :members:
    :undoc-members: