layout: documentation title: Artifacts doc: gem5art parent: main permalink: /documentation/gem5art/main/artifacts Authors:
All unique objects used during gem5 experiments are termed “artifacts” in gem5art. Examples of artifacts include: gem5 binary, gem5 source code repo, Linux kernel source repo, linux binary, disk image, and packer binary (used to build the disk image). The goal of this infrastructure is to keep a record of all the artifacts used in a particular experiment and to return the set of used artifacts when the same experiment needs to be performed in the future.
The description of an artifact serves as the documentation of how that artifact was created. One of the goals of gem5art is for these artifacts to be self contained. With just the metadata stored with the artifact a third party should be able to perfectly reproduce the artifact. (We are still working toward this goal. For instance, we are looking into using docker to create artifacts to separate artifact creation from the host platform its run on.)
Each artifact is characterized by a set of attributes, described below:
Additionally, each artifact also has the following implicit information.
These attribute are not specified by the user, but are generated by gem5art automatically (when the
Artifact object is created for the first time).
An example of how a user would create a gem5 binary artifact using gem5art is shown below. In this example, the type, name, and documentation are up to the user of gem5art. You're encouraged to use names that are easy to remember when you later query the database. The documentation attribute should be used to completely describe the artifact that you are saving.
gem5_binary = Artifact.registerArtifact( command = 'scons build/X86/gem5.opt', typ = 'gem5 binary', name = 'gem5', cwd = 'gem5/', path = 'gem5/build/X86/gem5.opt', inputs = [gem5_repo,], documentation = ''' Default gem5 binary compiled for the X86 ISA. This was built from the main gem5 repo (gem5.googlesource.com) without any modifications. We recently updated to the current gem5 master which has a fix for memory channel address striping. ''' )
Another goal of gem5art is to enable sharing of artifacts among multiple users, which is achieved through the use of the centralized database. Basically, whenever a user tries to create a new artifact, the database is searched to find if the same artifact exists there. If it does, the user can download the matching artifact for use. Otherwise, the newly created artifact is uploaded to the database for later use. The use of database also avoids running identical experiments (by generating an error message if a user tries to execute exact run which already exists in the database).
To create an
Artifact, you must use
registerArtifact as shown in the above example as well. This is a factory method which will initially create the artifact.
registerArtifact, the artifact will automatically be added to the database. If it already exists, a pointer to that artifact will be returned.
The parameters to the
registerArtifact function are meant for documentation, not as explicit directions to create the artifact from scratch. In the future, this feature may be added to gem5art.
Note: While creating new artifacts, warning messages showing that certain attributes (except hash and id) of two artifacts don't match (when artifact similarity is checked in the code) might appear. Users should make sure that they understand the reasons of any such warnings.
You can create an artifact with just a UUID if it is already stored in the database. The behavior will be the same as when creating an artifact that already exists. All of the properties of the artifact will be populated from the database.
The particular database used in this work is MongoDB. We use MongoDB since it can easily store large files (e.g., disk images), is tightly integrated with Python through pymongo, and has an interface that is flexible as the needs of gem5art changes.
Currently, it's required to run a database to use gem5. However, we are planning on changing this default to allow gem5art to be used standalone as well.
gem5art allows you to connect to any database, but by default assumes there is a MongoDB instance running on the localhost at
mongodb://localhost:27017. You can use the environment variable
GEM5ART_DB to specify the default database to connect when running simple scripts, e.g.
GEM5ART_DB=mongodb://<remote>:27017". Additionally, you can specify the location of the database when calling
getDBConnection in your scripts.
In case no database exists or a user wants their own database, you can create a new database by creating a new directory and running the mongodb docker image. See the MongoDB docker documentation or the MongoDB documentation for more information.
docker run -p 27017:27017 -v <absolute path to the created directory>:/data/db --name mongo-<some tag> -d mongo
This uses the official MongoDB Docker image to run the database at the default port on the localhost. If the Docker container is killed, it can be restarted with the same command line and the database should be consistent.
By default, gem5art will assume the database is running at
mongodb://localhost:27017, which is MongoDB's default on the localhost.
The environment variable
GEM5ART_DB can override this default.
Otherwise, to programmatically set a database URI when using gem5art, you can pass a URI to the
Currently, gem5art only supports MongoDB database backends, but extending this to other databases should be straightforward.
gem5art provides a few convience functions for searching and accessing the database. These functions can be found in
Specifically, we provide the following functions:
getByName: Returns all objects mathching
getDiskImages: Returns a generator of disk images (type = disk image).
getLinuxBinaries: Returns a generator of Linux kernel binaries (type = kernel).
getgem5Binaries: Returns a generator of gem5 binaries (type = gem5 binary).
You can also download a file associated with an artifact using functions provided by gem5art. A good way to search and download items from the database is by using the Python interactive shell. You can search the database with the functions provided by the
artifact module (e.g.,
getByType, etc.). Then, once you‘ve found the ID of the artifact you’d like to download, you can call
downloadFile. See the example below.
$ python Python 3.6.8 (default, Oct 7 2019, 12:59:55) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from gem5art.artifact import * >>> db = getDBConnection() >>> for i in getDiskImages(db, limit=2): print(i) ... ubuntu id: d4a54de8-3a1f-4d4d-9175-53c15e647afd type: disk image path: disk-image/ubuntu-image/ubuntu inputs: packer:fe8ba737-ffd4-44fa-88b7-9cd072f82979, fs-x86-test:94092971-4277-4d38-9e4a-495a7119a5e5, m5:69dad8b1-48d0-43dd-a538-f3196a894804 Ubuntu with m5 binary installed and root auto login ubuntu id: c54b8805-48d6-425d-ac81-9b1badba206e type: disk image path: disk-image/ubuntu-image/ubuntu inputs: packer:fe8ba737-ffd4-44fa-88b7-9cd072f82979, fs-x86-test:5bfaab52-7d04-49f2-8fea-c5af8a7f34a8, m5:69dad8b1-48d0-43dd-a538-f3196a894804 Ubuntu with m5 binary installed and root auto login >>> for i in getLinuxBinaries(db, limit=2): print(i) ... vmlinux-5.2.3 id: 8cfd9fbe-24d0-40b5-897e-beca3df80dd2 type: kernel path: linux-stable/vmlinux-5.2.3 inputs: fs-x86-test:94092971-4277-4d38-9e4a-495a7119a5e5, linux-stable:25feca9a-3642-458e-a179-f3705266b2fe Kernel binary for 5.2.3 with simple config file vmlinux-5.2.3 id: 9721d8c9-dc41-49ba-ab5c-3ed169e24166 type: kernel path: linux-stable/vmlinux-5.2.3 inputs: npb:85e6dd97-c946-4596-9b52-0bb145810d68, linux-stable:25feca9a-3642-458e-a179-f3705266b2fe Kernel binary for 5.2.3 with simple config file >>> from uuid import UUID >>> db.downloadFile(UUID('8cfd9fbe-24d0-40b5-897e-beca3df80dd2'), 'linux-stable/vmlinux-5.2.3')
For another example, assume there is a disk image named
npb (containing NAS Parallel Benchmarks) in your database and you want to download the disk image to your local directory. You can do the following to download the disk image:
import gem5art.artifact db = gem5art.artifact.getDBConnection() disks = gem5art.artifact.getByName(db, 'npb') for disk in disks: if disk.type == 'disk image' and disk.documentation == 'npb disk image created on Nov 20': db.downloadFile(disk._id, 'npb')
Here, we assume that there can be multiple disk images/artifacts with the name
npb and we are only interested in downloading the npb disk image with a particular documentation (‘npb disk image created on Nov 20’). Also, note that there are other ways to download files from the database (although they will eventually use the
The dual of the
downloadFile method used above is
Alternative, you can use the pymongo Python module or the mongodb command line interface to interact with the database. See the MongoDB documentation for more information on how to query the MongoDB database.
gem5art has two collections.
artifact_database.artifacts stores all of the metadata for the artifacts and
artifact_database.fs is a GridFS store for all of the files. The files in the GridFS use the same UUIDs as the Artifacts as their primary keys.
You can list all of the details of all of the artifacts by running the following in Python.
#!/usr/bin/env python3 from pymongo import MongoClient db = MongoClient().artifact_database for i in db.artifacts.find(): print(i)
gem5art also provides a few methods to search the database for artifacts of a particular type or name. For example, to find all disk images in a database you can do the following:
import gem5art.artifact db = gem5art.artifact.getDBConnection('mongodb://localhost') for i in gem5art.artifact.getDiskImages(db): print(i)
Other similar methods include:
You can use getByName() method to search database for artifacts using the name attribute. For example, to search for gem5 named artifacts:
import gem5art.artifact db = gem5art.artifact.getDBConnection('mongodb://localhost') for i in gem5art.artifact.getByName(db, "gem5"): print(i)