blob: fdc45e52c11c9aa049389e33e51c51ddf2843b13 [file] [log] [blame] [view]
---
layout: documentation
title: Artifacts
doc: gem5art
parent: main
permalink: /documentation/gem5art/main/artifacts
Authors:
- Ayaz Akram
- Jason Lowe-Power
---
# Artifacts
## gem5art artifacts
All unique objects used during gem5 experiments are termed "artifacts" in gem5art.
Examples of artifacts include: gem5 binary, gem5 source code repo, Linux kernel source repo, linux binary, disk image, and packer binary (used to build the disk image).
The goal of this infrastructure is to keep a record of all the artifacts used in a particular experiment and to return the set of used artifacts when the same experiment needs to be performed in the future.
The description of an artifact serves as the documentation of how that artifact was created.
One of the goals of gem5art is for these artifacts to be self contained.
With just the metadata stored with the artifact a third party should be able to perfectly reproduce the artifact.
(We are still working toward this goal.
For instance, we are looking into using docker to create artifacts to separate artifact creation from the host platform its run on.)
Each artifact is characterized by a set of attributes, described below:
- command: command used to build this artifact
- typ: type of the artifact e.g. binary, git repo etc.
- name: name of the artifact
- cwd: current working directory, where the command to build the artifact is run
- path: actual path of the location of the artifact
- inputs: a list of the artifacts used to build the current artifact
- documentation: a docstring explaining the purpose of the artifact and any other useful information that can help to reproduce the artifact
Additionally, each artifact also has the following implicit information.
- hash: an MD5 hash for a binary artifact or a git hash for a git artifact
- time: time of the creation of an artifact
- id: a UUID associated with the artifact
- git: a dictionary containing the origin, current commit and the repo name for a git artifact (will be an empty dictionary for other types of artifacts)
These attribute are not specified by the user, but are generated by gem5art automatically (when the `Artifact` object is created for the first time).
An example of how a user would create a gem5 binary artifact using gem5art is shown below.
In this example, the type, name, and documentation are up to the user of gem5art.
You're encouraged to use names that are easy to remember when you later query the database.
The documentation attribute should be used to completely describe the artifact that you are saving.
```python
gem5_binary = Artifact.registerArtifact(
command = 'scons build/X86/gem5.opt',
typ = 'gem5 binary',
name = 'gem5',
cwd = 'gem5/',
path = 'gem5/build/X86/gem5.opt',
inputs = [gem5_repo,],
documentation = '''
Default gem5 binary compiled for the X86 ISA.
This was built from the main gem5 repo (github.com/gem5/gem5) without
any modifications. We recently updated to the current gem5 master
which has a fix for memory channel address striping.
'''
)
```
Another goal of gem5art is to enable sharing of artifacts among multiple users, which is achieved through the use of the centralized database.
Basically, whenever a user tries to create a new artifact, the database is searched to find if the same artifact exists there.
If it does, the user can download the matching artifact for use.
Otherwise, the newly created artifact is uploaded to the database for later use.
The use of database also avoids running identical experiments (by generating an error message if a user tries to execute exact run which already exists in the database).
### Creating artifacts
To create an `Artifact`, you must use `registerArtifact` as shown in the above example as well.
This is a factory method which will initially create the artifact.
When calling `registerArtifact`, the artifact will automatically be added to the database.
If it already exists, a pointer to that artifact will be returned.
The parameters to the `registerArtifact` function are meant for *documentation*, not as explicit directions to create the artifact from scratch.
In the future, this feature may be added to gem5art.
Note: While creating new artifacts, warning messages showing that certain attributes (except hash and id) of two artifacts don't match (when artifact similarity is checked in the code) might appear. Users should make sure that they understand the reasons of any such warnings.
### Using artifacts from the database
You can create an artifact with just a UUID if it is already stored in the database.
The behavior will be the same as when creating an artifact that already exists.
All of the properties of the artifact will be populated from the database.
## ArtifactDB
The particular database used in this work is [MongoDB](https://www.mongodb.com/).
We use MongoDB since it can easily store large files (e.g., disk images), is tightly integrated with Python through [pymongo](https://api.mongodb.com/python/current/), and has an interface that is flexible as the needs of gem5art changes.
Currently, it's required to run a database to use gem5.
However, we are planning on changing this default to allow gem5art to be used standalone as well.
gem5art allows you to connect to any database, but by default assumes there is a MongoDB instance running on the localhost at `mongodb://localhost:27017`.
You can use the environment variable `GEM5ART_DB` to specify the default database to connect when running simple scripts, e.g. `GEM5ART_DB=mongodb://<remote>:27017"`.
Additionally, you can specify the location of the database when calling `getDBConnection` in your scripts.
In case no database exists or a user wants their own database, you can create a new database by creating a new directory and running the mongodb docker image.
See the [MongoDB docker documentation](https://hub.docker.com/_/mongo) or the [MongoDB documentation](https://docs.mongodb.com/) for more information.
```sh
docker run -p 27017:27017 -v <absolute path to the created directory>:/data/db --name mongo-<some tag> -d mongo
```
This uses the official [MongoDB Docker image](https://hub.docker.com/_/mongo) to run the database at the default port on the localhost.
If the Docker container is killed, it can be restarted with the same command line and the database should be consistent.
### Connecting to an existing database
By default, gem5art will assume the database is running at `mongodb://localhost:27017`, which is MongoDB's default on the localhost.
The environment variable `GEM5ART_DB` can override this default.
Otherwise, to programmatically set a database URI when using gem5art, you can pass a URI to the `getDatabaseConnection` function.
Currently, gem5art only supports MongoDB database backends, but extending this to other databases should be straightforward.
### Searching the Database
gem5art provides a few convience functions for searching and accessing the database.
These functions can be found in `artifact.common_queries`.
Specifically, we provide the following functions:
- `getByName`: Returns all objects mathching `name` in database.
- `getDiskImages`: Returns a generator of disk images (type = disk image).
- `getLinuxBinaries`: Returns a generator of Linux kernel binaries (type = kernel).
- `getgem5Binaries`: Returns a generator of gem5 binaries (type = gem5 binary).
### Downloading from the Database
You can also download a file associated with an artifact using functions provided by gem5art. A good way to search and download items from the database is by using the Python interactive shell.
You can search the database with the functions provided by the `artifact` module (e.g., `getByName`, `getByType`, etc.).
Then, once you've found the ID of the artifact you'd like to download, you can call `downloadFile`.
See the example below.
```sh
$ python
Python 3.6.8 (default, Oct 7 2019, 12:59:55)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from gem5art.artifact import *
>>> db = getDBConnection()
>>> for i in getDiskImages(db, limit=2): print(i)
...
ubuntu
id: d4a54de8-3a1f-4d4d-9175-53c15e647afd
type: disk image
path: disk-image/ubuntu-image/ubuntu
inputs: packer:fe8ba737-ffd4-44fa-88b7-9cd072f82979, fs-x86-test:94092971-4277-4d38-9e4a-495a7119a5e5, m5:69dad8b1-48d0-43dd-a538-f3196a894804
Ubuntu with m5 binary installed and root auto login
ubuntu
id: c54b8805-48d6-425d-ac81-9b1badba206e
type: disk image
path: disk-image/ubuntu-image/ubuntu
inputs: packer:fe8ba737-ffd4-44fa-88b7-9cd072f82979, fs-x86-test:5bfaab52-7d04-49f2-8fea-c5af8a7f34a8, m5:69dad8b1-48d0-43dd-a538-f3196a894804
Ubuntu with m5 binary installed and root auto login
>>> for i in getLinuxBinaries(db, limit=2): print(i)
...
vmlinux-5.2.3
id: 8cfd9fbe-24d0-40b5-897e-beca3df80dd2
type: kernel
path: linux-stable/vmlinux-5.2.3
inputs: fs-x86-test:94092971-4277-4d38-9e4a-495a7119a5e5, linux-stable:25feca9a-3642-458e-a179-f3705266b2fe
Kernel binary for 5.2.3 with simple config file
vmlinux-5.2.3
id: 9721d8c9-dc41-49ba-ab5c-3ed169e24166
type: kernel
path: linux-stable/vmlinux-5.2.3
inputs: npb:85e6dd97-c946-4596-9b52-0bb145810d68, linux-stable:25feca9a-3642-458e-a179-f3705266b2fe
Kernel binary for 5.2.3 with simple config file
>>> from uuid import UUID
>>> db.downloadFile(UUID('8cfd9fbe-24d0-40b5-897e-beca3df80dd2'), 'linux-stable/vmlinux-5.2.3')
```
For another example, assume there is a disk image named `npb` (containing [NAS Parallel](https://www.nas.nasa.gov/) Benchmarks) in your database and you want to download the disk image to your local directory. You can do the following to download the disk image:
```python
import gem5art.artifact
db = gem5art.artifact.getDBConnection()
disks = gem5art.artifact.getByName(db, 'npb')
for disk in disks:
if disk.type == 'disk image' and disk.documentation == 'npb disk image created on Nov 20':
db.downloadFile(disk._id, 'npb')
```
Here, we assume that there can be multiple disk images/artifacts with the name `npb` and we are only interested in downloading the npb disk image with a particular documentation ('npb disk image created on Nov 20'). Also, note that there are other ways to download files from the database (although they will eventually use the `downloadFile` function).
The dual of the `downloadFile` method used above is `upload`.
#### Database schema
Alternative, you can use the pymongo Python module or the mongodb command line interface to interact with the database.
See the [MongoDB documentation](https://docs.mongodb.com/) for more information on how to query the MongoDB database.
gem5art has two collections.
`artifact_database.artifacts` stores all of the metadata for the artifacts and `artifact_database.fs` is a [GridFS](https://docs.mongodb.com/manual/core/gridfs/) store for all of the files.
The files in the GridFS use the same UUIDs as the Artifacts as their primary keys.
You can list all of the details of all of the artifacts by running the following in Python.
```python
#!/usr/bin/env python3
from pymongo import MongoClient
db = MongoClient().artifact_database
for i in db.artifacts.find():
print(i)
```
gem5art also provides a few methods to search the database for artifacts of a particular type or name. For example, to find all disk images in a database you can do the following:
```python
import gem5art.artifact
db = gem5art.artifact.getDBConnection('mongodb://localhost')
for i in gem5art.artifact.getDiskImages(db):
print(i)
```
Other similar methods include: `getLinuxBinaries()`, `getgem5Binaries()`
You can use getByName() method to search database for artifacts using the name attribute. For example, to search for gem5 named artifacts:
```python
import gem5art.artifact
db = gem5art.artifact.getDBConnection('mongodb://localhost')
for i in gem5art.artifact.getByName(db, "gem5"):
print(i)
```