dfdewey/docs/usage.md

# Using dfDewey

```shell
usage: dfdewey [-h] [-c CONFIG] [--no_base64] [--no_gzip] [--no_zip] [--reparse] [--reindex] [--delete] [--highlight] [-s SEARCH] [--search_list SEARCH_LIST] case [image]

positional arguments:
  case                  case ID
  image                 image file (default: 'all')

optional arguments:
  -h, --help            show this help message and exit
  -c CONFIG, --config CONFIG
                        datastore config file
  --no_base64           don't decode base64
  --no_gzip             don't decompress gzip
  --no_zip              don't decompress zip
  --reparse             reparse filesystem (will delete existing filesystem mapping)
  --reindex             recreate index (will delete existing index)
  --delete              delete image (filesystem mapping and index)
  --highlight           highlight search term in results
  -s SEARCH, --search SEARCH
                        search query
  --search_list SEARCH_LIST
                        file with search queries
```

## Docker

If using OpenSearch and PostgreSQL in Docker, they can be started using
[docker-compose](https://docs.docker.com/compose/install/) from the `docker`
folder.

```shell
docker-compose up -d
```

Note: Java memory for OpenSearch is set high to improve performance when
indexing large volumes of data. If running on a system with limited resources,
you can change the setting in `docker/docker-compose.yml`.

To shut the containers down again (and purge the data), run:

```shell
docker-compose down
```

### Running dfDewey in Docker

The `docker` folder also contains a `Dockerfile` to build dfDewey and its
dependencies into a Docker image.

To build the image (must be run from the root of the repo):

```shell
docker build -t <docker_name> -f ./docker/Dockerfile .
```

When running dfDewey within a Docker container, we need to give the container
access to the host network so it will be able to access OpenSearch and
PostgreSQL in their respective containers. We also need to map a folder in the
container to allow access to the image we want to process. For example:

```shell
docker run --network=host -v ~/images/:/mnt/images <docker_name> dfdewey -h
```

## Processing an Image

To process an image in dfDewey, you need to supply a `CASE` and `IMAGE`.

```shell
dfdewey testcase /path/to/image.dd
```

dfDewey will have bulk_extractor decode base64 data, and decompress gzip / zip
data by default. These can be disabled by adding the flags `--no_base64`,
`--no_gzip`, and `--no_zip`.

If an image has already been processed, you can opt to reparse and reindex the
image (this will first delete the existing data) by adding the flags
`--reparse` and `--reindex`.

You can also delete the data for a given image from the datastores by adding
the `--delete` flag.

## Searching

To search the index for a single image, you need to supply a `CASE`, `IMAGE`,
and `SEARCH`.

```shell
dfdewey testcase /path/to/image.dd -s 'foo'
```

If an `IMAGE` is not provided, dfDewey will search all images in the given case.

dfDewey can also search for a list of terms at once. The terms can be placed in
a text file one per line. In this case, only the number of results for each term
is returned.

```shell
dfdewey testcase /path/to/image.dd --search_list search_terms.txt
```
Add usage doc 2020-04-15 06:58:28 +00:00			`# Using dfDewey`

			```shell
Add image reparse and deletion functions (#31) * Update readme for bulk_extractor v2.0.0 * Update docker image to Ubuntu 20.04 * Parse filesystem before string extraction * Refactor postgres datastore code * Add reparse option * Add option to delete image data * Update usage * Update version 2022-06-03 05:35:43 +00:00			`usage: dfdewey [-h] [-c CONFIG] [--no_base64] [--no_gzip] [--no_zip] [--reparse] [--reindex] [--delete] [--highlight] [-s SEARCH] [--search_list SEARCH_LIST] case [image]`
Refactoring CLI, processing and searching 2020-11-20 04:01:48 +00:00
			`positional arguments:`
			`case case ID`
			`image image file (default: 'all')`
Add usage doc 2020-04-15 06:58:28 +00:00
			`optional arguments:`
			`-h, --help show this help message and exit`
Add option to specify config file 2021-09-03 03:14:24 +00:00			`-c CONFIG, --config CONFIG`
			`datastore config file`
Add usage doc 2020-04-15 06:58:28 +00:00			`--no_base64 don't decode base64`
			`--no_gzip don't decompress gzip`
			`--no_zip don't decompress zip`
Add image reparse and deletion functions (#31) * Update readme for bulk_extractor v2.0.0 * Update docker image to Ubuntu 20.04 * Parse filesystem before string extraction * Refactor postgres datastore code * Add reparse option * Add option to delete image data * Update usage * Update version 2022-06-03 05:35:43 +00:00			`--reparse reparse filesystem (will delete existing filesystem mapping)`
Add option to recreate index 2021-04-01 05:57:16 +00:00			`--reindex recreate index (will delete existing index)`
Add image reparse and deletion functions (#31) * Update readme for bulk_extractor v2.0.0 * Update docker image to Ubuntu 20.04 * Parse filesystem before string extraction * Refactor postgres datastore code * Add reparse option * Add option to delete image data * Update usage * Update version 2022-06-03 05:35:43 +00:00			`--delete delete image (filesystem mapping and index)`
Make search term highlighting optional 2021-08-17 23:48:41 +00:00			`--highlight highlight search term in results`
Add usage doc 2020-04-15 06:58:28 +00:00			`-s SEARCH, --search SEARCH`
			`search query`
			`--search_list SEARCH_LIST`
			`file with search queries`
			```

Updating usage docs for Docker 2020-06-24 01:06:09 +00:00			`## Docker`

Migrate to OpenSearch (#27) * Migrate to OpenSearch * Minor fixes to support Python 3.6 2021-12-20 00:08:29 +00:00			`If using OpenSearch and PostgreSQL in Docker, they can be started using`
Updating usage docs for Docker 2020-06-24 01:06:09 +00:00			[docker-compose](https://docs.docker.com/compose/install/) from the `docker`
			`folder.`

			```shell
			`docker-compose up -d`
			```

Migrate to OpenSearch (#27) * Migrate to OpenSearch * Minor fixes to support Python 3.6 2021-12-20 00:08:29 +00:00			`Note: Java memory for OpenSearch is set high to improve performance when`
Add note about Elasticsearch memory setting. 2020-11-23 03:21:51 +00:00			`indexing large volumes of data. If running on a system with limited resources,`
			you can change the setting in `docker/docker-compose.yml`.

Updating usage docs for Docker 2020-06-24 01:06:09 +00:00			`To shut the containers down again (and purge the data), run:`

			```shell
			`docker-compose down`
			```

			`### Running dfDewey in Docker`

			The `docker` folder also contains a `Dockerfile` to build dfDewey and its
			`dependencies into a Docker image.`

Docker changes for k8s deployment 2021-09-08 04:35:57 +00:00			`To build the image (must be run from the root of the repo):`
Updating usage docs for Docker 2020-06-24 01:06:09 +00:00
			```shell
Docker changes for k8s deployment 2021-09-08 04:35:57 +00:00			`docker build -t <docker_name> -f ./docker/Dockerfile .`
Updating usage docs for Docker 2020-06-24 01:06:09 +00:00			```

			`When running dfDewey within a Docker container, we need to give the container`
Migrate to OpenSearch (#27) * Migrate to OpenSearch * Minor fixes to support Python 3.6 2021-12-20 00:08:29 +00:00			`access to the host network so it will be able to access OpenSearch and`
Updating usage docs for Docker 2020-06-24 01:06:09 +00:00			`PostgreSQL in their respective containers. We also need to map a folder in the`
			`container to allow access to the image we want to process. For example:`

			```shell
			`docker run --network=host -v ~/images/:/mnt/images <docker_name> dfdewey -h`
			```

Add usage doc 2020-04-15 06:58:28 +00:00			`## Processing an Image`

			To process an image in dfDewey, you need to supply a `CASE` and `IMAGE`.

			```shell
Docker changes for k8s deployment 2021-09-08 04:35:57 +00:00			`dfdewey testcase /path/to/image.dd`
Add usage doc 2020-04-15 06:58:28 +00:00			```

			`dfDewey will have bulk_extractor decode base64 data, and decompress gzip / zip`
			data by default. These can be disabled by adding the flags `--no_base64`,
			`--no_gzip`, and `--no_zip`.

Add image reparse and deletion functions (#31) * Update readme for bulk_extractor v2.0.0 * Update docker image to Ubuntu 20.04 * Parse filesystem before string extraction * Refactor postgres datastore code * Add reparse option * Add option to delete image data * Update usage * Update version 2022-06-03 05:35:43 +00:00			`If an image has already been processed, you can opt to reparse and reindex the`
			`image (this will first delete the existing data) by adding the flags`
			`--reparse` and `--reindex`.

			`You can also delete the data for a given image from the datastores by adding`
			the `--delete` flag.

Add usage doc 2020-04-15 06:58:28 +00:00			`## Searching`

			To search the index for a single image, you need to supply a `CASE`, `IMAGE`,
			and `SEARCH`.

			```shell
Docker changes for k8s deployment 2021-09-08 04:35:57 +00:00			`dfdewey testcase /path/to/image.dd -s 'foo'`
Add usage doc 2020-04-15 06:58:28 +00:00			```

			If an `IMAGE` is not provided, dfDewey will search all images in the given case.

			`dfDewey can also search for a list of terms at once. The terms can be placed in`
			`a text file one per line. In this case, only the number of results for each term`
			`is returned.`

			```shell
Docker changes for k8s deployment 2021-09-08 04:35:57 +00:00			`dfdewey testcase /path/to/image.dd --search_list search_terms.txt`
Add usage doc 2020-04-15 06:58:28 +00:00			```