---
title: "From R to CFF"
subtitle: "Crosswalk"
description: >
  A comprehensive description of the internal mappings performed by the **cffr**
  package.
author: Diego Hernangómez
bibliography: REFERENCES.bib
link-citations: true
toc: true
vignette: >
  %\VignetteIndexEntry{From R to CFF}
  %\VignetteEngine{quarto::html}
  %\VignetteEncoding{UTF-8}
tbl-cap-location: bottom
knitr:
  opts_chunk:
    collapse: true
    comment: "#>"
    code-fold: true
    code-summary: "**Example**"
---

```{r}
#| include: false
library(cffr)
```

The goal of this vignette is to provide an explicit map between the metadata
fields used by **cffr** and each valid key in the [Citation File Format schema
version
1.2.0](https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md#valid-keys).

## Summary {#summary}

This section summarizes the fields that **cffr** can coerce and the original
source of information for each one. Details for each key are presented in the
next section. The assessment of fields is based on the [Guide to Citation File
Format schema version
1.2.0](https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md#valid-keys)
[@druskat_citation_2021].

```{r}
#| label: tbl-summary
#| echo: false
#| tbl-cap: "cffr and R packages: Conversion table"
keys <- cff_schema_keys(sorted = TRUE)
origin <- vector(length = length(keys))
origin[keys == "cff-version"] <- "function argument"
origin[keys == "type"] <- "Fixed value: 'software'"
origin[keys == "identifiers"] <- "DESCRIPTION/CITATION files"
origin[keys == "references"] <- "DESCRIPTION/CITATION files"

origin[
  keys %in%
    c(
      "message",
      "title",
      "version",
      "authors",
      "abstract",
      "repository",
      "repository-code",
      "url",
      "date-released",
      "contact",
      "keywords",
      "license",
      "commit"
    )
] <- "DESCRIPTION file"

origin[
  keys %in%
    c(
      "doi",
      "preferred-citation"
    )
] <- "CITATION file"

origin[origin == "FALSE"] <- "Ignored by cffr"

df <- data.frame(
  key = paste0("<a href='#", keys, "'>", keys, "</a>"),
  source = origin
)

knitr::kable(df, escape = FALSE)
```

## Details

### abstract

This key is extracted from the `"Description"` field of the `DESCRIPTION` file.

```{r}
#| label: abstract
library(cffr)

# Create a `cff` object for YAML.

cff_obj <- cff_create("rmarkdown")

# Get the DESCRIPTION of **rmarkdown** to check.

pkg <- desc::desc(file.path(find.package("rmarkdown"), "DESCRIPTION"))

cat(cff_obj$abstract)

cat(pkg$get("Description"))
```

Back to @tbl-summary.

### authors

This key is coerced from the `"Authors"` or `"Authors@R"` field of the
`DESCRIPTION` file. By default, persons with the role `"aut"` or `"cre"` are
considered, but this can be modified with the `authors_roles` argument.

```{r}
#| label: authors
# An example DESCRIPTION.
path <- system.file("examples/DESCRIPTION_many_persons", package = "cffr")
pkg <- desc::desc(path)

# See the listed persons.
pkg$get_authors()

# Default behavior: use authors and creators (maintainers).
cff_obj <- cff_create(path)
cff_obj$authors

# Use copyright holders and maintainers.
cff_obj_alt <- cff_create(path, authors_roles = c("cre", "cph"))
cff_obj_alt$authors
```

Back to @tbl-summary.

### cff-version

This key can be set with the arguments of the `cff_create()`/`cff_write()`
functions:

```{r}
#| label: cffversion
cff_objv110 <- cff_create("jsonlite", cff_version = "v1.1.0")

cat(cff_objv110$`cff-version`)
```

Back to @tbl-summary.

### commit

This key is extracted from the `"RemoteSha"` field of the `DESCRIPTION` file.
This is the case of packages installed using the
[r-universe](https://r-universe.dev/search) or packages such as **remotes** or
**pak**.

```{r}
#| label: commit
# An example DESCRIPTION.
path <- system.file("examples/DESCRIPTION_r_universe", package = "cffr")
pkg <- desc::desc(path)

# See `RemoteSha`.
pkg$get("RemoteSha")

cff_read(path)
```

Back to @tbl-summary.

### contact

This key is coerced from the `"Authors"` or `"Authors@R"` field of the
`DESCRIPTION` file. Only persons with the role `"cre"` (that is, maintainers)
are considered.

```{r}
#| label: contact
cff_obj <- cff_create("rmarkdown")
pkg <- desc::desc(file.path(find.package("rmarkdown"), "DESCRIPTION"))

cff_obj$contact

pkg$get_author()
```

Back to @tbl-summary.

### date-released

This key is extracted from the `DESCRIPTION` file following this logic:

- The `"Date"` field.
- If not present, `"Date/Publication"`. This field is present in packages built
  on **CRAN** and **Bioconductor**.
- If not present, `"Packaged"`, which is present in packages built by the
  [r-universe](https://r-universe.dev/search).

```{r}
#| label: date-released
# From an installed package.

cff_obj <- cff_create("rmarkdown")
pkg <- desc::desc(file.path(find.package("rmarkdown"), "DESCRIPTION"))

cat(pkg$get("Date/Publication"))

cat(cff_obj$`date-released`)

# A DESCRIPTION file without a `Date`.
nodate <- system.file("examples/DESCRIPTION_basic", package = "cffr")
tmp <- tempfile("DESCRIPTION")

# Create a temporary file.
file.copy(nodate, tmp)

pkgnodate <- desc::desc(tmp)
cffnodate <- cff_create(tmp)

# This will not appear.
cat(cffnodate$`date-released`)

pkgnodate

# Add a `Date`.

desc::desc_set("Date", "1999-01-01", file = tmp)

cat(cff_create(tmp)$`date-released`)
```

Back to @tbl-summary.

### doi {#doi}

This key is coerced from the `"doi"` field of the
[preferred-citation](#preferred-citation) object. If not present and the package
is on **CRAN**, it is populated with the DOI provided by **CRAN** (e.g.
<https://doi.org/10.32614/CRAN.package.cffr>).

```{r}
#| label: doi
cff_doi <- cff_create("cffr")

cat(cff_doi$doi)

cat(cff_doi$`preferred-citation`$doi)
```

Back to @tbl-summary.

### identifiers

This key includes all the possible identifiers of the package:

- From the `DESCRIPTION` field, it includes all the URLs not included in
  [url](#url) or [repository-code](#repository-code).
- From the `CITATION` file, it includes all the DOIs not included in [doi](#doi)
  and the identifiers (if any) not included in the `"identifiers"` key of
  [preferred-citation](#preferred-citation).
- If the package is on **CRAN** and it has a `CITATION` file providing a DOI,
  the DOI provided by **CRAN** is added as well.

```{r}
#| label: identifiers
file <- system.file("examples/DESCRIPTION_many_urls", package = "cffr")

pkg <- desc::desc(file)

cat(pkg$get_urls())

cat(cff_create(file)$url)

cat(cff_create(file)$`repository-code`)

cff_create(file)$identifiers
```

Back to @tbl-summary.

### keywords

This key is extracted from the `DESCRIPTION` file. The keywords should appear in
the `DESCRIPTION` as:

```         
...
X-schema.org-keywords: keyword1, keyword2, keyword3
```

```{r}
#| label: keyword
# A DESCRIPTION file without keywords
nokeywords <- system.file("examples/DESCRIPTION_basic", package = "cffr")
tmp2 <- tempfile("DESCRIPTION")

# Create a temporary file.
file.copy(nokeywords, tmp2)

pkgnokeywords <- desc::desc(tmp2)
cffnokeywords <- cff_create(tmp2)

# This will not appear.
cat(cffnokeywords$keywords)

pkgnokeywords

# Adding keywords.

desc::desc_set(
  "X-schema.org-keywords",
  "keyword1, keyword2, keyword3",
  file = tmp2
)

cat(cff_create(tmp2)$keywords)
```

Additionally, if the source code of the package is hosted on GitHub, **cffr**
can retrieve the topics of the repository with the [GitHub
API](https://docs.github.com/en/rest) and include those topics as keywords. This
option is controlled with the `gh_keywords` argument:

```{r}
#| label: ghkeyword
# Get a `cff` object from **jsonvalidate**.

jsonval <- cff_create("jsonvalidate")

# Keywords are retrieved from the GitHub repository.

jsonval

# Check keywords.
jsonval$keywords

# The repository.
jsonval$`repository-code`
```

Back to @tbl-summary.

### license

This key is extracted from the `"License"` field of the `DESCRIPTION` file.

```{r}
#| label: license
cff_obj <- cff_create("yaml")

cat(cff_obj$license)

pkg <- desc::desc(file.path(find.package("yaml"), "DESCRIPTION"))

cat(pkg$get("License"))
```

Back to @tbl-summary.

### license-url

This key is not extracted from the metadata of the package. See the description
on the [Guide to CFF schema
v1.2.0](https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md#license-url).

> - **description**: The URL of the license text under which the software or
>   dataset is licensed (only for non-standard licenses not included in the
>   [SPDX License
>   List](https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md#definitionslicense-enum)).
>
> - **usage**:
>
>   ``` yaml
>   license-url: "https://obscure-licenses.com?id=1234"
>   ```

Back to @tbl-summary.

### message

This key is extracted from the `DESCRIPTION` field, specifically as:

```{r}
#| eval: false
#| code-fold: false
msg <- paste0(
  'To cite package "',
  "NAME_OF_THE_PACKAGE",
  '" in publications use:'
)
```

```{r}
#| label: message
cat(cff_create("jsonlite")$message)
```

Back to @tbl-summary.

### preferred-citation {#preferred-citation}

This key is extracted from the `CITATION` file. If several references are
provided, it selects the first citation as the `"preferred-citation"` and the
rest of them as [references](#references).

```{r}
#| label: preferred-citation
cffobj <- cff_create("rmarkdown")

cffobj$`preferred-citation`

citation("rmarkdown")[1]
```

Back to @tbl-summary.

### references {#references}

This key is extracted from the `CITATION` file if several references are
provided. The first citation is considered as the
[preferred-citation](#preferred-citation) and the rest of them as
`"references"`. It also extracts package dependencies and adds them to this
field using `citation(auto = TRUE)` on each dependency.

```{r}
#| label: references
cffobj <- cff_create("rmarkdown")

cffobj$references

citation("rmarkdown")[-1]
```

Back to @tbl-summary.

### repository

This key is extracted from the `"Repository"` field of the `DESCRIPTION` file.
Usually, this field is automatically populated when a package is hosted on a
repository (like **CRAN** or the [r-universe](https://r-universe.dev/search)).
For packages without this field in the `DESCRIPTION` (which is the typical case
for an in-development package), **cffr** tries to find the package in any of the
default repositories specified in `options("repos")`.

In the case of [Bioconductor](https://bioconductor.org/) packages, those are
identified if a
["biocViews"](https://contributions.bioconductor.org/description.html#biocviews)
is present in the `DESCRIPTION` file.

If **cffr** detects that the package is available on **CRAN**, it returns the
canonical URL form of the package (that is,
<https://CRAN.R-project.org/package=jsonlite>).

```{r}
#| label: repository
# Installed package.

inst <- cff_create("jsonlite")

cat(inst$repository)

# Demo file downloaded from the r-universe.

runiv <- system.file("examples/DESCRIPTION_r_universe", package = "cffr")
runiv_cff <- cff_create(runiv)

cat(runiv_cff$repository)

desc::desc(runiv)$get("Repository")

# For an in-development package.

norepo <- system.file("examples/DESCRIPTION_basic", package = "cffr")

# No repository.
norepo_cff <- cff_create(norepo)

cat(norepo_cff[["repository"]])

# Change the name to a known package on CRAN: ggplot2.

tmp <- tempfile("DESCRIPTION")
file.copy(norepo, tmp)

# Change name.
desc::desc_set("Package", "ggplot2", file = tmp)

cat(cff_create(tmp)[["repository"]])
```

Back to @tbl-summary.

### repository-artifact

This key is not extracted from the metadata of the package. See the description
on the [Guide to CFF schema
v1.2.0](https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md#repository-artifact).

> - **description**: The URL of the work in a build artifact/binary repository
>   (when the work is software).
>
> - **usage**:
>
>   ``` yaml
>   repository-artifact: "https://search.maven.org/artifact/org.corpus-tools/cff-maven-plugin/0.4.0/maven-plugin"
>   ```

Back to @tbl-summary.

### repository-code {#repository-code}

This key is extracted from the `"BugReports"` or `"URL"` fields on the
`DESCRIPTION` file. **cffr** tries to identify the URL of the source on the
following repositories:

- [GitHub](https://github.com/).
- [GitLab](https://about.gitlab.com/).
- [R-Forge](https://r-forge.r-project.org/).
- [Bitbucket](https://bitbucket.org/).
- [Codeberg](https://codeberg.org/).

```{r}
#| label: repository-code
# Installed package on GitHub.

cat(cff_create("jsonlite")$`repository-code`)

# GitLab

gitlab <- system.file("examples/DESCRIPTION_gitlab", package = "cffr")

cat(cff_create(gitlab)$`repository-code`)

# Codeberg
codeberg <- system.file("examples/DESCRIPTION_codeberg", package = "cffr")

cat(cff_create(codeberg)$`repository-code`)
```

Back to @tbl-summary.

### title

This key is extracted from the `"Description"` field of the `DESCRIPTION` file.

```{r}
#| eval: false
#| code-fold: false
title <- paste0("NAME_OF_THE_PACKAGE", ": ", "TITLE_OF_THE_PACKAGE")
```

```{r}
#| label: title
# Installed package.

cat(cff_create("testthat")$title)
```

Back to @tbl-summary.

### type

Fixed value equal to `"software"`. The other possible value is `"dataset"`. See
the description on the [Guide to CFF schema
v1.2.0](https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md#type).

Back to @tbl-summary.

### url {#url}

This key is extracted from the `"BugReports"` or `"URL"` fields on the
`DESCRIPTION` file. It corresponds to the first URL that is different from
[repository-code](#repository-code).

```{r}
#| label: url
# Many URLs.
manyurls <- system.file("examples/DESCRIPTION_many_urls", package = "cffr")

cat(cff_create(manyurls)$url)

# Check.

desc::desc(manyurls)
```

Back to @tbl-summary.

### version

This key is extracted from the `"Version"` field on the `DESCRIPTION` file.

```{r}
#| label: version
#| code-fold: false
# Should be (>= 3.0.0).
cat(cff_create("testthat")$version)
```

Back to @tbl-summary.

## References
