@@ -110,6 +110,24 @@ The `abstract` key is written as character strings and encoded in UTF-8, a chara
The `annotation` key consists of two parts: `text` and `source`. The `text` part contains the string of the annotation text, while the `source` part denotes the origin of the annotation, which is indicated by its publication key value (`pub_id`).
### Step 3: Add Content from Published Files (Optional)
This step requires a folder named **archive** to be located in the package's top-level directory. Within this folder, the published files of the INLPO should be organized into subfolders based on the year of publication and the publication identifier. For example: `2005/KnobelOthers2005/ofr20051223.pdf`. The file names must be specified in the publications metadata under the `files` key in the publication entry (Step 2).
The text within a published file (such as a PDF document) will be extracted and stored in the package folder `data-raw/corpus`. When the package datasets are created, this text is included in the package corpus. The corpus is a collection of all the published text data, used for analysis, research, and various other processing tasks within the package.
The cover image for a publication is extracted and stored in the package folder `vignettes`. The image extraction process is manual. For example, the cover image for Knobel and others (2005) can be extracted using the following R command:
To execute the R script from a terminal, you can utilize the command `make datasets`. Please refer to the [Makefile](../Makefile) located at the root of the package repository for more details. The Makefile is a document that houses a collection of directives for constructing the package. It delineates the interdependencies among files and outlines the requisite commands for their compilation.