diff --git a/data-raw/README.md b/data-raw/README.md index 1600529255a5dfa10771c3bbe00d5ce007534566..a673dc43b031f23e8abfeb954f920112742a8b3b 100644 --- a/data-raw/README.md +++ b/data-raw/README.md @@ -110,6 +110,24 @@ The `abstract` key is written as character strings and encoded in UTF-8, a chara The `annotation` key consists of two parts: `text` and `source`. The `text` part contains the string of the annotation text, while the `source` part denotes the origin of the annotation, which is indicated by its publication key value (`pub_id`). +### Step 3: Add Content from Published Files (Optional) + +This step requires a folder named **archive** to be located in the package's top-level directory. Within this folder, the published files of the INLPO should be organized into subfolders based on the year of publication and the publication identifier. For example: `2005/KnobelOthers2005/ofr20051223.pdf`. The file names must be specified in the publications metadata under the `files` key in the publication entry (Step 2). + +The text within a published file (such as a PDF document) will be extracted and stored in the package folder `data-raw/corpus`. When the package datasets are created, this text is included in the package corpus. The corpus is a collection of all the published text data, used for analysis, research, and various other processing tasks within the package. + +The cover image for a publication is extracted and stored in the package folder `vignettes`. The image extraction process is manual. For example, the cover image for Knobel and others (2005) can be extracted using the following R command: + +```r +inlpubs::add_content("KnobelOthers2005", type = "image", destdir = "vignettes") +``` + +To extract cover images for all 2005 publications, use: + +```r +inlpubs::add_content(year = 2005, type = "image", destdir = "vignettes") +``` + ## Execute Script To execute the R script from a terminal, you can utilize the command `make datasets`. Please refer to the [Makefile](../Makefile) located at the root of the package repository for more details. The Makefile is a document that houses a collection of directives for constructing the package. It delineates the interdependencies among files and outlines the requisite commands for their compilation. diff --git a/vignettes/pub-TreinenOthers2024.jpg b/vignettes/pub-TreinenOthers2024a.jpg similarity index 100% rename from vignettes/pub-TreinenOthers2024.jpg rename to vignettes/pub-TreinenOthers2024a.jpg