Skip to content
Snippets Groups Projects
Commit b9361b02 authored by Fisher, Jason C.'s avatar Fisher, Jason C.
Browse files

add step 3 in readme

parent 3cc92f3b
No related branches found
No related tags found
3 merge requests!82Merge in develop and bump version,!81Add Zingre (2024) data release,!80Add TreinenOthers2024b and rename 2024a
......@@ -110,6 +110,24 @@ The `abstract` key is written as character strings and encoded in UTF-8, a chara
The `annotation` key consists of two parts: `text` and `source`. The `text` part contains the string of the annotation text, while the `source` part denotes the origin of the annotation, which is indicated by its publication key value (`pub_id`).
### Step 3: Add Content from Published Files (Optional)
This step requires a folder named **archive** to be located in the package's top-level directory. Within this folder, the published files of the INLPO should be organized into subfolders based on the year of publication and the publication identifier. For example: `2005/KnobelOthers2005/ofr20051223.pdf`. The file names must be specified in the publications metadata under the `files` key in the publication entry (Step 2).
The text within a published file (such as a PDF document) will be extracted and stored in the package folder `data-raw/corpus`. When the package datasets are created, this text is included in the package corpus. The corpus is a collection of all the published text data, used for analysis, research, and various other processing tasks within the package.
The cover image for a publication is extracted and stored in the package folder `vignettes`. The image extraction process is manual. For example, the cover image for Knobel and others (2005) can be extracted using the following R command:
```r
inlpubs::add_content("KnobelOthers2005", type = "image", destdir = "vignettes")
```
To extract cover images for all 2005 publications, use:
```r
inlpubs::add_content(year = 2005, type = "image", destdir = "vignettes")
```
## Execute Script
To execute the R script from a terminal, you can utilize the command `make datasets`. Please refer to the [Makefile](../Makefile) located at the root of the package repository for more details. The Makefile is a document that houses a collection of directives for constructing the package. It delineates the interdependencies among files and outlines the requisite commands for their compilation.
......
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment