R: get.text

get.text

R Documentation

get.text

Description

Extracts main textual content from NISO-JATS coded XML file or text as sectioned text.

Usage

get.text(
  x,
  sectionsplit = "",
  grepsection = "",
  letter.convert = TRUE,
  greek2text = FALSE,
  sentences = FALSE,
  paragraph = FALSE,
  cermine = "auto",
  rm.table = TRUE,
  rm.formula = TRUE,
  rm.xref = TRUE,
  rm.media = TRUE,
  rm.graphic = TRUE,
  rm.ext_link = TRUE
)

Arguments

`x`	a NISO-JATS coded XML file or text.
`sectionsplit`	search patterns for section split (forced to lower case), e.g. c("intro", "method", "result", "discus").
`grepsection`	search pattern to reduce text to specific section namings only.
`letter.convert`	Logical. If TRUE converts hexadecimal and HTML coded characters to Unicode.
`greek2text`	Logical. If TRUE some greek letters and special characters will be unified to textual representation (important to extract stats).
`sentences`	Logical. IF TRUE text is returned as sectioned list with sentences.
`paragraph`	Logical. IF TRUE "<New paragraph>" is added at the end of each paragraph to enable manual splitting at paragraphs.
`cermine`	Logical. If TRUE CERMINE specific error handling and letter conversion will be applied. If set to "auto" file name ending with 'cermxml$' will set cermine=TRUE.
`rm.table`	Logical. If TRUE removes <table> tag from text.
`rm.formula`	Logical. If TRUE removes <formula> tags.
`rm.xref`	Logical. If TRUE removes <xref> tag (citing) from text.
`rm.media`	Logical. If TRUE removes <media> tag from text.
`rm.graphic`	Logical. If TRUE removes <graphic> and <fig> tag from text.
`rm.ext_link`	Logical. If TRUE removes <ext link> tag from text.

Value

List with two elements. 1: Character vector with section title/s, 2: Character vector with floating text of sections or list with vector of sentences per section/s if sentences=TRUE.

get.text

Description

Usage

Arguments

Value

See Also