If multiple paths are given, the function looks for a year (4 digits) in the
file names to use the year as partition, see vignettes("design") for more
information about the partitioning. If a year is found, the data is saved
partitioned by year in the output directory, e.g.,
path/to/register_name/year=2020/part-0.parquet.
If no year can be found or only one path is given, the data is converted
without partitioning and saved as a Parquet file with the name specified in
the output path. E.g., if output_path is path/to/register, the Parquet file
will be saved as path/to/register.parquet.
If any duplicate rows are found, they are deduplicated before saving to
Parquet. If duplicate rows are found in multiple source files, the row from
the file that appears first in path is kept. Rows that are almost identical
across different files (e.g. different years) but that have a difference in
values are kept, as determining which is the correct value requires domain
knowledge.
Value
Returns a character scalar with the path to the created Parquet
file(s) (output_path), so it can be used in a
targets pipeline.
Examples
if (FALSE) { # \dontrun{
convert_to_parquet(
list_sas_files("path/to/sas/files"),
"output/path/to/register_name"
)
} # }