Skip to contents

Convert SAS register files to Parquet

Main functions for converting SAS files containing Danish register data into Apache Parquet files.

convert_to_parquet()
Convert register SAS file(s) and save to Parquet format
get_database_name()
Get the name of the database from the file name
get_database_year()
Get the year of database from the file name
get_database_year_external()
Get the years from the external database's name of the file path.
get_filename_no_ext()
Get the filename without its file extension
get_parquet_year()
Get the year of the parquet file from the file name
get_path_duplicates()
Get paths with duplicate file names
get_path_no_duplicates()
Get paths with no duplicate file names
get_path_specific_database()
Get path with specific database file
get_path_with_year()
Get path with year in the file name
get_path_without_year()
Get path without year in the file name
list_databases()
Lists all the cleaned Parquet databases in the cleaned-data folder
list_dirs()
List directories at given path
list_parquet_files()
List Parquet registers in a directory
list_sas_files()
List SAS registers in a directory
path_alter_filename_as_dir()
Convert path to end with filename/
path_alter_filename_year_as_dir()
Convert file name of a path to end in /year=YYYY
path_alter_to_cleaned_dir()
Convert path to cleaned directory.
path_alter_to_output_parquet_partition()
Convert the path to represent a Parquet Partition in another directory
path_as_df()
Create dataframe with path and file name
path_duplicates_as_list()
Get duplicate paths as a list
path_eksterne_dir()
Path to external directory
path_ext_set_parquet_partition()
Convert file path to Parquet Partition
path_grunddata_dir()
Path to "grunddata" directory
path_parquet_dirs()
Path to Parquet directory
path_parquet_external()
Path to external Parquet files
path_parquet_registers()
Path to Parquet registers
path_population_file()
Path to population file
path_rawdata()
Path to rawdata directory
path_sas_formats()
Path to SAS formats
path_set_dir()
Alter the path of a file to a Parquet partition in another directory
path_subdir()
Path to subdirectory
path_workdata()
Path to workdata directory
read_register()
Read a Parquet register