Title: | Load Avro file into 'Apache Spark' |
---|---|
Description: | Load Avro Files into 'Apache Spark' using 'sparklyr'. This allows to read files from 'Apache Avro' <https://avro.apache.org/>. |
Authors: | Aki Ariga |
Maintainer: | Aki Ariga <[email protected]> |
License: | Apache License 2.0 | file LICENSE |
Version: | 0.3.0 |
Built: | 2025-02-24 05:19:02 UTC |
Source: | https://github.com/chezou/sparkavro |
Reads a Avro file into Apache Spark using sparklyr.
spark_read_avro( sc, name, path, readOptions = list(), repartition = 0L, memory = TRUE, overwrite = TRUE )
spark_read_avro( sc, name, path, readOptions = list(), repartition = 0L, memory = TRUE, overwrite = TRUE )
sc |
An active |
name |
The name to assign to the newly generated table. |
path |
The path to the file. Needs to be accessible from the cluster. Supports the ‘"hdfs://"’, ‘"s3n://"’ and ‘"file://"’ protocols. |
readOptions |
A list of strings with additional options. |
repartition |
The number of partitions used to distribute the generated table. Use 0 (the default) to avoid partitioning. |
memory |
Boolean; should the data be loaded eagerly into memory? (That is, should the table be cached?) |
overwrite |
Boolean; overwrite the table with the given name if it already exists? |
## Not run: ## If you haven't got a Spark cluster, you can install Spark locally like this library(sparklyr) spark_install(version = "2.0.1") sc <- spark_connect(master = "local") df <- spark_read_avro( sc, "twitter", system.file("extdata/twitter.avro", package = "sparkavro"), repartition = FALSE, memory = FALSE, overwrite = FALSE ) spark_disconnect(sc) ## End(Not run)
## Not run: ## If you haven't got a Spark cluster, you can install Spark locally like this library(sparklyr) spark_install(version = "2.0.1") sc <- spark_connect(master = "local") df <- spark_read_avro( sc, "twitter", system.file("extdata/twitter.avro", package = "sparkavro"), repartition = FALSE, memory = FALSE, overwrite = FALSE ) spark_disconnect(sc) ## End(Not run)
Serialize a Spark DataFrame to the Parquet format.
spark_write_avro(x, path, mode = NULL, options = list())
spark_write_avro(x, path, mode = NULL, options = list())
x |
A Spark DataFrame or dplyr operation |
path |
The path to the file. Needs to be accessible from the cluster. Supports the ‘"hdfs://"’, ‘"s3n://"’ and ‘"file://"’ protocols. |
mode |
Specifies the behavior when data or table already exists. |
options |
A list of strings with additional options. See http://spark.apache.org/docs/latest/sql-programming-guide.html#configuration. |