Package 'sparkavro' reference manual

Title:	Load Avro file into 'Apache Spark'
Description:	Load Avro Files into 'Apache Spark' using 'sparklyr'. This allows to read files from 'Apache Avro' <https://avro.apache.org/>.
Authors:	Aki Ariga
Maintainer:	Aki Ariga <[email protected]>
License:	Apache License 2.0 \| file LICENSE
Version:	0.3.0
Built:	2025-03-26 05:03:43 UTC
Source:	https://github.com/chezou/sparkavro

Reads a Avro File into Apache Spark

Description

Reads a Avro file into Apache Spark using sparklyr.

Usage

spark_read_avro(
  sc,
  name,
  path,
  readOptions = list(),
  repartition = 0L,
  memory = TRUE,
  overwrite = TRUE
)
spark_read_avro(
  sc,
  name,
  path,
  readOptions = list(),
  repartition = 0L,
  memory = TRUE,
  overwrite = TRUE
)

Arguments

`sc`	An active `spark_connection`.
`name`	The name to assign to the newly generated table.
`path`	The path to the file. Needs to be accessible from the cluster. Supports the ‘⁠"hdfs://"⁠’, ‘⁠"s3n://"⁠’ and ‘⁠"file://"⁠’ protocols.
`readOptions`	A list of strings with additional options.
`repartition`	The number of partitions used to distribute the generated table. Use 0 (the default) to avoid partitioning.
`memory`	Boolean; should the data be loaded eagerly into memory? (That is, should the table be cached?)
`overwrite`	Boolean; overwrite the table with the given name if it already exists?

Examples

## Not run: 
## If you haven't got a Spark cluster, you can install Spark locally like this
library(sparklyr)
spark_install(version = "2.0.1")

sc <- spark_connect(master = "local")
df <- spark_read_avro(
  sc,
  "twitter",
  system.file("extdata/twitter.avro", package = "sparkavro"),
  repartition = FALSE,
  memory = FALSE,
  overwrite = FALSE
)

spark_disconnect(sc)

## End(Not run)
## Not run: 
## If you haven't got a Spark cluster, you can install Spark locally like this
library(sparklyr)
spark_install(version = "2.0.1")

sc <- spark_connect(master = "local")
df <- spark_read_avro(
  sc,
  "twitter",
  system.file("extdata/twitter.avro", package = "sparkavro"),
  repartition = FALSE,
  memory = FALSE,
  overwrite = FALSE
)

spark_disconnect(sc)

## End(Not run)

Write a Spark DataFrame to a Avro file

Description

Serialize a Spark DataFrame to the Parquet format.

Usage

spark_write_avro(x, path, mode = NULL, options = list())
spark_write_avro(x, path, mode = NULL, options = list())

Arguments

`x`	A Spark DataFrame or dplyr operation
`path`	The path to the file. Needs to be accessible from the cluster. Supports the ‘⁠"hdfs://"⁠’, ‘⁠"s3n://"⁠’ and ‘⁠"file://"⁠’ protocols.
`mode`	Specifies the behavior when data or table already exists.
`options`	A list of strings with additional options. See http://spark.apache.org/docs/latest/sql-programming-guide.html#configuration.

Package 'sparkavro'

Help Index

Reads a Avro File into Apache Spark

Description

Usage

Arguments

Examples

Write a Spark DataFrame to a Avro file

Description

Usage

Arguments