Cassandra + Spark + R connection

Asked

Viewed 82 times

0

How do I connect Cassandra to Spark?

Cassandra > Spark > R

I’ve already been able to connect R to Spark, now I need to bring the data that is stored in Cassandra to Spark and finally analyze it in R. Can someone help me? Thanks in advance.

1 answer

0

Spark doesn’t know how to talk about Cassandra, but its functionality can be extended through the use of connectors. The Datastax people produced a connector using Spark and Scala (scripting language that runs on JVM) and is available for download on Github:

https://github.com/datastax/spark-cassandra-connector

After building the repository on your computer, there will be two jar files in a directory called "target", one for Scala and one for Java. It’s good to have the jar accessible through a path that’s easy to remember.

Start Spark again (from Spark directory), but this time load the jar (remember to set the directory where the jar is):

bin/spark-shell --jars ~/spark-cassandra-connector-assembly-1.4.0-SNAPSHOT.jar

Now type the following at the scala prompt:

sc.stop
import com.datastax.spark.connector._, org.apache.spark.SparkContext, org.apache.spark.SparkContext._, org.apache.spark.SparkConf
val conf = new SparkConf(true).set("spark.cassandra.connection.host", "localhost")
val sc = new SparkContext(conf)

This takes the context of Spark and replaces it with one that is connected to your local database.

Type the following in the scala shell:

val test_spark_rdd = sc.cassandraTable("NOME_KEYSPACE", "SUA_TABELA")
test_spark_rdd.first

(instead of NOME_KEYSPACE and SUA_TABELA, place Keyspace and the table of the Keyspace).

I hope I helped in some way. Att

Browser other questions tagged

You are not signed in. Login or sign up in order to post.