forked from datastax/spark-cassandra-connector
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCHANGES.txt
400 lines (355 loc) · 19.2 KB
/
CHANGES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
1.4.0 M1
* Upgrade Spark to 1.4.0 (SPARKC-192)
********************************************************************************
1.3.0 M2 (unreleased)
* Support for loading, saving and mapping Cassandra tuples (SPARKC-172)
* Table and keyspace Name suggestions in DataFrames API (SPARKC-186)
* Removed thrift completely (SPARKC-94)
- removed cassandra-thrift.jar dependency
- automatic split sizing based on system.size_estimates table
- add option to manually force the number of splits
- Cassandra listen addresses fetched from system.peers table
- spark.cassandra.connection.(rpc|native).port replaced with spark.cassandra.connection.port
* Refactored ColumnSelector to avoid circular dependency on TableDef (SPARKC-177)
* Support for modifying C* Collections using saveToCassandra (SPARKC-147)
1.3.0 M1
* Removed use of Thrift describe_ring and replaced it with native Java Driver
support for fetching TokenRanges (SPARKC-93)
* Support for converting Cassandra UDT column values to Scala case-class objects (SPARKC-4)
- Introduced a common interface for TableDef and UserDefinedType
- Removed ClassTag from ColumnMapper
- Removed by-index column references and replaced them with by-name ColumnRefs
- Created a GettableDataToMappedTypeConverter that can handle UDTs
- ClassBasedRowReader delegates object conversion instead of doing it by itself;
this improves unit-testability of code
* Decoupled PredicatePushDown logic from Spark (SPARKC-166)
- added support for Filter and Expression predicates
- improved code testability and added unit-tests
* Basic Datasource API integration and keyspace/cluster level settings (SPARKC-112, SPARKC-162)
* Added support to use aliases with Tuples (SPARKC-125)
********************************************************************************
********************************************************************************
1.2.3 (unreleased)
* Support for connection compressions configuration (SPARKC-124)
* Support for connection encryption configuration (SPARKC-118)
1.2.2
* Updated Spark version to 1.2.2, Scala to 2.10.5 / 2.11.6
* Enabled Java API artifact generation for Scala 2.11.x (SPARKC-130)
* Fixed a bug preventing a custom type converter from being used
when saving data. RowWriter implementations must
perform type conversions now. (SPARKC-157)
1.2.1
* Fixed problems with mixed case keyspaces in ReplicaMapper (SPARKC-159)
1.2.0
* Removed conversion method rom WriteOption which accepted object of Duration type
from Spark Streaming (SPARKC-106)
* Fixed compilation warnings (SPARKC-76)
* Fixed ScalaDoc warnings (SPARKC-119)
* Synchronized TypeTag access in various places (SPARKC-123)
* Adds both hostname and hostaddress as partition preferredLocations (SPARKC-126)
1.2.0 RC 3
* Select aliases are no longer ignored in CassandraRow objects (SPARKC-109)
* Fix picking up username and password from SparkConf (SPARKC-108)
* Fix creating CassandraConnectorSource in the executor environment (SPARKC-111)
1.2.0 RC 2
* Cross cluster table join and write for Spark SQL (SPARKC-73)
* Enabling / disabling metrics in metrics configuration file and other metrics fixes (SPARKC-91)
* Provided a way to set custom auth config and connection factory properties (SPARKC-105)
* Fixed setting custom connection factory and other properties (SPAKRC-102)
* Fixed Java API (SPARKC-95)
1.2.0 RC 1
* More Spark SQL predicate push (SPARKC-72)
* Fixed some Java API problems and refactored its internals (SPARKC-77)
* Allowing specification of column to property map (aliases) for reading and writing objects
(SPARKC-9)
* Added interface for doing primary key joins between arbitrary RDDs and Cassandra (SPARKC-25)
* Added method for repartitioning an RDD based upon the replication of a Cassandra Table (SPARKC-25)
* Fixed setting batch.level and batch.buffer.size in SparkConf. (SPARKC-84)
- Renamed output.batch.level to output.batch.grouping.key.
- Renamed output.batch.buffer.size to output.batch.grouping.buffer.size.
- Renamed batch grouping key option "all" to "none".
* Error out on invalid config properties (SPARKC-90)
* Set Java driver version to 2.1.5 and Cassandra to 2.1.3 (SPARKC-92)
* Moved Spark streaming related methods from CassandraJavaUtil to CassandraStreamingJavaUtil
(SPARKC-80)
1.2.0 alpha 3
* Exposed spanBy and spanByKey in Java API (SPARKC-39)
* Added automatic generation of Cassandra table schema from a Scala type and
saving an RDD to a new Cassandra table by saveAsCassandraTable method (SPARKC-38)
* Added support for write throughput limiting (SPARKC-57)
* Added EmptyCassandraRDD (SPARKC-37)
* Exposed authConf in CassandraConnector
* Overridden count() implementation in CassandraRDD which uses native Cassandra count (SPARKC-52)
* Removed custom Logging class (SPARKC-54)
* Added support for passing the limit clause to CQL in order to fetch top n results (SPARKC-31)
* Added support for pushing down order by clause for explicitly specifying an order of rows within
Cassandra partition (SPARKC-32)
* Fixed problems when rows are mapped to classes with inherited fields (SPARKC-70)
* Support for compiling with Scala 2.10 and 2.11 (SPARKC-22)
1.2.0 alpha 2
* All connection properties can be set on SparkConf / CassandraConnectorConf objects and
the settings are automatically distributed to Spark Executors (SPARKC-28)
* Report Connector metrics to Spark metrics system (SPARKC-27)
* Upgraded to Spark 1.2.1 (SPARKC-30)
* Add conversion from java.util.Date to java.sqlTimestamp for Spark SQL (#512)
* Upgraded to Scala 2.11 and scala version cross build (SPARKC-22)
1.2.0 alpha 1
* Added support for TTL and timestamp in the writer (#153)
* Added support for UDT column types (SPARKC-1)
* Upgraded Spark to version 1.2.0 (SPARKC-15)
* For 1.2.0 release, table name with dot is not supported for Spark SQL,
it will be fixed in the next release
* Added fast spanBy and spanByKey methods to RDDs useful for grouping Cassandra
data by partition key / clustering columns. Useful for e.g. time-series data. (SPARKC-2)
* Refactored the write path so that the writes are now token-aware (SPARKC-5, previously #442)
* Added support for INSET predicate pushdown (patch by granturing)
********************************************************************************
1.1.2
* Backport SPARKC-8, retrieval of TTL and write time
* Upgraded to Spark 1.1.1
* Synchronized ReflectionUtil findScalaObject and findSingletonClassInstance methods
to avoid problems with Scala 2.10 lack thread safety in the reflection subsystem (SPARKC-107)
* Fixed populating ReadConf with properties from SparkConf (SPARKC-121)
* Adds both hostname and hostaddress as partition preferredLocations (SPARKC-141, backport of SPARKC-126)
1.1.1
* Fixed NoSuchElementException in SparkSQL predicate pushdown code (SPARKC-7, #454)
1.1.0
* Switch to java driver 2.1.3 and Guava 14.0.1 (yay!).
1.1.0 rc 3
* Fix NPE when saving CassandraRows containing null values (#446)
1.1.0 rc 2
* Added JavaTypeConverter to make is easy to implement custom TypeConverter in Java (#429)
* Fix SparkSQL failures caused by presence of non-selected columns of UDT type in the table.
1.1.0 rc 1
* Fixed problem with setting a batch size in bytes (#435)
* Fixed handling of null column values in Java API (#429)
1.1.0 beta 2
* Fixed bug in Java API which might cause ClassNotFoundException
* Added stubs for UDTs. It is possible to read tables with UDTs, but
values of UDTs will come out as java driver UDTValue objects (#374)
* Upgraded Java driver to 2.1.2 version and fixed deprecation warnings.
Use correct protocolVersion when serializing/deserializing Cassandra columns.
* Don't fail with "contact points contain multiple datacenters"
if one or more of the nodes given as contact points don't have DC information,
because they are unreachable.
* Removed annoying slf4j warnings when running tests (#395)
* CassandraRDD is fully lazy now - initialization no longer fetches Cassandra
schema (#339).
1.1.0 beta 1
* Redesigned Java API, some refactorings (#300)
* Simplified AuthConf - more responsibility on CassandraConnectionFactory
* Enhanced and improved performance of the embedded Kafka framework
- Kafka consumer and producer added that are configurable
- Kafka shutdown cleaned up
- Kafka server more configurable for speed and use cases
* Added new word count demo and a new Kafka streaming word count demo
* Modified build file to allow easier module id for usages of 'sbt project'
1.1.0 alpha 4
* Use asynchronous prefetching of multi-page ResultSets in CassandraRDD
to reduce waiting for Cassandra query results.
* Make token range start and end be parameters of the query, not part of the query
template to reduce the number of statements requiring preparation.
* Added type converter for GregorianCalendar (#334)
1.1.0 alpha 3
* Pluggable mechanism for obtaining connections to Cassandra
Ability to pass custom CassandraConnector to CassandraRDDs (#192)
* Provided a row reader which allows to create RDDs of pairs of objects as well
as RDDs of simple objects handled by type converter directly;
added meaningful compiler messages when invalid type was provided (#88)
* Fixed serialization problem in CassandraSQLContext by making conf transient (#310)
* Cleaned up the SBT assembly task and added build documentation (#315)
1.1.0 alpha 2
* Upgraded Apache Spark to 1.1.0.
* Upgraded to be Cassandra 2.1.0 and Cassandra 2.0 compatible.
* Added spark.cassandra.connection.local_dc option
* Added spark.cassandra.connection.timeout_ms option
* Added spark.cassandra.read.timeout_ms option
* Added support for SparkSQL (#197)
* Fixed problems with saving DStreams to Cassandra directly (#280)
1.1.0 alpha 1
* Add an ./sbt/sbt script (like with spark) so people don't need to install sbt
* Replace internal spark Logging with own class (#245)
* Accept partition key predicates in CassandraRDD#where. (#37)
* Add indexedColumn to ColumnDef (#122)
* Upgrade Spark to version 1.0.2
* Removed deprecated toArray, replaced with collect.
* Updated imports to org.apache.spark.streaming.receiver
and import org.apache.spark.streaming.receiver.ActorHelper
* Updated streaming demo and spec for Spark 1.0.2 behavior compatibility
* Added new StreamingEvent types for Spark 1.0.2 Receiver readiness
* Added the following Spark Streaming dependencies to the demos module:
Kafka, Twitter, ZeroMQ
* Added embedded Kafka and ZooKeeper servers for the Kafka Streaming demo
- keeping non private for user prototyping
* Added new Kafka Spark Streaming demo which reads from Kafka
and writes to Cassandra (Twitter and ZeroMQ are next)
* Added new 'embedded' module
- Refactored the 'connector' module's IT SparkRepl, CassandraServer and
CassandraServerRunner as well as 'demos' EmbeddedKafka
and EmbeddedZookeeper to the 'embedded' module. This allows the 'embedded'
module to be used as a dependency by the 'connector' IT tests, demos,
and user local quick prototyping without requiring a Spark and Cassandra
Cluster, local or remote, to get started.
********************************************************************************
1.0.7 (unreleased)
* Improved error message when attempting to transform CassandraRDD after deserialization (SPARKC-29)
1.0.6
* Upgraded Java Driver to 2.0.8 and added some logging in LocalNodeFirstLoadBalancingPolicy (SPARKC-18)
1.0.5
* Fixed setting output consistency level which was being set on prepared
statements instead of being set on batches (#463)
1.0.4
* Synchronized TypeConverter.forType methods to workaround some Scala 2.10
reflection thread-safety problems (#235)
* Synchronized computation of TypeTags in TypeConverter#targetTypeTag,
ColumnType#scalaTypeTag methods and other places to workaround some of
Scala 2.10 reflection thread-safety problems (#364)
* Downgraded Guava to version 14.
Upgraded Java driver to 2.0.7.
Upgraded Cassandra to 2.0.11. (#366)
* Made SparkContext variable transient in SparkContextFunctions (#373)
* Fixed saving to tables with uppercase column names (#377)
* Fixed saving collections of Tuple1 (#420)
1.0.3
* Fixed handling of Cassandra rpc_address set to 0.0.0.0 (#332)
1.0.2
* Fixed batch counter columns updates (#234, #316)
* Expose both rpc addresses and local addresses of cassandra nodes in partition
preferred locations (#325)
* Cleaned up the SBT assembly task and added build documentation
(backport of #315)
1.0.1
* Add logging of error message when asynchronous task fails in AsyncExecutor.
(#265)
* Fix connection problems with fetching token ranges from hosts with
rpc_address different than listen_address.
Log host address(es) and ports on connection failures.
Close thrift transport if connection fails for some reason after opening the transport,
e.g. authentication failure.
* Upgrade cassandra driver to 2.0.6.
1.0.0
* Fix memory leak in PreparedStatementCache leaking PreparedStatements after
closing Cluster objects. (#183)
* Allow multiple comma-separated hosts in spark.cassandra.connection.host
1.0.0 RC 6
* Fix reading a Cassandra table as an RDD of Scala class objects in REPL
1.0.0 RC 5
* Added assembly task to the build, in order to build fat jars. (#126)
- Added a system property flag to enable assembly for the demo module
which is disabled by default.
- Added simple submit script to submit a demo assembly jar to a local
spark master
* Fix error message on column conversion failure. (#208)
* Add toMap and nameOf methods to CassandraRow.
Reduce size of serialized CassandraRow. (#194)
* Fixed a bug which caused problems with connecting to Cassandra under
heavy load (#185)
* Skip $_outer constructor param in ReflectionColumnMapper, fixes working with
case classes in Spark shell, added appropriate test cases (#188)
* Added streaming demo with documentation, new streaming page to docs,
new README for running all demos. (#115)
1.0.0 RC 4
* Upgrade Java driver for Cassandra to 2.0.4. (#171)
* Added missing CassandraRDD#getPreferredLocations to improve data-locality. (#164)
* Don't use hosts outside the datacenter of the connection host. (#137)
1.0.0 RC 3
* Fix open Cluster leak in CassandraConnector#createSession (#142)
* TableWriter#saveToCassandra accepts ColumnSelector instead of Seq[String] for
passing a column list. Seq[String] still accepted for backwards compatibility,
but deprecated.
* Added Travis CI build yaml file.
* Added demos module. (#84)
* Extracted Java API into a separate module (#99)
1.0.0 RC 2
* Language specific highlighting in the documentation (#105)
* Fixed a bug which caused problems when a column of VarChar type was used
in where clause. (04fd8d9)
* Fixed an AnyObjectFactory bug which caused problems with instantiation of
classes which were defined inside Scala objects. (#82)
* Added support for Spark Streaming. (#89)
- Added implicit wrappers which simplify access to Cassandra related
functionality from StreamingContext and DStream.
- Added a stub for further Spark Streaming integration tests.
* Upgraded Java API. (#98)
- Refactored existing Java API
- Added CassandraJavaRDD as a JAVA counterpart of CassandraRDD
- Added Java helpers for accessing Spark Streaming related methods
- Added several integration tests
- Added a documentation page for JAVA API
- Extended Java API demo
- Added a lot of API docs
1.0.0 RC 1
* Ability to register custom TypeConverters. (#32)
* Handle null values in StringConverter. (#79)
* Improved error message when there are no replicas in the local DC. (#69)
1.0.0 beta 2
* DSE compatibility improvements. (#64)
- Column types and type converters use TypeTags instead of Strings to
announce their types.
- CassandraRDD#tableDef is public now.
- Added methods for getting keyspaces and tables by name from the Schema.
- Refactored Schema class - loading schema from Cassandra moved
from the constructor to a factory method.
- Remove unused methods for returning system keyspaces from Schema.
* Improved JavaDoc explaining CassandraConnector withClusterDo
and withSessionDo semantics.
* Support for updating counter columns. (#27)
* Configure consistency level for reads/writes. Set default consistency
levels to LOCAL_ONE for reads and writes. (#42)
* Values passed as arguments to `where` are converted to proper types
expected by the java-driver. (#26)
* Include more information in the exception message when query in
CassandraRDD fails. (#69)
* Fallback to describe_ring in case describe_local_ring does not exist to
improve compatibility with earlier Cassandra versions. (#47)
* Session object sharing in CassandraConnector. (#41 and #53)
* Modify cassandra.* configuration settings to prefix with "spark." so they
can be used from spark-shell and set via conf/spark-default.conf (#51)
* Fixed race condition in AsyncExecutor causing inaccuracy of success/failure
counters. (#40)
* Added Java API. Fixed a bug in ClassBasedRowReader which caused
problems when data were read into Java beans. Added type converters
for boxed Java primitive types. (#11)
* Extracted out initial testkit for unit and integration tests, and future
testkit module.
* Added new WritableToCassandra trait which both RDDFunction and
DStreamFunction both implement. Documentation moved to WritableToCassandra.
* Fixed broken links in API documentation.
* Refactored RDDFunctions and DStreamFunctions - merged saveToCassandra
overloaded methods into a single method with defaults.
1.0.0 beta 1
* CassandraRDD#createStatement doesn't obtain a new session, but reuses
the task's Session.
* Integration tests. (#12)
* Added contains and indexOf methods to CassandraRow. Missing value from
CassandraRow does not break writing - null is written instead.
* Caching of PreparedStatements. Subsequent preparations of the same
PreparedStatement are returned from the cache and don't cause
a warning. (#3)
* Move partitioner ForkJoinPool to companion object to share it between RDD's.
(#24)
* Fixed thread-safety of ClassBasedRowReader.
* Detailed user guide with code examples, reviewed by Kris Hahn. (#15)
* Support for saving RDD[CassandraRow]. New demo program copying data from one
table to another. (#16)
* Using a PreparedStatement make createStatement method compatible with
Cassandra 1.2.x. (#17)
* More and better logging. Using org.apache.spark.Logging instead of log4j.
(#13)
* Better error message when attempting to write to a table that doesn't exist.
(#1)
* Added more robust scala build to allow for future clean releases, and
publish settings for later integration. (#8)
* Refactored classes and objects used for authentication to support pluggable
authentication.
* Record cause of TypeConversionException.
* Improved error messages informing about failure to convert column value.
Fixed missing conversion for setters.
* Split CassandraWriter into RowWriter and TableWriter.
* Refactored package structure. Moved classes from rdd to rdd.reader
and rdd.partitioner packages. Renamed RowTransformers to RowReaders.
* Fix writing ByteBuffers to Cassandra.
* Throw meaningful exception when non-existing column is requested by name.
* Add isNull method on CassandraRow.
* Fix converting blobs to arrays of bytes in CassandraRow. Fix printing blobs
and collections.