Releases · derrickoswald/CIMSpark

26 Apr 14:13

1.9.1

5fc478e

CIMReader-2.11-2.0.1-1.9.1

Fix for Issue #6 Dropped Elements.

When determining the amount extra to read beyond the end of a Split, the computation was based on the FSDataInputStream.available() function. This returns an integer and not a long. So for all files over the 2GB barrier (maximum integer value) the available() was topping out at 2147483647.
This meant the extra being read in was zero - no over read at all - and hence the last dropped element at the end of some Splits for large files. The striped files were all under 2GB and hence did not have this problem.

This has been fixed by using the FileSystem.getFileStatus() function instead which returns a long.

Assets 3

31 Mar 09:25

derrickoswald

renamed_1.9.0

90cc4af

CIMReader-2.11-2.0.1-1.9.0

Release under the new name: CIMReader

add Asset/LifecycleDate to edges
add "split size" option (ch.ninecode.cim.split_maxsize) to ease memory pressure for worker nodes

Assets 3

28 Feb 14:05

derrickoswald

1.9.0

37754f6

CIMScala-2.11-2.0.1-1.9.0

Maintenance release for GridLAB-D work.

fixes for Join and Topological Processing options to update the superclass RDDs of affected RDDs
name topological islands by trafo low voltage pin
- when using option ch.ninecode.cim.do_topo_islands=true, an attempt is made to name the islands based on the transformer secondary pin (or failing tha, the topological node name)
add checkpointing, optimize Graphx trace
- if checkpointing is enabled (that is, the Spark context CheckpointDir has been set) final RDDs will be checkpointed
add abgang nummer, add mRID to classes not inheriting from IdentifiedObject
- when using optioon ch.ninecode.cim.make_edges=true, a column for description has been added to the generated edges RDD which contains the Abgang Nummer
- fixed a problem for DataFrames (and hence also for R data.frames) where objects not inheriting from IdentifiedObject had no primary key (mRID)

Assets 3

06 Jan 09:18

derrickoswald

1.8.1

8c5d6b4

CIMScala-2.11-2.0.1-1.8.1

Fix warning and error messages when creating redges.RData.

Note

Existing R scripts work, but issue warning messages like so:

Warning message:
'sparkR.init' is deprecated.
Use 'sparkR.session' instead.
See help("Deprecated")

Warning message:
'sparkRSQL.init' is deprecated.
Use 'sparkR.session' instead.
See help("Deprecated")

Warning message:
'sql(sqlContext...)' is deprecated.
Use 'sql(sqlQuery)' instead.
See help("Deprecated")

It is possible to eliminate these messages using the script below, but testing this code against large data sets indicates severe memory issues.

So, at this time, we recommend using the same R script as was used with version 1.6.0 - ignoring warning messages - and not using the code below.

R code changes for Spark 2.0 (avoids warning messages):

# record the load time
begin = proc.time ()

# set up the Spark system
Sys.setenv (YARN_CONF_DIR="/spark/spark-2.0.2-bin-hadoop2.7/conf")
Sys.setenv (SPARK_HOME="spark/spark-2.0.2-bin-hadoop2.7")
library (SparkR, lib.loc = c (file.path (Sys.getenv("SPARK_HOME"), "R", "lib")))
sparkR.session ("spark://sandbox:7077", "Sample", sparkJars = c ("CIMScala-2.11-2.0.1-1.8.1.jar"), sparkEnvir = list (spark.driver.memory="1g", spark.executor.memory="4g", spark.serializer="org.apache.spark.serializer.KryoSerializer"))

# record the start time
pre = proc.time ()

# read the data file and process topologically and make the edge RDD
elements = sql ("create temporary view elements using ch.ninecode.cim options (path 'hdfs://sandbox:8020/data/NIS_CIM_Export_sias_current_20161220_V9.rdf', StorageLevel 'MEMORY_AND_DISK_SER', ch.ninecode.cim.make_edges 'true', ch.ninecode.cim.do_topo 'false', ch.ninecode.cim.do_topo_islands 'false')")
head (sql ("select * from elements")) # triggers evaluation

# record the time spent creating the redges data frame
post = proc.time ()

# read the edges RDD as an R data frame
edges = sql ("select * from edges")
redges = SparkR::collect (edges, stringsAsFactors=FALSE)

# save the redges data frame
save ("redges", file="./NIS_CIM_Export_sias_current_20161220_V9")

finish = proc.time ()

# show timing
print (paste ("setup", as.numeric (pre[3] - begin[3])))
print (paste ("read", as.numeric (post[3] - pre[3])))
print (paste ("redges", as.numeric (finish[3] - post[3])))

# example to read an RDD directly
terminals = sql ("select * from Terminal")
rterminals = SparkR::collect (terminals, stringsAsFactors=FALSE)

# example to read a three-way join of RDD
switches = sql ("select s.sup.sup.sup.sup.mRID mRID, s.sup.sup.sup.sup.aliasName aliasName, s.sup.sup.sup.sup.name name, s.sup.sup.sup.sup.description description, open, normalOpen no, l.CoordinateSystem cs, p.xPosition, p.yPosition from Switch s, Location l, PositionPoint p where s.sup.sup.sup.Location = l.sup.mRID and s.sup.sup.sup.Location = p.Location and p.sequenceNumber = 0")
rswitches = SparkR::collect (switches, stringsAsFactors=FALSE)

Timings on NIS AWS cluster for the sequence of operations on 8017082910 byte RDF file is:
setup 3.089 seconds
read 27.636 seconds
redges 1296.595 seconds

Assets 3

03 Jan 10:18

derrickoswald

1.8.0

33ce9b1

CIMScala-2.11-2.0.1-1.8.0

Initial Spark 2.0.1 release.

uses UDT (User Defined Type) hack
rework class definitions
- CIMRelation not using HadoopFsRelation
- DefaultSource not using HadoopFsRelationProvider
update Docker environment

Assets 3

29 Nov 13:49

derrickoswald

1.7.2

0d47ab8

CIMScala-2.10-1.6.0-1.7.2

Alter Edges creation to use the top level container (Substation or DistributionBox) where possible.

Assets 3

28 Nov 09:19

derrickoswald

1.7.1

e6346b3

CIMScala-2.10-1.6.0-1.7.1

This is just a small update to revert to the original Edge schema when topological processing is not enabled,
dropping the columns related to topological islands.

Assets 3

21 Nov 08:39

derrickoswald

1.7.0

642bb45

CIMScala-2.10-1.6.0-1.7.0

This update includes many enhancements and extensions. Briefly, these are:

support for multiple input files, specifically IS-U CIM files and joining them
topological processor, creating TopologicalNode and TopologicalIsland elements linked from ConnectivityNode
adding MIT license to clarify the status
model package improvements
- completed Wires
- added Metering
- added InfAssets

Assets 3

05 Sep 08:19

derrickoswald

1.6.0

d8b8033

CIMScala-2.10-1.6.0-1.6.0

Update artifact naming to include:

version of Scala (2.10) which is necessary for some Scala repositories
version of Spark (1.6.0) which is the required target system
version of CIMScala (1.6.0) which will change from release to release and follows semantic versioning

Note it is just coincidence that the CIMScala version is the same as the Spark version for this release.

Assets 3

05 Sep 07:57

derrickoswald

0.6.0

ff19424

CIMScala-2.10-1.4.1-0.6.0

Update artefact naming to include:

version of Scala (2.10) which is necessary for some Scala repositories
version of Spark (1.4.1) which is the required target system
version of CIMScala (0.6.0) which will change from release to release and follows semantic versioning

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Note

Releases: derrickoswald/CIMSpark

CIMReader-2.11-2.0.1-1.9.1

CIMReader-2.11-2.0.1-1.9.0

CIMScala-2.11-2.0.1-1.9.0

CIMScala-2.11-2.0.1-1.8.1

Note

CIMScala-2.11-2.0.1-1.8.0

CIMScala-2.10-1.6.0-1.7.2

CIMScala-2.10-1.6.0-1.7.1

CIMScala-2.10-1.6.0-1.7.0

CIMScala-2.10-1.6.0-1.6.0

CIMScala-2.10-1.4.1-0.6.0