Brief introduction to Scala and Breeze for statistical computing
Brief introduction to Scala and Breeze for statistical computing
Introduction
In the previous post I outlined why I think Scala is a good language for statistical computing and data science . In this post I want to give a quick taste of Scala and the Breeze numerical library to whet the appetite of the uninitiated. This post certainly won’t provide enough material to get started using Scala in anger – but I’ll try and provide a few pointers along the way. It also won’t be very interesting to anyone who knows Scala – I’m not introducing any of the very cool Scala stuff here – I think that some of the most powerful and interesting Scala language features can be a bit frightening for new users.
To reproduce the examples, you need to install Scala and Breeze. This isn’t very tricky, but I don’t want to get bogged down with a detailed walk-through here – I want to concentrate on the Scala language and Breeze library. You just need to install a recent version of Java , then Scala , and then Breeze . You might also want SBT and/or theScalaIDE , though neither of these are necessary. Then you need to run the Scala REPL with the Breeze library in the classpath. There are several ways one can do this. The most obvious is to just run scala with the path to Breeze manually specified (or specified in an environment variable). Alternatively, you could run a console from an sbt session with a Breeze dependency (which is what I actually did for this post), or you could use a Scala Worksheet from inside a ScalaIDE project with a Breeze dependency.
A Scala REPL session
A first glimpse of Scala
We’ll start with a few simple Scala concepts that are not dependent on Breeze. For further information, see the Scala documentation .
Welcome to Scala version 2.10.3 (OpenJDK 64-Bit Server VM, Java 1.7.0_25).
Type in expressions to have them evaluated.
Type :help for more information. scala> val a = 5
a: Int = 5 scala> a
res0: Int = 5
So far, so good. Using the Scala REPL is much like using the Python or R command line, so will be very familiar to anyone used to these or similar languages. The first thing to note is that labels need to be declared on first use. We have declared a to be a val . These are immutable values , which can not be just re-assigned, as the following code illustrates.
scala> a = 6
<console>:8: error: reassignment to val
a = 6
^
scala> a
res1: Int = 5
Immutability seems to baffle people unfamiliar with functional programming. But fear not, as Scala allows declaration of mutable variables as well:
scala> var b = 7
b: Int = 7 scala> b
res2: Int = 7 scala> b = 8
b: Int = 8 scala> b
res3: Int = 8
The Zen of functional programming is to realise that immutability is generally a good thing, but that really isn’t the point of this post. Scala has excellent support for both mutable and immutable collections as part of the standard library. See the API docs for more details. For example, it has immutable lists.
scala> val c = List(3,4,5,6)
c: List[Int] = List(3, 4, 5, 6) scala> c(1)
res4: Int = 4 scala> c.sum
res5: Int = 18 scala> c.length
res6: Int = 4 scala> c.product
res7: Int = 360
Again, this should be pretty familiar stuff for anyone familiar with Python. Note that thesum and product methods are special cases of reduce operations, which are well supported in Scala. For example, we could compute the sum reduction using
scala> c.foldLeft(0)((x,y) => x+y)
res8: Int = 18
or the slightly more condensed form given below, and similarly for the product reduction.
scala> c.foldLeft(0)(_+_)
res9: Int = 18 scala> c.foldLeft(1)(_*_)
res10: Int = 360
Scala also has a nice immutable Vector class, which offers a range of constant time operations (but note that this has nothing to do with the mutable Vector class that is part of the Breeze library).
scala> val d = Vector(2,3,4,5,6,7,8,9)
d: scala.collection.immutable.Vector[Int] = Vector(2, 3, 4, 5, 6, 7, 8, 9) scala> d
res11: scala.collection.immutable.Vector[Int] = Vector(2, 3, 4, 5, 6, 7, 8, 9) scala> d.slice(3,6)
res12: scala.collection.immutable.Vector[Int] = Vector(5, 6, 7) scala> val e = d.updated(3,0)
e: scala.collection.immutable.Vector[Int] = Vector(2, 3, 4, 0, 6, 7, 8, 9) scala> d
res13: scala.collection.immutable.Vector[Int] = Vector(2, 3, 4, 5, 6, 7, 8, 9) scala> e
res14: scala.collection.immutable.Vector[Int] = Vector(2, 3, 4, 0, 6, 7, 8, 9)
Note that when e is created as an updated version of d the whole of d is not copied – only the parts that have been updated. And we don’t have to worry that aspects of d ande point to the same information in memory, as they are both immutable… As should be clear by now, Scala has excellent support for functional programming techniques. In addition to the reduce operations mentioned already, maps and filters are also well covered.
scala> val f=(1 to 10).toList
f: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) scala> f
res15: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) scala> f.map(x => x*x)
res16: List[Int] = List(1, 4, 9, 16, 25, 36, 49, 64, 81, 100) scala> f map {x => x*x}
res17: List[Int] = List(1, 4, 9, 16, 25, 36, 49, 64, 81, 100) scala> f filter {_ > 4}
res18: List[Int] = List(5, 6, 7, 8, 9, 10)
Note how Scala allows methods with a single argument to be written as an infix operator, making for more readable code.
A first look at Breeze
The next part of the session requires the Breeze library – see the Breeze quickstart guide for further details. We begin by taking a quick look at everyone’s favourite topic of non-uniform random number generation. Let’s start by generating a couple of draws from a Poisson distribution with mean 3.
scala> import breeze.stats.distributions._
import breeze.stats.distributions._ scala> val poi = Poisson(3.0)
poi: breeze.stats.distributions.Poisson = Poisson(3.0) scala> poi.draw
res19: Int = 2 scala> poi.draw
res20: Int = 3
If more than a single draw is required, an iid sample can be obtained.
scala> val x = poi.sample(10)
x: IndexedSeq[Int] = Vector(2, 3, 3, 4, 2, 2, 1, 2, 4, 2) scala> x
res21: IndexedSeq[Int] = Vector(2, 3, 3, 4, 2, 2, 1, 2, 4, 2) scala> x.sum
res22: Int = 25 scala> x.length
res23: Int = 10 scala> x.sum.toDouble/x.length
res24: Double = 2.5
Note that this Vector is mutable. The probability mass function (PMF) of the Poisson distribution is also available.
scala> poi.probabilityOf(2)
res25: Double = 0.22404180765538775 scala> x map {x => poi.probabilityOf(x)}
res26: IndexedSeq[Double] = Vector(0.22404180765538775, 0.22404180765538775, 0.22404180765538775, 0.16803135574154085, 0.22404180765538775, 0.22404180765538775, 0.14936120510359185, 0.22404180765538775, 0.16803135574154085, 0.22404180765538775) scala> x map {poi.probabilityOf(_)}
res27: IndexedSeq[Double] = Vector(0.22404180765538775, 0.22404180765538775, 0.22404180765538775, 0.16803135574154085, 0.22404180765538775, 0.22404180765538775, 0.14936120510359185, 0.22404180765538775, 0.16803135574154085, 0.22404180765538775)
Obviously, Gaussian variables (and Gamma, and several others) are supported in a similar way.
scala> val gau=Gaussian(0.0,1.0)
gau: breeze.stats.distributions.Gaussian = Gaussian(0.0, 1.0) scala> gau.draw
res28: Double = 1.606121255846881 scala> gau.draw
res29: Double = -0.1747896055492152 scala> val y=gau.sample(20)
y: IndexedSeq[Double] = Vector(-1.3758577012869702, -1.2148314970824652, -0.022501190144116855, 0.3244006323566883, 0.35978577573558407, 0.9651857500320781, -0.40834034207848985, 0.11583348205331555, -0.8797699986810634, -0.33609738668214695, 0.7043252811790879, -1.2045594639823656, 0.19442688045065826, -0.31442160076087067, 0.06313451540562891, -1.5304745838587115, -1.2372764884467027, 0.5875490994217284, -0.9385520597707431, -0.6647903243363228) scala> y
res30: IndexedSeq[Double] = Vector(-1.3758577012869702, -1.2148314970824652, -0.022501190144116855, 0.3244006323566883, 0.35978577573558407, 0.9651857500320781, -0.40834034207848985, 0.11583348205331555, -0.8797699986810634, -0.33609738668214695, 0.7043252811790879, -1.2045594639823656, 0.19442688045065826, -0.31442160076087067, 0.06313451540562891, -1.5304745838587115, -1.2372764884467027, 0.5875490994217284, -0.9385520597707431, -0.6647903243363228) scala> y.sum/y.length
res31: Double = -0.34064156102380994 scala> y map {gau.logPdf(_)}
res32: IndexedSeq[Double] = Vector(-1.8654307403000054, -1.6568463163564844, -0.9191916849836235, -0.9715564183413823, -0.9836614354155007, -1.3847302992371653, -1.0023094506890617, -0.9256472309869705, -1.3059361584943119, -0.975419259871957, -1.1669755840586733, -1.6444202843394145, -0.93783943912556, -0.9683690047171869, -0.9209315167224245, -2.090114759123421, -1.6843650876361744, -1.0915455053203147, -1.359378517654625, -1.1399116208702693) scala> Gamma(2.0,3.0).sample(5)
res33: IndexedSeq[Double] = Vector(2.38436441278546, 2.125017198373521, 2.333118708811143, 5.880076392566909, 2.0901427084667503)
This is all good stuff for those of us who like to do Markov chain Monte Carlo. There are not masses of statistical data analysis routines built into Breeze, but a few basic tools are provided, including some basic summary statistics.
scala> import breeze.stats.DescriptiveStats._
import breeze.stats.DescriptiveStats._ scala> mean(y)
res34: Double = -0.34064156102380994 scala> variance(y)
res35: Double = 0.574257149387757 scala> meanAndVariance(y)
res36: (Double, Double) = (-0.34064156102380994,0.574257149387757)
Support for linear algebra is an important part of any scientific library. Here the Breeze developers have made the wise decision to provide a nice Scala interface to netlib-java . This in turn calls out to any native optimised BLAS or LAPACK libraries installed on the system, but will fall back to Java code if no optimised libraries are available. This means that linear algebra code using Scala and Breeze should run as fast as code written in any other language, including C, C++ and Fortran, provided that optimised libraries are installed on the system. For further details see the Breeze linear algebra guide . Let’s start by creating and messing with a dense vector.
scala> import breeze.linalg._
import breeze.linalg._ scala> val v=DenseVector(y.toArray)
v: breeze.linalg.DenseVector[Double] = DenseVector(-1.3758577012869702, -1.2148314970824652, -0.022501190144116855, 0.3244006323566883, 0.35978577573558407, 0.9651857500320781, -0.40834034207848985, 0.11583348205331555, -0.8797699986810634, -0.33609738668214695, 0.7043252811790879, -1.2045594639823656, 0.19442688045065826, -0.31442160076087067, 0.06313451540562891, -1.5304745838587115, -1.2372764884467027, 0.5875490994217284, -0.9385520597707431, -0.6647903243363228) scala> v(1) = 0 scala> v
res38: breeze.linalg.DenseVector[Double] = DenseVector(-1.3758577012869702, 0.0, -0.022501190144116855, 0.3244006323566883, 0.35978577573558407, 0.9651857500320781, -0.40834034207848985, 0.11583348205331555, -0.8797699986810634, -0.33609738668214695, 0.7043252811790879, -1.2045594639823656, 0.19442688045065826, -0.31442160076087067, 0.06313451540562891, -1.5304745838587115, -1.2372764884467027, 0.5875490994217284, -0.9385520597707431, -0.6647903243363228) scala> v(1 to 3) := 1.0
res39: breeze.linalg.DenseVector[Double] = DenseVector(1.0, 1.0, 1.0) scala> v
res40: breeze.linalg.DenseVector[Double] = DenseVector(-1.3758577012869702, 1.0, 1.0, 1.0, 0.35978577573558407, 0.9651857500320781, -0.40834034207848985, 0.11583348205331555, -0.8797699986810634, -0.33609738668214695, 0.7043252811790879, -1.2045594639823656, 0.19442688045065826, -0.31442160076087067, 0.06313451540562891, -1.5304745838587115, -1.2372764884467027, 0.5875490994217284, -0.9385520597707431, -0.6647903243363228) scala> v(1 to 3) := DenseVector(1.0,1.5,2.0)
res41: breeze.linalg.DenseVector[Double] = DenseVector(1.0, 1.5, 2.0) scala> v
res42: breeze.linalg.DenseVector[Double] = DenseVector(-1.3758577012869702, 1.0, 1.5, 2.0, 0.35978577573558407, 0.9651857500320781, -0.40834034207848985, 0.11583348205331555, -0.8797699986810634, -0.33609738668214695, 0.7043252811790879, -1.2045594639823656, 0.19442688045065826, -0.31442160076087067, 0.06313451540562891, -1.5304745838587115, -1.2372764884467027, 0.5875490994217284, -0.9385520597707431, -0.6647903243363228) scala> v :> 0.0
res43: breeze.linalg.BitVector = BitVector(1, 2, 3, 4, 5, 7, 10, 12, 14, 17) scala> (v :> 0.0).toArray
res44: Array[Boolean] = Array(false, true, true, true, true, true, false, true, false, false, true, false, true, false, true, false, false, true, false, false)
Next let’s create and mess around with some dense matrices.
scala> val m = new DenseMatrix(5,4,linspace(1.0,20.0,20).toArray)
m: breeze.linalg.DenseMatrix[Double] =
1.0 6.0 11.0 16.0
2.0 7.0 12.0 17.0
3.0 8.0 13.0 18.0
4.0 9.0 14.0 19.0
5.0 10.0 15.0 20.0 scala> m
res45: breeze.linalg.DenseMatrix[Double] =
1.0 6.0 11.0 16.0
2.0 7.0 12.0 17.0
3.0 8.0 13.0 18.0
4.0 9.0 14.0 19.0
5.0 10.0 15.0 20.0 scala> m.rows
res46: Int = 5 scala> m.cols
res47: Int = 4 scala> m(::,1)
res48: breeze.linalg.DenseVector[Double] = DenseVector(6.0, 7.0, 8.0, 9.0, 10.0) scala> m(1,::)
res49: breeze.linalg.DenseMatrix[Double] = 2.0 7.0 12.0 17.0 scala> m(1,::) := linspace(1.0,2.0,4)
res50: breeze.linalg.DenseMatrix[Double] = 1.0 1.3333333333333333 1.6666666666666665 2.0 scala> m
res51: breeze.linalg.DenseMatrix[Double] =
1.0 6.0 11.0 16.0
1.0 1.3333333333333333 1.6666666666666665 2.0
3.0 8.0 13.0 18.0
4.0 9.0 14.0 19.0
5.0 10.0 15.0 20.0 scala> scala> val n = m.t
n: breeze.linalg.DenseMatrix[Double] =
1.0 1.0 3.0 4.0 5.0
6.0 1.3333333333333333 8.0 9.0 10.0
11.0 1.6666666666666665 13.0 14.0 15.0
16.0 2.0 18.0 19.0 20.0 scala> n
res52: breeze.linalg.DenseMatrix[Double] =
1.0 1.0 3.0 4.0 5.0
6.0 1.3333333333333333 8.0 9.0 10.0
11.0 1.6666666666666665 13.0 14.0 15.0
16.0 2.0 18.0 19.0 20.0 scala> val o = m*n
o: breeze.linalg.DenseMatrix[Double] =
414.0 59.33333333333333 482.0 516.0 550.0
59.33333333333333 9.555555555555555 71.33333333333333 77.33333333333333 83.33333333333333
482.0 71.33333333333333 566.0 608.0 650.0
516.0 77.33333333333333 608.0 654.0 700.0
550.0 83.33333333333333 650.0 700.0 750.0 scala> o
res53: breeze.linalg.DenseMatrix[Double] =
414.0 59.33333333333333 482.0 516.0 550.0
59.33333333333333 9.555555555555555 71.33333333333333 77.33333333333333 83.33333333333333
482.0 71.33333333333333 566.0 608.0 650.0
516.0 77.33333333333333 608.0 654.0 700.0
550.0 83.33333333333333 650.0 700.0 750.0 scala> val p = n*m
p: breeze.linalg.DenseMatrix[Double] =
52.0 117.33333333333333 182.66666666666666 248.0
117.33333333333333 282.77777777777777 448.22222222222223 613.6666666666667
182.66666666666666 448.22222222222223 713.7777777777778 979.3333333333334
248.0 613.6666666666667 979.3333333333334 1345.0 scala> p
res54: breeze.linalg.DenseMatrix[Double] =
52.0 117.33333333333333 182.66666666666666 248.0
117.33333333333333 282.77777777777777 448.22222222222223 613.6666666666667
182.66666666666666 448.22222222222223 713.7777777777778 979.3333333333334
248.0 613.6666666666667 979.3333333333334 1345.0
So, messing around with vectors and matrices is more-or-less as convenient as in well-known dynamic and math languages. To conclude this section, let us see how to simulate some data from a regression model and then solve the least squares problem to obtain the estimated regression coefficients. We will simulate 1,000 observations from a model with 5 covariates.
scala> val X = new DenseMatrix(1000,5,gau.sample(5000).toArray)
X: breeze.linalg.DenseMatrix[Double] =
-0.40186606934180685 0.9847148198711287 ... (5 total)
-0.4760404521336951 -0.833737041320742 ...
-0.3315199616926892 -0.19460446824586297 ...
-0.14764615494496836 -0.17947658245206904 ...
-0.8357372755800905 -2.456222113596015 ...
-0.44458309216683184 1.848007773944826 ...
0.060314034896221065 0.5254462055311016 ...
0.8637867740789016 -0.9712570453363925 ...
0.11620167261655819 -1.2231380938032232 ...
-0.3335514290842617 -0.7487303696662753 ...
-0.5598937433421866 0.11083382409013512 ...
-1.7213395389510568 1.1717491221846357 ...
-1.078873342208984 0.9386859686451607 ...
-0.7793854546738327 -0.9829373863442161 ...
-1.054275201631216 0.10100826507456745 ...
-0.6947188686537832 1.215...
scala> val b0 = linspace(1.0,2.0,5)
b0: breeze.linalg.DenseVector[Double] = DenseVector(1.0, 1.25, 1.5, 1.75, 2.0) scala> val y0 = X * b0
y0: breeze.linalg.DenseVector[Double] = DenseVector(0.08200546839589107, -0.5992571365601228, -5.646398002309553, -7.346136663325798, -8.486423788193362, 1.451119214541837, -0.25792385841948406, 2.324936340609002, -1.2285599639827862, -4.030261316643863, -4.1732627416377674, -0.5077151099958077, -0.2087263741903591, 0.46678616461409383, 2.0244342278575975, 1.775756468177401, -4.799821190728213, -1.8518388060564481, 1.5892306875621767, -1.6528539564387008, 1.4064864330994125, -0.8734630221484178, -7.75470002781836, -0.2893619536998493, -5.972958583649336, -4.952666733286302, 0.5431255990489059, -2.477076684976403, -0.6473617571867107, -0.509338416957489, -1.5415350935719594, -0.47068802465681125, 2.546118380362026, -7.940401988804477, -1.037049442788122, -1.564016663370888, -3.3147087994...
scala> val y = y0 + DenseVector(gau.sample(1000).toArray)
y: breeze.linalg.DenseVector[Double] = DenseVector(-0.572127338358624, -0.16481167194161406, -4.213873268823003, -10.142015065601388, -7.893898543052863, 1.7881055848475076, -0.26987820512025357, 3.3289433195054148, -2.514141419925489, -4.643625974157769, -3.8061000214061886, 0.6462624993109218, 0.23603338389134149, 1.0211137806779267, 2.0061727641393317, 0.022624943149799348, -5.429601401989341, -1.836181225242386, 1.0265599173053048, -0.1673732536615371, 0.8418249443853956, -1.1547110533101967, -8.392100167478764, -1.1586377992526877, -6.400362975646245, -5.487018086963841, 0.3038055584347069, -1.2247410435868684, -0.06476921390724344, -1.5039074374120407, -1.0189111630970076, 1.307339668865724, 2.048320821568789, -8.769328824477714, -0.9104251029228555, -1.3533910178496698, -2.178788...
scala> val b = X \ y // defaults to a QR-solve of the least squares problem
b: breeze.linalg.DenseVector[Double] = DenseVector(0.9952708232116663, 1.2344546192238952, 1.5543512339052412, 1.744091673457169, 1.9874158953720507)
So all of the most important building blocks for statistical computing are included in the Breeze library.
At this point it is really worth reminding yourself that Scala is actually a statically typedlanguage, despite the fact that in this session we have not explicitly declared the type of anything at all! This is because Scala has type inference , which makes type declarations optional when it is straightforward for the compiler to figure out what the types must be. For example, for our very first expression, val a = 5 , because the RHS is an Int , it is clear that the LHS must also be an Int , and so the compiler infers that the type of a must be an Int , and treats the code as if the type had been declared asval a: Int = 5 . This type inference makes Scala feel very much like a dynamic language in general use. Typically, we carefully specify the types of function arguments (and often the return type of the function, too), but then for the main body of each function, just let the compiler figure out all of the types and write code as if the language were dynamic. To me, this seems like the best of all worlds. The convenience of dynamic languages with the safety of static typing.
Declaring the types of function arguments is not usually a big deal, as the following simple example demonstrates.
scala> def mean(arr: Array[Int]): Double = {
| arr.sum.toDouble/arr.length
| }
mean: (arr: Array[Int])Double scala> mean(Array(3,1,4,5))
res55: Double = 3.25
A complete Scala program
For completeness, I will finish this post with a very simple but complete Scala/Breeze program. In a previous post I discussed a simple Gibbs sampler in Scala , but in that post I used the Java COLT library for random number generation. Below is a version using Breeze instead.
object BreezeGibbs { import breeze.stats.distributions._
import scala.math.sqrt class State(val x: Double, val y: Double) def nextIter(s: State): State = {
val newX = Gamma(3.0, 1.0 / ((s.y) * (s.y) + 4.0)).draw()
new State(newX, Gaussian(1.0 / (newX + 1), 1.0 / sqrt(2 * newX + 2)).draw())
} def nextThinnedIter(s: State, left: Int): State = {
if (left == 0) s
else nextThinnedIter(nextIter(s), left - 1)
} def genIters(s: State, current: Int, stop: Int, thin: Int): State = {
if (!(current > stop)) {
println(current + " " + s.x + " " + s.y)
genIters(nextThinnedIter(s, thin), current + 1, stop, thin)
} else s
} def main(args: Array[String]) {
println("Iter x y")
genIters(new State(0.0, 0.0), 1, 50000, 1000)
} }
Summary
In this post I’ve tried to give a quick taste of the Scala language and the Breeze library for those used to dynamic languages for statistical computing. Hopefully I’ve illustrated that the basics don’t look too different, so there is no reason to fear Scala. It is perfectly possible to start using Scala as a better and faster Python or R. Once you’ve mastered the basics, you can then start exploring the full power of the language. There’s loads of introductory Scala material to be found on-line. It probably makes sense to start with the links I’ve highlighted above. After that, just start searching – there’s an interesting set of tutorials I noticed just the other day. A very time-efficient way to learn Scala quickly is to do the FP with Scala course on Coursera, but whether this makes sense will depend on when it is next running. For those who prefer real books, the book Programming in Scala is the standard reference, and I’ve also found Functional programming in Scalato be useful (free text of the first edition of the former and a draft of the latter can be found on-line).
REPL Script
Below is a copy of the complete REPL script, for reference.
// start with non-Breeze stuff val a = 5
a
a = 6
a var b = 7
b
b = 8
b val c = List(3,4,5,6)
c(1)
c.sum
c.length
c.product
c.foldLeft(0)((x,y) => x+y)
c.foldLeft(0)(_+_)
c.foldLeft(1)(_*_) val d = Vector(2,3,4,5,6,7,8,9)
d
d.slice(3,6)
val e = d.updated(3,0)
d
e val f=(1 to 10).toList
f
f.map(x => x*x)
f map {x => x*x}
f filter {_ > 4} // introduce breeze through random distributions
// https://github.com/scalanlp/breeze/wiki/Quickstart import breeze.stats.distributions._
val poi = Poisson(3.0)
poi.draw
poi.draw
val x = poi.sample(10)
x
x.sum
x.length
x.sum.toDouble/x.length
poi.probabilityOf(2)
x map {x => poi.probabilityOf(x)}
x map {poi.probabilityOf(_)} val gau=Gaussian(0.0,1.0)
gau.draw
gau.draw
val y=gau.sample(20)
y
y.sum/y.length
y map {gau.logPdf(_)} Gamma(2.0,3.0).sample(5) import breeze.stats.DescriptiveStats._
mean(y)
variance(y)
meanAndVariance(y) // move on to linear algebra
// https://github.com/scalanlp/breeze/wiki/Breeze-Linear-Algebra import breeze.linalg._
val v=DenseVector(y.toArray)
v(1) = 0
v
v(1 to 3) := 1.0
v
v(1 to 3) := DenseVector(1.0,1.5,2.0)
v
v :> 0.0
(v :> 0.0).toArray val m = new DenseMatrix(5,4,linspace(1.0,20.0,20).toArray)
m
m.rows
m.cols
m(::,1)
m(1,::)
m(1,::) := linspace(1.0,2.0,4)
m val n = m.t
n
val o = m*n
o
val p = n*m
p // regression and QR solution val X = new DenseMatrix(1000,5,gau.sample(5000).toArray)
val b0 = linspace(1.0,2.0,5)
val y0 = X * b0
val y = y0 + DenseVector(gau.sample(1000).toArray)
val b = X \ y // defaults to a QR-solve of the least squares problem // a simple function example def mean(arr: Array[Int]): Double = {
arr.sum.toDouble/arr.length
} mean(Array(3,1,4,5))
Brief introduction to Scala and Breeze for statistical computing的更多相关文章
- MAST 397B: Introduction to Statistical Computing
MAST 397B: Introduction to Statistical ComputingABSTRACTNotes: (i) This project can be done in group ...
- The R Project for Statistical Computing
[Home] Download CRAN R Project About R Contributors What’s New? Mailing Lists Bug Tracking Conferenc ...
- scala 下 sigmoid 与breeze.numeric.sigmoid差异对比
scala> val beforeInit = System.nanoTime;val handsgn = rd.map(x => 1.0 / (1.0 + Math.exp(-x))); ...
- Scala class的构造方法与继承
有java背景的人,很清楚java是如何定义构造方法以及继承的.在scala里面,继承和java有些相似.但是构造方法的定义,就不大一样了,应该说是差别还是很大的.在java里面,定义构造方法,就是定 ...
- How-to: Do Statistical Analysis with Impala and R
sklearn实战-乳腺癌细胞数据挖掘(博客主亲自录制视频教程) https://study.163.com/course/introduction.htm?courseId=1005269003&a ...
- Can you share some Scala List class examples?
Scala List FAQ: Can you share some Scala List class examples? The Scala List class may be the most c ...
- 机器学习资源汇总----来自于tensorflow中文社区
新手入门完整教程进阶指南 API中文手册精华文章TF社区 INTRODUCTION 1. 新手入门 1.1. 介绍 1.2. 下载及安装 1.3. 基本用法 2. 完整教程 2.1. 总览 2.2. ...
- 【翻译】Awesome R资源大全中文版来了,全球最火的R工具包一网打尽,超过300+工具,还在等什么?
0.前言 虽然很早就知道R被微软收购,也很早知道R在统计分析处理方面很强大,开始一直没有行动过...直到 直到12月初在微软技术大会,看到我软的工程师演示R的使用,我就震惊了,然后最近在网上到处了解和 ...
- (转) [it-ebooks]电子书列表
[it-ebooks]电子书列表 [2014]: Learning Objective-C by Developing iPhone Games || Leverage Xcode and Obj ...
随机推荐
- SharePoint Word 转换PDF服务介绍及示例
前言:在SharePoint使用过程中,经常会发现将文档进行格式转换的需求,之前,看到SharePoint 2013有将PPT转换PDF文档的服务,后来,才发现SharePoint 2010开始,就有 ...
- Windows命令行提取日期时间
参考: http://elicecn.blog.163.com/blog/static/174017473200931910320556/ SET str="%date:~0,4%%date ...
- 如何判断CPU字节序之[Big-endian vs Little-endian]
[本文链接] http://www.cnblogs.com/hellogiser/p/big-endian-vs-little-endian.html [Big-endian vs Little-en ...
- oracle、mysql、sybase和sqlserver复制表结构和数据
Sql Server(sybase): 1.复制表结构: 新建表student2,并且结构同表syn_xj_student一致.Sql语句如下: 2.复制表数据,并排除俩表中相同的数据: insert ...
- 【转】SQL删除重复记录,只保留其中一条
SQL:删除重复数据,只保留一条用SQL语句,删除掉重复项只保留一条在几千条记录里,存在着些相同的记录,如何能用SQL语句,删除掉重复的呢 1.查找表中多余的重复记录,重复记录是根据单个字段(peop ...
- Hadoop 2.6.0 集群部署
Hadoop的集群部署和单节点部署类似,配置文件不同,另外需要修改网络方面的配置 首先,准备3台虚拟机,系统为CentOS 6.6,其中一台为namenode 剩余两台为 datanode: 修改主机 ...
- Win7下虚拟机个人使用小结:Virtual PC,VMware和VirtualBox
想来用了很多年的虚拟机了,换了Win7之后,种种原因又需要使用虚拟机,这里就简单介绍和比较一下. 点击小图看大图. Virtual PC: 如果想做Windows虚拟机的话,Virtual PC在之前 ...
- 利用FFmpeg生成视频缩略图 2.1.6
利用FFmpeg生成视频缩略图 1.下载FFmpeg文件包,解压包里的\bin\下的文件解压到 D:\ffmpeg\ 目录下. 下载地址 http://ffmpeg.zeranoe.com/build ...
- Java for LeetCode 200 Number of Islands
Given a 2d grid map of '1's (land) and '0's (water), count the number of islands. An island is surro ...
- GLSL的qualifier
uniform:从应用程序到vertex shader 到fragment shader都能使用,但是值一直不变: varying:从vertex shader到fragment shader,在fr ...