It should be clear that if I gave you a “random” number generated from this process (e.g., 2), you can predict the next number by applying the formula yourself (e.g, 3 * 2 5 mod 7 = 4).
Jepsen is a tool written by Kyle Kingsbury that is designed to test the partition tolerance of distributed systems. It creates network partitions while fuzzing the system with random operations. The results are analyzed to see if the system violates any of the consistency properties it claims to have.
In this talk he gives examples of how he started out as a new user with preconceptions about how the world should work, then quickly learned that the tools simply didn’t work the way he wanted. With open source software, however, he could bend the tools to his will. He also discusses his choice to distribute his R package under an open source license, feeling a strong responsibility to do so as both a user and developer.
Data engineering requires some system administrator hygiene. You’ll write less software and employ more configuration management. Understanding how to properly configure the aforementioned components in a system is key. Also, do not treat a tool as a “black box”. Understand how changes in schema impact performance with each system.
The proposed detector design is called a liquid-scintillator—the same basic set-up used to detect neutrinos for the first time in 1956. The detector consists primarily of an acrylic sphere 34.5 meters (or nearly 115 feet) in diameter, filled with fluid engineered specifically for detecting neutrinos. When a neutrino interacts with the fluid, a chain reaction creates two tiny flashes of light. An additional sphere, made of photomultiplier tubes, would surround the ceramic sphere and capture these light signals.
Need to build semantic representation of a corpus that is millions of documents large and it’s taking forever? Have several idle machines at your disposal that you could use? Distributed computing tries to accelerate computations by splitting a given task into several smaller subtasks, passing them on to several computing nodes in parallel.
browserify-cdn will bundle up lodash and all of its dependencies in the same way that browserify would if we were doing require(‘lodash’) in a client side project. The browserify-cdn folks are even nice enough to host a version of this at http://wzrd.in/, so if you visit http://wzrd.in/standalone/lodash@latest you can see the exact output of this project.
At some-point when you require a sufficient level of scaling you turn to the open source work of Twitter with Finagle or Netflix with Hystrix/RxJava. Netflix libs are written in Java while Twitters are written in Scala. Both are easy to use from any JVM based language but the Finagle route will bring in an extra dependency on Scala. I’ve heard little from people using interop between Clojure & Scala and that extra Scala dependency makes me nervous. Further I like the simplicity of Netflix’s libs and they have been putting a lot of effort into pushing support for many JVM based languages.
I’d argue that this is deceptive. I think real division in machine learning isn’t between supervised and unsupervised, but what I’ll term predictive learning and representation learning. I haven’t heard it described in precisely this way before, but I think this distinction reflects a lot of our intuitions about how to approach a given machine learning problem.