Message Boards Message Boards

GROUPS:

Live Coding Sessions from Andreas Lauschke

Posted 2 months ago
1582 Views
|
14 Replies
|
20 Total Likes
|

This is the section for the Andreas Lauschke Live Coding Sessions. I will update this section after every live coding session with new material.

You can always watch the live stream at the Wolfram Section of TwitchTV.

First session, Apr 26, 2019, 5pm EDT, Operator Notation with Application

The .nb for the session is the attachment PDS001.nb. The homework questions are in the attached file PDS001-homework-questions.nb

Second session, May 2, 2019, 5pm EDT, Introduction to Association and Dataset, part 1.

The .nb for the session is the attachment AssocDataset002.nb. The homework questions are in the attached file AssocDataset002-homework.nb. The election data is in the file 2016 US Presidential Election Results by County.csv.

Third session, May 21, 2019, 5pm EDT, Introduction to Association and Dataset, part 2

The .nb for the session is the attachment AssocDataset003.nb.

Fourth session, June 4, 2019, 5pm EDT, Introduction to Association and Dataset, part 3

The .nb for the session is the attachment AssocDataset004.nb.

Fifth session, June 18, 2019, 4pm EDT, Dataset, Query, and Web Scraping of Free Data

The .nb for the session is the attachment DatasetQueryWebScraping005.nb.

14 Replies

Looking forward to the 2nd live-coding session

Thank you, I enjoyed your session and am looking forward to the next one.

I only just tripped across your first presentation. Great, great stuff. Operators, associations and datasets have been awkward material for me, as I am "hardwired" to lists and traditional functional programming. Please continue at the level and pace you have set in the first presentation.

Thank you very much for your constructive feedback. At this point there is no planned end date for my live coding sessions, and I expect many more sessions, as the data scientist's progression

  • data sourcing / handling / filtering / aggregation
  • application (optimization, statistics, AI / NN / ML / DL, ...)
  • pure math <--> applied math

is a sheer endless paradise for the serious analyst / data scientist who can harness the appropriate tools to make inroads. And there is no other software system that is as highly integrated as the M system, so here is the place where I will demo the applicability of the M system to tackle real-world problems with concise and intuitive code.

After a few more sessions I'll prioritize the content based on audience feedback. There are many topics relevant for the professional data scientist, so that I have to start balancing general appeal with audience requests:

parallelism in computation, compilation for speed-up, combining the two: CUDA, databases, AI, crypto, dynamic interactivity, JLink, units framework, persistence, web, cloud, ... -- I won't run out of content soon. And after a while I think it will get more math-y: advanced regression, calculus of variations, control theory, region-based computing, differential geometry, ODEs, PDEs, ... all of which should be part of the professional data scientist's arsenal -- at least their very basics, and we can't get too detailed, to keep it sufficiently general to be of interest to everyone.

To me, this is a very encouraging and exciting trajectory of presentations. I use, awkwardly, Mathematica as my goto platform for data munging and analysis. Unfortunately, I do not have the adroitness of skills of the experts presenting for Mathematica; so I have to study carefully their code to understand sufficiently what is being done so that I can master the techniques myself. At my age, I have to fight against the rigidity of my past ways of doing things in order to grok the more useful methods. I thank you greatly for taking the time to think through how to present the Wolfram Language is a meaningful, comprehensible and sequential way for an individual primarily interested in data acquisition and analysis. (A 71 yr old hobbyist programmer).

Thank you very much for your comment. Yes, to live means to learn and to improve. When we don't learn, we bereave ourselves of opportunities to grow. Virtual Greetings!

Posted 2 months ago

Hi Andreas,

Looking forward to seeing future live coding sessions by you. Could you please also attach PSD001.nb, it is missing from the current list of attachments.

Thanks, Rohit

sorry, I must have accidentally deleted it. It was there, I know that for sure.

Posted 1 month ago

Hi Andreas,

You have used XETRA data in your notebook "AssocDataset002.nb". Could you please give us the corresponding URL of the csv file at Deutsche Boerse?

By the way, I highly appriciate your live coding sessions and I am waiting already for the notebook of part 3.

Viele Grüße Jürgen

Posted 1 month ago

It's in the .nb, look at PDS001.nb, that was the first week. Here they are:

in AWS data registry: https://registry.opendata.aws/deutsche-boerse-pds/

documentation: https://github.com/Deutsche-Boerse/dbg-pds

so for example "https://s3.eu-central-1.amazonaws.com/deutsche-boerse-eurex-pds/2019-04-18/2019-04-18BINSXEUR14.csv" to get the 14:00 MEZ file for Apr 4 for Eurex.

same for XETRA, use https://s3.eu-central-1.amazonaws.com/deutsche-boerse-xetra-pds/2019-04-18/2019-04-18_BINS_XETR14.csv

Uploading the file for part 3 now, I tend to wait until the YouTube video is ready, which usually takes a few days, I didn't see it live until yesterday eve.

Und Tschüß, Andreas

Posted 22 days ago

Hi Andreas,

Thank you for presenting Part 4. Could you please attach the associated notebook.

Posted 14 days ago

Dear Andreas,

Many thanks for your livecoding sessions. Often, I am not able to join them live, but I visit this post and the videos on a regular basis.

Are you planning on a livecoding session about how to manage large datasets (size near or exceeding the RAM of your computer)?

I am thinking of data streaming, processing large data sets, saving large datasets (from smaller chunks) in different formats for data exchange with other systems than Mathematica etc.

I am looking forward to learning from your next livecoding session.

Kind Regards,

Dave

I can present about this, but I try to have my sessions largely driven by audience requests. So far, you are the only one about this, and I have reasons to present about this not too soon (see point 1 below). However, I have some general comments about this:

  1. there is a piece of technology in the works for out-of-core processing. A very senior WRI programmer is working on this, and it's not finished yet. I'd rather wait until that is complete, and then showcase that piece of beauty, instead of presenting about something that would be even better once that future piece of built-in technology is usable. At this point, I think we should simply wait. This guy never writes bad code. Just wait. In principle, you can do out-of-core processing on your own already, https://www.wolfram.com/language/11/neural-networks/out-of-core-image-classification.html?product=language is an example for image classification. But that is specific, not generic, and I prefer generic over specific. (ability is more valuable than knowledge, one of my philosophies).
  2. I'm ardently supporting the philosophy that data that isn't needed by the kernel, shouldn't be in the kernel in first place. Think about it as: "kernel memory is precious" (it does actually consume extra memory). Don't ever handle data that you don't need. With that said, I'm a firm believer of pre-processing / filtering / extracting the salient data (opposite perspective: pre-bunking data you won't need) before loading into the kernel. It's no different from a database retrieval we're all familiar with: you submit a query to receive only want you want! And in this matter I've posted a reply on m.se some 6.5 years ago: https://mathematica.stackexchange.com/questions/16048/how-do-you-deal-with-very-large-datasets-in-mathematica/16060#16060. I strongly recommend that people use Linux tools, to pre-extract the salient data. Depending on your data situation, and depending on your ability to use some smart pre-processing that can reduce your data ingestion to be performed by the kernel significantly, try to shoot for 90% or 95% or higher for pre-bunkable data. Oftentimes the remainder will fit into the memory space available to the kernel just fine. If you're on Win, look at cygwin or MobaXterm, two wonderful Linux tools system available (note: MSFT announced they'll support some Linux distros in the future, no date announced as of yet, and I wouldn't take the first few versions as MSFT likes to botch things up, but eventually I venture the guess that this will be of good quality). Also, Win powershell may be a decent vehicle for data pre-processing available on Win right now. I believe it is, but how would I know, given that I only use Fedora? So that would be an avenue I recommend walking on, regardless of item 1. I think you should always do item 2, and in the future we'll have item 1 on top of it. But still, do item 2 anyways. Never don't pre-extract the salient data on the command line.
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract