#### 2019-SEP-05 nonlinear models with R**, by chel
A nonlinear model is an important tool used to describe the complex nature of observations which are not adequately explained by a linear model in medical and bioengineering research. The first session (40 mins) provides an introduction to the use of nonlinear models with R. The following are considered: 1. choosing good candidate nonlinear models, 2. estimating parameters (and choice of starting values), 3. checking model assumptions, and 4. summarizing results from the model. In the second session (30 mins), key R functions are introduced and outputs are visualized step-by-step using nlme
package. Extension to data with repeated measurements will be discussed if time permits.
Location: Cenovus Odd Fellows
Talk 1: Burning out of Time: Power-Plant Decommissions and Mine Cloures in the Appalachians, by Reinaldo Viccini
For the past 50 years, coal has been the most important fuel for electricity generation in the United States. The power sector on its own consumes 90% of all domestically mined coal and supplies 30% of the nation’s energy demand. Facing serious challenges from low natural gas prices, environmental regulation and increasing operational costs, coal-fired power plants are being decommissioned at increasing rates. Similarly, since 2010, coal production has fallen by 28.5% and nearly half of the existing mines have permanently closed. The purpose of this paper is to investigate the claim that power plant decommissions are causing closures and production declines in Appalachian mines. If decommissions are impacting mines in any way, given the regional nature of coal markets, nearby mines should be the most affected. The empirical strategy begins by testing the assumption that power plants buy local coal. I argue that observed differences in power plant efficiencies, by state and plant size, emerge as a consequence of local coal quality. Next, I estimate the correlations between aggregate electricity generation and coal consumption from nearby power plants on the mine’s production (intensive margin) and probability of closure (extensive margin). Results suggest that changes in aggregate electricity generation and coal consumption from the 10 nearest plant’s have a larger impact on the local mine’s intensive and extensive margin as compared to plants in further locations. Finally, using an event-study methodology, I estimate the causal effect of a decommission within the 10 nearest plants to the mine
Talk 2: Using unsupervised machine learning to tag oil and gas pressure drop methods used in commercial flow simulators, by Pablo Adames
Oil and gas engineers rely on flow simulators to design and troubleshoot pipelines, wells, and thus the facilities needed to achieve production and operation targets within the safety and economic constraints. Commercial software simulators offer a wide choice of calculation methods for the pressure drop and liquid holdup at every point of the system, these are usually referred to as flow correlations
for historical reasons. The difference in the numerical results of the simulations can vary significantly as a function of the flow method selected, leaving everything else constant. Unsupervised classification methods can be used to discover similarities in the results of the flow correlations available. Once the methods are tagged as belonging to a class of methods based on the similarity of the results, a priory
knowledge can be used to assign meaning and make recommendations on the more consistent classes of methods to use in a particular production scenario. For this study, Schlumberger’s PIPESIM was used to assess 35 different methods on a model built from field data in the public domain, a metric was defined to assess similarity, the machine learning results were compared to the empirical knowledge, and consistent results were identified for this specific production scenario. The processing of the text files from the simulator and the subsequent statistical analysis and visualizations were done in R and the code is presented in reproducible research format using the package knitr
and RStudio
. The data and files are also available in Github
.
The meeting starts by Dr. Catherine Eastwood and Dr. Muir will give a talk. Machine Learning techniques, while powerful, are very non-linear. Configuring the problem, choosing an appropriate algorithm and reaching an optimal solution makes this a complex task. Using an example of the calculation of NMR coupling constants, he will discuss the landscape of feature engineering and hyperparameter tuning, and maybe a bit about explainable models. You may bring your own laptop to run the code in the session. R and Rstudio should be installed. This talk is running approximately 90 minutes.
Location: N231, the second floor of the North Building, Bow Valley College
Talk 1: Confirmed – Cybera’s tools and programs in support of data science in Alberta by David Chan (confirmed)
An overview will be provided of the tools and projects Cybera - an Alberta based not-for-profit organization - makes available to folks interested in data science, including members of the CalgaryR group. Tools include our free general purpose Infrastructure as a Service environment, which also provides access to GPU resources. Recent results and plans to support K-12, entrepreneurs, and budding data scientists in Alberta through our Callysto and Data Science for Albertans projects will also be presented. Lastly, we seek input from CalgaryR members on a tool that the team has been working on, which strives to apply best practices from the software development world to data science projects through a structured framework. This will include a brief demo and overview of a roadmap of challenges we hope to address with the tool.
Talk 2: Practical Applications of Text Analysis / Natural Language Processing by Naoko Tomioka
The field of text analysis is constantly evolving, and new tools and algorithms become available on a regular basis. Text analytic tools and packages range from low-level processing tools, such as tokenizer and entity extraction, to higher-level processing such as sentiment analysis and text classification. The applicability of each tool depends on the type of the data to be analysed and the intended use of the output. I will talk about several of my own projects in order to illustrate the process of selecting the best tools, and what text analysis looks like in a business context.
Talk 3: (Cancelled) Community Development - Calgary Artificial Intellgience Meetup by Drew Gilson,
Drew Gillson is a technologist, entrepreneur, and community leader. In addition to organizing the Calgary Artificial Intelligence Meetup, Drew works for Looker, a data analytics software company that is being acquired by Google. Drew has been an active member of the Calgary innovation ecosystem since the dark ages of 2001. He’ll share some highlights and learnings from his remarkable journey, in the hope that it will inspire you to also take the road less traveled.
This presentation will begin with a short introduction into functional programming concepts, how they differ from imperial programming and why they are important. This will be followed by a discussion of language features in R that facilitate the use of functional techniques as well as some deficiencies. Examples will be given and seasoned R users may realize that they have used some of these techniques all the time. Finally, functional alternatives to using R will be mentioned.
Location: Cenovus Odd Fellows Building
Talk 1: Rig State Detection by David Shakleton
Over the past two or three years Independent Data Services (IDS) have shifted from ‘proof of concept’ projects to global live rollouts of their “in-time” (near real-time) lean automated reporting (LAR) and drilling performance monitoring (DPM) services. Some of our projects begin their lives in R, as RStudio and Shiny apps lend themselves beautifully to the rapid prototyping of ideas - by being able to ingest large amounts of data, and perform analytics accessed through an easily-adjustable user interface (UI). The first step to realizing these automated reporting & analytics services in the upstream oil & gas industry is ‘rig state detection’ (RSD) - a process where data is taken from key sensors on the rig and run through logic-based and/or machine learning algorithms to determine what the rig, drill string, etc., are doing. Rig states are then used to build activity descriptions for daily reporting, and generate charts and dashboards for key drilling parameters, with web-based charts and dashboards available from anywhere, any device within moments of the event. IDS have made further strides to automate daily operational reporting by automatically ingesting pdf/Excel/WITSML/etc. data to auto-populate much of the daily report.
Talk 2: Model Agnostic Approach by Alastair Muir
The real value of Machine Learning to businesses is when it is used to create a deep understanding of the problem. The predictive power of modern machine learning algorithms comes at the cost of decreased transparency. This is why Black Box solutions, however accurate, are not immediately accepted by themselves. You should come away from this session with a toolkit you can use to probe and understand your models. This could be for regulatory requirements, change management resistance, or just general acceptance. We will be using a typical example of a non-linear process modeled using different traditional statistical and deep learning models. So, we are going with the “model agnostic” approach.
Talk 3: Asset Failure Susceptibility Ranking, using LambdaMART, by Busayo Akinloye
The electric distribution system is one of the most diverse systems in the electrical grid. It consists of both overhead and underground assets. Growing power quality and reliability expectations from regulatory authorities and customers demand minimal downtime of equipment. Metrics such as System Average Interruption Frequency Index (SAIFI), and System Average Interruption Duration Index (SAIDI) are closely being monitored by electric utilities and form a major part of the business’ performance indices. These growing expectations, coupled with aging assets and budget constraints require innovative and cost-effective ways to realize actionable intelligence in order to optimize spending, while improving or maintaining the quality and reliability of the electric grid. Data analysis offers a unique solution that is reproducible across all asset infrastructure of an electric grid. It employs complex machine learning and statistical algorithms to extract actionable insights and learnings from historical data. These insights will help utilities better allocate both financial and human resources to the most failure susceptible assets, truly making data-driven decisions. I will discuss the development of an asset failure susceptibility ranking system on the Calgary area Underground Residential Distribution (URD) System. This system employs the supervised ranking system used by information retrieval systems. The framework of this ranking system can be applied to all distribution system assets (equipment), due to the reproducible nature of the statistical algorithms it employs.