class: middle # Programming in MATLAB ### https://github.com/gcapes/matlab-programming ### Initial tasks: 1. **Please sit near the front of the room** 2. Log into a PC 3. Start MATLAB 4. Open the course materials using the URL above - They contain a link to this slideshow --- # Research-related IT Services Described on [IT Services website](http://www.itservices.manchester.ac.uk/research/). Announcements given via [Research IT blog](https://researchitnews.org/). - [Training courses] teaching computing skills for Research - General guidance and advice about research software - Access to specialist support and consultancy e.g. code reviews - Access to HPC systems - Data storage and management - [Full list of services on offer](http://www.itservices.manchester.ac.uk/research/services/) For help and support use the [Support Centre] [Training courses]: http://www.staffnet.manchester.ac.uk/staff-learning-and-development/academicandresearch/practical-skills-and-knowledge/it-skills/research-computing/research-courses/ [Support Centre]: https://supportcentre.manchester.ac.uk/ServiceDesk.WebAccess/ss/search/search.rails?search_type=ServiceCatalogue&taxonomy_path=Service%20Catalogue%5CResearch%20IT%20Services%5C&requesting_user=ffcd3f4f-4fe6-4e27-9638-822a233cb30e&s= --- # Housekeeping - Fire exit - Toilets - Course timing - Feedback - Please complete the feedback [form] before you leave - Please ask questions - Attendance is recorded using responses to the feedback form [form]: https://docs.google.com/forms/d/e/1FAIpQLSdfpd8QuG9SPAehY5PBJ7AQdbH_eQcDL0UNbS2Oqs6960BTww/viewform?c=0&w=1 --- # Teaching methods - This course is largely interactive - Type along with the worked examples - Sticky notes - Used for getting help and giving real-time feedback - Green = OK; Red = not OK (too fast, didn't understand, computer says no etc) - Course notes - Most of what I type is in the course notes - Slides will remain online after the course --- # Agenda 1. Recap of [online lessons](https://app.manchester.ac.uk/rmatlabint) - Number types and automatic casting - Scripts and Functions - Flow control 2. Scripts and functions (worked examples) - Loading data into MATLAB - Processing data in the workspace - Extract sub-sets of data, basic statistics, filtering bad data points - Basic plotting 3. Debugging 4. Good programming practice 5. Efficiency 6. Tips and common problems --- class: center, middle # 1. Recap --- ## MATLAB GUI - Command window - Type commands for immediate execution - Variables created persist in workspace after commands finish - Workspace - Contains variables stored in memory - Editor - Write commands for execution later on - Save files with `.m` extension and run using the name of the file e.g. ``` >> myscript % runs the file called myscript.m ``` --- ## Number types - Computers represent numbers using [binary numbers](https://en.wikipedia.org/wiki/Binary_number), most frequently using floating point numbers - Floating point numbers are an approximation to real numbers, and not all numbers can be represented exactly (c.f. 1/3 in decimal) - [What every scientist should know about floating point arithmetic ](http://floating-point-gui.de/) - Default data type in MATLAB is double (i.e. double precision = 64 bits: approx 16 significant decimal digits) - Be aware of the maximum and minimum values that can be stored by a given number type ``` intmax('uint16') % max value for an unsigned 16-bit integer realmin('single') % minimum value for single precision floating point eps % floating point relative accuracy ``` - MATLAB performs automatic type casting (without warning) ``` intmax('uint8') % max value that uint8 can store = 2^8 -1 a = uint8(300) % a = 255 (300 truncated to max value) b = 2.3 + 1e6 + a % b = 255 (b is uint8) ``` --- ## Variables - Used to store values - Created implicitly ``` a = 23 % Assign value 23 to a variable, a. % If 'a' doesn't already exist it will be created. % If 'a' already exists, it will be overwritten. class(a) % Returns class of variable 'a'. ``` - Numeric variables are of `class` double by default - must explicitly create other numeric types if required - MATLAB doesn't prevent you from using built-in MATLAB keywords - Can 'overwrite' MATLAB function names very easily ``` intmax('uint8') % Returns max value of 8-bit integer intmax = intmax('uint8') % Create variable 'intmax' % This blocks access to the 'intmax' function intmax('uint8') % Now returns an error ``` - Names must start with a letter: check a string using `isvarname` - Chose descriptive names which don't mask built-in functions - use `exist` to check - Demo --- ## Data types - Standard array variables must contain only one type of data - e.g. a matrix of doubles, a vector of strings, a vector of logicals etc - Attempting to assign a different type to an existing variable can give confusing results (from automatic type casting) ``` >> a = ones(5,1) >> a(3) = 'x' ``` - Use cell arrays to store a mix of data types in one container ``` >> myCell = {'hello', 2e8, true; [3,2,1], rand(5),[false,true]} myCell = 2×3 cell array 'hello' [ 200000000] [ 1] [1×3 double] [5×5 double] [1×2 logical] ``` - Group related variables into one 'container' using structures or tables - these can be much neater than individual variables - depends on the structure of your data - See [data types] and [MATLAB classes] for guidance on usage of different data types [data types]: https://uk.mathworks.com/help/matlab/data-types_data-types.html [MATLAB classes]: https://uk.mathworks.com/help/matlab/matlab_prog/fundamental-matlab-classes.html --- ## Scripts and functions - **Scripts** are like automating the command window - Have access to variables in the workspace - Variables persist after script execution - **Functions** have their own scope - Cannot directly access variables in the workspace or other functions - Can only use variables from the workspace if passed in as arguments - Variables created in a function do not persist after execution (unless returned as output arguments) - Run a script or function using the **name of the file** in the command window (or another script/function) - To avoid confusion, ensure the filename matches the function name - A `.m` file can only be run if it is on the MATLAB `path` (list of directories to search) - Both scripts and functions are very helpful in terms of reproducibility - Demo --- ## Flow control (making decisions) - Choose different paths through your code using conditional statements - The first true expression will run ``` if x > 0.5 disp('x is greater than 0.5') else if x > 0.9 % This cannot be the first true expression, so will never run. disp('x is greater than 0.9') else disp('Neither of the conditions above were met') end ``` -- - Repeat commands using loops - `for` loops are generally used when you know how many iterations are required - `while` loops continue until an expression is false ``` for i = 1:10 % We know in advance that we want 10 iterations disp(i^2) end while k > 0 % We don't know how many iterations we need k = k / 21 end ``` --- class: center, middle # 2. Scripts and functions ## (Worked examples ) --- ## Disclaimer - I have tried to follow good practice in the examples shown, however - there are probably other valid approaches - there may be better methods than those shown - Which approch makes sense depends largely on personal preference - your brain (probably) works differently to mine - what makes sense to me may not make sense to you - The aim of the course is to get you started by teaching general principles --- ## Download files for worked examples - Download [zip file](matlab-course-files.zip) - URL for these slides: https://gcapes.github.io/matlab-programming/ - Extract zip file (don't just double click on it) - Change your current folder in MATLAB to the **matlab-programming** folder you just extracted ``` >> ls -F code/ data/ debug/ ``` - Put up your green sticky note when done --- ## Import CO
2
data from text file - Inspect the CO
2
[data file] (`data/co2_weekly_mlo.txt`) with a text editor (e.g. notepad++) and note the format of the data (multiple-space delimited) - Load the file into MATLAB using the `Import Data` tool to select: - column delimiters - treatment of repeated delimiters - starting row - specific columns - data format (use column vectors) - variable names - Tips: - use `exist` and `isvarname` to help choose variable names - import window remains open - generate a script using the `Import Data` tool and save it in your `code` directory for future use - [Example of final script](code/co2_load.m) - Use the `datetime` function to create a time series - Clear all variables and reload the data using your script (Data from NOAA: [CO2], [temperature], [metadata for temperature][metadata]) [data file]: data/co2_weekly_mlo.txt [CO2]: https://www.esrl.noaa.gov/gmd/ccgg/trends/data.html [temperature]: https://www.esrl.noaa.gov/gmd/dv/data/?parameter_name=Meteorology&site=MLO&category=Meteorology&frequency=Hourly%2BAverages [metadata]: ftp://aftp.cmdl.noaa.gov/data/meteorology/in-situ/README --- ## Plotting demo and guidelines Let's `plot` the loaded data. Give some thought to the presentation of your data: make it self-explanatory. - Give your figure a `title` - Label your axes using `xlabel` and `ylabel`, specifying units - Add a `legend` to label the traces - Set appropriate axis limits using `xlim` and `ylim`: round to nice values - By default the plot will not update if the data changes - use **Link Plot** button - Control figure properties using - **Edit plot** (select) tool, then figure context menu - **Show property editor** - Figure handle ``` ax = gca; ax.YGrid = 'on'; ``` --- ## Process CO
2
data - Remove bad data values using: - the command line - a [script] - a function - This is one way to build up functions i.e. 1. test ideas in the command window 2. save previously executed command(s) in a script 3. rewrite the script as a function (i.e. generalise the code for greater flexibility) - Pass in variables - Pass out results - Comment your code (and ensure it works with `lookfor` and `help`) - Demo using CO
2
data - Final function: [badtonan.m] - Final script to automate loading and processing: [co2_loadandprocess.m] - Save workspace variables to disk ``` >> save('processedData') ``` [script]: code/co2_clean.m [badtonan.m]: code/badtonan.m [co2_loadandprocess.m]: code/co2_loadandprocess.m --- ## A method for writing functions - Write pseudo-code to split your ideas into discrete tasks - These can form the basis of your comments - Draw a flow chart if it helps - Start with a simplified task and progressively build up complexity - Test ideas in the command window (or a script) first - Split separate tasks into their own functions - Makes functions more reusable - Easier to write and debug - Try to choose variable names which describe their meaning - Easier to understand - Ensure the final product has comments describing the purpose of the function, and an example of how to use it - This is a courtesy to your future self, as well as other users - Pay attention to the coloured flags on the right edge of the editor window - Read the suggestions to improve the efficiency of your code --- ## Example functions (demo/exercise) 1. Load a folder of data files (`data/met_mlo_insitu_1_obop_hour_*.txt`) into MATLAB, creating one time series per variable (rather than one per file) 2. Write a script to automate (1.) above, and clean the temperature data (remove bad data points) 3. Create (monthly) averages for comparision with other data sets - Check for overlap of data sets 4. Calculate basic statistics for subsets of data to look at monthly / yearly variability 5. Automate the whole work flow using a script 6. Import screenshot from paper, plot your data on top - clean up original image before adding your data --- ### 1. Load a directory of data files ### Pseudo-code - loop through files in directory - load data using separate function - generate data loader function from `import data` tool - modify import data function to automatically generate variable names - combine data from files into one array --- ### Order of tasks 1. generate function to load a single data file into column vectors using `import data` tool - i.e. start with a simplified task - refer to `data/met_README` for explanation of the data format 2. edit the file loader function (1.) to automatically generate variable names - use a structure (see next slide) 3. loop through files in directory - initially just print file names to test loop works correctly - load data by calling the separate file loader function 4. combine data from files into one array 5. ensure the code is adequately commented --- ### Data structures - MATLAB treats numeric data and strings as matrices - e.g. `size('test'), size(int8(22))` - These can be grouped using data structures - Named fields containing values ``` >> myStruct.name='My first structure'; >> myStruct.data=ones(10) myStruct = name: 'My first structure' data: [10x10 double] ``` - Structure field names can be dynamically assigned using strings ``` myStruct.('demo') = 'Like this!' ``` --- ### Hints - Can't dynamically assign variable names in MATLAB - Modify function to output a structure - Use one row in structure per file - Output argument overwrites variables in workspace - Pass in variable to function, modify, then pass it back out - Can only pass in a variable which exists (c.f. variable assignment) - Create empty variable before first function call ### Solutions (1 - 3) - [Function](code/met_importfile.m) to load a single met file - [Function](code/met_loadfolder.m) to load a directory of files --- ### 2. Pseudo code to process data - Combine structure rows to get one array per field - get list of field names - loop over fields, combining data - Remove bad data points using previously written function ### Solutions - [Function](code/combinestructcols.m) to combine columns in a structure array - [Script](code/met_loadandprocess.m) to automate the tasks so far: - load directory of files - combine columns - remove bad data points - create a `datetime` variable type from double type variables *year, month, day, hour* --- ### 3. Create monthly averages Average (temperature and CO
2
) data over user-specified intervals for comparison with other data sets - write a function to find overlap between time series - create `datetime` arrays for the required averaging intervals - write a function to average data onto new time base, ignoring NaN values - visually confirm averaging has been done correctly - scatter plot averaged data against CO2 data ### Solutions - [Function](code/find_overlap.m) to find time series overlap - [Function](code/ts_average.m) to average data - [Script](code/monthly_averages.m) to run the above functions and plot the data --- ### 4. Basic stats comparing variability 1. Monthly stats - Which month has the highest/lowest average CO
2
concentration? - Which month has the greatest variability? 2. Yearly stats - Which year has the highest/lowest average CO
2
concentration? - Which year has the greatest variability? 3. Plots - Plot the monthly / yearly mean concentrations. [Solution](code/co2_stats.m) --- ### 5. Automate whole workflow with a script [Solution](code/processalldata.m) --- ### 6. Import screenshot; plot your data on top - You don't always have access to other people's data, but you can still compare your results - Take a screenshot and load the image into MATLAB - Clean up the image, removing clutter - Plot your data on top of the image [demo](code/LoadMloCo2Image.m) --- class: center, middle # 3. Debugging --- ## Approaches - Know what output you are expecting - Not always straightforward with scientific code - Use simplified data - Compare actual output against correct output (e.g. manual calculations, previously published material, laws of conservation, expert opinion etc) -- - Debug explicitly (don't try to hold it all in your head) - Make and test hypotheses as to what is wrong - Keep a log of input, expected output, and actual output as you make changes to your code - Refine hypotheses based on output from testing --- ## Approaches (continued) - Explain the problem (possibly to someone else): - Write down what you're trying to do - Consider commenting each line - Does the code do what your comment says it should? - What are the steps to reproduce the problem? - Check for user error - Is your input valid? - Are you using the code correctly? - Create a minimum working example (MWE) - Helps to isolate the error - Easier to debug - Gets a quicker response if you're asking online - Very often the process of explaining the problem leads you to the error --- ## Techniques - Evaluate small sections of code at a time (one function, one line, one command) - Helps with testing your hypotheses - Using intermediate variables in loops can sometimes help debugging - Automate testing of hypotheses using assertions - `assert` declares that some condition should be true at a certain point in the program - halts execution if false - Revert to previous, working version using [Version Control](https://app.manchester.ac.uk/rgit) - After fixing, consider adding/keeping assertions to prevent recurrence --- ## Features of the debugger - Breakpoints (click on the `-` next to the line number to get a red dot) - code will pause execution at this point - can then run code - one line at a time at current level (step) - one line at a time, entering sub-functions (step in) - exit out of a function into parent (step out) - until cursor position (run to cursor) - to the end, or next break point (continue) - while paused, you can check - values, size, and class of variables - logic of conditionals - Conditional breakpoints (as above, but only halt code when condition is met) - Hover mouse cursor over variables - size, class and values are shown - Evaluate expressions at the debugger command prompt `K>>` - Open (view) variables in the workspace - Plot variables (link the plot) for visual debugging - Override value of variables in scripts and functions --- ## Example 1: Data "clipping" ### Aim We want a script to find the average, sum, number of non-zero cells, and max value of each imported matrix. Then create a four column table showing these results. ### The problem When the last step creates the matrix from the elements, any value above 65535 becomes 65535 (i.e. most of the sum and nnz values). How can we make it build the array without it changing these numbers? You should already have the script in: `debug/clipping.m` --- ## Exercise (5 mins) - Try to debug the script - There are hints on the next slide if you get stuck - Tutor will then demo the process --- ## Example 1: worked solution - View matrix `A` to familiarise yourself with the problem - Set a break point and check the value, class, and size of variables before assignment of summary matrix - Check assignment of first row using the debugger command prompt `K>>` - Check class of the result - What happens if you construct a matrix using the first two values? Three? Four? - Hint: view recap slide on number types, and automatic typecasting [Solution](debug/clipping_solution.m) (including detailed explanation) --- ## Example 2: bubble sort ### Aim Sort input array using the [bubble sort](https://en.wikipedia.org/wiki/Bubble_sort) technique. ### The problem The last data point doesn't get sorted. The bubblesort function: `debug/bubblesort.m` --- ## Example 2: worked solution - Run the function once to view the problem - Plot a graph of the output variable (the "sorted" data) - Link the plot so that it updates when the data updates - Set a break point and step through the code, watching how it updates - Explain what each line of the code should do in a comment - Hence identify which line doesn't do what it should do [Solution](debug/bubblesort_solution.m) --- class: center, middle # 4. Good programming practice --- ## Top 5 good practice tips 1. Comment your code - H1 line - Provide example of usage 2. Choose meaningful names for variables and functions - names which describe their purpose help to explain the code 3. Use version control - So you can undo mistakes and get back previous versions - [Register on the training course](http://app.manchester.ac.uk/rgit) 4. Format your code consistently - Use indentation for code blocks: use `smart indent` - Easier to read = easier to debug - Follow established conventions e.g. [clean and fast] 5. Program defensively - Expect mistakes and guard against them - Use assertions - Write tests for your functions to ensure they (continue to) work correctly [clean and fast]: http://uk.mathworks.com/matlabcentral/fileexchange/22943-guidelines-for-writing-clean-and-fast-code-in-matlab --- ## Testing - Done by means of a separate script which runs your code - Checks that correct output is given for various inputs - Some code lends itself better to testing than others - Aim to write tests whenever practical - Write the tests before writing the function - Reduce confirmation bias - Tests help to define the intention of the function - Look at the examples on [Cody] for inspiration [Cody]: https://www.mathworks.com/matlabcentral/cody/ --- ## [Example] #### Determine whether a vector is monotonically increasing - Write tests that the function should pass - `assert` that given input should match correct output ``` Input x = [-3 0 7] Output tf is true Input x = [2 2] Output tf is false ``` - Write your function - Look at the [test suite] from the web page - Do you agree with the tests? - Are you missing any? - Does your function pass all the tests? [Solution](code/mono_increase.m) [Example]: https://www.mathworks.com/matlabcentral/cody/problems/10-determine-whether-a-vector-is-monotonically-increasing [test suite]: code/mono_increase_test.m --- class: center, middle # 5. Efficiency --- ## Efficiency tips - `mlint` checks for syntax errors and gives tips to improve functions and scripts - Let MATLAB help you! - MATLAB stores arrays contiguously in memory, but augments them dynamically - preallocate arrays wherever possible, instead of growing on the fly - access consecutive elements using column-major indexing - Use vector operations instead of loops (when possible) - work on all elements simultaneously - Use built-in functions instead of writing your own - Use the profiler to identify hotspots in code - Many more tips at [Awesome MATLAB], particularly [clean and fast] and [vectorisation] Efficiency demo: `code/efficiency_examples.m` [Awesome MATLAB]: https://github.com/mikecroucher/awesome-MATLAB [clean and fast]: http://uk.mathworks.com/matlabcentral/fileexchange/22943-guidelines-for-writing-clean-and-fast-code-in-matlab [vectorisation]: http://www-h.eng.cam.ac.uk/help/tpl/programs/Matlab/tricks.html --- ## Optimisation exercise (15 mins) Inefficient assignment with loops: `code/loopy.m` - Look at advice from `mlint` (down the side of the editor window) - Use the profiler (`Run and time` button) to identify expensive code sections - Implement suggestions from `mlint` - Re-run profiler and admire the speed up from preallocating arrays - Examine the most time consuming lines in the profile report - Rewrite the variable assignments using vector operations instead of loops - It may help to write a comment for each line in the loop, explaining what it does - Consider how you will check that your modifications give the same answers as the original code [Solution](code/loopy_solution.m) --- ## Additional optimisation techniques - Use third-party code - e.g. [NAG toolbox] for MATLAB - Use MATLAB MEX functions - i.e. rewrite functions in Fortran or C - MATLAB parallel computing toolbox - `parfor` and `spmd` utilise multicore processors - Split your work into separate computations - e.g. [Condor] pool, job array on [CSF] [NAG toolbox]: https://www.applications.itservices.manchester.ac.uk/show_product.php?id=307 [Condor]: http://ri.itservices.manchester.ac.uk/htccondor/ [CSF]: http://ri.itservices.manchester.ac.uk/csf/ --- class: center, middle # 6. Tips and common problems --- ## A few words of caution - Load data window remains open after loading data (looks like nothing has happened if maximised) - Close MATLAB - no prompt to save workspace variables; only modified scripts - Variables are implicitly declared so easy to accidentally overwrite - It's very easy to create variables / functions using names of existing MATLAB functions - Use `exist` to check so you don't mask built in functions --- ## Publishing MATLAB code - Generate html, LaTeX code, or a pdf from your MATLAB code using the `publish` tab within MATLAB - Capture code and output, including markup for explanations - Demo --- ## Further resources - Lots of tips, advice, learning resources, etc at [Awesome MATLAB] - Interact with other MATLAB users at [MATLAB central] - Share applications, examples and functions at [MATLAB file exchange] - Ask and answer questions at [MATLAB answers] [Awesome MATLAB]: https://github.com/mikecroucher/awesome-MATLAB [MATLAB central]: https://uk.mathworks.com/matlabcentral/ [MATLAB file exchange]: http://uk.mathworks.com/matlabcentral/fileexchange/ [MATLAB answers]: https://uk.mathworks.com/matlabcentral/answers/ --- class: center, middle ## Feedback / attendance Please complete the [feedback form]. Feedback is used to improve our courses, and to record your attendance. [feedback form]: https://docs.google.com/forms/d/e/1FAIpQLSdfpd8QuG9SPAehY5PBJ7AQdbH_eQcDL0UNbS2Oqs6960BTww/viewform?usp=pp_url&entry.1427428485&entry.1759822899&entry.1444288709=Programming+in+MATLAB&entry.1409009513&entry.160472735&entry.2083518247&entry.92324155