04-630 Data Structures and Algorithms for Engineers

Location: Africa

Units: 12

Semester Offered: Fall, Spring

Course description

The primary objective of the course is to provide engineers without formal training in computer science, a solid background in the key principles of computer science, in general, and of the algorithms and data structures, in particular. The key purpose of this course is to complement the experience that engineers may already have in writing software with formal computer science underpinnings, making those engineers more capable in developing software intensive systems. There are no prerequisites for taking this course.

Learning objectives

The course begins by considering the main phases of the software development lifecycle: requirements elicitation, computational modeling, system specification, software design, implementation, and software quality assurance, including various forms of testing, verification, and validation. Then, building on the concept of abstract data types, the course provides an in in-depth treatment of the key elements of algorithms and data-structures, beginning with the fundamentals of searching, sorting, lists, stacks, and queues, but quickly progressing to more advanced topics, including trees, graphs, and algorithmic strategies. It also covers the analysis of the performance and tractability of algorithms, finishing with automata theory and computability theory. A key focus of the course is on effective implementation and good design principles.

Outcomes

After completing this course, students should be able to:

Recognize and analyze critical computational problems, generate alternative solutions to problems, and assess their relative merits.
Understand, analyze, and characterize those factors that influence algorithmic computational performance and memory consumption.
Design, implement, and document appropriate, effective, and efficient data structures & algorithms for a variety of real-world problems.
Understand detailed software structures and their underlying strengths and weaknesses.
Perform detailed, code-level design and document the design in an understandable way.

Content details

Introduction & The Software Development Life Cycle. Goals, outcomes, and syllabus. Assignments, labs, and exercises. Software development tools for assignments. Levels of abstraction. The software development life cycle. Yourdon Structured Analysis - functional, data, and behavioural models (hierarchical decomposition trees, architecture diagrams, data flow diagrams DFD, data dictionaries, entity relationship ER diagrams, state transition diagrams). Software process models: waterfall, evolutionary, formal transformation, re-use, hybrid, spiral.

Formalisms for Representing Algorithms. Definition of an algorithm. Modelling software. Relational modelling. State modelling. Practical representations. Pseudo code. Flow charts. Finite state machines. UML. Predicate logic. Analysis.

Analysis of Complexity. Performance of algorithms, time and space tradeoff, worst case and average case performance. Big O notation. Recurrence relationships. Analysis of complexity of iterative and recursive algorithms. Recursive vs. iterative algorithms: runtime memory implications. Complexity Theory: tractable vs intractable algorithmic complexity. Example intractable problems: travelling salesman problem, Hamiltonian circuit, 3-colour problem, SAT, cliques. Determinism and non-determinism. P, NP, and NP-Complete classes of algorithm.
Searching and Sorting Algorithms. Linear and binary search (iterative and recursive). In-place sorts: bubblesort (efficient and inefficient), selection sort, insertion sort. Not-in-place sorts: Quicksort, merge sort. Complexity analysis. Characteristics of a good sort. Speed, consistency, keys, memory usage, length & code complexity, stability. Other sorts ordered by complexity.

Abstract Data Types (ADT). Abstract Data Types (ADT). Information hiding. Types and typing. Design Goals. Design practices.

Containers, Dictionaries, and Lists. Container and dictionaries: mechanisms for accessing data in a list. List ADT. Implementation with arrays. List ADT. Implementation with linked lists. Doubly linked lists and circular lists. Performance considerations.

Stacks. Stack (LIFO) ADT. Implementation using List ADT (array and linked-list). Comparison of order of complexity. Stack applications, including token matching, evaluation of postfix expressions, and conversion of infix expressions to postfix.

Queues. Queue (FIFO ADT). Implementation using List ADT (array and linked-list). Comparison of order of complexity. Dedicated ADT. Circular queues. Queue applications.

Trees. Concepts and terminology: level, height, external and internal nodes, skinny, fat, complete, left-complete, perfect, multi-way, d-ary. Types of tree: binary, binary search, B-tree, 2-3 tree, AVL, Red-Black. Binary trees and binary search trees. Tree traversals: inorder, preorder, postorder. Height-balanced trees: AVL Trees, RR, RL, LR, LL rotations. Height-balanced trees: Red-Black Trees, single promotion, zig-zag promotion, recolouring and restructuring. Tree applications. Fixed- length codes & variable length codes. Optimal code trees. Huffman's algorithm.

Heaps. Priority queues. Heap basics. Types of heap: min heaps and max heap. Heap characteristics. Implementation of heap. Heap operations: delete max/min, down heap, up heap, merge, construct, heapify. Complexity of operations. Heap sort.

Graphs. Types of graphs: directed, undirected, weighted, unweighted, cyclic, acyclic, directed acyclic, simple, non-simple, implicit, explicit, embedded, topological. Adjacency matrix representation. Adjacency list representation. Graph traversal: breadth-first search and depth- first search; implementation and application. Topological sort. Spanning trees and minimum spanning trees, Kruskal's algorithm, Prim's algorithm. Dijkstra's shortest path algorithm. Floyd- Warshall's all-pairs algorithm.

Complex Networks. Euler's theorem: the Bridges of Königsberg. Networks vs. graphs. Degree, average degree, and degree distribution. Bipartite networks. Path length, BFS, Connectivity, Components. Clustering coefficient. Random graph model. Small world phenomena. Scale free networks. Communities. Fundamental Hypothesis. Connectedness and Density Hypothesis. Strong and weak communities. Graph partitioning. Community detection. Hierarchical clustering. Girvan- Newman Algorithm. Modularity. Random Hypothesis. Maximum Modularity Hypothesis. Greedy algorithm for community detection by maximizing modularity. Overlapping communities. Clique percolation algorithm and CFinder.

Algorithmic Strategies. Classes of algorithms. Brute force. Divide and conquer. Greedy algorithms. Dynamic programming. Combinatorial search. Backtracking. Pruning. Branch and bound.

Hashing. Dictionaries. Hashing. Hash functions. Collision resolution. Complexity. Applications.

Analysis of Correctness. Types of software defects. Code module design. Syntactic, semantic, logical defects. (Semi-)formal verification: partial vs. total correctness. Invariant assertion method. Simple proof strategies: by contradiction, counterexample, induction. Dynamic testing: unit tests, test harness, stubs, drivers, integration testing, regression testing. Static tests: reviews, walkthroughs, inspections, reviewing algorithms and software. Pair programming. Verification and validation strategies.

Prerequisites

None

Syllabus

https://canvas.cmu.edu/files/11333372

Faculty

TBD