Playing with tension a computational mode of improvisational accompaniment by secondary rhythmic performer in carnatic music

Thông tin tài liệu

... development and validation of a tension model that, assuming restricted sowkhyam, is able to generate alternate variations of secondary accompaniment that are as valid as the original accompaniment. .. the database Since each rhythm in the database is distinctly characterized by a single set of accompaniment values, there is always only one accompaniment available for any given musical scenario... multiple valid accompaniments by modeling the constraints of accompaniment playing, is the problem of interest in this thesis Computational creativity is an emerging ﬁeld of research in artiﬁcial intelligence,

PLAYING WITH TENSION PRASHANTH THATTAI RAVIKUMAR NATIONAL UNIVERSITY OF SINGAPORE 2015 PLAYING WITH TENSION GENERATING MULTIPLE VALID ACCOMPANIMENTS FOR THE SAME LEAD PERFORMANCE PRASHANTH THATTAI RAVIKUMAR B.Tech, National Institute of Technology, Trichy, 2012 A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF ARTS COMMUNICATIONS AND NEW MEDIA NATIONAL UNIVERSITY OF SINGAPORE 2015 Acknowledgment Foremost, I would like to express my sincere gratitude to my supervisors Prof. Lonce Wyse and Prof. Kevin McGee for their continuous support, patience, motivation, enthusiasm and immense knowledge in guiding me to learn and do research. "To define is to limit" – I cannot quantify the knowledge that I have learned from them in the past two years. Their constant guidance, support and dedication has been a immense inspiration for me to finish this dissertation. Besides my supervisors, I would like to thank Dr. Srikumar Karaikudi Subramanian, who has been a friend, a mentor and a person to look upto. I will long cherish the memorable coffee-chats that have lead to so many new insights about the thesis, music and varied things in life. I thank my fellow lab mates from the Partner Technologies group, Dr. Alex Mitchell, Teong Leong, Chris, Jing, Evelyn, and Kakit, for their stimulating discussions every week. Our weekly meetings used to be a ton of fun in terms of discussing and learning diverse perspectives of doing research. I thank the faculty, the staff and the graduate students of the Communications and New Media department for supporting and housing me as a graduate student for the last two years. I thank the musicians, Dr. Ghatam Karthik, Mr. Trichur Narendran, Mr. Arun Kumar, Mr. Sumesh Narayan, Mr. Sriram, Mr. Hari, Mr. Shrikanth, Mr. Santosh and all others who have imparted their musical knowledge to help my understanding of the genre. This thesis could not have progressed as much as it has, if not for the musical insights and inspirations that I drew from our group music jamming sessions. I take this moment to thank to my close friends and music collaborators - Vinod, Vishnu, Lakshmi Narasimhan, Prasanna and Arun – who have enhanced my musical growth and helped me achieve the insights that I have in this thesis. I thank my close friends Shyam and Kameshwari who have been a constant source of support during the tough times. I thank my friend Akshay for the intellectually stimulating conversations. I also thank him for his timely help during the thesis revisions. I thank Spatika Narayanan for her help in proof-reading the document. Last but not the least, I would also like to thank my family. March 20, 2015 ii Name : Prashanth Thattai Ravikumar Degree : Master of Arts Supervisor(s) : Associate Professor Kevin McGee, Associate Professor Lonce Wyse Department : Communications and New Media Thesis Title : Playing with Tension Generating multiple valid accompaniments for the same lead performance Abstract One area of research interest in computational creativity is the development of interactive music systems that are able to perform variant, valid accompaniment for the same lead performance. Although previous work has tried to solve the problem of generating multiple valid accompaniments for the same lead input, success has been limited. Broadly, retrieval-based music systems use static databases and produce accompaniment that is too repetitive; generation-based music systems that use hand-coded grammars are less repetitive, but have a more limited range of pre-defined accompaniment options; and finally, transformation-based music systems produce accompaniment choices which are predictably valid for only a few cases. This work goes beyond the existing work by proposing a model of choice generation and selection that generates multiple valid accompaniment choices given the same input. The model is applied to generate secondary percussive accompaniment to an lead percussionist in a Carnatic improvisational ensemble. The central insight – the main original contribution – is that the generation of valid alternate variations of secondary accompaniment can be accomplished by formally representing the relationship between lead and accompaniment in terms of musical tension. By formalizing tension ranges for acceptable accompaniment, an algorithmic system is able to generate alternate accompaniment choices that are acceptable in terms of a restricted notion of sowkhyam (roughly, musical consonance). In the context of this thesis, restricted sowkhyam refers to the sowkhyam of accompaniment coniii sidered independent of the secondary performer (and his creativity). The research proceeded in three stages. First, Carnatic music performances were analyzed in order to model the performance structures and improvisation rules that provide the freedom and constraints in secondary percussion playing. Second, based on the resulting tension model, a software synthesis system was implemented that can take a transcribed selection of a Carnatic musical performance and algorithmically generate new performances, each with different secondary percussion accompaniment that meet the criteria of restricted sowkhyam. Third, a study was conducted with six expert participants to evaluate the results of the synthesis. The main contribution of this thesis is the development and validation of a tension model that, assuming restricted sowkhyam, is able to generate alternate variations of secondary accompaniment that are as valid as the original accompaniment. Keywords : Carnatic rhythmic improvisation, Improvisational accompaniment iv Contents 1 Introduction 1.1 1 Structure of this document . . . . . . . . . . . . . . . . . . . 2 Related work 2.1 2.2 2.3 2 5 Retrieval-based music systems . . . . . . . . . . . . . . . . . 5 2.1.1 Retrieval from a database . . . . . . . . . . . . . . . 6 2.1.2 Retrieval using dynamic learning models . . . . . . . 6 2.1.3 Generation-based music systems . . . . . . . . . . . . 8 Hand-coded grammars . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Online learning of grammars . . . . . . . . . . . . . . 8 Transformation-based music systems . . . . . . . . . . . . . 9 2.3.1 Transformation function is pre-given . . . . . . . . . 9 2.3.2 User selects the transformation function . . . . . . . 10 3 Research problem 13 3.1 Summary of the related work . . . . . . . . . . . . . . . . . 13 3.2 Proposed solution . . . . . . . . . . . . . . . . . . . . . . . . 15 4 Method 17 4.1 Analysis of the Carnatic musical performances . . . . . . . . 17 4.2 Model development . . . . . . . . . . . . . . . . . . . . . . . 17 4.3 Evaluating the tension model . . . . . . . . . . . . . . . . . 18 4.4 System development . . . . . . . . . . . . . . . . . . . . . . 18 5 Background: Carnatic quartet performance 19 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5.2 Musical structures 5.3 Choices in different styles of accompaniment playing . . . . . 21 5.4 Musical actions in the improvisation . . . . . . . . . . . . . . . . . . . . . . . 20 . . . . . . . . . . . . . 22 5.4.1 Major variations . . . . . . . . . . . . . . . . . . . . 22 5.4.2 Minor variations . . . . . . . . . . . . . . . . . . . . 24 v 6 System: design criteria & constraints 27 6.1 Research/Implementation model . . . . . . . . . . . . . . . . 27 6.2 Lead percussionist: improvisation and variation . . . . . . . 28 6.3 Secondary percussionist: accompaniment and variation . . . 29 7 Possible approaches 31 7.1 The Direct Mapping model . . . . . . . . . . . . . . . . . . . 32 7.2 The Horizontal Continuity model . . . . . . . . . . . . . . . 33 8 The tension model 35 8.1 Tension model applied to secondary playing . . . . . . . . . 35 8.2 Tension model applied to generate multiple accompaniments 9 Tension synthesis protocol 36 37 9.1 Choose Carnatic performance recording . . . . . . . . . . . . 38 9.2 Choose a sixteen bar sample of performance recording . . . . 39 9.3 Transcribe the sixteen-bar selection . . . . . . . . . . . . . . 39 9.3.1 Transcribing double hits . . . . . . . . . . . . . . . . 40 9.3.2 Transcribing hit loudness . . . . . . . . . . . . . . . . 40 9.3.3 Transcribing rhythmic repetition of bars . . . . . . . 40 9.4 Compute tension scores for each hit . . . . . . . . . . . . . . 42 9.5 Compute tension scores for each beat . . . . . . . . . . . . . 42 9.6 Compute tension range for each bar . . . . . . . . . . . . . . 43 9.7 Generate all viable accompaniment sequences . . . . . . . . 46 9.7.1 Enumerate all unique triplet values for each beat . . 47 9.7.2 Collect all viable 8-beat (1-bar) sequences . . . . . . 47 9.7.3 Collect secondary sequences that meet tension constraints . . . . . . . . . . . . . . . . . . . . . . . . . 48 9.8 Construct secondary transcription for entire piece . . . . . . 50 9.9 Synthesize performance . . . . . . . . . . . . . . . . . . . . . 51 10 Tension synthesis: practical details 53 10.1 Separating tracks from original recording . . . . . . . . . . . 53 10.2 Storing the transcript . . . . . . . . . . . . . . . . . . . . . . 54 10.3 Sequencing audio from a transcript . . . . . . . . . . . . . . 54 10.4 Creating a new recording . . . . . . . . . . . . . . . . . . . . 54 11 Study protocol 55 11.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 11.2 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 vi 11.2.1 Documents . . . . . . . . . . . . . . . . . . . . . . . 57 11.2.2 Equipment . . . . . . . . . . . . . . . . . . . . . . . . 58 11.2.3 Recordings (original) . . . . . . . . . . . . . . . . . . 58 11.2.4 Recordings (with new accompaniment) . . . . . . . . 59 11.3 Study Disclaimer . . . . . . . . . . . . . . . . . . . . . . . . 63 11.4 Study Session Protocol . . . . . . . . . . . . . . . . . . . . . 66 11.4.1 Gather demographic information . . . . . . . . . . . 67 11.4.2 Explain evaluation criteria . . . . . . . . . . . . . . . 67 11.4.3 Sequencing the recordings . . . . . . . . . . . . . . . 68 11.4.4 Evaluate recordings . . . . . . . . . . . . . . . . . . . 69 11.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 12 Study results 73 12.1 RQ1: does system produce acceptable accompaniment . . . . 74 12.1.1 Recording 1 . . . . . . . . . . . . . . . . . . . . . . . 75 12.1.2 Recording 2 . . . . . . . . . . . . . . . . . . . . . . . 75 12.1.3 Recording 3 . . . . . . . . . . . . . . . . . . . . . . . 76 12.2 RQ2: are accompaniments inside the range better? . . . . . 76 12.2.1 Recording 1 . . . . . . . . . . . . . . . . . . . . . . . 77 12.2.2 Recording 2 . . . . . . . . . . . . . . . . . . . . . . . 77 12.2.3 Recording 3 . . . . . . . . . . . . . . . . . . . . . . . 78 12.3 RQ3: do ratings decrease as a function of distance . . . . . . 78 12.3.1 Recording 1 . . . . . . . . . . . . . . . . . . . . . . . 79 12.3.2 Recording 2 . . . . . . . . . . . . . . . . . . . . . . . 79 12.3.3 Recording 3 . . . . . . . . . . . . . . . . . . . . . . . 80 12.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 13 Potential objections 81 14 Discussion 85 14.1 Algorithmic limitations . . . . . . . . . . . . . . . . . . . . . 85 14.2 Transcription limitations . . . . . . . . . . . . . . . . . . . . 86 14.3 System limitations . . . . . . . . . . . . . . . . . . . . . . . 86 15 Future work 89 Appendices 93 A Key Terms 95 A.1 Terms: tension model . . . . . . . . . . . . . . . . . . . . . . 95 A.2 Terms: Carnatic music . . . . . . . . . . . . . . . . . . . . . 96 vii B Enumerating the accompaniment sequences 99 C Assigning perceptual scores C.1 Diction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.2 Loudness . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.3 Note duration . . . . . . . . . . . . . . . . . . . . . . . . . 101 . 101 . 102 . 104 D Transcription: internal representation 105 D.1 Transcription: internal representation . . . . . . . . . . . . . 109 E Results 111 E.1 Complete results for recordings . . . . . . . . . . . . . . . . 111 E.2 Complete results for variants . . . . . . . . . . . . . . . . . . 113 F Study documents F.1 Session checklist . . . . . . . F.2 Demographic questionnaire . F.3 Participant variant sequence F.4 Evaluation sheet . . . . . . F.5 Participant observation form F.6 Participant definition sheet . . . . . . . viii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 . 116 . 117 . 118 . 119 . 120 . 121 List of Tables 9.1 Rhythmic repetition of bars . . . . . . . . . . . . . . . . . . 41 9.2 Tension scores for each hit . . . . . . . . . . . . . . . . . . . 42 9.3 Tension scores for each beat . . . . . . . . . . . . . . . . . . 43 9.4 Computing TZP and tension range for a bar . . . . . . . . . 44 9.5 Computing TZP and tension range for a bar . . . . . . . . . 44 9.6 Computing TZP and tension range for a bar . . . . . . . . . 45 9.7 Lookup table for 2-beats . . . . . . . . . . . . . . . . . . . . 47 9.8 Possible 2-beat diction combinations . . . . . . . . . . . . . 47 9.9 Possible 2-beat diction combinations . . . . . . . . . . . . . 48 9.10 Valid 3-beat diction combination . . . . . . . . . . . . . . . 48 9.11 Two bars (average tension scores) . . . . . . . . . . . . . . . 49 9.12 Two bars of valid sequences . . . . . . . . . . . . . . . . . . 49 9.13 Rhythmic repetition of bars, with accompaniment . . . . . . 50 11.1 Participant data . . . . . . . . . . . . . . . . . . . . . . . . . 57 11.2 Two bars of valid sequences . . . . . . . . . . . . . . . . . . 60 11.3 Two bars of valid sequences . . . . . . . . . . . . . . . . . . 61 11.4 Variants by distance value . . . . . . . . . . . . . . . . . . . 62 11.5 Distance of variants used for recording 1 . . . . . . . . . . . 64 11.6 Distance of variants used for recording 2 . . . . . . . . . . . 64 11.7 Distance of variants used for recording 3 . . . . . . . . . . . 64 11.8 Recording sequences for participants . . . . . . . . . . . . . 68 11.9 Variant sequences for participant . . . . . . . . . . . . . . . 69 12.1 Average accompaniment rating per recording . . . . . . . . . 74 12.2 Average rating for variants of recording 1 . . . . . . . . . . . 75 12.3 Average rating for variants of recording 2 . . . . . . . . . . . 75 12.4 Average rating for variants of recording 3 . . . . . . . . . . . 76 12.5 Accompaniment ratings for variants of recording 1 . . . . . . 77 12.6 Accompaniment ratings for variants of recording 2 . . . . . . 78 12.7 Accompaniment ratings for variants of recording 3 . . . . . . 78 12.8 Accompaniment ratings for different variants . . . . . . . . . 79 ix 12.9 Accompaniment ratings for different variants . . . . . . . . . 79 12.10Accompaniment ratings for different variants . . . . . . . . . 80 C.1 C.2 C.3 C.4 C.5 Weights for lead strokes . . . Weights for secondary strokes Perceived loudness of lead and Weights for loudness . . . . . Weights for note duration . . . . . . . . . . . . . . secondary . . . . . . . . . . . . . . . . . . hits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 102 103 103 104 D.1 Transcription of recording 1, bars 1-16 . . . . . . . . . . . . 106 D.2 Transcription of recording 2, bars 1-16 . . . . . . . . . . . . 107 D.3 Transcription of recording 3, bars 1-16 . . . . . . . . . . . . 108 E.1 Accompaniment ratings for recordings 1, 2, and 3 . . . . . . 112 E.2 Accompaniment ratings for variants 0-6 . . . . . . . . . . . . 113 F.1 Recording and variant sequences . . . . . . . . . . . . . . . . 118 x List of Figures 5.1 5.2 5.3 The Carnatic quartet (from left): lead percussionist, secondary, vocalist, Tambura (provides the background drone), and violinist. . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Two bars of lead and secondary playing . . . . . . . . . . . . 21 Different minor variations . . . . . . . . . . . . . . . . . . . 24 7.1 7.2 Direct Mapping . . . . . . . . . . . . . . . . . . . . . . . . . 32 Horizontal Continuity: secondary follows the lead changes . 33 8.1 8.2 Tension-relaxation visualization . . . . . . . . . . . . . . . . 35 Tension between lead and secondary . . . . . . . . . . . . . 36 xi xii List of Algorithms 1 2 3 4 5 Hit tension score calculation . Beat tension score calculation Unique 1-hit and 2-hit triplets Unique 1-beat triplets . . . . Unique 8-beat triplets . . . . xiii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 43 99 99 99 Chapter 1 Introduction This chapter introduces the research area of musical improvisational accompaniment systems and highlights an important problem in this field. Improvisational accompaniment systems differ from score-following, solo-trading, and tap-along systems in that they are able to produce multiple valid musical alternatives for the same performance. Developing musical accompaniment systems that generate multiple valid accompaniments by modeling the constraints of accompaniment playing, is the problem of interest in this thesis. Computational creativity is an emerging field of research in artificial intelligence, cognitive psychology, philosophy, and the arts. The goal of computational creativity is to model, simulate or replicate human creativity using a computer. One area of research interest in computational creativity is the development of improvisational music systems that are able to perform variant, valid accompaniment for the same lead performance. Developing musical accompaniment systems that generate multiple valid accompaniments by modeling the constraints of accompaniment playing, is the problem of interest in this thesis. Although previous work has tried to solve the problem of generating multiple valid accompaniments for the same lead input, success has been limited. Broadly, retrieval-based music systems that use static databases are produce accompaniment that is too repetitive; generation-based music systems that use hand-coded grammars are less repetitive, but have a more limited range of pre-defined accompaniment options; and finally, transformation-based music systems produce accompaniment choices which are predictable valid for only a few cases. 1 This work goes beyond the existing work by proposing a model of choice generation and selection that generates multiple valid accompaniment choices given the same input. 1.1 Structure of this document The remainder of this document is structured as follows: • Related work This chapter summarizes the previous work on improvisational accompaniment systems developed for generating multiple valid accompaniments by modeling the constraints of accompaniment playing. • Research problem This chapter identifies a significant problem left open by previous work and presents the research focus: to develop a model of rhythmic accompaniment for Carnatic ensemble music that produces multiple musically valid accompaniments, given the same input. • Method This chapter provides a brief overview of the method used during this thesis research. The method included the analysis of Carnatic music performances, development of different models of accompaniment playing, their implementation as computer programs, and their evaluation. • Background This chapter describes the roles and activities of the lead and secondary percussionist within a Carnatic quartet performance. It further describes the musical structure and provides examples of different scenarios of lead and secondary percussion playing in a performance ensemble. • System design criteria This chapter describes the narrow subset of constraints that guided the research and development of the secondary accompaniment system. The structural constraints separate the music into improvisational cycles made of eight bars in a 4/4 time signature. The input constraints restrict the lead to minor bar variations. The output constraints restrict the scope of secondary accompaniment to playing compliant accompaniment to the lead. Within these constraints, the secondary system still has the freedom to play a variety of valid accompaniments in a given situation. • Possible approaches This chapter describes two seemingly-reasonable approaches – Direct Mapping and Horizontal Continuity – and shows why they will not effectively solve the central research problem. 2 • The tension model This chapter describes the tension model that was developed to address the shortcomings of the previous models. Applied to the activity of secondary accompaniment playing in a Carnatic performance, the tension model is used as a constraint satisfaction mechanism to generate multiple accompaniments given the same lead. • Tension synthesis protocol This chapter describes the main steps involved in synthesizing recordings with variant valid accompaniment. • Tension synthesis: practical details This chapter describes the different steps in the synthesis process in terms of the different technologies used to implement them. • Study protocol This chapter describes the study conducted with musical experts for evaluating the ability of the system to produce alternate valid secondary accompaniments for a Carnatic musical performance. • Study results This chapter describes the main results from the user study and uses them to answer the research questions. • Potential objections This chapter highlights the aspects of the study design that could raise objections about the claims made from this work. • Discussion This chapter identifies the main limitations of the research reported here and discusses their impact on the findings from the study. • Future work This chapter proposes directions for future work. The next chapter reviews work on developing improvisational accompaniment systems that generate multiple valid accompaniments by modeling the constraints of accompaniment playing. 3 4 Chapter 2 Related work This chapter summarizes the previous work on improvisational accompaniment systems developed for generating multiple valid accompaniments by modeling the constraints of accompaniment playing. Previous work has developed retrieval-based music systems, generation-based music systems and transformationbased music systems to solve the problem. Retrieval-based music systems use dynamic learning models to produce different sequence continuations given the same input, but at any given point in the performance they produce deterministic output. Generation-based music systems dynamically update the production rules of a grammar that are used to generate different accompaniments, but at any given point in the performance the production rules produce deterministic output. Transformation-based music systems generate permutations of a source rhythm representation to generate multiple accompaniments, but the generated choices are not always musically valid. Previous work that has tried to solve the research problem can be classified into retrieval-based, generation-based, and transformation-based music systems. This chapter reviews the systems and highlights the problems they solve. 2.1 Retrieval-based music systems Retrieval-based music systems use musical parameters to retrieve the best possible accompaniment from a set of accompaniment patterns. The focus is on optimizing the parameters for efficient representation and real-time 5 retrieval. There are two variations of retrieval-based music systems based on the type of data structure used to store the accompaniment: retrieval from a database and retrieval using dynamic learning models. 2.1.1 Retrieval from a database The first type of retrieval-based music systems store the accompaniments in a database which is queried to retrieve the accompaniment. The accompaniments in the database are organized by their musical features. Retrieval systems extract the necessary musical features from the input, package them into a data format which is suitable to query the database, and retrieve the accompaniment. The best matching accompaniment is retrieved and played. Impact is an accompaniment system that uses case-based reasoning and production rules to retrieve accompaniment from a database of accompaniment patterns (Ganascia, Ramalho, and Rolland, 1999). It extracts metalevel descriptions of musical scenarios (such as the beginning and end of a bar), fills in the sections and the duration of chords, and uses the result to form a query. This query is used to retrieve the best matching accompaniment from the database. The best accompaniment is selected according to a measure of mathematical distance between the query (called target case) and each of the patterns in the database. Given a single input, the system always returns one accompaniment (the best matching accompaniment) as output. Cyber-Joao is an adaptation of the Impact system that optimizes the number of parameters used for the retrieval (Dahia et al., 2004). It ranks the different musical features based on expert knowledge data, and uses the ranking to determine the important musical features in a given performance situation. Each rhythm is distinctly characterized by a single set of accompaniment values and the musical features are used to query and retrieve the accompaniment pattern from the database. Since each rhythm in the database is distinctly characterized by a single set of accompaniment values, there is always only one accompaniment available for any given musical scenario. 2.1.2 Retrieval using dynamic learning models In order to overcome the limitations of statically stored accompaniment options, systems were developed with capabilities to model the input rather than statically store it. 6 One of the earlier systems that retrieved accompaniment using Markov models was the M system (Zicarelli, 1987). It listens to a musicians performance as streams of MIDI data and builds a Markov chain representation on the fly. It traverses over the representation in order to send the output. Another well known example is the Continuator system (Pachet, 2002a; Pachet, 2002b). The Continuator uses Markov modeling to build possible sequence continuations of musical sequences played earlier in the performance. For any given sequence of musical notes, the accompaniment is retrieved by selecting the longest sequence continuation. A later version of the Continuator system models the trade-offs between adaptation and continuity of the retrieved accompaniment (Cabral, Briot, and Pachet, 2006). Apart from finding a continuation sequence, the system constantly reviews the relationship between the retrieved accompaniment and the harmonic context to retrieve a new continuation in case of any mismatch. Another system, Omax, in addition to listening to lead, listens to its own past improvisations (Assayag et al., 2006). In a special self-listening state, the system listens to its own outputs to bias its Markov model. This results in a variety of possible choices for future accompaniment, depending on whether the system was listening to itself or to the lead. The second variation of the retrieval systems also use Markov models to produce sequence continuations of accompaniment. These systems model the music as sequence continuations, based on listening to the improviser’s input. Given a starting note or a sequence, the model is traversed to produce the musical continuation. As the system listens to more of the input it changes the Markov model and the sequence continuations. Thus it is able to produce multiple alternate accompaniments for different situations. Although the use of modeling approaches improves performance over the static database approach, at any point in the performance these systems retrieve and play only one valid accompaniment. There is, however, one non-accompaniment system that falls broadly into this category, but which generates musically-valid variations that do meet the musical constraints of a given melodic line (Donze et al., 2013). This “control improvisation system” generates variations of a lead melody in jazz. Specifically, given a reference melodic and harmonic sequence, the system builds a probabilistic model of all state transitions between the notes of the melody. The probability values assigned to the transitions determine the variations of the main melody produced. Assigning a high probability to transitions of the reference melody (called direct transitions), it produces melodic sequences similar to the reference melody. Assigning 7 low probability to the direct transitions, it produces melodic sequences different from the reference line. Thus, given the same harmonic progression and a reference melodic line, the system produces variations by controlling a single parameter, the probability value of transitions. Although it is not an accompaniment system, the approach could conceivably be used as the basis for one, but not without significant modification. This is because the generation part of the system is entirely influenced by itself, by what it played earlier. Without modification, this would result in an odd accompaniment scenario, one in which the choices of the accompanist are based on his own decisions rather than being based on the changes played by an improviser. And if the goal was to transform this into an accompaniment system, it would not be sufficient to simply modify the system so that it listened to the lead performer; many of the challenges and limitations described in future chapters would still appear. 2.1.3 Generation-based music systems Generation-based music systems use musical grammars to generate accompaniment. The grammars contain production rules that associate the characteristics of the input rhythm with an output accompaniment rhythm. The grammars are either hand-coded by a human expert or automatically inducted by listening to performances. There have been several systems developed using each type of grammar. 2.2 Hand-coded grammars Voyager (Lewis, 2000) and Cypher (Rowe, 1992) are examples of accompaniment systems that uses hand-coded grammars to generate accompaniment responses. They contain pre-defined sub-routines that are triggered by specific conditions to generate the different accompaniment responses. However, the rules of these grammars are rigid and unchanging, and as a result, these systems are limited in their ability to respond to the same input with alternative outputs. 2.2.1 Online learning of grammars One improvement over hand-coded grammars is the development of grammars that are more flexible and learn on the fly. ImprovGenerator is an example of an accompaniment system that learns musical grammars online (Kitani and Koike, 2010). It listens to the varia8 tions of a base rhythm and generates production rules corresponding to the variations. The different production rules are assigned a probability value that changes over the course of a performance. FILTER is another system that employs an online learning approach (Van Nort, Braasch, and Oliveros, 2009; Van Nort, Braasch, and Oliveros, 2012). It is an improvising instrument system that reacts in novel and interesting ways by recognizing the gestures of a performer. The system comes pre-loaded with 20 gestures and the transitions between the gestures are modeled by a Markov model. Over the course of a performance, it varies the transition probabilities of the gestures to produce interesting and varied responses. However, the relation between the gesture and the output parameters itself remains constant. In other words, it does a better job of generating different responses over the course of a performance, but at any given point in the performance, it will produce the same accompaniment given the same input.1 Grammar systems that use online learning are more flexible and generate more varied responses compared to the systems developed using handcoded grammars. However, in both cases, the grammars are modeled deterministically and once the grammar is inducted, the same input will produce the same output. 2.3 Transformation-based music systems Transformation-based music systems apply a transformation function on the input to generate the output. The transformation function is usually a mathematical operation that is applied on each of the input parameters to produce the output accompaniment values. Multiple accompaniments are generated by permuting a representation of the input parameters. There are two kinds of transformation systems based on how the transformations are generated: systems where transformation function is pre-given and systems where the user selects the transformation function. 2.3.1 Transformation function is pre-given In pre-given transformation systems, the transformation function computed is given through a target accompaniment value, which is given as input to the system. The transformation function is computed as a function of the 1 One notable thing about FILTER is that it models the interplay between lower level audio features and higher level gestural parameters. This will be discussed in more detail in later chapters. 9 target accompaniment and is applied on the input values to generate the accompaniment. Ambidrum is one system that uses a statistical measure of rhythmic ambiguity to generate rhythmic accompaniment (Gifford and Brown, 2006). It measures rhythmic ambiguity using a statistical correlation between the rhythmic metre and three rhythmic variables: the beat velocity, pitch, and duration. The system is given the target correlation values which it uses to transform the input to the output which can be either metrically coherent or metrically ambiguous rhythms. Metrically coherent rhythms are musically valid as accompaniment and are generated by Ambidrum system when its target correlation matrix (transformation function) is an identity function. When the transformation function is not an identity function, the rhythms generated by Ambidrum are metrically ambiguous and their musical appropriateness varies widely. Another system, Clap-along, uses values from the target accompaniment to move the input towards the target (Young and Bown, 2010). The system uses four musical features to compute the distance between the source and the target accompaniment and progressively modifies the source towards the target. For each generation, the system generates 20 choices and finds the closest rhythm to the target by computing the Euclidean distance. When the performer repeatedly claps the exact same pattern, the system is able to slowly evolve its output towards the target accompaniment. However, variations in the performer’s rhythms causes unstable changes in the system’s output, often resulting in inappropriate accompaniment. The main limitation of both these systems (and systems like these) is that there are very few cases when the accompaniment generated by the systems is predictably valid (musically). 2.3.2 User selects the transformation function In order to get transformation systems to generate [musically predictable] output, systems have been created in which user preference is used to generate transformation functions. For example, NeatDrummer generates drumtracks by transforming the other musical parts in the song (Hoover, Szerlip, and Stanley, 2011; Hoover, Rosario, and Stanley, 2008). The different accompaniment tracks are generated by giving different input tracks (like piano, violin, and vocal) to an Artificial Neural Network, called CPPN, that generates the output rhythms. The CPPN is initially trained by using the input from the different audio tracks. In the successive generations, the 10 user ranks the generated tracks, which are used to generate multiple CPPNs in each step. Thus the different CPPNs generate multiple accompaniment tracks according to the user preference. In the successive generations, the user ranks the generated tracks, the properties of which are permuted to generate multiple transformations function that generates different accompaniment drum tracks. The problem with the approach followed by NeatDrummer is related to the musical validity of the generated tracks. Although the user’s preferences are used to generate multiple alternate transformation functions, the user actually has minimal control over the accompaniment generation process itself. Thus, the system generates musically valid accompaniment in only a few cases. 11 12 Chapter 3 Research problem This chapter identifies a significant problem left open by previous work and presents the research focus: to develop a model of rhythmic accompaniment for Carnatic ensemble music that produces multiple musically valid accompaniments, given the same input. Although there has been previous work on improvised accompaniment playing systems, none of them have addressed the problem of generating multiple accompaniment, given the same input. This work goes beyond the existing work by proposing a formal model of choice generation that provides multiple valid accompaniment choices given the same lead input. This formal model was used to develop a rhythmic improvisation system – specifically, a system that will provide percussive improvisational accompaniment for a human lead percussionist during Carnatic performance. 3.1 Summary of the related work Although there has been previous work that has tried to solve the problem of accompaniment playing, the problem of generating multiple accompaniment given the same input is still largely unsolved. Retrieval-based music systems use dynamic learning models to produce different sequence continuations, but at any given point in the performance they produce a single valid output. Generation-based music systems dynamically update the production rules of a grammar that are used to generate different accompaniments, but at any given point in the performance they produce a single valid output. Transformation-based music systems permute a source rhythm representation to generate multiple accompani13 ments, but the generated choices correspond to valid musical descriptions in very few cases. Among the different systems surveyed in the related work, the FILTER systems comes close to generating multiple accompaniment, given the same input. It models the interplay between the low-level audio features and the higher level gestural parameters (that have visual correspondences), to identify the player’s intent and adapts the output in interesting ways. Though this interplay produces a variety of responses, the mapping between the gesture and the associated output parameters is one-to-one. Once a mapping is established, the system produces the same outputs for the same input. Another related system is the control improvisation system that generates variations of a lead melody in jazz (Donze et al., 2013). It is not accompaniment system but is relevant in that it models accompaniment constraints to generate multiple variations of a given melodic line. This system is an enhancement of factor oracle approach used in (Assayag et al., 2006) and generates variations of a lead melody in jazz, such that they satisfy a given accompaniment specification. Given a reference melodic and harmonic sequence, the system builds a probabilistic model of all state transitions between the notes in the melodic. The probability values assigned to the transitions determine the variations of the main melody produced. Assigning a high probability to transitions of the reference melody (called direct transitions) produces melodic sequences similar to the reference melody and assigning low probability of direct transitions produces melodic sequences different from the reference line. Thus, given the same harmonic progression and a reference melodic line, the system produces variations by controlling a single parameter, the probability value of transitions. However, this system would not scale very well in an accompaniment scenario as the generation part of the system is purely influenced by what it played (or listened to) earlier. This results in an odd accompaniment scenario, one in which, the choices of the accompanist are based on his own decisions rather than being based on the changes played by an improviser. This raises concerns about the validity of the accompaniment played using such a system. 14 3.2 Proposed solution The central goal of the work reported here is to develop an algorithm that generates valid alternate variations of secondary accompaniment for recordings of Carnatic musical performances. The central insight – the main original contribution – is that the generation of valid alternate variations of secondary accompaniment can be accomplished by formally representing the relationship between lead and accompaniment in terms of musical “tension”. By formalizing tension ranges as constraints for acceptable accompaniment, an algorithmic system is able to generate alternate accompaniment choices that are acceptable in terms of a restricted notion of sowkhyam (roughly, musical consonance). In the context of this thesis restricted sowkhyam refers to the sowkhyam of accompaniment considered independent of the secondary performer (and his creativity). This specifically ignores influences of any particular school of percussion playing, any particular secondary performer’s playing style, creative kanjira variations, and the tonal quality unique to any performer’s instrument. Unless otherwise noted, sowkhyam in this document refers to the restricted sowkhyam described above. The central insight is roughly as follows. For any given performance, there is a degree of sowkhyam (consonance) between the lead and secondary accompaniment. For the work reported here, this degree of (restricted) sowkhyam has been numerically formalized as the inverse relationship: tension. With this formalization, any synthesized accompaniment that has equivalent or less tension (relative to the lead) is considered equally sowkhyam as the original. The research resulted in a system that can take a transcribed selection of a Carnatic musical performance and algorithmically generate new performances, each with different secondary percussion accompaniment that meet the criteria of restricted sowkhyam as well as the original secondary accompaniment. In order to evaluate the ability of the system to produce alternate valid secondary accompaniments for a Carnatic musical performance, a study was conducted with musical experts to address three related research questions: 15 • RQ1: Does the system produce secondary accompaniment that is rated at least as high as the original accompaniment? • RQ2: Are accompaniments inside the range better (i.e., do variants within the range get higher scores than variants outside the range)? • RQ3: Do the ratings for accompaniment decrease as a function of the distance from the tension zero point? The remaining chapters in this thesis provide details about the synthesis protocol for generating alternate valid accompaniments, the study protocol used to evaluate the system, the results of the study and their analysis. 16 Chapter 4 Method This chapter provides a brief overview of the method used during this thesis research. The method included the analysis of Carnatic music performances, development of different models of accompaniment playing, their implementation as computer programs, and their evaluation. 4.1 Analysis of the Carnatic musical performances Since the rules and constraints for secondary improvisation in Carnatic ensemble are not clearly specified in the literature (or in oral tradition), the first step involved the development of a method to systematically understand secondary improvisation in performances. Different performance recordings were analyzed to find performance structures (e.g., bar and improvisation cycle) and improvisation rules (e.g., forced and discretionary playing) that restricted/imposed constraints on the playing, but also offered some flexibility for improvisation. The analysis of the performances was used to develop the different models of accompaniment playing. 4.2 Model development During the course of this thesis, different models were developed to solve the research problem of developing systems that play multiple valid accompaniment given the same lead input. They were the Direct Mapping model, the Horizontal Continuity model, and the tension model. The first two models were limited in their ability to generate multiple accompaniments that are musically valid. In order to address these shortcomings, a 17 third model was developed that generates multiple valid accompaniment for the same lead input: the tension model. 4.3 Evaluating the tension model In order to evaluate the ability of the system to produce alternate valid secondary accompaniments for a Carnatic musical performance, a study was conducted with musical experts to answer three related research questions: • RQ1: Does the system produce secondary accompaniment that is rated at least as high as the original accompaniment? • RQ2: Are accompaniments inside the range better (i.e., do variants within the range get higher scores than variants outside the range)? • RQ3: Do the ratings for accompaniment decrease as a function of the distance from the tension zero point? The methodology followed to answer these research questions was to generate accompaniment variants that were qualitatively different from an evaluation standpoint. For this, six accompaniment variants of 16-bar duration were created with different distance values from the original. These were presented to musical experts who evaluated and rated them. 4.4 System development The Direct Mapping and the Horizontal Continuity accompaniment models were implemented as computer programs that were evaluated in restricted real-time performance settings. For these models, the accompaniment system is a computer program that plays a melody, accepts percussive input through a midi controller, and combines both of those with the secondary accompaniment (which is algorithmically generated). The lead’s input is used to drive the algorithmic secondary accompaniment generation, and the combined output is played back through a speaker. The accompaniment system built using the tension model accepts percussive input for lead and secondary through the keyboard and combines both of those to algorithmically generate multiple accompaniments. The inputs are obtained in their respective format on the keyboard. It (input) consists of sequences of diction, note duration, and loudness represented in array format. Using this input, the system algorithmically generates the corresponding arrays for the secondary accompaniments. The tension model and its generation process are described in much more detail in section 8. 18 Chapter 5 Background: Carnatic quartet performance This chapter describes the roles and activities of the lead and secondary percussionist within a Carnatic quartet performance. It further describes the musical structure and provides examples of different scenarios of lead and secondary percussion playing in a performance ensemble. 5.1 Overview Figure 5.1: The Carnatic quartet (from left): lead percussionist, secondary, vocalist, Tambura (provides the background drone), and violinist. Figure 5.1 shows the performance of a Carnatic quartet. There are four improvising performers on the stage, the vocalist, the violinist and the lead (mridangist) and the secondary (kanjirist) percussionists. The vocal19 ist performs the main melody and the violinist plays the accompaniment melody. The lead percussionist improvises relative to the melody (vocal and violin) and the secondary percussionist mostly provides accompaniment to the lead percussionist. The actions of the secondary percussionist are constrained by what the lead percussionist plays and by what the secondary percussionist predicts that the lead will play. There is one main difference in the nature of playing on the lead and by the secondary drums. The lead uses both hands to simultaneously strike the different sides of the lead drum, called the mridangam. The secondary uses one hand to strike the drum, while controlling the tension of the membrane that he strikes with the other hand. The secondary drum is called the kanjira or the frame drum. In general, the secondary percussionist is trying to “follow” the lead. This means that the secondary takes cues from the lead, is guided by what the lead plays, is not allowed to freely improvise, and has fairly constrained choices in terms of accompaniment selection. However, as noted earlier, the secondary also has some degree of freedom in playing. Within the constraints imposed by the lead, the secondary percussionist may: • • • • Proactively suggest variations for the lead to incorporate Improvise by making references to earlier changes played by the lead Improvise somewhat freely in the last bar of an improvisational cycle Play complementary accompaniment using off-beat strokes, rolls, changes to the accent structure of the accompaniment rhythm, etc. Note that there is also one exceptional case whereby the secondary largely ignores the lead percussionist and instead follows the melody directly. This is only justified if the lead is playing the same rhythmic patterns without variation, but even so, such a decision is controversial and problematic for a number of reasons. The work detailed in this document does not attempt to handle this case. 5.2 Musical structures Musical improvisations occur in cycles of many different lengths, such as 10, 12, 14, 32, or 64 beats. This document considers improvisations that take place within a 32 beat (or 64 beat) improvisation cycle which is known as the Adi talam in Carnatic music. Since the structure of improvisations that happen in a 64 and 32 beat improvisational cycle are similar, for simplicity’s sake the descriptions are provided for a 32 beat improvisational cycle. The improvisation cycle is further divided into different bars. In a 4/4 time 20 signature, each bar consists of four beats. The numerator of 4/4 denotes the number of beats in a bar and the denominator denotes that each beat has a duration of a quarter note. Similarly, in an 8/4 time signature, each bar contains eight beats each of quarter note duration. Improvisational cycles of 64 beats usually contain 8 bars in an 8/4 time signature. Figure 5.2 shows a rhythm pattern of two-bar duration, containing eight lead and secondary hits. Bar1 : 1 2 3 4 5 6 7 8 Lead : num thi dhin dhin num thi dheem dhin Accompaniment : ta ka di mi ta ka jo no Figure 5.2: Two bars of lead and secondary playing In an improvisational cycle, typically the lead introduces a rhythmic groove in the first two bars, plays variations of the groove in the following bars and either syncopates or intensifies the groove in the final bar. There are three different styles of accompaniment that the secondary can play. They are: • Compliant accompaniment: the secondary complies with the actions of the lead and closely matches the different changes played by the lead • Interactive accompaniment: the secondary introduces changes in the accompaniment by referring to the past actions of the lead • Proactive accompaniment: the secondary plays complementary rhythm patterns, supplements certain hits of the lead, matches the musical structure of the melody, etc. Each of these accompaniment scenarios impose different demands and constraints on the secondary percussionist and result in very different kinds of musical choices and decisions. 5.3 Choices in different styles of accompaniment playing In compliant accompaniment playing style, the secondary has the most constraints and the fewest choices in terms of accompaniment playing. Basically, the secondary must strictly follow the changes in the bar activity and loudness of the lead. Within those constraints, the secondary is able 21 to make some discretionary choices about diction, note duration, and loudness. In interactive accompaniment playing, the secondary has the freedom to deviate from the lead but is constrained by the type of deviations played by the lead. Deviations played by the secondary are typically in the form of changes in the bar activity and loudness. In rare cases, the deviations include major structural changes (e.g., violating bar boundaries, grouping of hits). In this style of playing, the secondary is allowed to deviate from the lead but is restricted to the deviations played by the lead in earlier cycles of the improvisation. In proactive accompaniment playing, the secondary percussionist has the maximum freedom to deviate from the lead. The secondary is free to change the bar activity, loudness and musical structure (alternate groupings of hits, changing the lead’s groove) of the accompaniment. The deviations are constrained by their appropriateness to the melodic structure. This type of improvised behavior is very discretionary and depends greatly on the aesthetic sense and skill of the secondary percussionist. Transcriptions of representative examples of accompaniment playing are provided in the appendix. 5.4 Musical actions in the improvisation This section describes the musical actions corresponding to the different variations in improvisations played by the lead and secondary percussionists. There are broadly two kinds of variations that the lead and secondary play while improvising in performances: major variations and minor variations. The main distinction between the two is that major variations either change or obstruct the flow of the rhythmic groove whereas minor variations are considered as variations around the rhythmic groove. 5.4.1 Major variations Major variations are variations in playing that obstruct the groove. In performance situations, these are usually played to align the rhythm structure with the melodic structure. The actions in major variations include playing rolls, interspersing pauses with rolls, changing the grouping of hits, speed-doubling and playing sequences that violate bar-boundaries. The lead introduces major variations in order to minimize the sense of repetition between the different bars, and play accompaniment that better suits 22 the melody. The secondary employs major variations in sections of interactive and proactive accompaniment playing. Major variations in the lead make the groove less predictable for the secondary. Usually, the secondary percussionist keeps silent when the lead plays major variations and joins the lead after the major variations are over. The major variations are as follows: • Rolls Rolls are rhythm patterns that are filled with double-hits or quadruple-hits. They are played to break the flow of an existing groove and start a new groove. Rolls are difficult to anticipate if they have not been played before. Usually the secondary percussionist stays silent when the roll is played the first time. However, the secondary plays the roll if he is able to predict the occurrence of the roll before the lead plays or he is able to predict the continuation of the roll after listening to a few beats. • Pauses They occur in a specific scenario where the lead leaves pauses of unequal duration in the different bars and occasionally intersperses with rolls between the pauses. This makes predictability of the lead difficult for the secondary percussionist. In this case, the secondary percussionist is usually silent but joins with the lead when the lead’s playing becomes predictable. • Speed doubling In speed doubling, an entire groove is played at twice its speed. Speed doubling is common while playing for melodic phrases at fast tempos and is repeated for one improvisation cycle. Secondary percussionists either predict the moments of speed doubling in a performance or they join after the lead doubles the speed. • Violation of bar-boundaries Bar-boundaries are violated when rhythm patterns of durations greater (or less) than one bar are repetitively played. There are certain types of bar-boundary violations that are more common than others. The secondary percussionist initially stays silent to predict the type of bar-boundary violation. He joins the lead once he predicts the type of bar-boundary violation. The lead plays major variations in some parts of the concert, but the unpredictability in playing makes it difficult for the secondary percussionist to follow them. It is much easier to follow the lead when he plays predictably. There are other sections of the performance when the lead plays predictably but provides variations using minor variations. These variations are more predictable and easier to follow for the secondary. This thesis addresses the accompaniment played by the secondary when the lead 23 plays with minor variations. It will not address accompaniment issues that arise when the lead performs major variations (and the remainder of this document ignores those variations). 5.4.2 Minor variations Minor variations are specific kinds of changes to diction, loudness, and note duration on the different beats in a bar. Figure 5.3 provides examples of different minor variations played by the secondary for the same lead rhythm; it also shows the different actions that are included in the minor variations. It should be noted that the existing literature does not provide any standard notation for pitch bending and loudness. In figure 5.3, # for loudness and ’ for pitch bend are introduced for illustration purposes. Bar1 : 1 2 3 4 5 6 7 8 Lead : num thi dhin dhin num thi dheem dhin Accompaniment : ta ka di mi ta ka jo no V ar1 − pauses : tum . tum . . tum tum ta V ar2 − Loudness : ta #tum ta #tum ta tum tum ta V ar3 − P itchBend : ’tum ’tum ’tum tum ta tum tum ta V ar4 − Doubletaps : ta ta-te ta-te tum ta tum tum ta Figure 5.3: Different minor variations The lead percussionist also performs the same actions while playing minor variations. The different minor variations with their associated actions are: • Diction – Adding notes: single notes, double taps – Removing notes (pauses) – Changing the sequence of strokes – Double taps • Intonation – Pitch bending the notes played on the hits • Loudness (Emphasis) – Increasing/Decreasing loudness of hits – Increasing/Decreasing loudness of entire rhythm pattern 24 It is important to note that the pauses and double taps mentioned here are different from pauses and speed doubling mentioned in major variations. The duration, position, and occurrence of the pauses in the minor variations are fairly repetitive, predictable, and help to establish the groove. Double taps, by contrast to speed doubling, are changes in note duration of one or two hits in a bar in which an entire groove is played at double speed. The lead and secondary percussionist follow these same actions to play minor variations, but the implementation differs slightly due to the different ways that the instruments are played. The physical constraints of playing are some of the reasons for this difference. In particular, the lead players play strokes with both hands whereas the secondary plays strokes with one hand. One consequence of this, is that pitch bending, for example, is performed very differently by the lead and by the secondary. Another consequence is that lead players can typically play multiple sounds at the same time by simultaneously playing the hits on both hands, increase the intensity of the groove, and provide more variation. Secondary players, however, are much more restricted in the kind of the grooves and variations played. 25 26 Chapter 6 System: design criteria & constraints This chapter describes the narrow subset of constraints that guided the research and development of the secondary accompaniment system. The structural constraints separate the music into improvisational cycles made of eight bars in a 4/4 time signature. The input constraints restrict the lead to minor bar variations. The output constraints restrict the scope of secondary accompaniment to playing compliant accompaniment to the lead. Within these constraints, the secondary system still has the freedom to play a variety of valid accompaniments in a given situation. 6.1 Research/Implementation model This section describes the subset of constraints that guided the research and development of a system to automatically generate appropriate secondary percussive accompaniment to a lead percussionist. Specifically, the system will be based on the assumption that the secondary percussionist must follow the lead (i.e., not switch to following the melody) and that only minor variations are allowed to be played in the different bars in the improvisational cycle. The structural constraints separate the music into improvisational cycles consisting of 32 beats. The 32 beats are split into eight bars containing 4 beats each. All the beats are played in a 4/4 time signature. 27 6.2 Lead percussionist: improvisation and variation Within these constraints, there are different variations that can be played by the lead, therefore requiring and choices and decisions to be made by the secondary.1 The lead is restricted to play minor variations of four beat grooves. The four beat grooves are a minimal example of a larger set of grooves called sarvalaghu patterns. The use of sarvalaghu patterns varies according the style of lead playing, but most lead percussionists usually play these patterns for about 50% of the song duration. Sarvalaghu patterns are usually four or eight beats long when played in a 32 beat improvisational cycle and characterize a style of lead playing that is predictable and easy to follow for the secondary percussionist. The lead plays minor variations of these grooves in the different bars of the improvisational cycle. The minor variations allow the lead to perform actions like pauses, double taps, loudness change, diction change, and pitch bending. It does not allow the lead percussionist to perform rolls, pauses, bar-boundary violations and other actions that change the groove. Within the constraints, the lead selects a four-beat groove at the beginning of the first bar in the cycle, plays minor variations of the groove in the following bars, and syncopates the groove in the last bar of the improvisation cycle. The groove selected in the first bar establishes the diction, loudness, and accents of the grooves that will be played for the rest of the improvisation cycle (except for the final bar when the lead can syncopate). After selecting the groove at the start of the first bar, the lead can play minor variations in any of the bars based on their appropriateness to the melody. One example of a sequence of minor variations that the lead can play is: • • • • • Lead selects a four-beat groove in the first bar Repeats the groove in the second and third bar Syncopates or plays minor variations in the fourth bar Repeats the original groove in the fifth, sixth, and seventh bar Repeats the fourth bar syncopation or the minor variation in the eighth bar There are several other variations of this sequence that the lead can employ based on the context of the melody and other performance factors. 1 The same variations also apply for 64 beat improvisational cycles 28 There are different places in the improvisation cycle where the lead is likely to encounter a decision point in terms of whether to introduce minor variations or not. In an eight bar improvisational cycle, the decision points for the lead are: • Beginning of the first bar when the lead selects which groove to play • Beginning of each bar when the lead either decides to repeat the pattern or play a minor variation • Beginning of the last bar (or last two bars), in which lead decides whether to syncopate or intensify the groove to denote the end of the cycle It is important to note that the lead does not have absolute freedom in all the bars. Variations played in each bar are associated with the changes made in the previous bars and the changes in the melody. 6.3 Secondary percussionist: accompaniment and variation Under these conditions of lead playing, there are also the choices and variations available to a secondary performer who plays percussive accompaniment. Although this thesis discusses several different forms of improvisational accompaniment (i.e., compliant, interactive, and proactive), the system is only implemented to handle compliant accompaniment. In compliant accompaniment playing, the goal of the secondary percussionist is to follow the groove and the minor variations played by the lead. This imposes certain constraints and allows certain choices to the secondary. In order to play compliant accompaniment, two constraints are imposed on the secondary percussionist: loudness and bar activity. The loudness constraint specifies that the loudness of the secondary hits do not exceed the lead. The bar activity constraint specifies that the bar activity of the secondary does not exceed the lead. Within these constraints, the secondary has flexibility with other choices, such as pauses, loudness (within the imposed loudness constraint), double taps, and pitch bending. It is important to note that though these actions go beyond copying the lead exactly, they do not qualify as interactive or proactive accompaniment. Specifically, compliant accompaniment does not include such actions as: referring to previous rhythms played by the performer; suggesting rhythms that the other performer (lead) should play/follow; playing rhythms that 29 violate the one bar restriction (i.e., when the lead complies with the one bar restriction); changing the base rhythm pattern that the lead plays; or ignoring the lead percussionist and playing to the melody. Apart from the constraints imposed by compliant accompaniment, there are other constraints on the secondary percussionist that arise due to the physical nature of the drum playing. Two notable constraints are drum size and the fact that the secondary percussionist uses one hand (whereas the lead uses two hands). The small size of the drum limits the variety of strokes (compared to the mridangam) that can be played on the drum. Using a single hand to strike the drum imposes certain physical restrictions and demands (e.g., when it is necessary to produce a loud and intense effect that would be easier to do with two hands). 30 Chapter 7 Possible approaches This chapter describes two seemingly-reasonable approaches – Direct Mapping and Horizontal Continuity – and shows why they will not effectively solve the central research problem. The Direct Mapping Model generates variant accompaniment, but with reduced musical coherence; the Horizontal Continuity Model generates musically coherent accompaniment, but with reduced variability. During the course of this thesis, two preliminary models were developed to solve the research problem of developing systems that play multiple valid accompaniment, given the same lead input. They were the Direct Mapping model and the Horizontal Continuity model. Reviewing these models, and analyzing why they failed, provides deeper insights into the nature of the research challenge. The development of these models was based on approaches used in the related work, as well as from personal research about (and analysis of) the task of secondary percussion playing. The Direct Mapping model was based on the functional scaffolding approach adopted in the NEAT drummer (Hoover, Szerlip, and Stanley, 2011; Hoover, Rosario, and Stanley, 2008). The lead’s input rhythms were used as a functional scaffold to generate the secondary accompaniment. Musically, it is valid to generate the secondary accompaniment from the lead input, as the secondary (system) follows the lead and makes decisions about the accompaniment based on the lead’s playing. The Horizontal Continuity model was motivated by the Ambidrum system (Gifford and Brown, 2006) and a prediction-retrieval system (Cabral, Briot, and Pachet, 2006). Both these systems highlight an important musical principle: maintaining temporal continuity and coherence between 31 different bars of music. Ambidrum transforms the musical parameters of the previous beat to values for the current beat. The prediction-retrieval system retrieves continuation sequences of accompaniment and selects one based on its coherence and continuity with the accompaniment played earlier. Both these models were limited in their ability to generate multiple accompaniments that are musically valid. The rest of the chapter describes these models in greater detail and summarizes the reasons for their shortcomings. 7.1 The Direct Mapping model Figure 7.1: Direct Mapping In this model, each hit of the lead is mapped to a corresponding hit of the secondary to generate the accompaniment. The model generates variations by substituting a list of choices at certain hits, pauses, or double hits of the lead. The least-square method, using note duration, was used to score and select one accompaniment from the variations. The main problem with this approach is that since each accompaniment is selected independent of the previous segment, the accompaniment played in each segment does not establish any musical coherence or continuity with the previous segment. This results in incoherent or random accompaniment behavior across the different musical bars. One way to solve this problem was to introduce the constraint of musical continuity or coherence. Musical coherence establishes a relation between the current musical action of the system and the past actions of the lead or the system. 32 7.2 The Horizontal Continuity model This model imposes continuity constraints on adjacent bars of music. The accompaniment played in a succeeding bar is considered a musical continuation of the accompaniment played in the preceding segment. The musical changes between the bars is used to measure the extent of continuity between the different bars. L1 τ0 L2 τ1 L3 τ2 L4 A1 τ0 A2 τ1 A3 τ2 A4 Figure 7.2: Horizontal Continuity: secondary follows the lead changes In this model, the secondary system identifies the changes made by the lead between adjacent bars – and then generates accompaniment that follows similar changes. The note duration of the hits in the different bars is used to identify the change and the hits themselves are substituted with accompaniment choices that follow the change. The generated choices are all equally valid. This model provides a musical basis for generating choices. However, the generated choices sound too similar to the lead. One reason is that the secondary system copies too many parameters from the lead (i.e., loudness and changes in note duration). As a result, the generated choices can only vary in one dimension (diction), which is not sufficient to generate accompaniment that is musically interesting or sounds very different from the lead. 33 34 Chapter 8 The tension model This chapter describes the tension model that was developed to address the shortcomings of the previous models. Applied to the task of secondary accompaniment playing in Carnatic improvisational ensemble, the tension model is used as a constraint satisfaction mechanism to generate multiple accompaniments given the same lead. The intuitive notion of tension used here is analogous to that of musical pitch. Like musical pitch, tension increases or decrease between different beats; however, unlike pitch, tension does not have a meaning as an absolute value (like C, C#, etc). The notion of tension becomes meaningful only when compared across two hits. Figure 8.1 shows a visualization of tension for a particular set of secondary hits. The upward line in the diagram denotes increasing tension, the downward line denotes decreasing tension and a flat line denotes no tension change. Tension Diction tum tum ta tum Figure 8.1: Tension-relaxation visualization 8.1 Tension model applied to secondary playing Applied to the situation of secondary accompaniment playing in Carnatic improvisational ensemble, the idea of tension equates to the notion 35 of sowkhyam of accompaniment in a performance. The term sowkhyam roughly translates to musical consonance. In this work, the tension is used to numerically represent the sowkhyam of secondary accompaniment playing with respect to a lead percussion sequence. In this thesis, tension has an inverse relationship with sowkhyam. As the numerical value of tension increases between lead and secondary increases, the sowkhyam (consonance) between them decreases. As the numerical value of tension decreases between lead and secondary increases, the sowkhyam of accompaniment (consonance) increases. Let us briefly look at how the tension model is used to generate multiple secondary accompaniments. 8.2 Tension model applied to generate multiple accompaniments Restricted sowkhyam Intact M usicalCorrespondence Change T ension1 Restricted sowkhyam M usicalcorrespondence T ension2 T ensioncalculation T ensioncalculation (Diction, noteduration, loudness) (Diction, noteduration, loudness) Figure 8.2: Tension between lead and secondary The tension model is used as a constraint satisfaction strategy for generating multiple accompaniments given the same lead. Figure 8.2 illustrates the hierarchy of constraints under which the tension model can be used to generate multiple valid accompaniment sequences. At the lowest level, the music is specified as diction, note duration and loudness. The highest level is described by musical descriptions of accompaniment playing style (compliant, interactive or proactive). Tension forms the intermediate level that connects the different layers of musical descriptions. The diagram illustrates how the system uses the tension model to generate multiple accompaniments that satisfy a musical constraint (accompaniment playing style). The following chapters describes this process in greater detail. Appendix C shows the perceptual scores, assigned to diction, loudness, and note duration, which are used in tension calculation. Chapter 9 explains how the tension synthesis protocol is used to generate multiple valid accompaniments that satisfy certain constraints. 36 Chapter 9 Tension synthesis protocol This chapter describes the main steps involved in synthesizing recordings with variant valid accompaniment. The process of synthesizing new recordings with variant secondary accompaniment involves first selecting a recording and then manually transcribing the lead and secondary percussion tracks. The system then converts the transcription into numerical representation of tension for hits, beats, and bars. And then, using the tension representation to derive constraints, it generates a new secondary accompaniment score, which is then played with the original melody and the lead percussion track to form a new performance recording. The main steps involved in synthesizing recordings with variant valid accompaniment is as follows: 1. Choose Carnatic performance recording. The process starts with an appropriate recording of a Carnatic musical performance consisting of melody (vocal and violin), lead percussion, and secondary percussion performed by humans. 2. Choose sixteen-bar sample of performance recording. Sixteen bars are selected in which lead and secondary perform simultaneously – and in which improvisations do not extend beyond a single bar. 3. Transcribe the sixteen-bar selection. The 16-bar selection of the lead percussionist and secondary percussionist tracks are manually transcribed. 4. Compute tension scores for each hit The system quantifies data from the transcription by converting them into numerical tension scores for each hit in the transcription. 5. Compute tension scores for each beat. Based on hit tension scores, the system calculates tension scores for each beat. 37 6. Compute tension range for each bar. Based on the average of tension ranges for each beat in a bar, an average tension range is computed for that bar. 7. Generate all viable accompaniment sequences The system generates a table of all the accompaniment sequences that meet the tension constraints for a bar (with one table created for each bar). 8. Construct secondary transcription for entire piece. Multi-bar sequences are constructed by making use of the repetition structure to sequence randomly selected, valid bars of accompaniment. 9. Synthesize performance A synthesized performance recording is then generated by creating an algorithmically synthesized secondary track from the transcription and combining it with the lead and melody tracks. Note that steps 1-3 are manual (done by the researcher) – and steps 4-9 are algorithmic (automatically done by the system). The sections below provide detail for each of these steps. 9.1 Choose Carnatic performance recording The process starts with an appropriate recording of a Carnatic musical performance consisting of melody (vocal and violin), lead percussion, and secondary percussion performed by humans. The selection of a particular recording is based on whether the original secondary accompaniment meets the requirements for a) discretionary playing, and b) the “one bar improvisation” rule. Discretionary playing refers to secondary accompaniment which is not constrained by the exact rhythmic changes played by the lead. The “one-bar improvisation” rule means that any discretionary secondary playing that starts in a bar ends within the same bar. One way to identify discretionary secondary playing is by the fact that, during discretionary playing, the accents of the secondary are not aligned with those of the lead. If the accents align with the lead’s playing at the beginning of the next bar, then the discretionary playing ended with the previous bar. If the accents continue to misalign with the lead’s playing even after the next bar begins, then the discretionary playing continues in the next bar also. Continuous segments of secondary playing that comply with the “onebar improvisation” rule were observed more often in recordings of songs with tempos in the range of 80-120 bpm (beats per minute). However, the 38 system developed as part of this thesis is not designed to handle tempos greater than 90bpm. So the rest of the protocol deals only with songs that have tempos in the range of 80-90bpm. 9.2 Choose a sixteen bar sample of performance recording Sixteen bars are selected in which lead and secondary perform simultaneously – and in which discretionary improvisations do not extend beyond a single bar. The sixteen bars are selected from the repetitions of the first line of the anupallavi (second section of the song). Typically, the secondary starts playing only from the anupallavi of the song. Hence, sections of performance that occur from the second section onwards were considered for analysis. The sixteen bars are selected such that they contain no secondary bars with improvisations that extend beyond a single bar. 9.3 Transcribe the sixteen-bar selection The 16-bar selection of the lead percussionist and secondary percussionist tracks of are manually transcribed. The transcription is a musical-score like representation of the lead and the secondary hits. Each of the lead and secondary hits are given symbols and aligned with the beats in which they are played. The transcription captures three pieces of information about each hit of the lead and secondary playing: diction, note duration, and loudness. • Diction of a hit refers to the type of stroke played on the lead and secondary drums (eg, ta, tum, num, dhin, etc). • Note duration of a hit specifies the duration (in beats) between the hit’s onset and offset. There are two beat durations used in this system (1 beat or 0.5 beats). • Loudness of a hit refers to the perceived loudness level (as heard by the researcher) on a scale from 0.0 (pause/rest) to 1.0 (very high) The transcription notation used here mostly resembles the standard notation used in Carnatic scoring for percussion, so it should be straightforward to interpret for someone who is familiar with the standard notation. However, the research reported here required three modifications to standard notation: 39 1. for double hits (two hits played in the duration of the same beat) 2. for loudness 3. for the rhythmic repetition of bars The sections below provides details about each of these. For details about how the transcription was stored on the computer, see Appendix section D. 9.3.1 Transcribing double hits The traditional notation for double hits (i.e., underlines or overlines) is not suitable for computer-based transcription and processing, so brackets were used for double hits instead. Hits in brackets are to be interpreted as double hits played on the same beat. For example, (num thi) is a double hit played on a single beat (say, beat 4) and (num thi) (thi ri) are double hits played on two beats (say, beats 4 and 5). Some lead and secondary hits used in the transcription need additional clarification for correct interpretation. In the case of the lead strokes, these are hits played together using both the hands and in the case of the secondary strokes, these are hits played with pitch bend. There are three cases where two hits/hit+action are combined to form a new hit: • The lead hit “dheem” is played as “dhin+thom”. • The lead hit “dham” is played as “nam+thom” • The secondary hit “tumki” is played as “tum+pitch bend” Note that although people who know the genre would know that such diction exists – and could also recognize it when they hear it – they might spell or understand it differently when interpreting a textual transcription. Thus, explicitly indicating that, e.g., a “dheem” and “tumki” should be broken into constituent basic strokes is helpful and avoids any confusion in interpretation. 9.3.2 Transcribing hit loudness The standard scoring notation does not have a separate loudness representation, but it is essential to the synthesis algorithm, so notation was introduced into the transcription. 9.3.3 Transcribing rhythmic repetition of bars The entire performance is manually notated in terms of rhythmically unique accompaniment sequences and their repetitions. 40 Rhythmic repetition improves validity by musically connecting the different bars through repetition of rhythm patterns. Repeating rhythms at regular intervals (of bars) establishes a continuity between the different bars of playing that makes the music sound more coherent, meaningful, and valid compared to only randomly selecting sequences that meet the tension constraints. Thus, the rhythmic repetition structure of the original performance was used in the algorithmic synthesis to improve the sowkhyam of the algorithmically generated accompaniment. The notation of rhythmic repetition structure represents the performance of the secondary percussionist in terms of which accompaniment sequences are rhythmically unique and which accompaniment sequences are rhythmic repetitions (of previously played sequences). For each secondary performance, the rhythmic repetition structure is manually notated. The process followed to notate rhythmic repetition structure is: 1. Classify every accompaniment sequence played in the piece as either rhythmically unique or repetitive. An accompaniment sequence is rhythmically unique when its transcription is not rhythmically identical to the transcription of any accompaniment sequence played earlier in time. An accompaniment sequence is rhythmically repetitive if its transcription is rhythmically identical to the transcription of another accompaniment sequence played earlier in time. 2. Label the first occurrence of each rhythmically unique accompaniment sequence (e.g., 1, 2, 3, etc). 3. Label subsequent occurrences of previously labeled rhythmic accompaniment sequences with the same label. 4. Using the labels, the accompaniment structure is constructed. To make this more concrete, consider the 8-bar example in Table 9.1. Bar Repetition structure B1 B2 B3 B4 B5 B6 B7 B8 1 2 1 2 3 2 1 2 Table 9.1: Rhythmic repetition of bars It has three rhythmically unique accompaniment sequences: 1, 2, and 3. Accompaniment 1 is played first in bar 1 and then repeated in bars 3 and 7. Accompaniment 2 is first played in bar 2 and repeated in bars 4, 6, and 8. Accompaniment 3 is first (and only) played in bar 5. The use of the rhythmic repetition structure in algorithmic synthesis will be further elaborated below in Section 9.8. 41 9.4 Compute tension scores for each hit The system quantifies data from the transcription by converting them into numerical tension scores for each hit in the transcription. The numerical calculation of tension score from the triplet of (diction, note duration, loudness) forms the first step in the tension based analysis of sequence of percussive hits. The hit tension score (hts) is separately obtained for each lead and secondary hit (h) in the transcription. This is calculated by multiplying the numerical weights associated with the diction, note duration, and loudness of each lead/secondary hit. Algorithm 1 Hit tension score calculation hts(h) = weightdiction(h) ∗ weightnoteduration(h) ∗ weightloudness(h) To make this more concrete, consider the 4-hit example in Table 9.2. Hits H1 H2 H3 H4 1 2 1 2 Diction tum ta tum ta Note duration 1.0 1.0 1.0 1.0 Loudness 0.5 0.25 0.25 0.25 HIT TENSION SCORE (hts) 2.5 1.5 2.0 1.5 Repetition structure Table 9.2: Tension scores for each hit For the details about quantification of hit diction, hit note duration, and hit loudness, see the Appendix, section C. 9.5 Compute tension scores for each beat Based on hit tension scores, the system calculates tension scores for each beat. A single beat was considered as the smallest unit of tension comparison. In a performance, the lead or the accompaniment sequence is usually beat synchronized and different beats in the sequence can have different number of hits played in them. Having a different number of hits on different beats is a problem when it comes to comparing tension scores across beats. In order to keep the model simple and clear, it was assumed that 1) one-beat is the smallest unit of numerical tension calculation and comparison and 42 2) the tension scores of multiple hits played in the span of one beat were added to obtain a single value associated with a beat (beat tension score). Beat tension scores are calculated by adding the tension scores of all hits (h) that occur on a whole beat (b) and on the immediately succeeding half-beat (b+0.5 beats). Beat tension score is calculated as per the formula: Algorithm 2 Beat tension score calculation bts(b) = ts(hb ) + ts(hb+ 0.5 ) To make this more concrete, consider the 2-beat example in Table 9.3. Beats B1 B2 H1 2.5 0.9 H2 1.5 2.0 H3 2.0 2.0 H4 1.5 2.0 BEAT TENSION SCORE (bts) 7.5 6.9 Table 9.3: Tension scores for each beat Beat tension scores are obtained separately for each beat of the lead and for each beat of the secondary. A four-beat lead and secondary sequence, therefore, would have eight beat tension scores, one for each of the four beats of the lead and one for each of the four beats of the secondary. 9.6 Compute tension range for each bar Based on the average of tension ranges for each beat in a bar, an average tension range is computed for that bar. The average tension range for a bar is the constraint that determines which secondary sequences are acceptable and which are not, based on their tension scores. As per the constraint, secondary sequences whose average tension score lies within the tension range are considered as valid accompaniment and secondary sequences whose average tension score lies outside the tension range are considered as invalid accompaniment. The rest of the section explains the tension range calculation using an example problem. The procedure for calculating the average tension range (ATR) for a bar is: 43 1. Normalize the beat scores 2. Identify the tension range for each beat in the bar 3. Assign the lowest beat tension scores to the secondary and highest beat tension scores to the lead. 4. Calculate the tension zero point (TZPbar ) as the arithmetic mean (AM) of the swapped lead tension scores. 5. Calculate the tension range (bottom) as the arithmetic mean (AM) of the swapped secondary tension scores. 6. Calculate the tension range (top) as 2*TZPbar - tension range (bottom). 7. Assign tension range (top) and tension range (bottom) as the average tension range for the bar. The example in table 9.4 is used to explain the calculation of the tension ranges for a four beats of lead and secondary hits. • The tension scores for the lead are [0.9 2.0 2.0 4.0] and the tension scores for the secondary are [1.5 2.5 1.5 2.5]. • The highest tension score for the lead is 4.0 – and the lowest tension score for the secondary is 1.5. Beat1 Beat2 Beat3 Beat4 Lead 0.9 2.0 2.0 4.0 Secondary 1.5 2.5 1.5 2.5 Table 9.4: Computing TZP and tension range for a bar The lowest beat tension scores are assigned to the secondary and highest beat tension scores to the lead. The idea here is to use the lead’s tension score as a reference for minimum tension. The secondary’s tension score should be as close to the lead’s tension score (closer the secondary to the lead, more the sowkhyam) as possible – either below it or above it. Therefore, whether the lead score is greater that the secondary (or vice versa) is less important than the actual distance between the scores. For this calculation, all the beats are checked to ensure that the lower of the two tension scores is associated with the secondary; if not, the scores are swapped. Beat1 Beat2 Beat3 Beat4 AVG(AM) Lead 1.5 2.5 2.0 4.0 2.5 Secondary 0.9 2.0 1.5 2.5 1.725 Table 9.5: Computing TZP and tension range for a bar 44 Table 9.5 shows the result of swapping values so that the lead scores are greater than the secondary scores. The resulting average tension scores after the swap are: • average tension score (bar) (tsbar ) for the lead: 2.5 • average tension score (bar) (tsbar ) for the secondary: 1.725 These values are then used to complete the tension range calculations. Bar Tension Range (top) Tension Zero Point (TZPbar ) 3.275 2.5 Tension Range (bottom) 1.725 Tension span 1.55 Table 9.6: Computing TZP and tension range for a bar Table 9.6 shows the different terms associated with tension range and the results of the calculations. The different terms and their calculations are explained below: • Tension Zero Point (TZPbar ): It is the arithmetic mean (AM) of the swapped lead tension scores. Tension zero point (TZPbar ) = AMswappedleadtensionscores • Tension Range (bottom): It is lower value of the tension range and is calculated as the arithmetic mean of swapped secondary tension scores. Tension range (bottom) = AMswappedsecondarytensionscores • Half tension span: The difference between the TZP and tension range (bottom). Half tension span (HTSbar ) = TZP - tension range (bottom). • Tension span: It is twice the half tension span. tension span (TSbar ) = 2*HTSbar • Tension range (top): It is the upper value of the tension range and is calculated as: Tension range (top) = 2*AMswappedleadtensionscores - AMswappedsecondarytensionscores . (or) Tension range (top) = TZPbar + HTSbar . In table 9.6, this is represented a new row created above the tension zero point (TZPbar ). 45 Using the calculations described above, the values assigned in the example are: • the average tension zero point of the bar (TZPbar ) is 2.5. This means that a bar of secondary accompaniment with an average tension score (tsbar ) of 2.5 would mean that the system would rate the validity of the accompaniment the highest (i.e., zero tension between lead and secondary) • The lower and upper values for the average tension range (ATR) are 1.725 and 3.275, respectively. This means that a bar of secondary accompaniment with an average tension score anywhere between those values would be treated as valid accompaniment for the lead in that bar, and a bar of secondary accompaniment with an average tension score outside of that range would be treated as invalid accompaniment for the lead in that beat 9.7 Generate all viable accompaniment sequences The system generates a table of all accompaniment sequences that meet the tension constraints for a bar (with one table created for each bar). To do this, the system first generates all possible combinations for each beat, eliminates the nonviable diction subsequences and selects the required accompaniment from the viable accompaniment subsequences. In summary, the steps are as follows: 1. Enumerate all unique triplet values for each beat For each beat of a bar, create an exhaustive listing of all possible triplet combinations. 2. Collect all viable 8-beat (1-bar) sequences For each beat, retain only triplets which can form viable diction sequences of one bar (8beats) – then store all viable 8-beat (1-bar) sequences 3. Collect secondary sequences that meet tension constraints For each bar, identify and store secondary sequences that meet tension constraints Each of these steps is described in more detail below. 46 9.7.1 Enumerate all unique triplet values for each beat For each beat of a bar, create an exhaustive listing of all possible triplet combinations. Beat 1 Beat 2 (ta,1.0,0.5) (ta,1.0,0.5) (ta,1.0,0.5) (ta,1.0,0.5) (ta,1.0,1.0) (ta,1.0,1.0) (ta,1.0,1.0) (ta,1.0,1.0) (tum,1.0,0.5) (tum,1.0,0.5) (tum,1.0,0.5) (tum,1.0,0.5) (tum,1.0,1.0) (tum,1.0,1.0) (tum,1.0,1.0) (tum,1.0,1.0) (ta,1.0,0.5) (ta,1.0,1.0) (tum,1.0,0.5) (tum,1.0,1.0) (ta,1.0,0.5) (ta,1.0,1.0) (tum,1.0,0.5) (tum,1.0,1.0) (ta,1.0,0.5) (ta,1.0,1.0) (tum,1.0,0.5) (tum,1.0,1.0) (ta,1.0,0.5) (ta,1.0,1.0) (tum,1.0,0.5) (tum,1.0,1.0) Table 9.7: Lookup table for 2-beats Table 9.7 illustrates the enumeration process with an example using two strokes (ta and tum), a single hit duration of 1.0, and two loudness values (0.5 and 1.0). 9.7.2 Collect all viable 8-beat (1-bar) sequences For each beat, retain only triplets which can form viable diction sequences of one bar (8-beats). Then store all viable 8-beat (1-bar) sequences. In order to understand the process of computing musically valid accompaniment subsequences, consider the examples in Table 9.8 and 9.9. Beat 1 Beat 2 A1 ta te A2 ta tum Table 9.8: Possible 2-beat diction combinations 47 Beat 2 Beat 3 B1 tum ta B2 ta tum Table 9.9: Possible 2-beat diction combinations The choices between beat number 1 and beat number 2 are ta-te (A1 ) and ta-tum (A2 ) and the choices generated for beat number 2 and beat number 3 are tum-ta (B1 ), ta-tum (B2 ). Out of these, only the combination of A2 and B1 can form a viable subsequence of 3 hits, as shown in Table 9.10. Beat 1 Beat 2 Beat 3 ta tum ta Table 9.10: Valid 3-beat diction combination Note that the transition between bars is not considered since bars are selected independently of each other when constructing a multi-bar sequence. Finally, all viable 8-beat (1-bar) sequences are stored in a temporary lookup table (i.e., one temporary lookup table for each bar). 9.7.3 Collect secondary sequences that meet tension constraints Using the tables with viable 8-beat secondary sequences as lookup tables, for each bar, the system creates a new table of all valid secondary sequences that meet the tension constraints. For typical operation, the only constraint is that the average tension score (tsbar ) of the secondary’s new bar of accompaniment is within the average tension range (ATR) of the corresponding bar in the original performance. Thus, the procedure is as follows for creating tables (one for each bar) of viable 1-bar sequences that meet the tension range constraints of the original performance: 1. The average tension score (tsbar ) of each secondary bar sequence in the lookup table is compared with the average tension range of the corresponding bar in the original performance 2. Each secondary bar sequence in the lookup table with a tsbar value that falls within the original ATR is added to a new temporary lookup table containing all viable secondary accompaniment sequences (i.e., one table for each bar) 48 In order to make this concrete, consider a 2-bar example which shows the average tension scores for each bar of the original performance. Bar1 Bar2 Tension Range (top) 3.275 5.6 Tension Range (bottom) 1.725 1.5 Table 9.11: Two bars (average tension scores) Table 9.11 illustrates that any viable 8-beat secondary sequence with an average tension score (tsbar ) must be between the following values to meet the tension range constraints for each bar: • Bar1: between 1.725 and 3.275 • Bar2: between 1.5 and 5.6 Thus, selecting any combination (of one sequence from Bar1 and one sequence from Bar2) from the columns shown in Table 9.12 would result in a valid 2-bar accompaniment sequence. Bar1 Bar2 TENSION RANGE (top) 3.275 5.6 8-bar secondary sequence 3.26 5.59 8-bar secondary sequence 2.6 3.7 8-bar secondary sequence 1.73 1.51 1.725 1.5 TENSION RANGE (bottom) Table 9.12: Two bars of valid sequences Thus, the system would treat any of the possible nine 2-bar accompaniment sequences (based on the options above) as valid accompaniment for the original lead performance in those two bars. In summary, using the process mentioned above, the system selects valid accompaniment sequences for each bar and stores them in separate temporary lookup tables (one for each bar). The result is a set of tables, each of which contains 1-bar sequences that the system considers valid accompaniment for the lead performance in the corresponding bar. Note that there are special situations where the secondary sequences need to meet some additional criteria. For example, the goal might be to generate accompaniment that has the least amount of tension possible – or, for a study, to generate accompaniment so that each bar of accompaniment falls a consistent distance outside the valid range. Generating such 49 variations require slightly different algorithms for selecting desired accompaniment sequences. Such variations will be discussed in the 11 section when the specific system modifications done for the study are described. 9.8 Construct secondary transcription for entire piece Multi-bar sequences are constructed by making use of the repetition structure to position randomly selected, valid bars of accompaniment. For each rhythmically unique bar, one valid accompaniment sequence is randomly selected from the temporary look-up table for that bar. For every rhythmic repetition of the bar that follows, the same accompaniment sequence is assigned. To make this more concrete, consider the 8-bar example in Table 9.13. Bar Repetition structure Accompaniment sequence B1 B2 B3 B4 B5 B6 B7 B8 1 2 1 2 3 2 1 2 A1 B1 A1 B1 C1 B1 A1 B1 Table 9.13: Rhythmic repetition of bars, with accompaniment Imagine that the temporary lookup table of valid accompaniment sequences for each rhythmically unique bar (1, 2, and 3) are [A1 , A2 , A3 , . . . ], [B1 , B2 , B3 , . . . ], and [C1 , C2 , C3 , . . . ]. One accompaniment sequence is randomly chosen for each of A, B and C. If, for example, the chosen accompaniments are A1 , B1 and C1 , then using the Table rhythmic repetition structure shown in the second row of 9.13, the system creates the accompaniment transcription shown in the third row of Table 9.13: A 1 -B1 A1 -B1 -C1 -B1 -A1 -B1 . The transcription is interpreted as follows: 1. Accompaniment A1 is randomly selected for the first uniquely labeled beat in bar 1. 2. Then, the same A1 is repeated for each repetition of the uniquely labeled first beat of bar 1 (i.e., for bars 3 and 7). 3. After all repetitions of the first bar are assigned, the system starts from the beginning to see if there is a second uniquely labelled bar. If so, the same process is repeated for the second uniquely labeled bar (i.e., B1 is assigned first to bar 2 and then to repetitions of bar 2: bars 4,6, and 8). 50 4. This process continues until all the bars have been assigned accompaniment sequences. This transcription is used to algorithmically synthesize the accompaniment track. 9.9 Synthesize performance Finally, a synthesized performance recording is then generated by combining: • the melody track (taken from the original recording) • the lead percussion track (taken from the original recording) • the algorithmically synthesized secondary track The synthesis from transcription is achieved by sequentially triggering the appropriate lead and secondary sounds at the specified loudness level for a particular duration at a given tempo. 51 52 Chapter 10 Tension synthesis: practical details This chapter describes the different steps in the synthesis process in terms of the different technologies used to implement them. The actual process of synthesis involves separating tracks from an original recording, storing transcriptions, sequencing audio from a transcript, and synthesizing a new recording. Two technological tools were used in the thesis: Audacity and Clojure. Audacity was used for separating the audio recordings. Clojure was used to store, sequence and synthesize recordings. 10.1 Separating tracks from original recording Transcription was based on listening to percussion tracks separated from the original recording. In order to separate the recording into tracks, the performance recording is segregated into two parts – the melody and the percussion. This is done using another audio editing environment called Audacity. The steps for separating the recordings are: 1. The performance recording (.wav) file is loaded into Audacity. 2. A copy of the entire performance is created and separated into the left and right tracks. 3. The right track is inverted using the “invert” function of audacity. This removes a major portion of the melody from the recording. 4. Now, both the tracks are selected and the noise profile is obtained. This gives the noise profile of the percussion hits. 5. The noise profile of the percussion is removed from the original recording to remove the percussion sounds from the recording. 53 6. The melody track thus obtained is stored as an .wav file and used in the re-synthesis. 7. The percussion audio, that was separated in the third step, is notated. It is to be noted that this procedure does not produce melody track of a high quality (studio recording). It produces music that is of an acceptable quality for evaluation of sowkhyam. This was personally verified by the reviewer, informally with pilot users and with the participants during study. 10.2 Storing the transcript The system was coded in a language called Clojure. Clojure has an environment called Overtone, that provides facilities for real-time musical sequencing. Musical transcriptions inside Clojure are stored as Clojure lists. The list containing the transcription is given a variable name and the transcription is accessed using that name. Operations on transcription lists include sequencing them to produce musical sounds, breaking them to smaller sublists (8 beats) for analysis etc. Transcriptions of different performances are stored in separate Clojure (.clj) files. Clojure files are much like other program files (C, Javascript) and contain both data and executable code. 10.3 Sequencing audio from a transcript Overtone’s sequencer operates directly on Clojure lists. The sequencer reads every symbol in the list and performs the corresponding musical action. It reads diction to retrieve an audio file associated with the diction. It reads the loudness value to trigger the audio file at the specified loudness. It reads note duration to estimate the duration that the audio should be played for. Reading an entire lists, the sequencer sequences audio samples at specified intensity and specified duration to create a music track. 10.4 Creating a new recording Clojure provides a built-in audio recording functionality to record the musical output played by the sequencer. After the recorder is started, the sequencer synthesizes the lead and secondary transcriptions and plays it along with the original melody (.wav) file. The recorder stores the audio playback as a wav file. These recordings are presented in the study. 54 Chapter 11 Study protocol This chapter describes the study with musical experts, that was conducted in order to evaluate the ability of the system to produce alternate valid secondary accompaniments for a Carnatic musical performance. In order to evaluate the ability of the system to produce alternate valid secondary accompaniments for a Carnatic musical performance, a study was conducted with musical experts to address three related research questions. Experts evaluated the sowkhyam of the accompaniment generated by the system. The study was designed to answer three research questions. They were: • RQ1: Does the system produce secondary accompaniment that is rated at least as high as the original accompaniment? • RQ2: Are accompaniments inside the range better (i.e., do variants within the range get higher scores than variants outside the range)? • RQ3: Do the ratings for accompaniment decrease as a function of the distance from the tension zero point? The methodology to answer these research questions was to generate variants that were qualitatively different from an evaluation standpoint. For this, six variants of 16-bar duration were created with different deviance values. These were presented to musical experts and their accompaniment is rated/evaluated. 55 DISCLAIMER: certain calculation errors crept into the algorithm – thus, the variants generated were not as systematically controlled as intended. Unfortunately, this was only detected in the post-study analysis. Although the variants and results are not as good as they could be, they still provide enough insight to be able to say something about the validity of tension model. Below, after presentation the intended variations, the actual variants and results are presented. Claims are based on analysis of the actual results of experts evaluating the variants that were actually presented to them. The evaluation study involved six male percussionists, each with at least eight years of experience playing lead/secondary instruments in Carnatic music performances. Six variations of three different performance recordings (i.e., 18 recordings) were presented to each participant. In a blind evaluation test, the participants rated the synthesized performances based on the sowkhyam of the accompaniment. To prevent possible order effects, the study was structured as a balanced Latin square design. The key aspect of this experimental design is that each participant encounters the conditions in a different sequence, but the sequences are designed to ensure that every single condition follows every other condition once during the study. 11.1 Participants The evaluation study involved six male percussionists with an average age of 22 years, and each with at least eight years of experience playing lead/secondary instruments in Carnatic music performances. Participants were recruited using snowball sampling. The first three participants were people who played lead or secondary percussion with a friend of the author; the remaining three were other percussionists recommended by the first three. The author didn’t know any of the participants personally, prior to the study sessions. Table 11.1 shows the participant number, age, and number of years of performance experience. 56 Participant Age Experience (years) P1 19 9 P2 21 8 P3 26 9 P4 21 8 P5 22 10 P6 21 10 Table 11.1: Participant data 11.2 Materials The materials used in the study included the documents needed for conducting each session, equipment to play the recordings, and audio files of the recordings. 11.2.1 Documents Six different documents were used during the study sessions. 1. Researcher session script: outlines the sequence of steps that the researcher followed while conducting the study. 2. Demographic questionnaire: was used to collect information about the participant’s age, gender, years of relevant performance experience, relevant musical degrees and other relevant experience (e.g., teaching, composing). 3. Recording sequencing document: was used by the researcher to play the recordings in a unique sequence for each participant 4. Participant observation form: was used by the researcher to take notes and comments from the participants during the study. 5. System evaluation form: was used by the researcher to record the ratings given by participant to each variant. 6. Definition sheet (given to participants): contained definitions of three aspects of secondary accompaniment and explained the criteria for rating them based on sowkhyam. Each of these documents used in the study are given in the Appendix F. 57 11.2.2 Equipment The equipment used for the study was a Hewlett-Packard G62 laptop with Windows 7 operating system (for playback of recordings) – and JVC HAS360 headphones (for the participant to listen to the audio from the laptop). 11.2.3 Recordings (original) Three different performance recordings were used in the study. For each performance recording, six variants that differed in the content of the secondary accompaniment were generated and used in the study: one synthesized version of the original accompaniment and five algorithmically generated variant versions of accompaniment. The recordings were: • Recording 1 was a live performance of the song “Ragasudha Rasa” by the vocalist Shri Sandeep Narayan, violinist Shri Rajeev, mridangist Shri Rohit Prasad, and kanjirist Shri Anirudh Athreya. The performance took place Shri Krishna Gana Sabha, Chennai on 10th August, 2013. The source of the recording was Youtube https://www.youtube.com/watch?v=seAJD_9ZrG0. In the recording, the selected song begins and 0:00 and ends at 3:53 minutes. • Recording 2 was a live performance of the song “Vande Maataram” by the vocalist Shri Sanjay Subrahmanyan, violinist Shri S Varadarajan, mridangist Shri Mannargudi Easwaran and kanjirist Shri S Venkatramanan. The performance took place in Chennai on 2nd October, 2011. The venue of the performance is unknown. The source of the recording was Youtube https://www.youtube.com/watch?v= OVAfMaZGG5o. The entire recording contains only the one song. The song begins at 0:00 and ends at 8:21 minutes. • Recording 3 was a live performance of the song “Rama Ni Pai” by the vocalist Shri Ramakrishnan Murthy, violinist Shri Vittal Ramamurthy, mridangist Shri Manoj Siva and the kanjirist Shri Anirudh Athreya. The performance took place in Chennai, as a part of the “Voices of Tomorrow 2012” concert festival. The source of the recording is Youtube https://www.youtube.com/watch?v=w57DOvP1Ks8. In the recording, the song begins and 0:00 and ends at 12:30 minutes. All three recordings were set to Adi talam. For each of the recordings, sixteen bar samples were drawn from the repetitions of the first line of the anupallavi of the song. For all the recordings, the second and third repetitions of the first line were selected. 58 • Selection 1 starts with the second repetition of the line “yaagayoga thyaga” at 2:31. The sixteen bars are selected between 2:31 and 2:53 minutes. • Selection 2 starts with the second repetition of the line “aayiram undu ingu jaathi” at 1:13 minutes. The sixteen bars are selected between 1:13 and 1:34 minutes. • Selection 3 starts with the second repetition of the line starts “taamarasadala nayana” at 4:16 minutes. The sixteen bars are selected between 4:16 and 4:37 minutes. For each selected sample, the selected sixteen bar samples were transcribed. For transcriptions of the three recording selections, see Appendix D. 11.2.4 Recordings (with new accompaniment) Based on the transcriptions of the three 16-bar selections, the system was used to generate six different variants for each recording. In order to meet this requirement of the study, the original system was modified in two ways: 1. Resynthesize lead from transcription. In normal use, the system combines the original melody and lead percussion tracks with a synthesized secondary percussion track. However, for the purposes of the study, the recordings were generated by resynthesizing the lead percussion (from the transcription) so that, during the evaluation sessions, test subjects would not be distracted by acoustic differences between the original lead percussion and the algorithmically synthesized secondary percussion. Unlike the lead percussion track, the melody was not transcribed and resynthesized for the study. This was because melodic reconstruction is a significant challenge by itself – and even if implemented, it seemed likely that the acoustic quality of the resynthesized melody (vocal and violin) could become a source of distraction for the participants. Hence, the melody was used from the original recording after removing the lead and the secondary tracks. 2. Control tension deviance. This was added so that it was possible create recordings that combined the original melody and lead percussion (resynthesized) according to certain additional constraints. So, for the purposes of the study, the goal was to generate variant performances that were qualitatively different from an evaluation 59 standpoint. To do this, the system was used to generate variant accompaniment that fell at (different) specified distances from the TZPbar in each bar of the original. To understand why the standard algorithm is not sufficient, consider again the earlier 2-bar example. Bar1 Bar2 TENSION RANGE (top) 3.275 5.6 8-bar secondary sequence 3.26 5.59 8-bar secondary sequence 2.6 3.7 8-bar secondary sequence 1.73 2.6 1.725 1.5 TENSION RANGE (bottom) Table 11.2: Two bars of valid sequences As noted earlier, under normal operation, selecting any combination of sequences from Bar1 and Bar2 columns shown in Table 11.2 would result in a valid 2-bar accompaniment sequence. Using this system, it would only be possible to test if variants considered valid by the system were rated the same as the original. With a small adjustment, but without changing the algorithm in any fundamental way, it would also be possible to generate recordings with secondary accompaniment that fell outside the range. This would make it possible to test whether these were rated lower than those within the range. But it is also interesting to test whether the ratings for accompaniment changed as a function of the size of the tension difference between lead and secondary. To generate variant performances that have a consistent tension distance between lead and secondary accompaniment, however, does require a fundamental change to the algorithm used. Specifically, there needs to be a way to specify a specific value associated with a recording – and for that value to reflect tension distance between lead and secondary in a precise and meaningful way. Said another way, there should be a good way to quantify the distance of different recordings relative to some unifying frame. This is a bit tricky because the tension ranges for each bar are different. As a result, it would not be meaningful to quantify the deviance of a recording by picking a single average tension score (tsbar ) value that all secondary bars need to have. To see why this is problematic, consider the following two scenarios (in the context of the 2-bar example of valid sequences shown in 11.2): 60 • tsbar = 1.51 : this would mean that there could not be a valid accompaniment sequence from Bar1 • tsbar = 2.6 : this would mean that there are valid accompaniment sequences in both bars, but each of them is different relative to the TZP (and hence, the average tension score of the lead ) for that bar One alternative way is to specify the distance the secondary accompaniment sequence’s average tension score (tsbar ) relative to the lead’s average tension score (i.e., the TZP). In other words, variant performances will be specified by the amount of distance between lead (average tension score) and secondary (average tension score). Bar1 Bar2 TENSION RANGE (top) 3.275 5.6 8-bar secondary sequence 3.26 5.59 8-bar secondary sequence 2.6 3.7 8-bar secondary sequence 1.73 2.6 1.725 1.5 2.5 3.55 0.775 2.05 TENSION RANGE (bottom) TENSION ZERO POINT (TZP) Tension half-span Table 11.3: Two bars of valid sequences Using the same example values from Table 11.3, consider the following two scenarios: • tsdistancebar = -0.5 : this would mean that Bar1 would have a secondary accompaniment sequence with an average tension score of 2.0 (not shown in the table) – and Bar2 would have a secondary accompaniment sequence with an average tension score of 3.05 (also not shown in the table) • tsdistancebar = +1.0 : this would mean that Bar1 would have a secondary accompaniment sequence with an average tension score of 4.0 (not shown in the table) – and Bar2 would have a secondary accompaniment sequence with an average tension score of 4.55 (also not shown in the table). Note that the choice for Bar1 is outside the valid range! The example above illustrates a limitation of using absolute values for distance values – there is a risk that some values are outside the valid range. An alternative approach, used in the study reported here, is to use values that are computed as ratios (e.g., as a proportion of the bar’s range). 61 Using the same example values from Table 11.3, consider the following two scenarios (where half-span is half the tension range for a bar – i.e., half-span is the distance between the average tension score for the lead and the average tension score for the secondary): • tsdistancebar = 0.5*halfspan : this would be involve values that are half way between the TZP and top of the valid range Bar1 secondary tsbar : (+ 2.5 (* 0.5 .775)) = 2.8875 Bar2 secondary tsbar : (+ 3.55 (* 0.5 2.05)) = 4.5750 • tsdistancebar = -1.5*halfspan : this would be involve values that are 1.5 times the distance between the TZP and the bottom of the valid range Bar1 secondary tsbar : (- 2.5 (* 1.5 .775)) = 1.3375 Bar2 secondary tsbar : (- 3.55 (* 1.5 2.05)) = 0.475 For each recording, the intention was to create six variants which had the tension score distance values shown in Table 11.4. Variant Distance Comment V0 NA Original tension distance (inside range) V1 NA Original tension distance (inside range) V2 +0.3*halfspan Moderate tension distance (inside range) V3 -0.3*halfspan Moderate tension deviation (inside range) V4 +1.2*halfspan High tension deviation (outside range) V5 -1.2*halfspan High tension deviation (outside range) Table 11.4: Variants by distance value In terms of tension, the variants for the study were as follows: • V0/V1 (inside tension range): two recordings that had identical profiles to the original (i.e., bar-by-bar, tension was equivalent to original) • V2/V3 (inside tension range): two recordings in which the tension distance was 1/3 of the tension span distance (i.e., the average tension score of the secondary in V2 was 1/3 of the span above the lead – and the average tension score of the secondary in V3 was 1/3 of the span below the lead) • V4/V5 (outside tension range): two recordings in which the tension distance was 1.2 times the tension span distance (i.e., the average tension score of the secondary in V2 was 1.2 times the span, above 62 the lead – and the average tension score of the secondary in V3 was 1.2 times the span, below the lead) One brief comment about V0 and V1. In the original (V0), the tension span between lead and secondary varied quite a bit, bar by bar. The variant V1 was generated by the system in the normal way – namely, it used the bar-by-bar tension range in the original recording to select, bar by bar, secondary accompaniment that fell within the tension range set by the original recording. However, V1 does not have a consistent, fixed relationship between lead tension and secondary tension (the way the other variants for the study do). And, of course, V0 as the reference performance, does not follow the model of consistently having the same tension range – it simply maintains the half-span tension relationship (by definition), bar by bar. The different variants were stored as recording files for use in the study. During the generation of the recordings for the study, the loudness of the secondary track was increased to ensure that every secondary stroke played is clearly heard in the performance. The 18 audio files can be obtained from the study recordings archive. 11.3 Study Disclaimer The 18 variants for the study were meant to be distributed equally to the categories given in table 11.4: original, varying within tension range of the original, above the original but within tension range, below the original but within tension range, above the original but outside the tension range and below the original and outside tension range. But a mistake in the system resulted in 18 variants that had different tension profiles. This was not discovered until post-study analysis. The variations actually generated are summarized in the tables below (one for each recording). 63 Variant R1 (Distance) V4 9.750 Range (top) 6.875 V2 2.708 TZPREF 0.000 Range (bottom) -1.250 V3 -2.708 V5 -9.750 V0 NA V1 NA Table 11.5: Distance of variants used for recording 1 Variant R2 (Distance) Range (top) 4.500 V4 3.000 V2 0.833 TZPREF 0.000 Range (bottom) -0.25 V3 -0.833 V5 -3.000 V0 NA V1 NA Table 11.6: Distance of variants used for recording 2 Variant R3 (Distance) V4 6.600 Range (top) 4.500 V2 1.833 TZPREF 0.000 Range (bottom) -1.000 V3 -1.833 V5 -6.600 V1 NA V0 NA Table 11.7: Distance of variants used for recording 3 64 The distance values in each of the tables 11.5, 11.6 and 11.7 indicate in absolute numbers the tsdistance for each variant. The table also shows where the tension range boundaries are – and which variants are inside (and which are outside) the tension range. The mistake in the system was that the tension middle point was calculated as the average of the tension differences rather than as the arithmetic center of the tension scores. As a result, the six variants of each recording were not, as intended, symmetrically distributed on tsdistance of variants given in table 11.4. The consequences of the asymmetric distribution are described here: 1. The asymmetric distribution of the variants for each recording makes them incomparable. Hence, the results of the listener evaluation of were analyzed separately for each individual recording (R1,R2 and R3) and its six variants. 2. It is also not possible to conclusively answer the hypothesis RQ3 (Do the ratings for accompaniment decrease as a function of the distance from the tension zero point?). As the different variants were asymmetrically distributed, it was not possible to say that sowkhyam increases as a function of increasing/decreasing distance from a tension zero point. Although there are several problems raised by this bug, there are some claims that still hold valid. The results can still be used to answer the questions RQ1 and RQ2. The reasons are explained below: 1. RQ1 (Does the system produce secondary accompaniment that is rated at least as high as the original accompaniment?) was still valid. It is possible to argue from the results that the tension algorithm produced some variant accompaniments that were statistically indistinguishable from the original accompaniment. The research question RQ1, asks the question - Is it possible to find some accompaniment synthesized using the tension model that meets some musical standards of acceptability. The musical standards of acceptability were determined using the original recording as the control case. The answer to this research question does not assume that the sowkhyam of accompaniment variants have to be ordered in any particular way nor does it expect that some variants will be rated any higher or lower than the others. It is simply a comparison of the accompaniments variants with the original to find if there is statistically significant difference in their accompaniment ratings. 65 So, even though the variants were asymmetrically distributed, the results can still be used to claim that the tension model produces synthesized accompaniment that is equally valid as the original. 2. RQ2 (Are accompaniments inside the range better (i.e., do variants within the range get higher scores than variants outside the range)?) Using the revised interpretation which considers the lead as the tension zero point and the new range as twice the original range, it is still possible to argue that RQ2 is not affected. The research question RQ2, asks the question - Is it possible to find synthesized accompaniment (using the tension model) that lies within certain tsdistance of variant from the TZP point (tension boundaries), and has significantly different sowkhyam ratings than synthesized accompaniment that has tsdistance outside the boundary (tsdistance > tension boundaries or tsdistance < tension boundaries). As in the case of RQ1, the answer to this research question does not assume that the sowkhyam of accompaniment variants have to be ordered in any particular way. But it does expect that 1) some variants lie within the boundary and some variants lie outside the boundary, and 2) the variants within the boundary will be rated any higher or lower than the variants outside the boundary. This result was observed for the case of recordings R2 and R3 but not R1. The reasons will be discussed later. So, even though the variants were asymmetrically distributed, the results can still be used to find the settings under which the tension model produces synthesized accompaniment with different degrees of sowkhyam. The analysis of results are further discussed in section 12. 11.4 Study Session Protocol The following steps outline the protocol used to evaluate the validity of accompaniment variants. 1. Introduction The participants were first informed what was expected during the session: they are required to rate secondary accompaniments played to different performance recordings. 2. Gather demographic information The questions were read out to the participants and the responses were filled. 3. Explain evaluation criteria The participants were handed the definition sheet (with the definitions and rating criteria) and were informed about the definitions (timing, accents, and strokes) and rating 66 criteria. 4. Play recordings Participants were played each variant of performance recording (according to the sequencing order generated for the particular participant). 5. For each variant/recording played: a. Participants were asked questions to ensure that they were not focusing on creativity or quality of accompaniment. b. Participants were asked to rate the timing, accents, strokes and the overall accompaniment using the definition sheet. c. Participants were again asked questions to check if they were consistent in their evaluation. The following sections provide more detail for each of these steps. 11.4.1 Gather demographic information Demographic information about the participant’s age, gender, years of relevant performance experience, relevant musical degrees and other relevant experience (e.g., teaching, composing) was collected. Information from the last four categories in the list were used as credentials to determine the musical expertise of the participant. See Appendix F.2 for the demographic questionnaire used in the study. 11.4.2 Explain evaluation criteria Participants were informed about the restricted notion of sowkhyam of overall accompaniment and were asked to focus on three specific aspects of the restricted notion of sowkhyam in order to evaluate the secondary accompaniment of a recording. The three aspects of secondary playing were: 1. Timing and synchronization: the alignment of secondary hits with the tala (beat positions) of the performance 2. Secondary accents: the sequence of strong and weak hits played in the rhythm pattern 3. Secondary strokes: the exact selection of secondary strokes played on the different (accented and non-accented) beats in a rhythm pattern Using a scale of 1-5 (low to high), participants were also informed about how to rate the sowkhyam of the overall accompaniment as well as the sowkhyam of each of the three aspects for each variant of a performance recording. 67 They were also provided with a definition sheet to help remind them of the definitions and rating scale to be used. See Appendix F.6 for the participant definition sheet used during the study. 11.4.3 Sequencing the recordings The recordings were presented in a counterbalanced order to the participants. There are six possible ways of sequencing three performance recordings such that order effects have been balanced out. The sequence of presenting the performance recordings to participants is shown in table 11.8. Participant Sequence P1 R1 - R2 - R3 P2 R1 - R3 - R2 P3 R2 - R1 - R3 P4 R2 - R3 - R1 P5 R3 - R1 - R2 P6 R3 - R2 - R1 Table 11.8: Recording sequences for participants There were six variants of each performance recording. For the six variants, a balanced Latin square was generated. The method used to generate the Latin square was: 1. Each row is generated as [0 1 n-1 2 n-2 3 ..]. At each row, the first number and the last number are incremented by 1. The last number wraps around 0..n-1. 2. Using this method, the first row was [0 1 5 2 4 3]. 3. In next (second) row, the first and last number are increased. The first number becomes 1 and the last number wraps around to become 0. The second row was generated as [1 2 0 3 5 4]. 4. Continuing this process, all six rows were generated to get a balanced Latin square. This was replicated across the 3 recordings. The rows (participants) remain constant across the recordings. The columns (content of the performances) changes across recordings. Based on this, the balanced Latin squares were replicated. Table 11.9 shows the sequence of recordings presented to the participants: 68 Participants P1 P2 P3 P4 P5 P6 R1-V0 1 3 5 6 4 2 R1-V1 2 1 3 5 6 4 R1-V2 4 2 1 3 5 2 R1-V3 6 4 2 1 3 5 R1-V4 5 6 4 2 1 3 R1-V5 3 5 6 4 2 1 R2-V0 2 1 3 5 6 4 R2-V1 4 2 1 3 5 6 R2-V2 2 4 2 1 3 5 R2-V3 5 6 4 2 1 3 R2-V4 3 5 6 4 2 1 R2-V5 1 3 5 6 4 2 R3-V0 4 2 1 3 5 6 R3-V1 6 4 2 1 3 5 R3-V2 5 2 4 2 1 3 R3-V3 3 5 6 4 2 1 R3-V4 1 3 5 6 4 2 R3-V5 2 1 3 5 6 4 Table 11.9: Variant sequences for participant 11.4.4 Evaluate recordings Participants listened to each audio recording and rated them. They were asked a set of questions before and after their rating to ensure that they had focused on the restricted sowkhyam to rate the accompaniments. The following steps describe the process: 1. Participants listen to each variant of each recording. 2. After participants listen to a variant, they are first questioned to find if they had focused on restricted sowkhyam or other factors like creativity or quality. a. Test for use of creativity: (a) Did you think the kanjira’s creativity at any point in the performance was worth mentioning? (b) (If NO): Okay, let’s go to the evaluation. (c) (If YES):Do you think you will use this creativity to rate any of the 3 aspects (timing, accent, strokes) or the overall accompaniment? (d) (If YES): The ratings should be assigned without any additional 69 points given to this brilliance. The ratings should be given only based on the rating criteria given in the definition sheet. (e) I will replay the performance again. Please rate it without focusing on this additional brilliance criterion. b. Test for use of quality: (a) Do you think the nadam of the lead strokes sounded dull at any point? Do you think the nadam of the lead strokes sounded rich at any point? Do you think the secondary strokes sounded dull at any point? Do you think the secondary strokes sounded rich at any point? Do you think you used dullness or richness to rate any of the 3 aspects (timing, accent, strokes) and the overall accompaniment”. (b) (If NO): Okay, let’s go to the rating. (c) (If YES):Do you think you will use this quality to rate any of the 3 aspects (timing, accent, strokes) or the overall accompaniment? (d) (If YES) Those are not the main important aspects of the study. I will replay the performance again. (e) Given the particular quality of the strokes used in the recording and not focusing on rating the creativity, please rate the sowkhyam of the accompaniment. 3. Once it has been ensured that participants are not focusing on other things (e.g., quality or creativity), they are asked to rate the consonance of timing, accents, and strokes and the accompaniment on the whole using the evaluation criteria in the definition sheet. 4. After participants rate each variant, a set of questions are asked to ensure that they were consistent in using the evaluation criteria. Experts were asked two questions: a. Was the consonance of the three criteria sufficient to judge the acceptability of accompaniment? b. If not, what additional aspects were needed to judge the acceptability? c. The additional aspects that they used to rate are noted. 70 Appendix F.4 contains the evaluation questionnaire used during the study. Appendix F.5 contains the participant observation form that was used to note any descriptive comments given by the participants when they were asked questions. 11.5 Evaluation Evaluation of the results involved statistical test of significance between the sowkhyam ratings of original accompaniment and the algorithmically generated accompaniment. In addition, qualitative comments by participants as well as observations made by the researcher during study sessions were also used to provide additional insights. 71 72 Chapter 12 Study results This chapter describes the main results from the user study and uses them to answer the research questions. The main results of the study are as follows: • RQ1: Does the system produce secondary accompaniment that is rated at least as high as the original accompaniment? Result: the tension algorithm does produce at least one variant accompaniment that received a rating equivalent to the original accompaniment (i.e., no statistically significant difference) • RQ2: Are accompaniments inside the range better (i.e., do variants within the range statistically significant scores than variants outside the range)? Result: there is a statistically significant drop in accompaniment rating that falls beyond a certain tension distance of the original accompaniment for R2 and R3. There is no statistically significant drop in accompaniment rating that falls beyond a certain tension distance of the original accompaniment for R1 • RQ3: Do the ratings for accompaniment decrease as a function of the distance from the tension zero point? Result: There is some weak evidence that suggests that sowkhyam might decrease as a function of distance. Before presenting the results to the main research questions, it is important to address the question about whether the different recordings (R1, R2, R3) had any statistically significant impact on the evaluations. An ANOVA test was run on the different recordings to compare whether there was a significant difference in the way that the recordings were evaluated. The overall accompaniment ratings for all variants of each recording were analyzed to find whether there are significant differences in their ratings. 73 Table 12.1 summarizes the results. The Appendix E.1 presents a complete set of results of the recording ratings. Recording R1 R2 R3 Mean 3.33 2.89 3.02 Table 12.1: Average accompaniment rating per recording Null hypothesis: The average overall accompaniment rating assigned for R1 = R2 = R3. Outcome: p-value = 0.2130 Result: There was NO significant difference between the overall accompaniment rating between the recordings as determined by one-way ANOVA (0.213 > 0.05). This suggests that, with 95% confidence, there was no significant difference in the overall ratings of the different recordings. The sections below provide detailed results for the three main research questions. 12.1 RQ1: does system produce acceptable accompaniment The first research question was: does the system produce secondary accompaniment that is rated at least as high as the original accompaniment? Answer: the tension model is able to produce at least one accompaniment that is rated equivalent to the original accompaniment. The ANOVA test was run on the different variants of each performance recording to compare the original with the algorithmically generated accompaniment. The overall accompaniment ratings for each of the variants (from 0 - 5) were analyzed to find whether there are significant differences in their ratings. Tables 12.2, 12.3, and 12.4 summarize the results of the experiment conducted for each recording. Appendix E.2 presents the complete set of results of the variant ratings. 74 12.1.1 Recording 1 Tables 12.2 presents the accompaniment ratings of the variants of the first recording. Variant V0 V1 V2 V3 V4 V5 Average 3.67 3.83 3.00 3.67 3.00 2.83 Table 12.2: Average rating for variants of recording 1 Null hypothesis: There is no significant difference between the means of the overall accompaniment ratings of all variants (v0, v1, v2, v3, v4, and v5). Outcome: p-value = 0.472 Result: There was no statistically significant difference between the means of the overall accompaniment rating assigned to the different variants as determined by one-way ANOVA (0.047 < 0.05). With 95% confidence, the accompaniments V1, V2, V3, V4 and V5 are not significantly different from the original variant V0. 12.1.2 Recording 2 Tables 12.3 presents the results for the variants of the first recording. Variant V0 V1 V2 V3 V4 V5 Average 3.33 3.67 2.50 3.17 2.50 2.17 Table 12.3: Average rating for variants of recording 2 Null hypothesis: There is no significant difference between the means of the overall accompaniment ratings of all variants (v0, v1, v2, v3, v4, and v5). Outcome: p-value = 0.0586 Result: There was no statistical significant difference between the means of the overall accompaniment rating assigned to the different variants as determined by one-way ANOVA (0.0586 > 0.05). The above results show that there was no significant difference in the rating of the different variants. Although a low p-value suggests that the chances of significant difference are higher. A post-hoc analysis of the data using Bonferonni test was done to find which groups were actually significantly different at 95% confidence. However, the test did not produce any significantly different accompaniment. 75 12.1.3 Recording 3 Tables 12.4 presents the results for the variants of the first recording. Variant V0 V1 V2 V3 V4 V5 Average 3.50 4.00 2.83 3.17 2.50 2.17 Table 12.4: Average rating for variants of recording 3 Null hypothesis: There is no significant difference between the means of the overall accompaniment ratings of all variants (v0, v1, v2, v3, v4, and v5). Outcome: p-value = 0.0593 Result: There was no statistical significant difference between the means of the overall accompaniment rating assigned to the different variants as determined by one-way ANOVA (0.0593 > 0.05). The above results show that there was no significant difference in the rating of the different variants. Although a low p-value suggests that the chances of significant difference are higher. A post-hoc analysis of the data using Bonferonni test was done to find which groups were actually significantly different at 95% confidence. However, the test did not produce any significantly different accompaniment. Looking at the results from the three sections, it can be safely concluded that with 95% confidence, the tension algorithm produces accompaniment that is not significantly different from an original accompaniment. 12.2 RQ2: are accompaniments inside the range better? The second research question was: are accompaniments inside the range better (i.e., do variants within the range get higher scores than variants outside the range)? Answer: for algorithmic accompaniment that falls within a certain tension distance of the original accompaniment, there is no statistically significant drop in rating between the original accompaniment and the algorithmically generated accompaniment (except for v1 and v5). The ANOVA test was run on variants of each recording and it was found that there was significant difference between the accompaniment ratings for some variants accompaniment inside and outside the range. 76 Tables 12.5, 12.6, and 12.7 show the average accompaniment ratings of variants from recordings 1, 2 and 3. The following sections present the results of ANOVA test applied to each recording. 12.2.1 Recording 1 Tables 12.5 shows the results of running the ANOVA on variants of recording 1. The interpretation of the table is explained below: Variant Within range Rating Significant V0 Yes 3.67 No V1 Yes 3.83 No V2 Yes 3.00 No V3 No 3.67 No V4 No 3.00 No V5 No 2.83 No Table 12.5: Accompaniment ratings for variants of recording 1 The first column of the table indicates the variant number and the second column indicates whether the variant is within the range or outside the range. The third column shows the average rating for variants and the last column shows the pairwise significance between the original accompaniment and each variant at 80% confidence interval. The third column of table shows that the individual means of the accompaniment ratings within the range are greater than the those outside the range. It is left to verify whether there is statistical significance difference between the ratings of accompaniment inside the tension range and outside it. The last column of table 12.5 shows the pairwise significance between the original accompaniment and each variant at 80% confidence interval. At 80% confidence level, no accompaniment rating was significantly different. 12.2.2 Recording 2 Table 12.6 shows the results of post-hoc analysis on variants of recording 2. At 80% confidence level, accompaniment rating was significantly different for v1 - v5 and their difference in means was 1.5. 77 Variant Within range Rating Significant V0 Yes 3.33 No V1 Yes 3.67 V5 V2 Yes 2.50 No V3 No 3.17 No V4 Yes 2.50 No V5 No 2.17 No Table 12.6: Accompaniment ratings for variants of recording 2 12.2.3 Recording 3 Tables 12.7 shows the results of post-hoc analysis on variants of recording 2. At 80% confidence level, accompaniment rating was significantly different for v1 - v5 and their difference in means was 1.605. Variant Within range Rating Significant V0 Yes 3.50 No V1 Yes 4.00 V5 V2 Yes 2.83 No V3 Yes 3.17 No V4 No 2.50 No V5 No 2.17 No Table 12.7: Accompaniment ratings for variants of recording 3 The results do not convincingly show that for algorithmic accompaniment that falls within a certain tension range of the original accompaniment, is more valid. The experiments have to be validated later with more participants to draw conclusions about the accompaniment validity. 12.3 RQ3: do ratings decrease as a function of distance The third research question was: do the ratings for accompaniment decrease as a function of the distance from the tension zero point? The answer to this question is a cautious yes. There are two pieces of evidence that provide some weak support: 1) average tension ratings and, 2) the significant differences between different pairs of accompaniment at different confidence intervals. 78 A post-hoc Bonferonni test was conducted on the results of the ANOVA test to find which pairs of variants were statistically different and how did their distance influence their validity. The results for the different recordings are shown in tables 12.8,12.9 and 12.10. 12.3.1 Recording 1 Table 12.8 presents the results of the post hoc-test. The results are provided below: Variant Distance Rating Significant V5 9.750 2.83 NO V4 V2 V3 9.750 2.708 -2.708 3.00 3.00 3.67 NO NO NO V0 NA 3.67 NO V1 NA 3.83 NO Table 12.8: Accompaniment ratings for different variants Although the rating values show a steady decrease based on their distance from the V0, no groups were significantly different at 80% confidence interval. 12.3.2 Recording 2 Table 12.9 presents the results of the post hoc-test. The interpretation of the table is given below: Variant Distance Rating Significant V5 -3.000 2.17 NO V3 -0.833 3.17 NO V0 NA 3.33 NO V1 NA 3.67 V5 V2 0.833 2.50 NO V4 3.000 2.50 NO Table 12.9: Accompaniment ratings for different variants The rating values show a steady decrease for distances below V0, and there was a significant differences in the rating of V0 and V1 at 80% confidence interval. 79 12.3.3 Recording 3 Table 12.10 presents the results of the post-hoc test. Variant Distance Rating Significant V5 -6.600 2.17 NO V3 -1.833 3.17 NO V0 0.0 3.50 NO V1 0.0 4.00 V5 V2 1.833 2.83 NO V4 6.600 2.50 NO Table 12.10: Accompaniment ratings for different variants The rating values show a steady decrease based on distance and there was a significant differences in the rating of V0 and V1 at 80% confidence interval. There is weak evidence from the data that the rating decreases as a function of distance. However, the decrease is not consistent for all distances above and below zero tension point. 12.4 Summary Overall, the results of the study are encouraging: • RQ1: the tension model is able to produce at least one accompaniment that is rated equivalent to the original accompaniment. • RQ2: for algorithmic accompaniment that falls within a certain tension distance of the original accompaniment, there is no statistically significant drop in rating between the original accompaniment and the algorithmically generated accompaniment (except for v1 and v5). • RQ3: there is some weak support for the claim that the sowkhyam of accompaniment decreases as a function of distance, but the decrease in rating as a function of distance is not consistent above and below the zero deviance. 80 Chapter 13 Potential objections This chapter highlights the aspects of the study design that could raise objections about the claims made from this work. There are certain aspects of the study design that could raise objections about the claims made from this work. This section discusses some of those aspects. • Number of recordings and variants. Three recordings (3) and six variants (6) were used in the study. The small number of recordings impacts the findings from the study. The number of variants used for each recording weakens the claims that could be made by the work. Using six variants, it was only so much possible so as to find that 1) there are some accompaniment that not statistically significant from the original, and 2)find approximate boundaries beyond which the validity of the accompaniment significantly reduces. Precise boundaries and exact settings for acceptable accompaniment can be found using a higher sample size and larger number of variants. This is an area for future work. • Location. The study was conducted in public places and there was a chance that participants could have been distracted by other audio or visuals. The chances of audio distractions were minimal as the JVC HA-360 headphones nearly cuts out all other external audio signals in the surrounding. The study did not explicitly control for the effect of visual distractions. However, it was ensured that, whenever participants (or researcher) were disturbed by an external event, they were asked to listen to the recording again before they rated it. 81 • Number of participants. With a small set of participants, it is not possible to make conclusive statements about validating the tension model or finding exact accompaniment boundaries. However, the data provides sufficient evidence to make weaker claims, such as 1) a non-significant difference between the sowkhyam of some of the algorithmically generated accompaniments and original accompaniment, and 2) finding a deviance value beyond which there is significant decrease in the rating of the accompaniment. The small set of participants does not impact the results but offers scope of future work. • School of participants. The different participants came from different schools of percussion playing which could have resulted in certain differences in their evaluation. But this does not impact the results as the ratings were consistent on the key claims. • Age of participants. The participants were similar in their age groups and hence, the sample does not represent the general community of percussion players. Although the age groups were similar, each of them were competent and had atleast eight years of performance experience. Some of them were also graded performers in a reputed music organization (All India Radio). These credentials put the chosen performers on par with the credentials of general community of Carnatic percussion players. It is unlikely that there would be much impact on the results even if the participants were changed to older experts. This is because the notion of sowkhyam (restricted sowkhyam) that was used for evaluation is a fundamental idea in accompaniment evaluation that is intuitively grasped by most performers very early in their training. • Calculation error. A major limitation of the study is that the results are not as clear-cut as they should have been. In an ideal case, all the performance variants should have been generated at equal intervals, such that it allows for comparing the results across findings and providing a rigorous analysis of the tension model. This will be taken up as a part of the future work. • Loudness of secondary. For the study, the loudness was modified to make it easier to hear the secondary. However, loudness is one of the essential components of the tension algorithm and may impact the results. The loudness of the algorithmically generated tension transcriptions were never tampered. Only the loudness level of the kanjira output in the resultant audio was enhanced so as to make it clearer. 82 In terms of the study, it was rigorously ensured that participants did not face a discomfort with these increased levels. Hence, the loudness of the secondary does not impact the results. • No control experiment. Another limitation of the study design was that all the variants were generated and compared relative to a original recording. The results were not compared with a control case such as a 1) system which simply produced accompaniment as a sequence of random hits quantized to a 1/16 or 2) a random generation system that uses a single parameter such as equal lead hit density of lead and secondary to synthesize accompaniment sequences. As a result, one could argue that it is a case of false positive that the variants were statistically indifferent from the original. However, there is some evidence from the results that suggest otherwise. The results contain additional comments provided by the listeners which provides some evidence that they were able to qualitatively distinguish the different accompaniment variants. The analysis of qualitative comments as well as using control cases for comparing the results will be done in the future. 83 84 Chapter 14 Discussion This chapter identifies the main limitations of the research reported here and discusses their impact on the findings from the study. Specifically, the limitations of the tension calculation algorithm, the performance transcription process and the accompaniment system used in the study, are discussed. 14.1 Algorithmic limitations The algorithm cannot currently calculate the actual precise boundary between accompaniment that is sowkhyam and accompaniment that is not. Based on the study, there is evidence that the boundary is somewhere between two ranges – and with a confidence interval of 95% (or 80%). Synthesized accompaniment will be treated as sowkhyam if it is within a certain range (although, it is still possible for it to be acceptable if it is outside the range, but currently it is not possible to determine where that boundary is). Currently, it can only generate accompaniment that is rated as well as the original. In other words, it does not claim that the accompaniment will always be valid – it depends on how valid is the original. This does not weaken the contribution of generating accompaniment as valid as the original. A related point is that the tension range is based on choices made in the original. This does not mean that different ranges would not work. In other words, although the research suggests that anything within high/low tension range will be acceptable, it is possible that the high could be higher (and low could be lower) and still be acceptable. The system/research does not address this. However, this does not impact the results as they are. 85 14.2 Transcription limitations The perceived loudness of the strokes. The loudness levels assigned to a particular transcription were subjective to the researcher. During the resynthesis, these perceived levels are listened to by the participants. At the worst case, wrongly transcribing the loudness levels causes the participant to perceive an entirely different rhythm pattern than how it was played originally. This influences the way that he evaluates the accompaniment. In order to counter the problems with wrong loudness notation, multiple checks were used to ensure accuracy in loudness. First, the loudness were assigned on scale of very low to very high. The levels were decided based on listening multiple times to the original recordings. After the loudness notation, the synthesis from the transcription was listened to ensure that the perceived loudness level was closest (from very-low to very-high) to the actual level in the recording. Using these two steps, impact of the researchers’ subjectivity involved in assigning the perceived loudness levels on the transcriptions was reduced. 14.3 System limitations Several restrictions were imposed on the system that generated alternate valid accompaniment. They are – restricted notion of sowkhyam, limited tempo, small sample of music, one-bar improvisation rule and rhythmic repetition. Although these conditions do not affect the current results, they impact on the applicability of the system to general performance situations. This section discusses the limitations. 1. The system does not handle tempos > 90bpm. This is not a limitation of the system but the limitation of researcher’s ability to transcribe data from performances. This does not impact the system’s ability to generate valid accompaniment at higher speeds. 2. One bar discretionary improvisation. The system does not generate accompaniment that violate the one-bar discretionary improvisations. The ability of the system to generate valid accompaniment was tested only for performance situations that comply to the one-bar discretionary improvisation. This is limitation of the system but it does not impact the current findings and is an avenue for future work. 3. Rhythmic repetition. Using rhythmic repetition, the system assigns same secondary sequences for bars that are rhythmic repetitions of a rhythmically unique bar. This limitation does not affect the current 86 results. But, it restricts the range of valid accompaniment that can be produced by the system. Finding the validity of accompaniment produced without rhythmic repetition is a possible avenue for future work. 4. 16-bar samples. The musical principles that apply to 16 bars of music (specifically, rhythmic repetition) need not hold true for longer segments of music. Though the use of 16 bars does not affect the current results, it limits the applicability of the system to longer segments of music, (typically, one section of a performance). This is a limitation of the system that will be addressed in the future. 5. Quality of synthesis. The quality of the recordings were created such that it was possible to distinctly and clearly hear the kanjira, the mridangam and the vocals. A detailed attention was not paid to the finer aspects of the musical quality of the synthesis (for both the accompaniment and lead). However, quality of synthesis does not impact the results or the evaluation because 1) participants were able to evaluate the accompaniment without being bothered much by the quality of accompaniment, and 2) the restricted notion of sowkhyam already includes a specific quality of music (lead and secondary strokes) in its evaluation. However, providing realistic performance synthesis will ease the evaluation time for the participants. 87 88 Chapter 15 Future work There are a number of ways that the system (and study) could be improved in the future. • Broadly, the most important changes are to go through and check that the system generates study-ready variants as intended – and to rerun the study with such variants (and more participants). • Beyond that, there are a number of interesting questions that would involve making the system more sophisticated – and learning about the response for different kinds of participants (e.g., age, different musical schools). In spite of the limitations, this work makes a small contribution to the domain of improvisational music accompaniment systems. An abstract perceptual notion of musical acceptance (sowkhyam) in Carnatic music was formalized to generate alternate musically valid accompaniments. Subsequently, the formalization (the tension model) was verified under a controlled setting. Going forward, the potential application of the tension model to understand and generate other secondary percussion activities in Carnatic music as well other improvised music genres seems promising. 89 90 Bibliography Assayag, G. et al. (2006). “Omax brothers: a dynamic topology of agents for improvisation learning”. In: Proceedings of the 1st ACM workshop on Audio and music computing multimedia, Santa Barbara, California, USA, pp. 125–132. url: http : / / dl . acm . org / citation . cfm ? id = 1178742 (visited on 10/02/2012). Cabral, Giordano, Jean-Pierre Briot, and François Pachet (2006). “Incremental parsing for real-time accompaniment systems”. In: Special Track on Artificial Intelligence in Music and Art (AIMA’2006) Florida Artificial Intelligence Research Society Conference 19th International FLAIRS’2006 Conference, Melbourne Beach, FL, USA, pp. 227–230. Dahia, M. et al. (2004). “Using patterns to generate rhythmic accompaniment for guitar”. In: Proceedings of Sound and Music Computing, Paris. Donze, Alexandre et al. (2013). Control Improvisation with Application to Music. Tech. rep. DTIC Document. Ganascia, Jean-Gabriel, Geber Ramalho, and Pierre-Yves Rolland (1999). “An artificially intelligent jazz performer”. In: Journal of New Music Research 28, pp. 105–129. Gifford, Toby M and Andrew R Brown (2006). “The Ambidrum: automated rhythmic improvisation”. In: Medi(t)ations: computers/music/intermedia - The Proceedings of Australasian Computer Music Conference 2006, Adelaide, pp. 44–49. Hoover, Amy K, Paul A Szerlip, and Kenneth O Stanley (2011). “Generating musical accompaniment through functional scaffolding”. In: Proceedings of the 8th Sound and Music Computing Conference (SMC-2011), Padova, Italy. Hoover, AmyK., MichaelP. Rosario, and KennethO. Stanley (2008). “Scaffolding for interactively evolving novel drum tracks for existing songs”. English. In: Applications of Evolutionary Computing. Ed. by Mario Giacobini et al. Vol. 4974. Lecture Notes in Computer Science. Springer Berlin Heidelberg, pp. 412–422. isbn: 978-3-540-78760-0. url: http: //dx.doi.org/10.1007/978-3-540-78761-7_44. 91 Kitani, K. M. and H. Koike (2010). “Improvgenerator: online grammatical induction for on-the-fly improvisation accompaniment”. In: Proceedings of the 2010 Conference on New Interfaces for Musical Expression (NIME 2010), Sydney, Australia, pp. 469–472. url: http://www.nime. org/proceedings/2010/nime2010_469.pdf (visited on 09/08/2012). Lewis, G. E. (2000). “Too many notes: computers, complexity and culture in Voyager”. In: Leonardo Music Journal 10, pp. 33–39. url: http : //www.mitpressjournals.org/doi/abs/10.1162/096112100570585 (visited on 10/09/2012). Pachet, F. (2002a). “The Continuator: musical interaction with style”. In: Proceedings of the ICMA, Goteborg, Sweden, pp. 211–218. url: http: //ehess.modelisationsavoirs.fr/atiam/biblio/Pachet- ICMC02f.pdf (visited on 10/02/2012). Pachet, Francois (2002b). “Interacting with a musical learning system: the Continuator”. English. In: Music and Artificial Intelligence. Ed. by Christina Anagnostopoulou, Miguel Ferrand, and Alan Smaill. Vol. 2445. Lecture Notes in Computer Science. Springer Berlin Heidelberg, pp. 119–132. isbn: 978-3-540-44145-8. doi: 10.1007/3-54045722-4_12. url: http://dx.doi.org/10.1007/3-540-45722-4_12. Rowe, R (1992). “Machine listening and composing with Cypher”. In: Computer Music Journal 16.1, pp. 43–63. Van Nort, D., J. Braasch, and P. Oliveros (2009). “A system for musical improvisation combining sonic gesture recognition and genetic algorithms”. In: Proceedings of Sound and Music Computing Conference, Porto, Portugal, pp. 131–136. url: http : / / smc2009 . smcnetwork . org/programme/pdfs/316.pdf (visited on 12/27/2012). Van Nort, Doug, Jonas Braasch, and Pauline Oliveros (2012). “Mapping to musical actions in the FILTER system”. In: Proceedings of the 2012 International conference on New interfaces for musical expression (NIME), Ann Arbor, USA. Young, Michael W and Oliver Bown (2010). “Clap-along: a negotiation strategy for creative musical interaction with computational systems”. In: Proceedings of the International Conference on Computational Creativity 2010, Lisbon, Portugal, pp. 215–222. url: http : // eprints . gold.ac.uk/4684/ (visited on 10/11/2012). Zicarelli, David (1987). “M and Jam factory”. English. In: Computer Music Journal 11.4, pp. 13–29. issn: 01489267. url: http : / / www . jstor . org/stable/3680237. 92 Appendices 93 Appendix A Key Terms This section provides a brief summary of the definitions of the different concepts/terms invented, introduced, or used as a part of this thesis work. A.1 Terms: tension model • Accompaniment sequence. One bar sequence of secondary hits played for a duration 8 beats. • Accompaniment subsequence. Portion of an accompaniment sequence that is played for durations less than 8 beats. • Average tension range. The average tension range for a bar is a mathematical interval that contains valid secondary sequences. • Bar activity. It is the sum of the number of hits played in a lead or accompaniment sequence. • Beat tension score. It is the sum of hit tension scores calculated for hits played on each whole beat and the next half beat, in an lead/accompaniment sequence. • Beat tension range. It is a mathematical interval, calculated for one beat, with the beat tension scores of the lead and the secondary hits played on that beat as the boundary points. • Diction. Type of stroke played on the lead and secondary drums (eg, ta, tum, num, dhin, etc). • Hit tension score. Numerical value that is computed for every drum hit, as the weighted product of diction, note duration and loudness. • Lead sequence. One bar sequence of lead hits played for a duration 8 beats. • Loudness. Perceived loudness level on a scale from 0.0 (pause/rest) to 1.0 (very high). 95 • Note duration. It is the duration of a drum hit between its onset and offset expressed as number of beats. • Rhythmic repetition structure. Abstract representation of a sequence of lead/secondary hits that highlights the unique and repeated accompaniment sequences. • Swapped lead sequence. Sequence of beat tension scores containing the highest of each beat tension scores for a given lead and secondary sequence. • Swapped secondary sequence. Sequence of beat tension scores containing the lowest of each beat tension scores for a given lead and secondary sequence. • Tension. Tension is the formal attribute of the relationship between two drum hits, expressing a degree of consonance. • Tension deviation. Numerical distance of a secondary sequence from a tension zero point. • Tension range (bottom). Lesser of the two values in the average tension range. • Tension span. Numerical difference between tension range (top) and tension range (bottom). • Tension range (top). Greater of the two values in the average tension range. • Tension zero point (TZP). The tension zero point for a lead and accompaniment sequence (i.e., a bar) is the point of maximum sowkhyam. It is computed as the arithmetic mean of tension scores of swapped lead sequence. A.2 Terms: Carnatic music • Sowkhyam. A perceptual notion that expresses the degree of consonance in the music (produced by one performer, some performers or all performers). • Restricted sowkhyam. Sowkhyam considered independent of the performers (and their creativity). • Anupallavi. It is second section of a musical piece. The first section is called the pallavi. The sections after the anupallavi are called charanams. • Avarthanam. A musical interval, made of several bars, that signifies the end of a melodic line. In this thesis, the duration of an avarthanam is eight bars. 96 • Charanam. The sections of a musical piece that follow the anupallavi. • Kanakkus. Sequences of lead/secondary hits that progressively increase or decrease in length. • Kanjira. It is single headed drum and is played as the secondary percussive instrument. • Kanjirist. Performer who plays the kanjira. • Mridangam. It is double sided drum and is played as the lead percussive instrument. • Mridangist. Performer who plays the mridangam. • Nadam. The musical quality of the sound produced from an instrument (usually the mridangam). • Pallavi. The first section of a musical piece. • Restricted sowkhyam. Sowkhyam considered independent of the performers (and their creativity). • Sowkhyam. A perceptual notion that expresses the degree of consonance in the music (produced by one performer, some performers or all performers). 97 98 Appendix B Enumerating the accompaniment sequences The number of unique values that the triplet (diction, note duration, loudness) can take for one beat is: number of unique triplet values of one single hit of 1 beat duration + number of unique triplet values of two hits of 0.5 beat duration. The number of unique triplet values of one hits of 1-beat and two hits of 0.5-beat duration are: Algorithm 3 Unique 1-hit and 2-hit triplets 5(diction) * 5(loudness) = 25 (unique 1-hit triplets) 25(unique 1-hit triplets) * 25(unique 1-hit triples) = 625 (unique 2-hit triples) The total number of unique triplet values for one beat is: Algorithm 4 Unique 1-beat triplets 625 (unique 2-hit triples) + 25 (unique 1-hit triples) = 650(unique 1-beat triples) The total number of unique triplet values for eight beats is: Algorithm 5 Unique 8-beat triplets 650 (unique 1-beat triples) 8(beats) = 6508 99 100 Appendix C Assigning perceptual scores This section explains the rationale behind assigning the perceptual weights for the diction, note duration and loudness values used in the lead and secondary transcriptions. C.1 Diction Tables C.1 and C.2 shows the weights assigned to different strokes played on the lead and secondary drums. The symbol "." is used to denote a pause and is common symbol to both the lead and the secondary. Lead hit Perceptual value bheem tham dheem dham dhin num tha thom thi ri ki . 6 4.5 4 3.5 4.0 3 3.0 2.0 2.0 1.0 1.0 0.0 Table C.1: Weights for lead strokes 101 Secondary hit Perceptual value tumki tum ta te . 6.0 4.0 3.0 2.0 0.0 Table C.2: Weights for secondary strokes The rationale behind assigning these values to the drum hits is that they can be considered analogous to musical pitch. Some hits sound perceptually high and some hits sound perceptually low. Although the increase or decrease is not as steady as musical notes, there is a perceptual increase or decrease. A rigorous procedure to arrive at some objective correlates of the corresponding to perceptual values was not pursued. The main goal was to establish a range wherein it was possible to assign weights at equal intervals. A brief description of the procedure is given below: First, the perceptual range was identified. The pitched hit “bheem” is perceptually the highest and “ki and ri " are perceptually the lowest. “Bheem” is almost one semi-tone higher than the hits - “tham, dhin, dheem, dham, num”. “ki and ri” and are almost one semi-tone lower than “num”. The remaining notes were assigned at equal intervals between “bheem and num” and “num and ki,ri”. The secondary hits were assigned based on perceptual analogues with the lead hits. The secondary hit of “tumki” was timbrally most rich hit and corresponded to “bheem”. The hit “ta” corresponds to the hits “tha and num” of the lead. The hit “tum” corresponds to the hit “dhin” of the lead. The secondary hit “te” corresponds to the hit “thi” of the lead. C.2 Loudness The specific loudness levels used in the transcriptions are based on the loudness levels that the author perceived when listening to the transcription. The relative loudness scale used for interpreting the perceptual loudness of hits is shown in Table C.3. Tables C.3 shows the different weights assigned to perceptual loudness levels observed by the researcher in the recordings. The main goal here was to find perceptually distinguishable loudness levels and find a consistent 102 Perceived level Numerical Value very high 1.000 high 0.750 medium 0.500 low 0.250 very low 0.125 pause/rest 0.000 Table C.3: Perceived loudness of lead and secondary hits method to weight them. There were initially four levels of loudness (excluding 0.0) that the researcher could perceptually discern from the audio recordings. They were assigned at equal intervals from 0.0 to 1.0. The levels were 0.25, 0.5, 0.75, and 1.0. Later another loudness level between 0.25 and 0.0 was discovered and was quantified as 0.125. Tables C.4 shows the different weights assigned to perceptual loudness levels observed by the researcher in the recordings. Loudness level 1.0 0.75 0.5 0.25 0.125 0.0 Weights 1.25 1.0 0.625 0.5 0.3125 0.0 Table C.4: Weights for loudness As before, a rigorous procedure to arrive at some objective correlates of the corresponding to perceptual loudness was not pursued. The main goal here was to find perceptually distinguishable loudness levels and find a consistent method to score them. A brief description of the procedure for finding the loudness levels and assigning the weights is given below: There were initially four levels of loudness (excluding 0.0) that the researcher could perceptually discern. On a scale of 0 to 1.0, they assigned them equal intervals of 0.25. They were 0.25, 0.5, 0.75 and 1.0. Later another loudness level between 0.25 and 0.0 was discovered and was assigned as 0.125. Barring 0.0, every two unit increase in the loudness level doubles the weight value. 103 C.3 Note duration Tables C.5 shows the different weights assigned to hits depending the duration for which they were played. Duration (beats) Weights 0.5 1.0 3.0 1.0 Table C.5: Weights for note duration Unlike diction and loudness, the weights were assigned by trial and error. The initial impulse was to weight the 0.5 as twice the 1.0 beats. But since the values did not reflect to an expected perceptual distance when applied to accompaniment sequences, the weights of 0.5 and 1.0 beats were assigned as 3.0 and 1.0. 104 Appendix D Transcription: internal representation The tables show the transcriptions of the lead and the secondary for recording 1. Every bar contains the lead and secondary hits for 8 beats. In every bar, the lead hits, their loudness, the secondary hits and their loudness are shown in the order that they are mentioned. • Each table contains a transcription of the lead and secondary playing from a performance recording. • The top most row of the transcription shows the beats. • Each cell in the first column is divided into 4 sub-cells indicating the diction and loudness of lead and secondary. • In each cell, from column 2, the following order of sub-cells is followed: lead’s diction, lead’s loudness, secondary’s diction, and secondary’s loudness. • Double hits (and their corresponding loudness) are interpreted in brackets. • A diction/loudness value aligned with the beat number indicates that the hit was played on the whole beat. • A diction/loudness value aligned between two beat numbers indicates that the hit was played on the half beat. 105 Recording 1 Table D.1 shows the sixteen bar transcription of the lead and the secondary for recording 1. Bar/Beat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 num 1.0 tum 0.5 dhin 0.5 ta 0.25 dhin 0.5 tum 0.25 dhin 0.5 ta 0.25 dhin 0.5 te 0.25 dhin 0.5 ta 0.25 bheem 0.5 tum 0.5 . 0.25 . 0.0 . 0.0 . 0.0 . 0.0 . 0.0 num 0.125 tum 0.125 dhin 0.5 tum 0.5 num 0.5 ta 0.5 thi 0.5 tum 0.25 dhin 0.5 tum 0.5 dhin 0.25 ta 0.25 num 0.5 tum 0.25 thi 0.75 ta 0.25 dhin 0.25 tum 0.125 dhin 0.5 tum 0.5 num 0.75 ta 0.25 thi 0.5 tum 0.125 dhin 0.5 tum 0.5 dhin 0.25 ta 0.25 num 0.5 tum 0.25 thi 0.75 ta 0.25 dhin 0.25 tum 0.125 dhin 0.5 tum 0.5 num 0.75 ta 0.25 thi 0.5 tum 0.125 dhin 0.5 tum 0.5 dhin 0.25 ta 0.25 (num thi) (0.5 0.5) (tum ta) (0.25 0.25) (num thi) (0.75 0.75) (te ta) (0.5 0.5) num 0.125 tum 0.125 dhin 0.5 tum 0.5 num 0.5 ta 0.25 thi 0.5 tum 0.125 dhin 0.5 tum 0.5 dhin 0.25 (tum ta) (0.25 0.25) num 0.5 tum 0.25 thi 0.75 ta 0.25 dhin 0.25 tum 0.125 dhin 0.5 tum 0.5 num 0.75 ta 0.25 thi 0.5 tum 0.125 dhin 0.5 tum 0.5 dhin 0.25 ta 0.25 (num thi) (0.5 0.5) tum 0.25 (num thi) (0.75 0.75) ta 0.25 num 0.125 tum 0.125 dhin 0.5 tum 0.5 num 0.25 ta 0.25 thi 0.25 tum 0.125 num 0.75 tum 0.5 dhin 0.25 ta 0.25 num 0.25 tum 0.25 thi 0.25 ta 0.25 num 0.75 tum 0.125 dhin 0.25 tum 0.5 num 0.25 ta 0.25 thi 0.25 tum 0.125 num 0.75 tum 0.5 dhin 0.25 ta 0.25 (num thi) (0.75 0.75) (ta te) (0.125 0.125) (num thi) (0.75 0.75) (ta te) (0.5 0.5) num 0.75 ta 0.5 dheem 0.5 tum 0.125 (num thi) (0.5 0.5) ta 0.25 (num thi) (0.25 0.25) tum 0.125 num 0.75 tum 0.25 dhin 0.5 ta 0.5 (num thi) (0.5 0.5) tum 0.125 (num thi) (0.25 0.25) ta 0.25 num 0.75 ta 0.5 dhin 0.5 tum 0.125 (num thi) (0.5 0.5) (ta te) (0.5 0.5) (num thi) (0.25 0.25) (ta te) (0.125 0.125) num 0.75 ta 0.5 dhin 0.5 tum 0.125 (num thi) (0.75 0.75) (ta te) (0.5 0.5) (num thi) (0.25 0.25) (ta te) (0.125 0.125) num 0.75 ta 0.5 dheem 0.5 tum 0.125 (num thi) (0.5 0.5) (ta te) (0.5 0.5) (num thi) (0.25 0.25) (ta te) (0.125 0.125) num 0.75 ta 0.5 dhin 0.5 tum 0.125 (num thi) (0.5 0.5) (ta te) (0.5 0.5) (num thi) (0.25 0.25) (ta te) (0.125 0.125) num 0.75 ta 0.5 dhin 0.5 tum 0.125 (num thi) (0.5 0.5) (ta te) (0.5 0.5) (num thi) (0.25 0.25) (ta te) (0.125 0.125) num 0.75 ta 0.5 dhin 0.5 tum 0.125 (num thi) (0.75 0.75) (ta te) (0.5 0.5) (num thi) (0.25 0.25) (ta te) (0.125 0.125) num 0.75 ta 0.5 dheem 0.5 tum 0.125 (num thi) (0.5 0.5) (ta te) (0.5 0.5) (num thi) (0.25 0.25) (ta te) (0.125 0.125) num 0.75 ta 0.5 dhin 0.5 tum 0.125 (num thi) (0.5 0.5) (ta te) (0.5 0.5) (num thi) (0.5 0.5) (ta te) (0.125 0.125) num 0.75 ta 0.5 dhin 0.5 tum 0.125 num 0.5 (ta te) (0.5 0.5) thi 0.5 (ta te) (0.125 0.125) thi 1.0 ta 0.5 dhin 0.5 tum 0.125 num 0.75 (ta te) (0.5 0.5) thi 0.75 (ta te) (0.125 0.125) thi 1.0 ta 0.5 dhin 0.75 tum 0.125 num 0.75 ta 0.25 thi 0.75 tum 0.125 thi 1.0 tum 0.25 dhin 0.75 ta 0.5 num 0.75 tum 0.5 thi 0.75 ta 0.25 thi 1.0 tum 0.125 dhin 0.75 tum 0.5 num 0.75 ta 0.25 thi 0.75 tum 0.125 thi 1.0 tum 0.5 dhin 0.75 ta 0.5 Table D.1: Transcription of recording 1, bars 1-16 106 Recording 2 Table D.2 shows the sixteen bar transcription of the lead and the secondary for recording 2. Bar/Beat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 dhin 0.5 tum 0.5 . 0.0 . 0.0 num 0.5 . 0.0 dhin 0.25 tum 0.5 . 0.0 . 0.0 thi 0.25 ta 0.0 dhin 0.5 te 0.125 thi 0.25 ta 0.25 dhin 0.5 ta 0.25 . 0.0 . 0.0 num 0.5 ta 0.125 dhin 0.25 ta 0.25 . 0.0 . 0.0 thi 0.25 ta 0.125 dhin 0.5 te 0.125 thi 0.25 ta 0.125 dhin 0.5 tum 0.5 . 0.0 . 0.0 num 0.5 . 0.0 dhin 0.25 tum 0.5 . 0.0 . 0.0 thi 0.25 ta 0.0 dhin 0.5 te 0.125 thi 0.25 ta 0.25 ri 0.5 ta 0.25 thi 0.25 . 0.0 num 0.5 . 0.0 dhin 0.25 ta 0.25 . 0.0 . 0.0 . 0.0 ta 0.125 (thi ri) (0.25 0.25) te 0.125 (ki ta) (0.25 0.25) ta 0.25 (num thom) (0.25 0.25) tum 0.25 (thom thi) (0.25 0.25) ta 0.25 num 0.5 tum 0.5 thom 0.25 tum 0.25 thi 0.25 ta 0.25 num 0.5 tum 0.5 thom 0.25 tum 0.25 thi 0.25 ta 0.25 num 0.5 tum 0.25 thom 0.5 . 0.0 thom 0.5 ta 0.25 dheem 0.5 ta 0.5 . 0.0 . 0.0 thom 0.5 ta 0.25 dheem 0.5 tum 0.5 . 0.0 ta 0.25 num 0.5 tum 0.25 thom 0.5 . 0.0 thom 0.5 ta 0.25 dheem 0.5 ta 0.5 . 0.0 . 0.0 thom 0.5 ta 0.25 dheem 0.5 tum 0.5 . 0.0 ta 0.25 num 0.5 tum 0.25 thom 0.5 . 0.0 thom 0.5 ta 0.25 tham 0.5 te 0.5 . 0.0 . 0.0 thom 0.5 ta 0.25 dheem 0.5 tum 0.5 . 0.0 . 0.25 dhin 0.5 tum 0.5 . 0.0 . 0.0 num 0.5 . 0.0 dhin 0.25 tum 0.5 . 0.0 . 0.0 thi 0.25 ta 0.25 dhin 0.5 tum 0.25 thi 0.25 ta 0.25 ri 0.5 te 0.25 thi 0.25 ta 0.25 num 0.5 ta 0.125 dhin 0.5 ta 0.25 . 0.0 . 0.0 thi 0.25 ta 0.25 dhin 0.5 te 0.25 thi 0.25 ta 0.25 ri 0.5 tum 0.5 thi 0.25 . 0.0 num 0.5 tum 0.5 dhin 0.5 tum 0.25 . 0.0 . 0.0 thi 0.25 ta 0.25 dhin 0.5 tum 0.5 thi 0.25 ta 0.25 dhin 0.5 ta 0.125 thi 0.25 . 0.0 num 0.5 ta 0.125 dhin 0.25 ta 0.125 . 0.0 . 0.0 . 0.25 ta 0.125 tham 0.5 te 0.125 . 0.0 ta 0.125 . 0.0 tum 0.5 . 0.0 . 0.0 tham 0.25 . 0.0 tham 0.5 . 0.0 . 0.0 . 0.0 tham 0.25 . 0.0 tham 0.5 . 0.0 . 0.0 . 0.0 tham 0.5 . 0.0 . 0.0 . 0.0 num 0.5 tum 0.25 dhin 0.25 tum 0.25 . 0.0 tum 0.5 thi 0.25 ta 0.25 num 0.5 (ta te) (0.25 0.25) thi 0.25 (ta te) (0.25 0.25) dhin 0.5 tum 0.5 . 0.0 . 0.0 num 0.5 tum 0.5 dhin 0.25 tum 0.25 . 0.0 . 0.0 . 0.25 ta 0.25 dhin 0.5 tum 0.5 thi 0.25 ta 0.25 dhin 0.5 ta 0.125 . 0.0 . 0.0 num 0.5 ta 0.125 dhin 0.25 ta 0.125 . 0.0 . 0.0 . 0.25 ta 0.125 dham 0.5 te 0.125 dhin 0.5 ta 0.125 Table D.2: Transcription of recording 2, bars 1-16 107 Recording 3 Table D.3 shows the sixteen bar transcription of the lead and the secondary for recording 3. Bar/Beat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 dhin 0.5 . 0.0 . 0.0 . 0.0 num 0.25 . 0.0 dhin 0.5 . 0.0 . 0.0 . 0.0 thi 0.25 . 0.0 num 0.5 . 0.0 thi 0.25 . 0.0 dhin 0.5 . 0.0 . 0.0 . 0.0 num 0.25 . 0.0 dhin 0.5 . 0.0 . 0.0 . 0.0 thi 0.25 . 0.0 num 0.5 . 0.0 thi 0.25 . 0.0 dhin 0.5 . 0.0 . 0.0 . 0.0 num 0.25 . 0.0 dhin 0.5 tum 0.25 . 0.0 . 0.0 thi 0.25 . 0.0 num 0.5 tum 0.25 thi 0.25 . 0.0 dhin 0.25 tum 0.5 . 0.0 ta 0.25 num 0.25 tum 0.25 dhin 0.5 tum 0.5 . 0.0 ta 0.25 thom 0.25 tum 0.25 dham 0.5 tum 0.5 dhin 0.25 ta 0.25 dham 0.5 tum 0.5 dhin 0.25 ta 0.25 num 0.5 tum 0.25 dhin 0.25 tumki 0.5 thi 0.125 ta 0.25 dhin 0.25 tum 0.25 num 0.5 tum 0.5 thi 0.25 ta 0.25 dhin 0.25 tum 0.5 . 0.0 ta 0.25 num 0.25 tum 0.25 dhin 0.5 tumki 0.5 . 0.0 ta 0.25 thom 0.25 tum 0.25 dham 0.5 tum 0.5 dhin 0.25 ta 0.25 num 0.5 tum 0.5 thi 0.25 ta 0.25 num 0.5 tum 0.25 dhin 0.25 tumki 0.5 . 0.125 ta 0.25 thi 0.25 tum 0.25 num 0.5 tum 0.5 thi 0.25 ta 0.25 dhin 0.25 tum 0.5 . 0.0 ta 0.25 num 0.25 tum 0.25 dhin 0.5 tumki 0.5 . 0.0 ta 0.25 thi 0.25 tum 0.25 num 0.5 tum 0.5 . 0.0 ta 0.25 tham 0.5 tumki 0.5 . 0.0 . 0.0 . 0.0 . 0.0 tham 0.25 tumki 0.5 . 0.0 . 0.0 . 0.0 . 0.0 tham 0.25 tumki 0.5 . 0.0 . 0.0 . 0.0 tum 0.5 . 0.0 ta 0.25 . 0.0 tum 0.25 tham 0.5 tum 0.5 . 0.0 . 0.0 thi 0.25 ta 0.25 dhin 0.5 tum 0.5 . 0.0 . 0.0 num 0.5 tum 0.5 thi 0.25 ta 0.25 dhin 0.25 tum 0.25 dhin 0.5 tum 0.5 thi 0.25 . 0.0 dhin 0.25 ta 0.25 dhin 0.5 tum 0.5 thi 0.25 . 0.0 num 0.5 tum 0.5 (thi ri) (0.25 0.25) (ta te) (0.25 0.25) (ki ta) (0.25 0.25) (ta te) (0.25 0.25) (ki thi) (0.25 0.25) tum 0.25 tha 0.5 tum 0.5 tham 0.5 ta 0.5 thom 0.5 tum 0.5 thi 0.25 ta 0.25 dhin 0.25 tum 0.25 dhin 0.5 tumki 0.5 thi 0.25 . 0.0 dhin 0.25 ta 0.25 dhin 0.5 tum 0.5 thi 0.25 ta 0.25 num 0.5 tum 0.5 thi 0.25 ta 0.25 dhin 0.25 tum 0.25 dhin 0.5 tumki 0.5 thi 0.25 ta 0.25 dhin 0.25 tum 0.25 dhin 0.5 tum 0.5 thi 0.25 ta 0.25 num 0.5 tum 0.5 thi 0.25 ta 0.25 dhin 0.25 tum 0.25 dhin 0.5 tumki 0.5 thi 0.25 . 0.0 dhin 0.25 ta 0.25 dhin 0.5 tum 0.5 thi 0.25 ta 0.25 num 0.5 tum 0.5 thi 0.25 ta 0.25 dhin 0.25 ta 0.25 dheem 0.5 tumki 0.75 dham 0.5 ta 0.25 dhin 0.5 tum 0.25 dhin 0.25 tum 0.75 thi 0.25 ta 0.25 (ri thi) (num tha) (0.25 0.25) (0.25 0.25) (ta te) (ta te) (0.25 0.25) (0.25 0.25) Table D.3: Transcription of recording 3, bars 1-16 108 D.1 Transcription: internal representation The transcription was represented internally as follows: 1. The transcription is an array of symbols representing a musical performance. The index of a symbol of the array denotes its time of occurrence in the musical performance. The symbols themselves map to the musical notes (or other perceptual musical entities) of the performance. 2. The transcription is simultaneously represented in different dimensions using three sub-arrays. The different are diction, note duration and loudness. The diction sub-array indicates the type of strokes played on the lead/secondary drum. The note duration refers to the lapse in time between the onset and offset of a drum stroke, expressed in musical unit of time (beats). The loudness refers to the intensity of the drum strike. 3. At each dimension, the sub-arrays of diction, note duration and loudness are transformed into sub-arrays of numerical values based on a (key,value) mapping. The key is a value of diction, note duration or loudness (expressed in their units). The value is a number (expressed in tension). 4. The three sub-arrays are merged to form a single array that represents a musical performance in terms of tension. This single array is created by multiplying the numerical weights associated with the same index positions in the sub-arrays of diction, note duration and loudness. 5. The single array is used for further analysis (of tension scores, tension delta, etc). 109 110 Appendix E Results This section provides the complete results of the study. The results include accompaniment ratings for each performance recording and accompaniment ratings for each variants of a performance recording. E.1 Complete results for recordings The accompaniment ratings given in tables E.1 form the basis of the average ratings for performance recordings. In each table, the columns P1 to P6 show the rating given by each participant for each variant of a recording. The rows Vi indicates the variant number of a performance recording that was rated. The last row shows the mean rating given for each variant. The last column shows the mean rating given by each participant over all recordings. The last cell of the table shows the mean accompaniment rating calculated for the entire recording. Participant P1 P2 P3 P4 P5 P6 Mean V0 2 4 3 5 4 4 3.67 V1 3 3 5 3 5 4 3.83 V2 1 2 2 5 4 4 3.00 V3 4 4 3 4 3 4 3.67 V4 3 3 4 3 2 3 3.00 V5 1 2 5 2 3 4 2.83 Mean 2.33 3.00 3.67 3.67 3.50 3.83 2.78 111 Participant P1 P2 P3 P4 P5 P6 Mean V0 2 3 2 5 4 4 3.33 V1 2 4 4 4 4 4 3.67 V2 3 3 2 2 3 2 2.50 V3 2 4 4 4 3 2 3.17 V4 2 3 4 2 2 2 2.50 V5 1 4 2 2 2 2 2.17 Mean 2.00 3.50 3.0 3.17 3.00 2.67 2.89 Participant P1 P2 P3 P4 P5 P6 Mean V0 3 4 2 3 4 5 3.50 V1 3 5 5 4 3 4 4.00 V2 4 4 3 2 2 2 2.83 V3 3 4 3 5 2 2 3.17 V4 3 4 1 3 2 2 2.50 V5 3 4 2 1 1 2 2.17 Mean 3.17 4.17 2.67 3.00 2.33 2.83 3.02 Table E.1: Accompaniment ratings for recordings 1, 2, and 3 112 E.2 Complete results for variants The accompaniment ratings given in tables E.2 form the basis of the average ratings of the variants. For each table, the columns P1 to P6 show the rating given by each participant for each variant of a recording. The rows Vi indicates the variant number that was rated. The last column shows the mean rating given by all participants for a variant. Participant P1 P2 P3 P4 P5 P6 Mean R1-V0 R2-V0 R3-V0 2 2 3 4 3 4 3 2 2 5 5 3 4 4 4 4 4 5 3.67 3.33 3.50 Participant P1 P2 P3 P4 P5 P6 Mean R1-V1 R2-V1 R3-V1 3 2 3 3 4 5 5 4 5 3 4 4 5 4 3 4 4 4 3.83 3.67 4.00 Participant P1 P2 P3 P4 P5 P6 Mean R1-V2 R2-V2 R3-V2 1 3 4 2 3 4 2 2 3 5 2 2 4 3 2 4 2 2 3.00 2.50 2.83 Participant P1 P2 P3 P4 P5 P6 Mean R1-V3 R2-V3 R3-V3 4 2 3 4 4 4 3 4 3 4 4 5 3 3 2 4 2 2 3.67 3.17 3.17 Participant P1 P2 P3 P4 P5 P6 Mean R1-V4 R2-V4 R3-V4 3 2 3 3 3 4 4 4 1 3 2 3 2 2 2 3 2 2 3.00 2.50 2.50 Participant P1 P2 P3 P4 P5 P6 Mean R1-V5 R2-V5 R3-V5 1 1 3 2 4 4 5 2 2 2 2 1 3 2 1 4 2 2 2.83 2.17 2.17 Table E.2: Accompaniment ratings for variants 0-6 113 114 Appendix F Study documents 115 F.1 Session checklist Equipment: Laptop and headphones. Adjust volume levels to 40 percent on the laptop. Listen to system output and ensure that secondary is heard on the left headphone and lead and melody are heard on the right headphone Materials: 1. A print out, listing the sequence for presenting the six performances (ensuring counter-balance). 2. Study documents: • Researcher session script • Demographic questionnaire • Participant observation form • Definition sheet (for participant) • System evaluation form • Recording sequencing document for each participant 3. Three (3) performance recordings. For each performance recording there are six (6) variants: (a) One (1) synthesized performance with original accompaniment [R00.wav] (b) One (1) synthesized performance with same tension scores as original accompaniment [R-01.wav] (c) Two (2) synthesized performances within tension range and [plus - R-02.wav, minus - R-03.wav] two (2) synthesized performances outside tension range [plus - R04.wav, minus - R-05.wav]. Protocol: 1. Explain what is expected during the session 2. Gather demographic information: [Researcher reads the questions and completes the form] 116 3. Explain definitions Hand the participant the definition sheet (with the definitions and rating criteria) and then explain the definitions (timing, accents, and strokes) and rating criteria 4. For each performance recording: (a) Remind participant to keep the definitions (timing, accents, strokes) and rating criteria in mind (ask them to look at the definition/ranking sheet while they are listening) (b) Play the variant (c) Ask them the questions (creativity, quality) in the script to check whether they are violating instructions (d) If they ARE violating, play again (without asking them to rate) (e) If not, ask them to rate AND ask them if sokhyman is sufficient (and note if they indicate any additional aspects) [Researcher will complete the form based on participant response]. (f) Move on to next variant (do NOT then probe further about creativity, replay, etc.) F.2 Demographic questionnaire Date: Name of participant: Participant age: Gender of participant: Years of relevant performance experience: Relevant musical degrees (if any): Other relevant experience (e.g., teaching, composing): 117 F.3 Participant variant sequence Table F.1 shows the sequence of recordings and variants presented to each participant in the study. Participant Recording sequence Variant sequence Participant 1 R1 R2 R3 015243 504132 453021 Participant 2 R1 R3 R2 120354 504132 015243 Participant 3 R2 R1 R3 120354 231405 015243 Participant 4 R2 R3 R1 231405 120354 342510 Participant 5 R3 R1 R2 231405 453021 342510 Participant 6 R3 R2 R1 342510 453021 504132 Table F.1: Recording and variant sequences 118 F.4 Evaluation sheet Variant 1 sowkhyam of timing and synchronization 1 (low) �—�—�—�—� 5 (high) sowkhyam of accents 1 (low) �—�—�—�—� 5 (high) sowkhyam of strokes 1 (low) �—�—�—�—� 5 (high) sowkhyam of accompaniment as a whole 1 (low) �—�—�—�—� 5 (high) Was the "sowkhyam" of the timing, accents and strokes sufficient to judge the acceptability of the accompaniment? � Yes � No If not, what additional aspects did you use? Variant 2 sowkhyam of timing and synchronization 1 (low) �—�—�—�—� 5 (high) sowkhyam of accents 1 (low) �—�—�—�—� 5 (high) sowkhyam of strokes 1 (low) �—�—�—�—� 5 (high) sowkhyam of accompaniment as a whole 1 (low) �—�—�—�—� 5 (high) Was the "sowkhyam" of the timing, accents and strokes sufficient to judge the acceptability of the accompaniment? � Yes � No If not, what additional aspects did you use? Variant 3 sowkhyam of timing and synchronization 1 (low) �—�—�—�—� 5 (high) sowkhyam of accents 1 (low) �—�—�—�—� 5 (high) sowkhyam of strokes 1 (low) �—�—�—�—� 5 (high) sowkhyam of accompaniment as a whole 1 (low) �—�—�—�—� 5 (high) Was the "sowkhyam" of the timing, accents and strokes sufficient to judge the acceptability of the accompaniment? � Yes � No If not, what additional aspects did you use? Variant 4 sowkhyam of timing and synchronization 1 (low) �—�—�—�—� 5 (high) sowkhyam of accents 1 (low) �—�—�—�—� 5 (high) sowkhyam of strokes 1 (low) �—�—�—�—� 5 (high) sowkhyam of accompaniment as a whole 1 (low) �—�—�—�—� 5 (high) Was the "sowkhyam" of the timing, accents and strokes sufficient to judge the acceptability of the accompaniment? � Yes � No If not, what additional aspects did you use? Variant 5 sowkhyam of timing and synchronization 1 (low) �—�—�—�—� 5 (high) sowkhyam of accents 1 (low) �—�—�—�—� 5 (high) sowkhyam of strokes 1 (low) �—�—�—�—� 5 (high) sowkhyam of accompaniment as a whole 1 (low) �—�—�—�—� 5 (high) Was the "sowkhyam" of the timing, accents and strokes sufficient to judge the acceptability of the accompaniment? � Yes � No If not, what additional aspects did you use? Variant 6 sowkhyam of timing and synchronization 1 (low) �—�—�—�—� 5 (high) sowkhyam of accents 1 (low) �—�—�—�—� 5 (high) sowkhyam of strokes 1 (low) �—�—�—�—� 5 (high) sowkhyam of accompaniment as a whole 1 (low) �—�—�—�—� 5 (high) Was the "sowkhyam" of the timing, accents and strokes sufficient to judge the acceptability of the accompaniment? � Yes � No If not, what additional aspects did you use? 119 F.5 Participant observation form The participant observation form that was used in the study sessions is presented below. Each session included three copies of the form, one for each recording. Variant 1: Number of times listened to: Variant 2: Number of times listened to: Variant 3: Number of times listened to: Variant 4: Number of times listened to: Variant 5: Number of times listened to: Variant 6: Number of times listened to: Problems worth noting about the evaluation (quality of synthesis, creativity of playing ) 120 F.6 Participant definition sheet Definitions • Timing and synchronization: alignment of secondary hits with the tala (beat positions) of the performance. • Accents: the sequence of strong and weak hits played in the rhythm pattern. For example, the accents of [takadimi takajono] is 4-4, [num thi dhin dhin thi dhin dhin thi] is 3-3-2. • Strokes: exact choice of secondary strokes played on the different (accented and non-accented) beats in a rhythm pattern. Overall accompaniment rating information • • • • • Rating Rating Rating Rating Rating 1: 2: 3: 4: 5: overall overall overall overall overall accompaniment accompaniment accompaniment accompaniment accompaniment is is is is is sowkhyam sowkhyam sowkhyam sowkhyam sowkhyam during during during during during 121 NONE of the performance. a LITTLE BIT of the performance about HALF of the performance. MOST of the performance ENTIRE performance [...]... accompaniments in a database which is queried to retrieve the accompaniment The accompaniments in the database are organized by their musical features Retrieval systems extract the necessary musical features from the input, package them into a data format which is suitable to query the database, and retrieve the accompaniment The best matching accompaniment is retrieved and played Impact is an accompaniment. .. 12.1 Average accompaniment rating per recording 74 12.2 Average rating for variants of recording 1 75 12.3 Average rating for variants of recording 2 75 12.4 Average rating for variants of recording 3 76 12.5 Accompaniment ratings for variants of recording 1 77 12.6 Accompaniment ratings for variants of recording 2 78 12.7 Accompaniment ratings... database The best accompaniment is selected according to a measure of mathematical distance between the query (called target case) and each of the patterns in the database Given a single input, the system always returns one accompaniment (the best matching accompaniment) as output Cyber-Joao is an adaptation of the Impact system that optimizes the number of parameters used for the retrieval (Dahia... about the validity of the accompaniment played using such a system 14 3.2 Proposed solution The central goal of the work reported here is to develop an algorithm that generates valid alternate variations of secondary accompaniment for recordings of Carnatic musical performances The central insight – the main original contribution – is that the generation of valid alternate variations of secondary accompaniment. .. improvisational accompaniment systems and highlights an important problem in this field Improvisational accompaniment systems differ from score-following, solo-trading, and tap-along systems in that they are able to produce multiple valid musical alternatives for the same performance Developing musical accompaniment systems that generate multiple valid accompaniments by modeling the constraints of accompaniment. .. accompaniment can be accomplished by formally representing the relationship between lead and accompaniment in terms of musical tension By formalizing tension ranges as constraints for acceptable accompaniment, an algorithmic system is able to generate alternate accompaniment choices that are acceptable in terms of a restricted notion of sowkhyam (roughly, musical consonance) In the context of this... (relative to the lead) is considered equally sowkhyam as the original The research resulted in a system that can take a transcribed selection of a Carnatic musical performance and algorithmically generate new performances, each with different secondary percussion accompaniment that meet the criteria of restricted sowkhyam as well as the original secondary accompaniment In order to evaluate the ability of. .. produce alternate valid secondary accompaniments for a Carnatic musical performance, a study was conducted with musical experts to address three related research questions: 15 • RQ1: Does the system produce secondary accompaniment that is rated at least as high as the original accompaniment? • RQ2: Are accompaniments inside the range better (i.e., do variants within the range get higher scores than variants... during this thesis research The method included the analysis of Carnatic music performances, development of different models of accompaniment playing, their implementation as computer programs, and their evaluation 4.1 Analysis of the Carnatic musical performances Since the rules and constraints for secondary improvisation in Carnatic ensemble are not clearly specified in the literature (or in oral tradition),... their ability to generate multiple accompaniments that are musically valid In order to address these shortcomings, a 17 third model was developed that generates multiple valid accompaniment for the same lead input: the tension model 4.3 Evaluating the tension model In order to evaluate the ability of the system to produce alternate valid secondary accompaniments for a Carnatic musical performance, a study

Ngày đăng: 30/09/2015, 10:11

Xem thêm: Playing with tension a computational mode of improvisational accompaniment by secondary rhythmic performer in carnatic music , Playing with tension a computational mode of improvisational accompaniment by secondary rhythmic performer in carnatic music

Playing with tension a computational mode of improvisational accompaniment by secondary rhythmic performer in carnatic music

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan