Bernard Mont Reynaud transcript

NB time codes are a rough guide only. they are retained from the original audio, which has since been edited for clarity and continuity.

James Parker: OK. Fantastic. Well, if you're OK to begin, shall we just begin? Maybe you could start off by introducing yourself, however feels right to begin with, and we can go on from there. OK. Go for it.

[00:01:19] Bernard Mont-Reynaud: Let's see. Well, I was born in France, in Marseilles, you know, up in the South. I studied there until I went to Paris at the Ecole Polytechnique, and I hesitated between multiple professions. Was I going to be an architect or was I going into electronic music or computers? And computers won, I would say, very much so, based on this idea that you could capture elements of thought, then encapsulate them and pull them together.

[00:02:00] Bernard Mont-Reynaud: It was sort of an early notion of AI, if you want, the pull of AI, but the pull of just modularism thought and assembling thoughts together. And that was the winning thing. And I went to work on, actually, computer-assisted instruction for a number of years. You may not believe, but as early as 1968, which was the time I'm talking about, there were a large system doing CAI. Professor Soupez at Stanford had some large computer-assisted systems at the time.

[00:02:41] Bernard Mont-Reynaud: What we called CAI in those days was computer-assisted instruction, whereas now in my work at SoundHound, this is conversational AI for the same initials. In any case, so I went to do this at a place called IRIA, which became INRIA, the National Institute for Information and Automation in France. I worked there for four years doing AI, CAI. And then, but the appeal of Stanford was enormous for me. And so I went to Stanford to do a PhD.

[00:03:26] James Parker: Was the appeal because, you know, Stanford is a famous university in America, or is it specifically because that was the place to do that specific work in computing or a bit of everything?

[00:03:44] Bernard Mont-Reynaud: It's sort of in the middle. Stanford was, remember back in those days, computer science wasn't what it is today. It was pretty much a pioneering field. And Stanford was one of the pioneering institutions. I also applied to MIT and UCLA and what was the fourth one and was admitted everywhere. But Stanford was where my heart went. And I had been on the West Coast a few years earlier, visited, you know, San Francisco and the area. I was just fascinated. So I just, yeah, Stanford was my choice.

[00:04:24] Bernard Mont-Reynaud: It was because they had the best computer science programs along with MIT. And you quickly or maybe not so quickly got into working on computer music at Stanford, or was that the, because you said before briefly that that was your sort of alternate love music.

[00:04:48] James Parker: And so how did those converge or what was the relationship between those two in those early days at Stanford?

[00:04:58] Bernard Mont-Reynaud: Yeah, good question. Back in the days of thinking about, it was about electronic music and old computer music, which I did not know about at the time, and I'm speaking 1967, 68. It was just out of an interest in music. But, you know, one thing I always regretted in my life is to not have a formal education in music. And this was something I wanted to do for my own enjoyment, for my own passion even. But I was a computer scientist. I had a very rigorous training in math and science, and was becoming a computer scientist.

[00:05:40] Bernard Mont-Reynaud: The idea of music was not there yet, and it wasn't for a while. What happened was, while I was doing my PhD at Stanford, I took my time and I sampled every topic under the sun in a computer science curriculum. I was involved also in natural language understanding and other things, and very solid training in optimization and so on, like hardcore topics of computer science as opposed to applications. And let's see, I'm losing the thread here. Yes, I wasn't… I wasn't doing anything without music.

[00:06:28] Bernard Mont-Reynaud: I went to be an assistant professor in Berkeley. And I wasn't, how can I say, I had lost my passion for the core CS, you know, ultimate optimization kind of quality, and I started to be attracted to more perceptual fields like visual and auditory, and that was my interest. And this is when, after four years in Berkeley, I left behind a tenure track position to go do research on computer and music. Not in producing computer music, but there's a story on this of how come I went to CCRMA (‘Carma’), and I could get into that story separate stage.

[00:07:21] Bernard Mont-Reynaud: But basically, I went PhD plus four, or almost four, when I left the sort of core computer science track to go into the audio and music area.

[00:07:39] James Parker: I'd love to hear more about how that played out. I mean, yeah, what work were you even doing on audio and music in that period? What work was being done in the field? And then how did that take you to, well, I've been calling it the CCRMA at Stanford, but Carma, as you put it?

[00:08:03] Bernard Mont-Reynaud: Yes. So, yeah, I want to take a deep breath on that one, because now I'm restarting about CCRMA and all the people and then how come I joined in. But basically, if you go back to the founding of CCRMA, that was around 1977. And there were four founders, John Chowning, who had become the director, and two other people, and the fourth was Andy Moore, a student who at the time was completing his PhD in 1977, at the same time I was, and we had known each other at the Stanford AI Lab.

[00:08:53] Bernard Mont-Reynaud: But he wasn't involved with music at the time, at least that I know, and our relationship was based on music, it was based on AI, but we happened to have esteem for each other. And he independently had applied for a grant to the NSF, if you know about the National Science Foundation in the US, a grant to do musical intelligence, which at that point, you know, and when you're a pioneer, you have no limits, you have no boundaries, everything is open field.

[00:09:37] Bernard Mont-Reynaud: And so in that grant, he was going to address everything that could possibly have to do with computers and music or computer analysis of music, and that involved the signal processing, the polyphony, the, you know, musical intelligence in its wider form, and so on and so forth.

[00:09:56] James Parker: So, I know, I know, is that James Moore, is that the same person that I know as James Moore?

[00:10:06] Bernard Mont-Reynaud: Yeah, James Andy Moore. Did you speak with him?

[00:10:09] James Parker: No, I haven't spoken with him, but I read some of his work on music transcription. It seems like in the mid 1970s, that was, he was really the first person publishing on music, automatic music transcription.

[00:10:26] Bernard Mont-Reynaud: Exactly, exactly. And this is where the connection happens, because he completed his PhD in 1977, same year I did mine, I completed mine. And his work was on, involved the separation of two notes on the guitar, guitar player, and doing the signal processing just to separate two notes, you know, in a succession of notes. That was state of the art at that time. And yes, so, so indeed, that's the same person. That's the research he was doing for his PhD. And he became a co-founder of CCRMA.

[00:11:15] Bernard Mont-Reynaud: And he had applied for a grant at, with NSF, where, based on extrapolating his research on separating two notes on the guitar, if you want to put it that way in a humorous way, he was then going to solve just about, I mean, the grant, the project was going to solve every problem under the sun for, for, you know, music analysis by computer. And this is, I guess what you, what you do in grants, you oversell a bit and, but especially when this is a completely pioneering field and there's nothing to limit your ambitions.

[00:12:01] Bernard Mont-Reynaud: So then, okay, so this is what happened. And, but then he went to Paris to a place called IRCAM, which you may have heard of or not. And, and he was, he's also, I mean, Andy is a, I don't know whether to use the word genius or, or to say he's a, he's an incredible engine that could, but he's a, you know, he could write not only incredible software, but he could design incredible hardware and so on and so forth.

[00:12:40] Bernard Mont-Reynaud: So he was at, at CCRMA, he was involved in all sorts of software, but later on he was hired by Lucasfilm and he built hardware for George Lucas that would help with the, the Star Wars series, the Star Wars sequence. Yes.

[00:13:06] James Parker: Trilogy.

[00:13:09] Bernard Mont-Reynaud: Trilogy. I was going to say trilogy and say, but wait, there were more than three. This is my hesitation. It wasn't a trilogy, was it in the end or, you know, but I didn't want to misrepresent the number, but at the time it was probably a trilogy. And, but I, I get slightly ahead of myself. So he applied for this grant and then he went to IRCAM in Paris to install, you know, a certain software situation similar to CCRMA. They were duplicating the CCRMA environment in Paris for Pierre Boulez.

[00:13:46] Bernard Mont-Reynaud: And then he got hired by Lucasfilm to go build the software for Star Wars. And by the time this all happened, the grants got granted and he wasn't around to do this. So he was not going to deliver on any of these promises.

[00:14:09] Bernard Mont-Reynaud: And somehow I heard about this and I kind of managed my way into leaving my faculty, my position at Berkeley, my assistant professor position at Berkeley, where I was getting a bit frustrated and kind of a little tight at the elbows because I wanted to get in something more perceptual, you know, possibly sound or music or visual stuff, as opposed to, you know, core technique, core map, you know. And so the timing was good. I don't know. I don't know exactly on what basis I tried to convince them that I was the person for the job, but it worked.

[00:14:58] Bernard Mont-Reynaud: They hired me to do this to actually head that project. And it was all new to me. And because it was all pioneering work, you know, I didn't feel I had enormous constraints about that.

[00:20:31] Bernard Mont-Reynaud: I forgot that I was supposed to give a biography and I just started unfolding the whole story as opposed to like a quick summary. So I could give you, try to give you a quick summary, but chances are I get lost along the way. I mean, quick summary, what would be, I spent 10, 11 years at CCRMA. During that time, I did various consulting. I could not, let's see, I went on to industry for the rest of life, been in a lot of places like Xerox PARC and Sony and others.

[00:21:14] Bernard Mont-Reynaud: I went into SoundHound over 12 years ago, I've been 13 years at SoundHound, it'll be 13 years. And, but basically, it's all around the valley, but some of my best times have been at CCRMA and SoundHound and Sony.

[00:21:36] Speaker 3: So, yeah, so this is a quick collapse.

[00:21:40] James Parker: That's very helpful.

[00:21:41] Bernard Mont-Reynaud: I've spent 50 years, 54 years.

[00:21:45] James Parker: Oh, no, that's incredibly helpful. I was just wondering if it would be possible to do a slightly, a similar overview of your time at CCRMA, like, you know, in those 10 years, you know, where did you begin? Because you said that it was kind of quite blue sky thinking, you know, the industry was involved, but not sort of driving any specific commercial outcomes. So do you sort of remember, like the broad arc of your time at CCRMA, like what sort of things you were working on the beginning and where you ended up?

[00:22:21] Bernard Mont-Reynaud: Yeah, sure. Of course, I remember all that stuff, you know, near to my heart, I remember way more than we have time to go over. Initially, you know, it was all pioneering and I was discovering that too. It was like, okay, where do I begin? What did I do? I found at CCRMA, one of the other co-founders, Lauren Rush, who helped me a little get started and collect some examples that seem of manageable complexity. And the focus at that time was music transcription.

[00:23:02] Bernard Mont-Reynaud: How do we, and I discovered among other things that finding the pitches, hearing the pitches and the sound was not the hardest thing, or at least I had somebody else as a consultant doing it. Basically, you say what frequency, what time, what amplitude, and you've got those few numbers coming out. But I discovered that transcribing is a problem of itself. Once you have the events, you have to figure if there's a tempo, if this isn't, you know, you have before, is there a quarter note somewhere or are there triplets and so on and so forth?

[00:23:52] Bernard Mont-Reynaud: That's the key. And there are a number of issues that have to do with musical intelligence, as opposed to just extracting events. So I discovered for one thing, the structure of this, started developing some representations for notes, little representations starting with events, and then gradually getting enriched with properties that were going towards transcription until eventually we're able to put note names, note headers, note durations.

[00:24:30] Bernard Mont-Reynaud: I found out about, there was at CCRMA, early pioneering music printing, so I could create a notation from Professor Leland Smith had a system for musical typography, and you could feed this data to his program to get a terrific looking musical score. So anyway, at the beginning as putting those pieces together, figuring out what the components of problems were, and that's pretty much, that's what I achieved within that first time, as well as applying for a follow up grant.

[00:25:13] Bernard Mont-Reynaud: So, there was round one, on round two, and after I returned from Paris, I got deeper into the capabilities of the system, in particular, the musical intelligence, but I realized that I had to think about the following grant, you know, that I had to think early on about planning the sequel, and I realized I had to choose one of two paths from that point. At that point, the project had been about everything, from detecting events, to separating the notes, to creating musical score, and so on and so forth.

[00:26:00] Bernard Mont-Reynaud: I thought I had to either go deeper into the music intelligence, or deeper into the event separation intelligence. Call it hearing, or call it musical listening, or musical understanding, you know, as you go less of a pioneering effort, and the effort, you know, deepens, it diversifies also. And so, I thought about this for a while, and I decided to go into hearing.

[00:26:37] Bernard Mont-Reynaud: Part of my reason was, it goes back to the fact that I did not have a, I had some musical training, but I did not have the entire depth of kind of having been to music school and having all of that, to totally do musical intelligence justice, and so I felt that however interested I might be, and also the field of sound separation, source separation, also looked very much a pioneering area, like wide open, as opposed to, musical intelligence was quite open, but I wasn't as prepared for it, and I felt my limitations.

[00:27:30] Bernard Mont-Reynaud: I didn't know what my limitations would be in source separation. Anyway, that's where I aimed my next round, for the third round of funding.

[00:27:47] Bernard Mont-Reynaud: In the meantime, I should probably mention, there's a couple of students, you know, when you have a research grant, you hire some students, and I had a couple of PhDs happening, on the project, one that involved the transcription of Afro-Cuban drumming, which I used some of my software that did event detection and quantization of durations and so on, but he supplied his own event detection, and then we put it all together, and he's been a music professor now for quite a while.

[00:28:36] James Parker: Who is that?

[00:28:36] Bernard Mont-Reynaud: That's Andy Schloss, his name is Andy Schloss, S-C-H-L-O-S-S, and I can give you a contact. Victoria in Canada. Yeah, I will provide follow up on that. Yeah, the second PhD was later on, that's David Menninger, and that happened, you know, on the next round. So, I'm getting ahead of myself again, but basically, I said there was the second round, where I was still pursuing this double goal, you know, but I felt I had to move towards either the music intelligence or the auditory intelligence, and I went into the second, for the third round.

[00:29:41] Bernard Mont-Reynaud: Do you remember what year that was? So, this is where the story, I would say, 86. Okay. 86. Yep.

[00:30:02] Bernard Mont-Reynaud: Right. So, and in 87, I also had all the interest, I had developed a, an approach to event detection, my own approach, and it was very visually oriented. And I would create spectrograms. I know if it can get a little technical, but you know, there's such a thing as a spectrum and a spectrogram, and it's normally put in the frequency time representation. Now, imagine that the frequency dimension, instead of being the linear frequency in hertz, is going to become the logarithm of frequency.

[00:30:50] Bernard Mont-Reynaud: So you, that means if you're in linear frequency, you have, say, zero, 400 hertz, 800 hertz. By the time you compressed it, the factor of two from 400 to 800 occupies the same amount of space as 200 to 400. It's like an octave. And by the time these are like octaves, this is like semitones. This is like a keyboard. Okay, so this was more appropriate for, for music, but not only that, but you can actually, I discovered you could do pitch detection on that representation, because the harmonic series is a fixed pattern, is a fixed vertical pattern.

[00:31:39] Bernard Mont-Reynaud: And I could do, I could render audio into this representation, which involved a spectrogram and the log f, a semitone spectrogram, I'm going to call it. And I could do pattern recognition as in image convolution on this image representation and have a, and obtain event detection that way. And so I wrote some of this work for a while, and then this was, this started at, you know, 85, 86, 87, where those things were happening, which was sort of the core, which was where I had taken control of the event detection with my own approach.

[00:32:33] Bernard Mont-Reynaud: And I decided to go into source separation, like if there were multiple pitches at the same time or other, okay. So now we had mentioned Al Bregman, and the fact that he came, what was wonderful is that Professor Al Bregman came, spent one year at CCRMA, while he was writing a book on auditory scene analysis, a 900 and some page compendium of his research in psychoacoustics, the psychoacoustics of how do we detect streaming?

[00:33:16] Bernard Mont-Reynaud: How do we know there's a source here and a source there versus any psychologically that we have separate sources versus, you know, other events that don't group as sources. And this was fascinating. And it was great for me to hear what he had to say, it was great for him to have somebody really interested to hear about his thoughts. So we had, we had this little club, he and I, before the book came out, where I got to receive all his ideas on this, and this fit exactly in the right place.

[00:33:58] Bernard Mont-Reynaud: Because I wanted to know his principles for bringing sources together. I don't know if you're familiar with the book, but they basically are grouping principles that allow sources to combine as separate auditory streams.

[00:34:20] James Parker: Yeah, the coining of that term, the sort of the auditory stream seems to be one of the sort of key things like, from the book, to start thinking in terms of auditory streams. He has a whole section. And it's really interesting to, I've read the book, or I've read, tried to read the book, I'm not a scientist, but everything you say really correlates with my understanding.

[00:34:49] James Parker: It does seem that he was doing this work in psychoacoustics sort of before he came to CCRMA, but that, you know, he's very explicit in his acknowledgements about the influence of his time at CCRMA, you know, your influence, John Chowning's and others.

[00:35:14] James Parker: And it's really interesting to me, it's always been interesting to me to think about the relationship between this work that's on hearing, more generally, and music specifically, because it seems like around the time that you're describing, there's basically two major streams in, to use that word stream again, in, you know, auditory oriented AI, there's basically speech recognition. And then there's, or speech understanding or whatever. And then there's work on music. And then around the time that you're describing, suddenly, or not so suddenly.

[00:35:56] James Parker: it starts to broaden out, it starts to become much more about hearing, as you're saying, or much more about, you know, the separation of different auditory events, you get to things like auditory scene analysis, auditory event detection start to come out. And then it seems from the outside that what were relatively distinct fields, I mean, I should ask you what the if you had any relationship with people doing work on speech recognition, and so on at the time, but but it seems like they were relatively separate.

[00:36:33] James Parker: And then they start to come together under a larger umbrella of people working in, you know, because suddenly all the speech people start to need source separation. And they, they need to do scene analysis in order to, you know, separate out speech from office sounds, and, you know, environmental noise, and so on and so on.

[00:36:59] James Parker: So, so that's the story that I seem to be sort of, I seem to be finding digging around in all of these reports and PhD theses and stuff, but it sounds like what you're describing, but I, I don't know if I'm misrepresenting it at all.

[00:37:16] Bernard Mont-Reynaud: Well, well, yes, I know. First, first to clarify one thing, I, I was not involved in speech at the time. My interest was in source separation, which is a sort of, which is a broad interest, but the examples were musical examples, right?

[00:37:38] Bernard Mont-Reynaud: They there's no doubt in the speech community, people have been very interested in separating the voice itself from the background noise, which is also a kind of separation, but it has a different focus in the sense that it's not on per se streaming, for example, noise is not considered a stream, you basically try to find a dominant stream, the dominant source, and everything else is kind of, is what you want to remove.

[00:38:14] Bernard Mont-Reynaud: So, so I think it's taken a long time for, and in many ways hasn't completely happened, speech to be treating the voice as just one of multiple sources in the environment, and also recognizing other sources in the environment. They, as far as I know, that particular phenomenon hasn't happened to treat the voice as just one equal, you know, one of many, of many events. If you use a visual analogy, you know, you, you can recognize chairs and tables and, you know, and people in the, in the image.

[00:38:59] Bernard Mont-Reynaud: And although maybe you're particularly interested in people or people's faces, you might still recognize a chair or a table in, in, in the vision field. This object formation is the same thing, you know, at one level, whether you're in vision or in audition, you're doing object formation, right, you're doing object separation, object formation, and you recognize sources, give them properties, okay, but, but in speech, the equivalent, you're still only interested in maybe the face, you know, it's not in all these other things.

[00:39:39] Bernard Mont-Reynaud: So, so I, I, I don't fully see the same type of parallelism that, that you see about speech in music, also that there has been a tendency, I would say for people to, for a field of source separation, auditory source separation to, to exist on its own, and it would pick examples in sound or, or image as the case might be. But not be completely attached to either speech or music. If you want, there are the people who look at this psychologically and the people look at this from the point of view of applications. They're not entirely the same. I mean, in a research lab, they can be, but the source separation has been driven by research interests and only to a small degree by applications in the real world. It's still expensive.

[00:41:00] Bernard Mont-Reynaud: Now, this has started to change, but and also we're beginning to see maybe neural networks that have some capability in that area, but it's still not very widely spread out.

[00:41:22] James Parker: Okay.

[00:41:23] Bernard Mont-Reynaud: Maybe I could return because we're on Bregman here and on auditory scene analysis. And that was a very important book. And yes, he had been interested in this topic for a while. Then he decided to write a book on it. And that was a very important book. He put it all together. And I was familiar with a lot of this work. Before the book was written, he was just dumping this stuff on us at CCRMA. And I was one of the most attentive listeners of his ideas, because that's the field I was really getting into. And I must say, he did a lot of psychoacoustics, which that wouldn't be within my capability to do all of this.

[00:42:18] Bernard Mont-Reynaud: He's that kind of experimental psychologist. I am not. And not only that, but it's not truly my interest to carry out. I don't have the patience to do this, but to hear him talk about it, that was wonderful because he had done all these experiments. And I came to summarize a 900 page book out of a couple of sentences in my head, which is to say, if you take any criteria, such as suppose you have two sources, they're based on, you have a source that goes beep, beep, beep, beep. And the other goes, bop, bop, bop, bop. So there's one high, one low, and you do beep, beep, bop, bop, bop, bop.

[00:43:06] Bernard Mont-Reynaud: Yeah, I cannot do the productive process at the same time. I mean, I can do beep, boop, beep, boop, beep, boop, beep. And if I do it too slow, they start to, but the kind of experiment he would do is to vary the distance between those frequencies, the high and the low, and to vary the timings. And he would show that if the timing makes them very close, and one goes like, beep, beep, beep, beep, the other, boop, boop, boop, boop, they separate as two streams, a high and a low stream. Versus if you have one that goes, beep, boop, boop, beep, boop, then when there's a long time, they become a single stream.

[00:43:57] Bernard Mont-Reynaud: They become a single note going up and down. And his experiments, he would do this and show them, and show when does it stream together, when does it not. And he basically showed in every case, any number of dimensions of these kinds of things, that there are trade-offs, that the same mechanism that makes it stream can make it not stream as a not great.

[00:44:30] Bernard Mont-Reynaud: So, okay. There's not one kind of example, say you're going to stream on frequency, and then it will be on time difference. No, they always trade-offs, trade-off between the two, if you will. There's multiple factors, each can, by varying them against one another, can cause streaming or not. And so that's first sentence, if you want, of the summary. And the second is, this implies, seems to me, that there is a central mechanism for separation.

[00:45:10] Bernard Mont-Reynaud: Because everything that goes from the signal processing or from the raw data, goes and can have the outcome of separation or not. So, to me, that speaks for a uniform mechanism that decides whether sources belong together or not. And that's my summary of a 900-page book, and is essential, because it had implications for the architecture of building that kind of system.

[00:45:39] James Parker: Right. So, do you begin to then operationalize that in the systems that you're building?

[00:45:50] Bernard Mont-Reynaud: Absolutely. Absolutely, yes.

[00:45:52] James Parker: And does it suddenly lead to significant improvements in…

[00:45:55] Bernard Mont-Reynaud: Well, okay, so you're assuming that the system was already at full capability, but it wasn't. It guided how we think about it, how we start building the pieces, but not all the pieces were there. But yes, indeed, it did lead us in that. And this is where that second PhD thesis I was talking about came along. His name is David Menninger, and he did his thesis where sort of showing trade-offs. And there's another parenthesis on this. There's a phenomenon that you may have heard or not about.

[00:46:50] Bernard Mont-Reynaud: It goes back to some work of John Chowning from way back, and you know about the frequency modulation. And you may have heard about this phenomenon called frequency co-modulation, where you have a collection of partials. I'm showing things in the spectrogram, and each of my fingers is a partial, and it goes like… And it has this sort of artificial sound, and it's not well separated from other things if you have multiple. But suddenly, you're going to put frequency modulation on it. It goes… And the moment you do this, the sounds belong together at one.

[00:47:42] Bernard Mont-Reynaud: It sounds natural. It sounds like a voice, and it separates from anything else. So one of the many dimensions that Bregman talks about is the co-modulation, the fact that those various partials go up and down at the same time. They modulate at the same time. It's called co-modulation. But it goes way back to the synthesis technique in FM that by putting this frequency co-modulation, it created this voice effect that was overwhelming to the auditory system. This effect is overwhelming because it immediately causes source formation.

[00:48:32] Bernard Mont-Reynaud: Okay, so this was a parenthesis from David Menninger's thesis into, you know, and Bregman and all this with a parenthesis back to John Chowning. And now we're back to frequency co-modulation was one of the features that David Menninger has focused on, as well as some others. And yes, we use these ideas in his thesis, which were based on Bregman and on the architecture, which is a central mechanism for grouping features. But the whole system was not built. It's like it was, again, pieces of there's so much to be done.

[00:49:25] Bernard Mont-Reynaud: The auditory system is extremely complex and capable. And we only built some pieces of it. It's the same as a vision system. Vision has so many things in it. And you build pieces of it. You might build a piece that had to do with occlusion. Another piece has to do with texture and color. And there's perspective. And I could go on and on. Well, the same is true in the auditory domain. In Bregman, you'll see that he uses visual metaphors all the time for what's happening in the auditory.

[00:50:02] Bernard Mont-Reynaud: And the reason is like we see the stuff and we understand occlusion. But auditory masking is much harder to represent. But it's essentially the same thing. It's essentially the same as occlusion. But we hear it, but if we can't see it, we can't easily talk about it, point to it. Because sound only exists in motion. Sound does not exist at a frozen moment. An image can exist at a frozen moment. You can point to this piece and that piece and talk about it. You cannot do that in sound very easily. It's only moving. So I go parenthesis within parenthesis.

[00:50:49] Bernard Mont-Reynaud: But to answer your question, yes, we put what we could into the architecture of the system. And that plans to continue. But let's see.

[00:51:04] James Parker: So, this is a fair way into your time at CCRMA. And soon in the story, you must leave this work, sort of, you know, unfinished in a certain way and then move on. Is that right? Like, are we getting to that point in the tale? So you sort of leave, you left this sort of, I don't know how you would describe it, but sort of basic research, maybe, into machine hearing and moved more into industry? Is that the, is that how you think of it?

[00:51:42] Bernard Mont-Reynaud: Yeah. Let me talk some more about the transition and how it happened. Because by then, by then I was ready, between all the ideas of Bregman and all what I had built over a succession of systems, I was ready for a major onslaught onto auditory machine analysis, right? And I envisioned a larger grant than I had had before. And I went to DARPA, I wrote a grant proposal, I went to DARPA, and there was some interest, you know, there's a question of how does this connect to industry or applications?

[00:52:35] Bernard Mont-Reynaud: And you know, at DARPA, when you go defend, I went to Washington to defend my proposal, and there are people representing NSF and ONR and the NSA and the different, you know, different agencies that might be interested in supporting the grants. And I felt it was a strange feeling. I felt they were interested, and their mind wasn't quite there. They both were present and absent. It's kind of strange. And then I found out later on, it turns out a week later, was the war on Iraq.

[00:53:17] Bernard Mont-Reynaud: So, you know, at the Pentagon, they're, this is what, you know, they're also involved, they're involved in research and they're involved in the defense department very much. So that explains part of that. And yet I had had interest in my research. And there was, the question was, would you also be interested in applications of this research on degraded monophonic signals? Now, what's a degraded monophonic signal? It, this is a telephone tapped line, right? You have, it's mono and it can be arbitrary and anonymous.

[00:54:10] Bernard Mont-Reynaud: So I had specific interest for like secret work. And I knew once you put your foot into doing secret work, you're kind of, you go underground and this is the end of the research. You're now working for the spooks. So to, don't quote me on that, but. Okay, so I found that this huge effort I had put into having this very wide open research and that I was asking for $5 million at the time, which was a fair amount, but I felt it was building stage upon stage where I said I wanted to build a large system to do this.

[00:55:00] Bernard Mont-Reynaud: I couldn't get the funding and I tried to survive a bit. But this is because I made a mistake as a, as a professor, which I had become by then an associate professor, research. I should know better than to go for large grants. I should also have also, so a little bit grant, so to get, to continue funding while hunting for a big grant. I shouldn't have small ones to survive. I didn't do that. My mechanism for survival was sort of on a personal basis.

[00:55:42] Bernard Mont-Reynaud: I would do consulting outside the university, but I hadn't, so anyway, I made the strategic mistake of, of not having small grants to stay in the game while waiting for a long grant. And this is where I had to leave, just to put it in perspective.

[00:56:04] Bernard Mont-Reynaud: And so it was a while until I was able to work on source separation again. It wasn't until I was at this company called Audience, where the ambition was to put a chip into telephones to do foreground background separation, to separate the voice of interest from all of the noise around it. So Audience would be another story a number of years down the line. In the meantime, I've been at many different companies.

[00:56:40] Bernard Mont-Reynaud: And then again, quite a bit later, I went to SoundHound, where I wasn't doing sound separation at all, but I've been involved with sound and music and then speech, sorry, and natural language. At Audience, I did work on source separation. I finally got to build a new system based on these principles that I got from Al Brickman. And I pulled that together.

[00:57:16] James Parker: What year are we talking about now? At Audience?

[00:57:22] Bernard Mont-Reynaud: Audience, that's going to be maybe 2000.

[00:57:25] James Parker: Okay.

[00:57:27] Bernard Mont-Reynaud: Yeah, I could go, maybe I should send you a resume.

[00:57:36] James Parker: Well, I've read bits and bobs.

[00:57:38] Bernard Mont-Reynaud: I would say it's about 2000. Yes, I would say 2000 if I picked it like that. So when you were... And again...

[00:57:51] James Parker: No, go ahead.

[00:57:56] Bernard Mont-Reynaud: Even at Audience, hold on. Even at Audience, we had the tension between the broad research angle on this, which is source separation. And the product focused just separate that voice right here from ambient stuff by something cheap, something that works most of the time, but it does not have to do source separation. And Audience initially was addressing broad goals. And there was a lot of interest in this source separation business. At some point, the investors came down and said, hey, what's your product focus? You don't have a product yet.