NB time codes are a rough guide only. they are retained from the original audio, which has since been edited for clarity and continuity.
James Parker: OK. Fantastic. Well, if you're OK to begin, shall we just begin? Maybe you could start off by introducing yourself, however feels right to begin with, and we can go on from there. OK. Go for it.
[00:01:19] Bernard Mont-Reynaud: Let's see. Well, I was born in France, in Marseilles, you know, up in the South. I studied there until I went to Paris at the Ecole Polytechnique, and I hesitated between multiple professions. Was I going to be an architect or was I going into electronic music or computers? And computers won, I would say, very much so, based on this idea that you could capture elements of thought, then encapsulate them and pull them together.
[00:02:00] Bernard Mont-Reynaud: It was sort of an early notion of AI, if you want, the pull of AI, but the pull of just modularism thought and assembling thoughts together. And that was the winning thing. And I went to work on, actually, computer-assisted instruction for a number of years. You may not believe, but as early as 1968, which was the time I'm talking about, there were a large system doing CAI. Professor Soupez at Stanford had some large computer-assisted systems at the time.
[00:02:41] Bernard Mont-Reynaud: What we called CAI in those days was computer-assisted instruction, whereas now in my work at SoundHound, this is conversational AI for the same initials. In any case, so I went to do this at a place called IRIA, which became INRIA, the National Institute for Information and Automation in France. I worked there for four years doing AI, CAI. And then, but the appeal of Stanford was enormous for me. And so I went to Stanford to do a PhD.
[00:03:26] James Parker: Was the appeal because, you know, Stanford is a famous university in America, or is it specifically because that was the place to do that specific work in computing or a bit of everything?
[00:03:44] Bernard Mont-Reynaud: It's sort of in the middle. Stanford was, remember back in those days, computer science wasn't what it is today. It was pretty much a pioneering field. And Stanford was one of the pioneering institutions. I also applied to MIT and UCLA and what was the fourth one and was admitted everywhere. But Stanford was where my heart went. And I had been on the West Coast a few years earlier, visited, you know, San Francisco and the area. I was just fascinated. So I just, yeah, Stanford was my choice.
[00:04:24] Bernard Mont-Reynaud: It was because they had the best computer science programs along with MIT. And you quickly or maybe not so quickly got into working on computer music at Stanford, or was that the, because you said before briefly that that was your sort of alternate love music.
[00:04:48] James Parker: And so how did those converge or what was the relationship between those two in those early days at Stanford?
[00:04:58] Bernard Mont-Reynaud: Yeah, good question. Back in the days of thinking about, it was about electronic music and old computer music, which I did not know about at the time, and I'm speaking 1967, 68. It was just out of an interest in music. But, you know, one thing I always regretted in my life is to not have a formal education in music. And this was something I wanted to do for my own enjoyment, for my own passion even. But I was a computer scientist. I had a very rigorous training in math and science, and was becoming a computer scientist.
[00:05:40] Bernard Mont-Reynaud: The idea of music was not there yet, and it wasn't for a while. What happened was, while I was doing my PhD at Stanford, I took my time and I sampled every topic under the sun in a computer science curriculum. I was involved also in natural language understanding and other things, and very solid training in optimization and so on, like hardcore topics of computer science as opposed to applications. And let's see, I'm losing the thread here. Yes, I wasn't… I wasn't doing anything without music.
[00:06:28] Bernard Mont-Reynaud: I went to be an assistant professor in Berkeley. And I wasn't, how can I say, I had lost my passion for the core CS, you know, ultimate optimization kind of quality, and I started to be attracted to more perceptual fields like visual and auditory, and that was my interest. And this is when, after four years in Berkeley, I left behind a tenure track position to go do research on computer and music. Not in producing computer music, but there's a story on this of how come I went to CCRMA (‘Carma’), and I could get into that story separate stage.
[00:07:21] Bernard Mont-Reynaud: But basically, I went PhD plus four, or almost four, when I left the sort of core computer science track to go into the audio and music area.
[00:07:39] James Parker: I'd love to hear more about how that played out. I mean, yeah, what work were you even doing on audio and music in that period? What work was being done in the field? And then how did that take you to, well, I've been calling it the CCRMA at Stanford, but Carma, as you put it?
[00:08:03] Bernard Mont-Reynaud: Yes. So, yeah, I want to take a deep breath on that one, because now I'm restarting about CCRMA and all the people and then how come I joined in. But basically, if you go back to the founding of CCRMA, that was around 1977. And there were four founders, John Chowning, who had become the director, and two other people, and the fourth was Andy Moore, a student who at the time was completing his PhD in 1977, at the same time I was, and we had known each other at the Stanford AI Lab.
[00:08:53] Bernard Mont-Reynaud: But he wasn't involved with music at the time, at least that I know, and our relationship was based on music, it was based on AI, but we happened to have esteem for each other. And he independently had applied for a grant to the NSF, if you know about the National Science Foundation in the US, a grant to do musical intelligence, which at that point, you know, and when you're a pioneer, you have no limits, you have no boundaries, everything is open field.
[00:09:37] Bernard Mont-Reynaud: And so in that grant, he was going to address everything that could possibly have to do with computers and music or computer analysis of music, and that involved the signal processing, the polyphony, the, you know, musical intelligence in its wider form, and so on and so forth.
[00:09:56] James Parker: So, I know, I know, is that James Moore, is that the same person that I know as James Moore?
[00:10:06] Bernard Mont-Reynaud: Yeah, James Andy Moore. Did you speak with him?
[00:10:09] James Parker: No, I haven't spoken with him, but I read some of his work on music transcription. It seems like in the mid 1970s, that was, he was really the first person publishing on music, automatic music transcription.
[00:10:26] Bernard Mont-Reynaud: Exactly, exactly. And this is where the connection happens, because he completed his PhD in 1977, same year I did mine, I completed mine. And his work was on, involved the separation of two notes on the guitar, guitar player, and doing the signal processing just to separate two notes, you know, in a succession of notes. That was state of the art at that time. And yes, so, so indeed, that's the same person. That's the research he was doing for his PhD. And he became a co-founder of CCRMA.
[00:11:15] Bernard Mont-Reynaud: And he had applied for a grant at, with NSF, where, based on extrapolating his research on separating two notes on the guitar, if you want to put it that way in a humorous way, he was then going to solve just about, I mean, the grant, the project was going to solve every problem under the sun for, for, you know, music analysis by computer. And this is, I guess what you, what you do in grants, you oversell a bit and, but especially when this is a completely pioneering field and there's nothing to limit your ambitions.
[00:12:01] Bernard Mont-Reynaud: So then, okay, so this is what happened. And, but then he went to Paris to a place called IRCAM, which you may have heard of or not. And, and he was, he's also, I mean, Andy is a, I don't know whether to use the word genius or, or to say he's a, he's an incredible engine that could, but he's a, you know, he could write not only incredible software, but he could design incredible hardware and so on and so forth.
[00:12:40] Bernard Mont-Reynaud: So he was at, at CCRMA, he was involved in all sorts of software, but later on he was hired by Lucasfilm and he built hardware for George Lucas that would help with the, the Star Wars series, the Star Wars sequence. Yes.
[00:13:06] James Parker: Trilogy.
[00:13:09] Bernard Mont-Reynaud: Trilogy. I was going to say trilogy and say, but wait, there were more than three. This is my hesitation. It wasn't a trilogy, was it in the end or, you know, but I didn't want to misrepresent the number, but at the time it was probably a trilogy. And, but I, I get slightly ahead of myself. So he applied for this grant and then he went to IRCAM in Paris to install, you know, a certain software situation similar to CCRMA. They were duplicating the CCRMA environment in Paris for Pierre Boulez.
[00:13:46] Bernard Mont-Reynaud: And then he got hired by Lucasfilm to go build the software for Star Wars. And by the time this all happened, the grants got granted and he wasn't around to do this. So he was not going to deliver on any of these promises.
[00:14:09] Bernard Mont-Reynaud: And somehow I heard about this and I kind of managed my way into leaving my faculty, my position at Berkeley, my assistant professor position at Berkeley, where I was getting a bit frustrated and kind of a little tight at the elbows because I wanted to get in something more perceptual, you know, possibly sound or music or visual stuff, as opposed to, you know, core technique, core map, you know. And so the timing was good. I don't know. I don't know exactly on what basis I tried to convince them that I was the person for the job, but it worked.
[00:14:58] Bernard Mont-Reynaud: They hired me to do this to actually head that project. And it was all new to me. And because it was all pioneering work, you know, I didn't feel I had enormous constraints about that.
[00:20:31] Bernard Mont-Reynaud: I forgot that I was supposed to give a biography and I just started unfolding the whole story as opposed to like a quick summary. So I could give you, try to give you a quick summary, but chances are I get lost along the way. I mean, quick summary, what would be, I spent 10, 11 years at CCRMA. During that time, I did various consulting. I could not, let's see, I went on to industry for the rest of life, been in a lot of places like Xerox PARC and Sony and others.
[00:21:14] Bernard Mont-Reynaud: I went into SoundHound over 12 years ago, I've been 13 years at SoundHound, it'll be 13 years. And, but basically, it's all around the valley, but some of my best times have been at CCRMA and SoundHound and Sony.
[00:21:36] Speaker 3: So, yeah, so this is a quick collapse.
[00:21:40] James Parker: That's very helpful.
[00:21:41] Bernard Mont-Reynaud: I've spent 50 years, 54 years.
[00:21:45] James Parker: Oh, no, that's incredibly helpful. I was just wondering if it would be possible to do a slightly, a similar overview of your time at CCRMA, like, you know, in those 10 years, you know, where did you begin? Because you said that it was kind of quite blue sky thinking, you know, the industry was involved, but not sort of driving any specific commercial outcomes. So do you sort of remember, like the broad arc of your time at CCRMA, like what sort of things you were working on the beginning and where you ended up?
[00:22:21] Bernard Mont-Reynaud: Yeah, sure. Of course, I remember all that stuff, you know, near to my heart, I remember way more than we have time to go over. Initially, you know, it was all pioneering and I was discovering that too. It was like, okay, where do I begin? What did I do? I found at CCRMA, one of the other co-founders, Lauren Rush, who helped me a little get started and collect some examples that seem of manageable complexity. And the focus at that time was music transcription.
[00:23:02] Bernard Mont-Reynaud: How do we, and I discovered among other things that finding the pitches, hearing the pitches and the sound was not the hardest thing, or at least I had somebody else as a consultant doing it. Basically, you say what frequency, what time, what amplitude, and you've got those few numbers coming out. But I discovered that transcribing is a problem of itself. Once you have the events, you have to figure if there's a tempo, if this isn't, you know, you have before, is there a quarter note somewhere or are there triplets and so on and so forth?
[00:23:52] Bernard Mont-Reynaud: That's the key. And there are a number of issues that have to do with musical intelligence, as opposed to just extracting events. So I discovered for one thing, the structure of this, started developing some representations for notes, little representations starting with events, and then gradually getting enriched with properties that were going towards transcription until eventually we're able to put note names, note headers, note durations.
[00:24:30] Bernard Mont-Reynaud: I found out about, there was at CCRMA, early pioneering music printing, so I could create a notation from Professor Leland Smith had a system for musical typography, and you could feed this data to his program to get a terrific looking musical score. So anyway, at the beginning as putting those pieces together, figuring out what the components of problems were, and that's pretty much, that's what I achieved within that first time, as well as applying for a follow up grant.
[00:25:13] Bernard Mont-Reynaud: So, there was round one, on round two, and after I returned from Paris, I got deeper into the capabilities of the system, in particular, the musical intelligence, but I realized that I had to think about the following grant, you know, that I had to think early on about planning the sequel, and I realized I had to choose one of two paths from that point. At that point, the project had been about everything, from detecting events, to separating the notes, to creating musical score, and so on and so forth.
[00:26:00] Bernard Mont-Reynaud: I thought I had to either go deeper into the music intelligence, or deeper into the event separation intelligence. Call it hearing, or call it musical listening, or musical understanding, you know, as you go less of a pioneering effort, and the effort, you know, deepens, it diversifies also. And so, I thought about this for a while, and I decided to go into hearing.
[00:26:37] Bernard Mont-Reynaud: Part of my reason was, it goes back to the fact that I did not have a, I had some musical training, but I did not have the entire depth of kind of having been to music school and having all of that, to totally do musical intelligence justice, and so I felt that however interested I might be, and also the field of sound separation, source separation, also looked very much a pioneering area, like wide open, as opposed to, musical intelligence was quite open, but I wasn't as prepared for it, and I felt my limitations.
[00:27:30] Bernard Mont-Reynaud: I didn't know what my limitations would be in source separation. Anyway, that's where I aimed my next round, for the third round of funding.
[00:27:47] Bernard Mont-Reynaud: In the meantime, I should probably mention, there's a couple of students, you know, when you have a research grant, you hire some students, and I had a couple of PhDs happening, on the project, one that involved the transcription of Afro-Cuban drumming, which I used some of my software that did event detection and quantization of durations and so on, but he supplied his own event detection, and then we put it all together, and he's been a music professor now for quite a while.
[00:28:36] James Parker: Who is that?
[00:28:36] Bernard Mont-Reynaud: That's Andy Schloss, his name is Andy Schloss, S-C-H-L-O-S-S, and I can give you a contact. Victoria in Canada. Yeah, I will provide follow up on that. Yeah, the second PhD was later on, that's David Menninger, and that happened, you know, on the next round. So, I'm getting ahead of myself again, but basically, I said there was the second round, where I was still pursuing this double goal, you know, but I felt I had to move towards either the music intelligence or the auditory intelligence, and I went into the second, for the third round.
[00:29:41] Bernard Mont-Reynaud: Do you remember what year that was? So, this is where the story, I would say, 86. Okay. 86. Yep.
[00:30:02] Bernard Mont-Reynaud: Right. So, and in 87, I also had all the interest, I had developed a, an approach to event detection, my own approach, and it was very visually oriented. And I would create spectrograms. I know if it can get a little technical, but you know, there's such a thing as a spectrum and a spectrogram, and it's normally put in the frequency time representation. Now, imagine that the frequency dimension, instead of being the linear frequency in hertz, is going to become the logarithm of frequency.
[00:30:50] Bernard Mont-Reynaud: So you, that means if you're in linear frequency, you have, say, zero, 400 hertz, 800 hertz. By the time you compressed it, the factor of two from 400 to 800 occupies the same amount of space as 200 to 400. It's like an octave. And by the time these are like octaves, this is like semitones. This is like a keyboard. Okay, so this was more appropriate for, for music, but not only that, but you can actually, I discovered you could do pitch detection on that representation, because the harmonic series is a fixed pattern, is a fixed vertical pattern.
[00:31:39] Bernard Mont-Reynaud: And I could do, I could render audio into this representation, which involved a spectrogram and the log f, a semitone spectrogram, I'm going to call it. And I could do pattern recognition as in image convolution on this image representation and have a, and obtain event detection that way. And so I wrote some of this work for a while, and then this was, this started at, you know, 85, 86, 87, where those things were happening, which was sort of the core, which was where I had taken control of the event detection with my own approach.
[00:32:33] Bernard Mont-Reynaud: And I decided to go into source separation, like if there were multiple pitches at the same time or other, okay. So now we had mentioned Al Bregman, and the fact that he came, what was wonderful is that Professor Al Bregman came, spent one year at CCRMA, while he was writing a book on auditory scene analysis, a 900 and some page compendium of his research in psychoacoustics, the psychoacoustics of how do we detect streaming?
[00:33:16] Bernard Mont-Reynaud: How do we know there's a source here and a source there versus any psychologically that we have separate sources versus, you know, other events that don't group as sources. And this was fascinating. And it was great for me to hear what he had to say, it was great for him to have somebody really interested to hear about his thoughts. So we had, we had this little club, he and I, before the book came out, where I got to receive all his ideas on this, and this fit exactly in the right place.
[00:33:58] Bernard Mont-Reynaud: Because I wanted to know his principles for bringing sources together. I don't know if you're familiar with the book, but they basically are grouping principles that allow sources to combine as separate auditory streams.
[00:34:20] James Parker: Yeah, the coining of that term, the sort of the auditory stream seems to be one of the sort of key things like, from the book, to start thinking in terms of auditory streams. He has a whole section. And it's really interesting to, I've read the book, or I've read, tried to read the book, I'm not a scientist, but everything you say really correlates with my understanding.
[00:34:49] James Parker: It does seem that he was doing this work in psychoacoustics sort of before he came to CCRMA, but that, you know, he's very explicit in his acknowledgements about the influence of his time at CCRMA, you know, your influence, John Chowning's and others.
[00:35:14] James Parker: And it's really interesting to me, it's always been interesting to me to think about the relationship between this work that's on hearing, more generally, and music specifically, because it seems like around the time that you're describing, there's basically two major streams in, to use that word stream again, in, you know, auditory oriented AI, there's basically speech recognition. And then there's, or speech understanding or whatever. And then there's work on music. And then around the time that you're describing, suddenly, or not so suddenly.
[00:35:56] James Parker: it starts to broaden out, it starts to become much more about hearing, as you're saying, or much more about, you know, the separation of different auditory events, you get to things like auditory scene analysis, auditory event detection start to come out. And then it seems from the outside that what were relatively distinct fields, I mean, I should ask you what the if you had any relationship with people doing work on speech recognition, and so on at the time, but but it seems like they were relatively separate.
[00:36:33] James Parker: And then they start to come together under a larger umbrella of people working in, you know, because suddenly all the speech people start to need source separation. And they, they need to do scene analysis in order to, you know, separate out speech from office sounds, and, you know, environmental noise, and so on and so on.
[00:36:59] James Parker: So, so that's the story that I seem to be sort of, I seem to be finding digging around in all of these reports and PhD theses and stuff, but it sounds like what you're describing, but I, I don't know if I'm misrepresenting it at all.
[00:37:16] Bernard Mont-Reynaud: Well, well, yes, I know. First, first to clarify one thing, I, I was not involved in speech at the time. My interest was in source separation, which is a sort of, which is a broad interest, but the examples were musical examples, right?
[00:37:38] Bernard Mont-Reynaud: They there's no doubt in the speech community, people have been very interested in separating the voice itself from the background noise, which is also a kind of separation, but it has a different focus in the sense that it's not on per se streaming, for example, noise is not considered a stream, you basically try to find a dominant stream, the dominant source, and everything else is kind of, is what you want to remove.
[00:38:14] Bernard Mont-Reynaud: So, so I think it's taken a long time for, and in many ways hasn't completely happened, speech to be treating the voice as just one of multiple sources in the environment, and also recognizing other sources in the environment. They, as far as I know, that particular phenomenon hasn't happened to treat the voice as just one equal, you know, one of many, of many events. If you use a visual analogy, you know, you, you can recognize chairs and tables and, you know, and people in the, in the image.
[00:38:59] Bernard Mont-Reynaud: And although maybe you're particularly interested in people or people's faces, you might still recognize a chair or a table in, in, in the vision field. This object formation is the same thing, you know, at one level, whether you're in vision or in audition, you're doing object formation, right, you're doing object separation, object formation, and you recognize sources, give them properties, okay, but, but in speech, the equivalent, you're still only interested in maybe the face, you know, it's not in all these other things.
[00:39:39] Bernard Mont-Reynaud: So, so I, I, I don't fully see the same type of parallelism that, that you see about speech in music, also that there has been a tendency, I would say for people to, for a field of source separation, auditory source separation to, to exist on its own, and it would pick examples in sound or, or image as the case might be. But not be completely attached to either speech or music. If you want, there are the people who look at this psychologically and the people look at this from the point of view of applications. They're not entirely the same. I mean, in a research lab, they can be, but the source separation has been driven by research interests and only to a small degree by applications in the real world. It's still expensive.
[00:41:00] Bernard Mont-Reynaud: Now, this has started to change, but and also we're beginning to see maybe neural networks that have some capability in that area, but it's still not very widely spread out.
[00:41:22] James Parker: Okay.
[00:41:23] Bernard Mont-Reynaud: Maybe I could return because we're on Bregman here and on auditory scene analysis. And that was a very important book. And yes, he had been interested in this topic for a while. Then he decided to write a book on it. And that was a very important book. He put it all together. And I was familiar with a lot of this work. Before the book was written, he was just dumping this stuff on us at CCRMA. And I was one of the most attentive listeners of his ideas, because that's the field I was really getting into. And I must say, he did a lot of psychoacoustics, which that wouldn't be within my capability to do all of this.
[00:42:18] Bernard Mont-Reynaud: He's that kind of experimental psychologist. I am not. And not only that, but it's not truly my interest to carry out. I don't have the patience to do this, but to hear him talk about it, that was wonderful because he had done all these experiments. And I came to summarize a 900 page book out of a couple of sentences in my head, which is to say, if you take any criteria, such as suppose you have two sources, they're based on, you have a source that goes beep, beep, beep, beep. And the other goes, bop, bop, bop, bop. So there's one high, one low, and you do beep, beep, bop, bop, bop, bop.
[00:43:06] Bernard Mont-Reynaud: Yeah, I cannot do the productive process at the same time. I mean, I can do beep, boop, beep, boop, beep, boop, beep. And if I do it too slow, they start to, but the kind of experiment he would do is to vary the distance between those frequencies, the high and the low, and to vary the timings. And he would show that if the timing makes them very close, and one goes like, beep, beep, beep, beep, the other, boop, boop, boop, boop, they separate as two streams, a high and a low stream. Versus if you have one that goes, beep, boop, boop, beep, boop, then when there's a long time, they become a single stream.
[00:43:57] Bernard Mont-Reynaud: They become a single note going up and down. And his experiments, he would do this and show them, and show when does it stream together, when does it not. And he basically showed in every case, any number of dimensions of these kinds of things, that there are trade-offs, that the same mechanism that makes it stream can make it not stream as a not great.
[00:44:30] Bernard Mont-Reynaud: So, okay. There's not one kind of example, say you're going to stream on frequency, and then it will be on time difference. No, they always trade-offs, trade-off between the two, if you will. There's multiple factors, each can, by varying them against one another, can cause streaming or not. And so that's first sentence, if you want, of the summary. And the second is, this implies, seems to me, that there is a central mechanism for separation.
[00:45:10] Bernard Mont-Reynaud: Because everything that goes from the signal processing or from the raw data, goes and can have the outcome of separation or not. So, to me, that speaks for a uniform mechanism that decides whether sources belong together or not. And that's my summary of a 900-page book, and is essential, because it had implications for the architecture of building that kind of system.
[00:45:39] James Parker: Right. So, do you begin to then operationalize that in the systems that you're building?
[00:45:50] Bernard Mont-Reynaud: Absolutely. Absolutely, yes.
[00:45:52] James Parker: And does it suddenly lead to significant improvements in…
[00:45:55] Bernard Mont-Reynaud: Well, okay, so you're assuming that the system was already at full capability, but it wasn't. It guided how we think about it, how we start building the pieces, but not all the pieces were there. But yes, indeed, it did lead us in that. And this is where that second PhD thesis I was talking about came along. His name is David Menninger, and he did his thesis where sort of showing trade-offs. And there's another parenthesis on this. There's a phenomenon that you may have heard or not about.
[00:46:50] Bernard Mont-Reynaud: It goes back to some work of John Chowning from way back, and you know about the frequency modulation. And you may have heard about this phenomenon called frequency co-modulation, where you have a collection of partials. I'm showing things in the spectrogram, and each of my fingers is a partial, and it goes like… And it has this sort of artificial sound, and it's not well separated from other things if you have multiple. But suddenly, you're going to put frequency modulation on it. It goes… And the moment you do this, the sounds belong together at one.
[00:47:42] Bernard Mont-Reynaud: It sounds natural. It sounds like a voice, and it separates from anything else. So one of the many dimensions that Bregman talks about is the co-modulation, the fact that those various partials go up and down at the same time. They modulate at the same time. It's called co-modulation. But it goes way back to the synthesis technique in FM that by putting this frequency co-modulation, it created this voice effect that was overwhelming to the auditory system. This effect is overwhelming because it immediately causes source formation.
[00:48:32] Bernard Mont-Reynaud: Okay, so this was a parenthesis from David Menninger's thesis into, you know, and Bregman and all this with a parenthesis back to John Chowning. And now we're back to frequency co-modulation was one of the features that David Menninger has focused on, as well as some others. And yes, we use these ideas in his thesis, which were based on Bregman and on the architecture, which is a central mechanism for grouping features. But the whole system was not built. It's like it was, again, pieces of there's so much to be done.
[00:49:25] Bernard Mont-Reynaud: The auditory system is extremely complex and capable. And we only built some pieces of it. It's the same as a vision system. Vision has so many things in it. And you build pieces of it. You might build a piece that had to do with occlusion. Another piece has to do with texture and color. And there's perspective. And I could go on and on. Well, the same is true in the auditory domain. In Bregman, you'll see that he uses visual metaphors all the time for what's happening in the auditory.
[00:50:02] Bernard Mont-Reynaud: And the reason is like we see the stuff and we understand occlusion. But auditory masking is much harder to represent. But it's essentially the same thing. It's essentially the same as occlusion. But we hear it, but if we can't see it, we can't easily talk about it, point to it. Because sound only exists in motion. Sound does not exist at a frozen moment. An image can exist at a frozen moment. You can point to this piece and that piece and talk about it. You cannot do that in sound very easily. It's only moving. So I go parenthesis within parenthesis.
[00:50:49] Bernard Mont-Reynaud: But to answer your question, yes, we put what we could into the architecture of the system. And that plans to continue. But let's see.
[00:51:04] James Parker: So, this is a fair way into your time at CCRMA. And soon in the story, you must leave this work, sort of, you know, unfinished in a certain way and then move on. Is that right? Like, are we getting to that point in the tale? So you sort of leave, you left this sort of, I don't know how you would describe it, but sort of basic research, maybe, into machine hearing and moved more into industry? Is that the, is that how you think of it?
[00:51:42] Bernard Mont-Reynaud: Yeah. Let me talk some more about the transition and how it happened. Because by then, by then I was ready, between all the ideas of Bregman and all what I had built over a succession of systems, I was ready for a major onslaught onto auditory machine analysis, right? And I envisioned a larger grant than I had had before. And I went to DARPA, I wrote a grant proposal, I went to DARPA, and there was some interest, you know, there's a question of how does this connect to industry or applications?
[00:52:35] Bernard Mont-Reynaud: And you know, at DARPA, when you go defend, I went to Washington to defend my proposal, and there are people representing NSF and ONR and the NSA and the different, you know, different agencies that might be interested in supporting the grants. And I felt it was a strange feeling. I felt they were interested, and their mind wasn't quite there. They both were present and absent. It's kind of strange. And then I found out later on, it turns out a week later, was the war on Iraq.
[00:53:17] Bernard Mont-Reynaud: So, you know, at the Pentagon, they're, this is what, you know, they're also involved, they're involved in research and they're involved in the defense department very much. So that explains part of that. And yet I had had interest in my research. And there was, the question was, would you also be interested in applications of this research on degraded monophonic signals? Now, what's a degraded monophonic signal? It, this is a telephone tapped line, right? You have, it's mono and it can be arbitrary and anonymous.
[00:54:10] Bernard Mont-Reynaud: So I had specific interest for like secret work. And I knew once you put your foot into doing secret work, you're kind of, you go underground and this is the end of the research. You're now working for the spooks. So to, don't quote me on that, but. Okay, so I found that this huge effort I had put into having this very wide open research and that I was asking for $5 million at the time, which was a fair amount, but I felt it was building stage upon stage where I said I wanted to build a large system to do this.
[00:55:00] Bernard Mont-Reynaud: I couldn't get the funding and I tried to survive a bit. But this is because I made a mistake as a, as a professor, which I had become by then an associate professor, research. I should know better than to go for large grants. I should also have also, so a little bit grant, so to get, to continue funding while hunting for a big grant. I shouldn't have small ones to survive. I didn't do that. My mechanism for survival was sort of on a personal basis.
[00:55:42] Bernard Mont-Reynaud: I would do consulting outside the university, but I hadn't, so anyway, I made the strategic mistake of, of not having small grants to stay in the game while waiting for a long grant. And this is where I had to leave, just to put it in perspective.
[00:56:04] Bernard Mont-Reynaud: And so it was a while until I was able to work on source separation again. It wasn't until I was at this company called Audience, where the ambition was to put a chip into telephones to do foreground background separation, to separate the voice of interest from all of the noise around it. So Audience would be another story a number of years down the line. In the meantime, I've been at many different companies.
[00:56:40] Bernard Mont-Reynaud: And then again, quite a bit later, I went to SoundHound, where I wasn't doing sound separation at all, but I've been involved with sound and music and then speech, sorry, and natural language. At Audience, I did work on source separation. I finally got to build a new system based on these principles that I got from Al Brickman. And I pulled that together.
[00:57:16] James Parker: What year are we talking about now? At Audience?
[00:57:22] Bernard Mont-Reynaud: Audience, that's going to be maybe 2000.
[00:57:25] James Parker: Okay.
[00:57:27] Bernard Mont-Reynaud: Yeah, I could go, maybe I should send you a resume.
[00:57:36] James Parker: Well, I've read bits and bobs.
[00:57:38] Bernard Mont-Reynaud: I would say it's about 2000. Yes, I would say 2000 if I picked it like that. So when you were... And again...
[00:57:51] James Parker: No, go ahead.
[00:57:56] Bernard Mont-Reynaud: Even at Audience, hold on. Even at Audience, we had the tension between the broad research angle on this, which is source separation. And the product focused just separate that voice right here from ambient stuff by something cheap, something that works most of the time, but it does not have to do source separation. And Audience initially was addressing broad goals. And there was a lot of interest in this source separation business. At some point, the investors came down and said, hey, what's your product focus? You don't have a product yet.
[00:58:43] Bernard Mont-Reynaud: And they just let go of 50% of the company and said, you now focus on something that goes out to market quickly. And at that point, they let go of me as well as several others. Which is the kind of... Even then, of course, this was 2000. And I think to a large degree, even now, we still have this distinction between a system that represents the psychology of understanding multiple sources versus one that achieves a specific engineering goal of just delivering on a very specific task.
[00:59:30] James Parker: What was the specific task that Audience was trying to do this, the voice separation for? Was it like what I didn't... Did you say telephony or what? Maybe I didn't catch that. What was the specific industrial context? What did you say?
[00:59:50] Bernard Mont-Reynaud: Smartphones. Well, phones.
[00:59:52] James Parker: Okay.
[00:59:53] Bernard Mont-Reynaud: Phone, phone, smartphone. The first... They eventually did a chip and the first chip went on to the iPhone.
[01:00:02] James Parker: Okay.
[01:00:03] Bernard Mont-Reynaud: And after that contract with Apple, and you know, Apple does like many companies that once they got the... Oh, okay. We'll do our own, you know, and...
[01:00:12] James Parker: Right.
[01:00:13] Bernard Mont-Reynaud: And then they went on the Samsung with a chip. Yeah. So, but basically, the idea was to separate. So, you have a front microphone and a back microphone, or a primary microphone and secondary one. You can assume the primary microphone captures more of the source of interest than the other, than the secondary microphone. And what they did eventually, instead of a general source separation, was to sort of emphasize the primary with respect to the secondary by what's called spectral subtraction, which is not an exact arithmetic subtraction.
[01:00:55] Bernard Mont-Reynaud: But basically, you see what stands up. So, yeah, the purpose was to improve the quality of voice in noisy environments, which is obviously an application of great economic interest.
[01:01:13] James Parker: And with quite a long history in.. you know, obviously Bell Telephone was quite invested in similar kinds of techniques for a long time. I mean, it's a lot of, anyway, you know the history of audio technology and its relationship with telephones better than I do, but it's an important one.
[01:01:38] Bernard Mont-Reynaud: Yeah, and denoising has been a big interest all along, and to do this once you have two microphones is much easier than with one microphone, especially if you know that one microphone captures more of the signal of interest than the other one, which is more kind of a bit of everything.
[01:02:07] James Parker: Yes. Look, I'm conscious that we've already been going for quite a long time, so I'm very grateful. So, I'm wondering, because obviously my interest personally is more on the sound end of things. So, part of me thinks, well, is this an appropriate moment to jump ahead to Soundhound? But then I'm wondering if I insist on that, am I missing a hugely important part of the story? You know, that's a big jump in time. Is there something really crucial during your period at Lucasfilm or Xerox or whatever that is sort of crucial to understanding?
[01:02:54] James Parker: Obviously, it's important to your biography, but understanding the evolution of machine hearing or machine listening more generally, or where you end up? So, I don't want to foreclose those stories if you think that they're important.
[01:03:16] Bernard Mont-Reynaud: Yes, I think by and large it's time to jump over to Soundhound. I was taking a look at my notes to see if there's something I still wanted to cover. I did want to mention, going quite way back, that I skipped over the whole story of Imperius. And yeah, let me skip it together, because there were some pretty interesting things, which were more in the political.
[01:03:47] Bernard Mont-Reynaud: Something I achieved, which was a technological success with speech recognition, but it was a human failure in the sense that we were acting as technologists and not understanding the whole application context. It involved a demo given to President Senghor of Senegal, who is also a poet, by the way. And I thought this might have appealed to your audience and from a political angle. So, maybe I should tell a bit about that.
[01:04:25] James Parker: Please do, please do.
[01:04:27] Bernard Mont-Reynaud: Quickly. Well, you remember, going back to, flashback to the first time I got funding from NSF, and I didn't have the grant in time to continue the research. So, it turns out I went away for a year and I went to, I almost went to IRCAM, it didn't happen. I'll skip the reasons, but there was improper timing, put it that way. So, I ended up being at this place called Centre Mondial Informatique, which was headed by Nicolas Negroponte and Seymour Papert and this French politician, Jean-Jacques Servan-Schreiber. And, let's see.
[01:05:16] Bernard Mont-Reynaud: Anyway, they were waiting to hire me. And then one day they hired me. Finally, I said, look, if you don't hire me by Tuesday, I'm gone. Okay. And then Monday night, they said, okay, you're hired. But Friday, we're having a demo to President Senghor. He was no longer president by then. He was the first president of Senegal. But then he had retired from that. And you're going to show him how you can use the voice to command things on the computer. And I came up with this idea of having shapes and colors and numbers. And you could say three red squares.
[01:06:07] Bernard Mont-Reynaud: And on the screen would show three red squares. And you say, make them blue.
[01:06:15] Bernard Mont-Reynaud: in blue, you say, you know, triangles, and they would turn into triangles, or you'd say seven balls, and now you, et cetera, you get the idea. And normally this was not done in English, or it was done in English, but you had a version done in the Senegalese language. Suddenly I don't remember what the name of the language was, but because this machine was trained to individual speakers, you can get the person to train their own vocabulary in whatever language it was, and then have it work in that language.
[01:07:02] Bernard Mont-Reynaud: So anyway, by Friday and by staying up all night a couple of times or whatever, by Friday I had the demo done. This was four days, it was just amazing.
[01:07:13] James Parker: And even though you didn't work on speech at the time?
[01:07:18] Bernard Mont-Reynaud: Okay, I should explain, I thought I should explain this. For the speech itself, we had this big box, it doesn't fit in our screen here, maybe a box this wide and the same depth, and about this high. This was the signal processing you could put into this box, the sound of your voice speaking certain words. You would train it to your own, your vocabulary and your own voice.
[01:07:50] Bernard Mont-Reynaud: And this box was then capable of doing the analysis, both of the words and of the, it was a two-stage dynamic programming algorithm, which as the sound came in, it would give you back the transcription. So the signal in that machine was called the NEC CSP-200. That was the machine doing the signal processing. I had a Symbolics Lisp machine that communicated via a serial line, RS-232, with that external signal processing box. And that's how I did it, by just route.
[01:08:37] Bernard Mont-Reynaud: The audio was going to that machine, it gave me a transcription, and then I processed the transcription, the demo. So that, it was just at arm's length, if you want. This is the technology of the time. There was no integration, and there was no way I could have done this in four days, you know, if it weren't, you know, by having a self-contained piece of equipment to do the speech recognition. And what was interesting about this is that the demo succeeded. It was working.
[01:09:17] Bernard Mont-Reynaud: I mean, there was an incredible technological feat in some ways that, oh, all was together. But President Senghor, didn't see the point. What does it mean that you can do that? You know, how does it contribute to humanity, to the problem of my country, to anything that you can come in a computer to do something like this? He couldn't see the point. And if you look at it, it was just a technology demo, was not connected to any real need of anybody, right? And this was 1982, remember?
[01:10:08] Bernard Mont-Reynaud: And Senghor was a poet I mentioned, and a philosopher, and he's one of three people who had founded this movement called Negritude. He talks about what it means to be black, you know, in the world, and a movement that also wanted unity of the African countries and so on. Very interesting character.
[01:10:32] Bernard Mont-Reynaud: But the Centre Mondial was trying to have all sorts of connections with Senegal in particular, was a main area and other third world countries, but with mixed success, because this was a raw and pushed rapidly forward application of technology to countries that weren't necessarily receptive to it.
[01:11:01] Bernard Mont-Reynaud: I thought this would connect with the other angle.
[01:11:06] James Parker: Oh, 100%.
[01:11:06] Bernard Mont-Reynaud: Yeah, yeah.
[01:11:07] James Parker: It's fascinating.
[01:11:08] Bernard Mont-Reynaud: So I have to mention that story.
[01:11:13] James Parker: I mean, I didn't even, I also didn't know that Negroponte was, you know, I've looked at his, you know, book, The Architecture Machine and some of his influence at MIT. And I don't know, there's just, it just sounds quite Negroponte-ish.
[01:11:40] Bernard Mont-Reynaud: Yes, indeed. Indeed. Well, let me give you more on that that you may not know. So Negroponte had The Architecture Machine Project at MIT. He was the director of that and so on. At some point, he wanted to grow that into a university department and have the whole complete independence. And MIT wasn't quite doing what he wanted. There was resistance. So he's gone to, he said, oh, it is so. Then I'm taking my team away. And he went like this to MIT and went to France, got money from 12 different departments of the government, 12.
[01:12:25] Bernard Mont-Reynaud: He was going to solve all the problems in the world, the lack of an alphabetism, how do you call it, the literacy. He was going to solve literacy, he was going to solve this and that and the rest, no, no. And got lots of money, got a wonderful center in Paris. There was this French politician involved, Jean-Jacques Scherber, who was very well, very powerful in the government. And they had this thing going on for two years, more or less, during which, you know, they organized conferences and this and that.
[01:13:06] Bernard Mont-Reynaud: And Negroponte was telling MIT, you see, if you want to have me back, you meet my conditions, OK? But that didn't happen for a while. This is where I end up, you know, going. But ultimately, the game, they didn't really care about this. They just took all this French money and they played Negroponte's games.
[01:13:29] Bernard Mont-Reynaud: But eventually, he got what he wanted from MIT, and this is what became the MIT Media Lab, OK? So this made the transition between the Architecture Machine Project and the Media Lab, which was bigger, a whole new building, a whole new department, everything. So we were at a pawn in Negroponte's game, and a pawn paid for by the French.
[01:13:57] James Parker: And when you were at CCRMA in the 80s, what was your relationship with the Media Lab? Because I know, obviously, the Machine Listening Group that sort of came, eventually appeared there, sort of had some familiar names to your project and stuff. But was there, I don't know, any tensions at all?
[01:14:21] Bernard Mont-Reynaud: Not at all. I mean, CCRMA and them were in completely different spheres. However, there were some contacts between researchers, for example. I mean, a lot of people met at the Computer Music Conference. And this is, for example, when I met Barry Vercoe, who was at the Media Lab. And he probably, I don't know if he had been at the Architecture Machine Project or not, but he definitely became part of the Media Lab. And I and Roger Dannenberg at CMU, we were doing research in related areas and sort of developed personal contact.
[01:15:09] Bernard Mont-Reynaud: But there was nothing institutional between CCRMA and the MIT Media Lab. Not even feelings one way or the other. It was just as individuals, we related to each other. Since I mentioned Roger Dannenberg and this idea of score following, you may be familiar with the fact that Barry Vercoe was involved in that. But later on, I was one time doing a paper with Roger Dannenberg of CMU, where we did the first time, instead of following a score, we were following an improvisation in real time. And so I did part of the system.
[01:15:57] Bernard Mont-Reynaud: And Roger Dannenberg was also a trumpet player, was playing his trumpet. And the system had basically blues grid. It would figure out where it is in the blues progression and then start accompanying the blues based on the trumpet solo, you know. And so that was work I did with Dannenberg. So this shows the kind of interactions that were happening across labs, you know.
[01:16:31] Bernard Mont-Reynaud: Okay, so maybe we should turn those parenthesis back, but I thought the Centre Mondial parenthesis was really interesting. And just to close it, they weren't really caring about my research. The name of the game was, there were two names, was one for Negroponte to get what he wanted and Pappard to get what they wanted from the Media Lab, and number two for the Jean-Jacques Avant-Sherbet, the French politician, was also the director or, I'm not sure, owner of the socialist journal, L'Observateur, and what they wanted was news.
[01:17:11] Bernard Mont-Reynaud: So this was a case of fishbowl research, where you're doing some research and there's cameras on it and news articles about, oh, they've done that. And then it started to happen, you could see, they would make news before the research is done. So a couple of times, I worked really hard because they had announced we do X, and then I rushed to do X. And then after a while, I realized that they don't care if it happens at all.
[01:17:43] Bernard Mont-Reynaud: I don't have to rush to do X because they've announced they're doing X. I was in charge of the audio and speech and I created, the name came back, the language Wolof, the language from Senegal, I created sentences from text for Wolof, because they had announced they were doing it, and in two months, I did that. In two months, I built this by going to Stockholm with a linguist coming from Dakar that was to a place in Stockholm where there was this thick of snow and ice on the ground, and he was frozen out of his wits, you know.
[01:18:23] Bernard Mont-Reynaud: Anyway, we did it, and he was there for a week. I put together that system in two months, which is an incredible feat. You think they cared? Not at all. The payoff had been two months ago when they announced they were doing it. Okay, anyway, once this happened, and they did all the projects, and at some point, my grant got funded. I went, bye-bye. I went back to continue my research at Cormac. Okay, so that was one parenthesis. One thing I had wanted to mention in the second round of NSF, I did pattern recognition in rhythmic material.
[01:19:07] Bernard Mont-Reynaud: You have to follow temporal variation. Music can have what's called rubato, right? You slow down and accelerate, but there's also a lot of fluctuation of events themselves. Maybe it's due to signal processing errors, or it's due to the fact that quarter notes aren't all equal in performance, and you have to decide when is it more like temporal variation, and when is it kind of micro variations, and so on. There was a bunch of work aimed at doing that, and there were also in the software layers that would look at, well, what happens to those patterns?
[01:19:47] Bernard Mont-Reynaud: If I, what makes a good quarter note, or what are triplets, triplet eighth notes, and you would create those clusters, and so on, and be able to fix errors by looking at those statistics, and this is kind of what, one thing I was involved in, but okay, I mean, those were some of the things. I think we can go forward to SoundHound.
[01:20:17] James Parker: Yeah, let's do that. I mean, my understanding is that SoundHound began as a music recognition or humming, melody recognition company. And now it's like a big, huge speech recognition, voice assistant company, but I was guessing that maybe it was the music connection that somehow brought you to be involved with them, but I don't really know. So, that was my best guess. How did you come to be involved with SoundHound?
[01:20:53] Bernard Mont-Reynaud: Yeah, you're guessing quite correctly, that at one point, I found out about Soundhound, and I picked up the phone. And I had a wonderful conversation with, and next thing I was talking to CEO, and the next thing I was hired, they had a hiring freeze. They had had a hiring freeze, but the part that wasn't so easy is that they were not hiring because they had been on a stretch. You know how companies get into this thing called, how would they call it, the desert of funding. They launch a company, and then there's no product, and they are dry.
[01:21:40] Bernard Mont-Reynaud: They don't have any income, and the investors are tired of giving them funding. So this is no man's land until something happens that makes them, anyway. So they were at that stage, they let go of a few people and had a hiring freeze. They begged the board to hire me, and I went in and so on. And yes, they were entirely focused on music pattern recognition, and the, what's that song, name that song issue.
[01:22:17] Bernard Mont-Reynaud: It's good to know for the story that from way back when, they were interested in actually doing the speech recognition and the natural language understanding. It had been on their mind, but there was no traction for that, there was no product. So it was an interest of theirs, but it got put behind because they had some traction on the song recognition, and some in the humming, right. And they had just a dialer as far as speech recognition, just dialing application.
[01:22:59] Bernard Mont-Reynaud: So yeah, it's quite right, it's my work in music and music pattern recognition that made me a fit.
[01:23:09] James Parker: So you came on relatively early, it sounds like around 2010 or something like that.
[01:23:15] Bernard Mont-Reynaud: Yeah, 2010, February, yeah, it will be 13 years in just a month.
[01:23:20] James Parker: And then they, I think they were originally called Midomi or something?
[01:23:28] Bernard Mont-Reynaud: Correct.
[01:23:28] James Parker: Or the app was Midomi maybe? But then...
[01:23:33] Bernard Mont-Reynaud: The app was, maybe the app was Midomi, the company definitely was Midomi. I suppose that is Midomi, you know what I mean?
[01:23:45] James Parker: Oh, I did wonder, okay. That makes sense.
[01:23:49] Bernard Mont-Reynaud: Yes. And yeah, I'm not sure if the product was called Midomi at the time or not, but it became called Soundhound. Before the company was called Soundhound, I believe the product was called Soundhound.
[01:24:14] James Parker: Oh, okay.
[01:24:15] Bernard Mont-Reynaud: Or maybe it was at the same time.
[01:24:17] James Parker: My understanding is that, I mean, I don't know this history very well, but it seems like they had a lot of traction in the sort of early days of smartphones, because sort of name that tune was quite a sort of cool thing to be able to do with a smartphone at the time. It was like in the very early days of the App Store, you know, and they were one of the biggest apps on the iPhone for a while there, and then in direct competition with Shazam and so on. And then they sort of, yeah, it seems like they just moved on.
[01:25:02] James Parker: And now they're a totally different company. So this is what it seems like from the outside.
[01:25:10] Bernard Mont-Reynaud: Yeah, most of what you say is correct, except for one thing, that the Soundhound application, the music recognition application continues to this day, it hasn't gone away. It's just that the music market was kind of this big and has remained kind of this big, you know, the speech and natural language is much bigger. So if you wanted growth, and also it was their original love. It's not that it was a complete reconstruction of the company, it was not a complete pivoting.
[01:25:50] Bernard Mont-Reynaud: It was, they had been interested in this all along for like 10 years, or something, right. So they actually wanted to do that. But so at this time, the music application Soundhound still exists, you can still download it and use it and so on, it's gotten better over time. But the conversational AI is really the dominant one.
[01:26:26] James Parker: What work were you doing? I mean, it's a long time to be with a company. Did you also move from music towards conversational AI? Is that sort of your main field recently?
[01:26:42] Bernard Mont-Reynaud: Yes. Well, I've done both. I did start with the music. I see my power is low. I started with the music. I started optimizing it, even using hardware to optimize it and so on, or your software. One of the things that happened, at some point I started doing patents. I started having an idea for one thing, an idea for another. I wrote a couple of patents. And then they had a need for a patent person, you know? And I became Mr. Patents, Patent Guru, they called me.
[01:27:19] Bernard Mont-Reynaud: And so I ended up, you know, doing a lot of inventions myself, or helping other people doing inventions, and doing the sort of responding to the patent office. You know, you have to usually, it's very rare when they say, oh, good, you have a patent here. You have to defend it. It's called prosecution. Sometimes you have to adjust things, and so on and so forth. Anyway, I've done a lot of patents, but also I've been involved later on with the natural language intelligence, natural language understanding. And so I've been very busy in that as well.
[01:28:01] Bernard Mont-Reynaud: And so I ended up having multiple hats over time. I had a patent in natural language understanding, and some mentorship in other areas, and speech.
[01:28:19] James Parker: Can I ask, because this is a, I don't know if this is really your field of expertise, but you've been close to it in a way that it's much closer than me. You know, I read recently that Amazon had just burned up a hell of a lot of money in one year. I could be wrong, but it was a huge amount of money in any case. And I can't, it's hard to understand where the market like is going or where the investment, the smart investment is. It seems like SoundHound as a company, I'm not really asking you to do PR for SoundHound.
[01:29:23] Bernard Mont-Reynaud: I can explain, I see where you're headed. Let me try and answer your question. First of all, there are many companies, they have virtual assistants for their own purposes. You know, Amazon is maybe hoping that Alexa is going to make them money with something, let's see. Now Apple has Siri and Siri helps them sell equipment. So they, I don't know how they evaluate their budget, but they have a reason. Now we are not selling anything. We have this application called Hound, which is a virtual assistant. You can try it, you can have it for free.
[01:30:08] Bernard Mont-Reynaud: This is not how we make money. One thing we do with it is we collect voices and those voices. And by the way, we, speaking of a privacy issue, we mask them, we store them in a way you cannot recognize the original people, the original voices. But this is helping train our systems. You need a lot of voice data to train the voice recognition. So we use it for that and we use it to give demos and we use it to, but where the money is, is none of that. The money is in applications.
[01:30:47] Bernard Mont-Reynaud: And in particular right now, SoundHound focuses on the automotive market, you know, having those kinds of system in cars. And on the restaurant voice kits, the ordering, you know, you order from your car and so just generally restaurant ordering. So these are the two. SoundHound has had an evolution from being this tech company, which of course still remains behind the scenes, but to have this market-driven areas. And now it has specialized on those two largest areas. This is where we are, company focus-wise. So maybe that answers your question or not.
[01:31:36] Bernard Mont-Reynaud: But for a while we were carried.
[01:31:43] James Parker: Because it sounds like you're saying that, you know, Amazon wants to become infrastructure, basically. It wants Alexa to be...how you access everything, but there's not that much money in that, maybe, whereas...
[01:32:04] Bernard Mont-Reynaud: I don't think they're succeeding in that. They don't have the internal capability. Alexa is very compartmentalized. Each of the capabilities, they don't form a network like we do. You can't have one of these... I'm trying to remember their competencies, their applications. They don't talk to each other. They don't have a full background of intelligence. We have what we call... I can't remember the branding word.
[01:32:37] Bernard Mont-Reynaud: Integrated AI. No, it's collective AI. That means that we have all those things talking to each other and sharing parts. They don't have this in Alexa, so they are failing on the infrastructure as far as we're concerned by not providing this crosstalk of all the different apps, which skills... The name came back. They call them skills. Well, those skills don't talk to each other. They remain isolated skills. If they don't solve that problem, they can't have any conversational AI of any power at all. It's kind of silly to have invested so much.
[01:33:27] Bernard Mont-Reynaud: I think the smart speaker is very good. We wish we had one. No, but the Alexa capability does not compare in terms of its intelligence.
[01:33:43] James Parker: That's really interesting. I mean, I've never knowingly used a Soundhound voice assistant. I probably have because I think the idea is that you're sort of the engine behind many, many, many different voice assistant systems, aren't you? So, I probably have used them without knowing.
[01:34:10] Bernard Mont-Reynaud: That could be.
[01:34:12] James Parker: Anyway, I guess I'm wondering if there's a way of drawing the two conversations together so you come from music through into SoundHound, you end up working more on this sort of very product-driven voice assistants or sort of application-driven voice assistants. Is there a through line between that most recent work you've been doing? Is it that auditory scene analysis is somehow crucial?
[01:34:48] Bernard Mont-Reynaud: So far, you scored 90% on your intelligence and understanding. This one, I need to make some corrections. First of all, let's see. First of all, I am not deeply involved in what SoundHound has taken that pivoting, I could say, or this new focus towards a very market-driven organization. I understand they need this for growth. We've gone public six months ago or so, and they need to do that. I haven't been part of that.
[01:35:28] James Parker: Oh, you were doing the patents and the mentoring.
[01:35:33] Bernard Mont-Reynaud: I've been doing the patents. I've been building pieces of the deep architecture of the natural language understanding. I'm still deep in technology. I have not been at all involved personally with any of that marketing effort. So, that makes sense to me. So, that makes sense to have done that. Well, I'm now retiring. A lot of people have to focus more their work onto the marketing areas of interest. I haven't had to do that, which works fine for me because that's not my temperament to go and work on a large market.
[01:36:08] Bernard Mont-Reynaud: It's my temperament to build technology or to have ideas or to be a mentor to people. The other thing is, yes, I wanted to tie it back in another way. I'm doing this work on natural language understanding, but if you go back to I was talking about my days at Stanford and doing my PhD.
[01:36:33] Bernard Mont-Reynaud: Among other topics, I had gotten kind of very interested in natural language understanding. At the time there was Professor Terry Vinograd coming fresh from MIT with his PhD thesis on the system called Schwergelu, Schwergelu, you know, go find a name like that. But he was speaking a mile a minute about NLU and that was just fascinating. I had a big interest in that. So that was the, the natural language AI was already on my mind. That's in 1974, 75.
[01:37:15] James Parker: But now you must be doing, you know, data driven and statistical methods in a way that weren't so prominent back then.
[01:37:32] Bernard Mont-Reynaud: Yes and no. And my battery could be dropped out at any moment. I don't want to start moving out to a place where I could plug it in. So just so, or would that be? Let me see if that, if that wire happens to be just the right thing and plugged in. Doesn't seem to be plugged in. So with a warning that my computer could die. Yes and no. The speech recognition part of the system has gone over into the statistical and neural network.
[01:38:14] Bernard Mont-Reynaud: Basically neural network architectures, one type or another, language models, partly statistical, and now they're vastly neural network built also. So we're going, we're with the rest of the industry in that type of approach on that part. But on the part, excuse me, on the part that's natural language understanding, there are front models and the system that we use at SoundHound as a language that has a grammatical component and it's semantic grammar.
[01:38:59] Bernard Mont-Reynaud: So you have a syntax structure that is being extracted and then the semantics are hanging off the syntax. This is more in a way the old AI approach as opposed to putting everything into a neural network. There's a big battle in the field or a big, it's not a battle so much. People choose one camp or the other, but they don't battle, they just invest in whatever they do. But it's a big difference of approach. There are ways to get the two talking to each other and we do some of those ways and I believe all those like Google.
[01:39:36] Bernard Mont-Reynaud: Google, they're primarily neural network, but they have a lot of linguists providing information. For us, we are using a programmable grammar if you want, but we have linguists and we have neural networks helping some of this, but we have a different mix from other people. But the core, the backbone if you want, of the natural language understanding is grammar-based and semantic grammars that is.
[01:40:04] James Parker: That's incredibly interesting.
[01:40:05] Bernard Mont-Reynaud: As opposed to, you know, yeah, yeah. And I forget.
[01:40:13] James Parker: I mean, you're gonna run out of batteries. You're gonna run out of batteries and we've talked for nearly two hours. So, you know, do you wanna just wrap it up here or do you have any concluding thoughts or I mean, you could talk a little bit about where the field of machine hearing is or machine listening or sort of, I don't know if you have any general observations, you know, I don't know, blue sky thinking or if you'd prefer to just draw a line and say, say thanks.
[01:40:53] Bernard Mont-Reynaud: Yeah, well, I mean, I can say thanks anyway because I feel like, you know, I'm now retiring and I've been lucky to have a wonderful career, you know, that has brought me a lot of interest, curiosity, joy. I've been involved in many different fields and applications. I haven't mentioned many of those things, but certainly, certainly audio and music and natural language and the auditory system have been like a big part of what I've done and that's been wonderful. So the gratitude is here.
[01:41:40] Bernard Mont-Reynaud: In terms of kind of the future, you see, I had this idea back when of this quote, programmable Bregman to build a system that does auditory scene analysis in a general way. And it's been one of my dreams as I've seen people go to build new, to, as opposed to the kind of systems I build, which were the old AI, right? The feature-based, you just build the pieces yourself as opposed to have it be learned out of the statistical, you know, the large corpuses of data. Well, people have started to develop a lot of cleverness about putting together architectures of neural networks, of sub-networks and so on and so forth. And I personally kind of missed that turn. I was deeply involved in one thing and another.
[01:42:41] Bernard Mont-Reynaud: I did not take a jump into that type of research. And I think the kind of principles that I was working on after my work with Bregman, were asking for a neural network architecture, which would do this, which would take the sound apart and give it components, which basically would do grouping based on the same type of reasoning, the same type of psychological principles that Bregman uses. But it would have been a dream of mine to actually build that type of system.
[01:43:19] Bernard Mont-Reynaud: I never got to it because I had too many different responsibilities and maybe I wasn't quite deeply versed enough in the neural networks. I don't know, but anyway, I think there's somewhere in the future, some systems that will address and solve or better solve source separation, you know, with neural networks. But I see beginnings of that, but I think there's still a bunch missing. You know, it takes a lot of work to reproduce what the auditory system is capable of.
[01:43:58] Bernard Mont-Reynaud: But now I've become convinced, which I wasn't for a good many years, I've become convinced that the time will come when neural networks can do that. Neural network combinations of neural networks in a more complex architecture. And of course we have a lot of knowledge about how that happens in the brain, you know, and it's a very complex architecture that has all these pieces. But there's evidence, I think, that this will be possible in time. And so that's...
[01:44:34] James Parker: That seems like a really good note to end on. If you hear of anybody doing it, let me know.
[01:44:45] Bernard Mont-Reynaud: Yes, well, I've seen a few papers. If you ask, you see, I'm no longer in this because I've had to be a little focused and stuff. But yeah, let's see. If I did some digging or if you ask around, there are bits and pieces, or even, you know, there were searches on neural networks and auditory separation or something like that, or sometimes you have to refine your search, but you may see a few things coming up. It's not completely there, but there's definitely a beginning of it. So, and again, I don't exclude the...
[01:45:34] Bernard Mont-Reynaud: Once I have more free time and after I take some traveling, you know, I may go back into this and say, okay, so what's going on here?
[01:45:41] James Parker: You never fully retire.
[01:45:44] Bernard Mont-Reynaud: Yes, but I may not, you know, I don't think I'm going to contribute to this. I'm not one of these people who are like never going to retire, who are going to continue as like emeritus professor, never stop. No, I'm ready to turn the page. You know, I've had a 54-year career and that's good. I want to travel and do art.
[01:46:15] James Parker: You should. You absolutely should. Thank you so much for your time. It's 10 o'clock at night here, and I feel like you're in Spain and on holiday and you should go and enjoy yourself. This, I'm going to click stop on the recording, if that's okay with you.
[01:46:39] Bernard Mont-Reynaud: Yeah.
[01:46:40] James Parker: Thank you.