Predicting the rules behind - Deep Symbolic Regression for Recurrent Sequences (w/ author interview)

2024 ж. 14 Мам.

19 338 Рет қаралды

#deeplearning #symbolic #research
This video includes an interview with first author Stéphane d'Ascoli (sdascoli.github.io/).
Deep neural networks are typically excellent at numeric regression, but using them for symbolic computation has largely been ignored so far. This paper uses transformers to do symbolic regression on integer and floating point number sequences, which means that given the start of a sequence of numbers, the model has to not only predict the correct continuation, but also predict the data generating formula behind the sequence. Through clever encoding of the input space and a well constructed training data generation process, this paper's model can learn and represent many of the sequences in the OEIS, the online encyclopedia of integer sequences and it also features an interactive demo if you want to try it by yourself.
OUTLINE:
0:00 - Introduction
2:20 - Summary of the Paper
16:10 - Start of Interview
17:15 - Why this research direction?
20:45 - Overview of the method
30:10 - Embedding space of input tokens
33:00 - Data generation process
42:40 - Why are transformers useful here?
46:40 - Beyond number sequences, where is this useful?
48:45 - Success cases and failure cases
58:10 - Experimental Results
1:06:30 - How did you overcome difficulties?
1:09:25 - Interactive demo
Paper: arxiv.org/abs/2201.04600
Interactive demo: symbolicregression.metademola...
Abstract:
Symbolic regression, i.e. predicting a function from the observation of its values, is well-known to be a challenging task. In this paper, we train Transformers to infer the function or recurrence relation underlying sequences of integers or floats, a typical task in human IQ tests which has hardly been tackled in the machine learning literature. We evaluate our integer model on a subset of OEIS sequences, and show that it outperforms built-in Mathematica functions for recurrence prediction. We also demonstrate that our float model is able to yield informative approximations of out-of-vocabulary functions and constants, e.g. bessel0(x)≈sin(x)+cos(x)πx√ and 1.644934≈π2/6. An interactive demonstration of our models is provided at this https URL.
Authors: Stéphane d'Ascoli, Pierre-Alexandre Kamienny, Guillaume Lample, François Charton
Links:
TabNine Code Completion (Referral): bit.ly/tabnine-yannick
KZhead: / yannickilcher
Twitter: / ykilcher
Discord: / discord
BitChute: www.bitchute.com/channel/yann...
LinkedIn: / ykilcher
BiliBili: space.bilibili.com/2017636191
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: www.subscribestar.com/yannick...
Patreon: / yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Пікірлер

OUTLINE: 0:00 - Introduction 2:20 - Summary of the Paper 16:10 - Start of Interview 17:15 - Why this research direction? 20:45 - Overview of the method 30:10 - Embedding space of input tokens 33:00 - Data generation process 42:40 - Why are transformers useful here? 46:40 - Beyond number sequences, where is this useful? 48:45 - Success cases and failure cases 58:10 - Experimental Results 1:06:30 - How did you overcome difficulties? 1:09:25 - Interactive demo Paper: arxiv.org/abs/2201.04600 Interactive demo: recur-env.eba-rm3fchmn.us-east-2.elasticbeanstalk.com/
@YannicKilcher2 жыл бұрын
- Great video! "cos mul 3 x" is Polish notation though, not reverse Polish (i.e. "3 x mul cos").
  @brandom2552 жыл бұрын
- @@brandom255 I guess it's RPN if you read it backwards... ;-) (Noticed the same mistake in the video).
  @anttikarttunen11262 жыл бұрын
This series is absolutely amazing! Thanks Yannic, your videos just keep getting better!
@ChaiTimeDataScience2 жыл бұрын
The paper explanation + interview format is amazing - the paper explanation provides the interesting nitty gritty, and the interview sometimes sheds light in less jargon and more intuitively to the overall concepts discussed in the paper.
@yabdelm2 жыл бұрын
Extremely interesting line of work. I imagine one day something like that could fully automate process of building mathematical models for any scientific data. Feels like a huge step toward automated scientific discovery process.
@volotat2 жыл бұрын
- kzhead.info/sun/o8-Dm6x6lpVpoXk/bejne.html They are having good aproach
  @Adhil_parammel2 жыл бұрын
Those visualizations in the appendix really look like you're staring into some transcendent, unfathomable principles at the very base of reality. Neat.
@josephharvey17622 жыл бұрын
Cool approach to crack pseudo random generators
@me2-212 жыл бұрын
Really inspiring topics. Interview really well done. Thanks
@benjamindonnot39022 жыл бұрын
Having this like website for every data science algorithms with dummy data and inference would be awsome for learners
@Adhil_parammel2 жыл бұрын
more like this plz. this is v informative
@grillmstr2 жыл бұрын
This is you best series, after the always punctual ML news
@billykotsos46422 жыл бұрын
Cool paper, interesting observations! I'd be curious to see the UMAP of these (learned) embeddings too. Sometimes these can capture global structure better, whereas t-SNE has a bias towards capturing local structure. The UMAP authors also made a huge effort to justify their approach with algorithmic geometry theory.
@rothn22 жыл бұрын
Amazing stuff
@harunmwangi8135 Жыл бұрын
Neurosymbolic for the win, the return !
@JTMoustache2 жыл бұрын
Love your content, Yannic. Would be extremely interested in hearing your thoughts on and seeing any possibly interesting papers regarding AI in cybersec. I’m imagining the types of networks’ that were exceptional at playing Atari games would also be relatively easy to shift domains into a variety of cyber attacks and given how useless the corporate world appears to be at securing data, I’m guessing this will be unbearable for most to endure (at least early on).
@yourpersonaldatadealer22392 жыл бұрын
Ca me fait rappeler la programmation du "compte est bon" du jeu Chiffres et lettres (tv francaise)
@WahranRai2 жыл бұрын
I wasn't familiar with the even/odd conditional function representation discussed 40:45 and had to work it out on paper to make sure I get it. "collatz" generator latex: (n \mod 2) \times (3n+1)+[1-(n \mod 2)]\times(n \div 2)
@ericadar2 жыл бұрын
Opening new frontiers for DL, congratulations! A maybe silly question (as I only watched until 31' so far): is the "6 is a magic number" finding robust to changes in the hyper parameter (fixed at 6 in the example table) "recurrence is limited to n-1… n-6" ?
@fredericln2 жыл бұрын
Anyone else feel like they haven't seen a paper about decision making, planning, neural reasoning etc. in a long time? Nothing about agents acting in a complex environment?
@brll57332 жыл бұрын
- You need a simple model of a complex environment to make decisions otherwise Computing the consequences of your decisions becomes intractable. That is true even if you adopt an alpha zero approach.
  @jabowery2 жыл бұрын
- @@jabowery I mean, we have pretty good toy simulations in the form of video games. Another challenge like Deepmind's Starcraft challenge would really help focus the field, imo.
  @brll57332 жыл бұрын
Hey Yannic, thank you for your videos. They have been very helpful for me. I just wanted to ask what tablet and app do you use for reading and annotating papers?
@MIbra962 жыл бұрын
- OneNote
  @YannicKilcher2 жыл бұрын
- @@YannicKilcher Thank you!
  @MIbra962 жыл бұрын
Would be interested in seeing how it will deal with chaotic systems with some noises.
@yoshiyuki1732ify2 жыл бұрын
amazing, the next step would be able to produce a sequence of prime numbers :P
@tclf902 жыл бұрын
- Reimann🤔🤔
  @Adhil_parammel2 жыл бұрын
- At this pace, machines will definitely beat humans at finding secrets of the prime.
  @rylaczero37402 жыл бұрын
Couldn't Wolfram Alpha do this for some time? How is this better?
@Emi-jh7gf Жыл бұрын
Next step should be predicting latent space, and when you sample an equation from it, it should give results as expected but also not diverge from other equations sampled from same space.
@rylaczero37402 жыл бұрын
Try this with digits of PI or some time series that obeys some visible but mysterious pattern :D
@TheDavddd2 жыл бұрын
Work on discovering scientific laws from data has a very long history in AI--Pat Langley and Herb Simon's BACON system was built 40 years ago, with about as much computing power as a modern toaster. Damn I'm old.
@meisherenow2 жыл бұрын
Why not use Deep symbolic regression as a mapping mechanism between different neural architectures.
@axe863Ай бұрын
I don't think base 12 enthusiasts have a club name lol But imo, base 720720 is *obviously* the *clear* way to go as any division involving stuff up tu 16ths are going to be super easy with it 🤓
@Kram10322 жыл бұрын
So I tried out the demo and came across an interesting "paradox". When I click the "OEIS sequence" button on the demo, it almost always loads up a sequence and perfectly predicts a solution with 0% error. But when I go over to OEIS and type in a couple numbers, grab a sequence, and slap that into the user input, the results are... usually not great. Very rarely does my OEIS sampling strategy yield a sequence this model can solve. Usually the error is huge. Which is going on? Am I somehow only grabbing "very hard" sequences from OEIS? Or are the OEIS sequences that the demo samples coming from a smaller subsection of OEIS that this model can solve reliably?
@jrkirby932 жыл бұрын
- Many (I would say: Most) OEIS sequences are not expressible as such simple recurrences (with such a limited set of operations). For example, almost any elementary number theoretical sequences, sigma, phi, etc.
  @anttikarttunen11262 жыл бұрын
- OEIS do have some ridiculous sequences, if I were an alien, I would it was specifically generated by humans to train their ML models
  @rylaczero37402 жыл бұрын
- @@rylaczero3740 Well, OEIS has a _few_ ridiculous sequences, but admittedly there are many sequences of questionable relevance, when somebody has dug themselves in a rabbit hole little bit too deep for others to follow. As what comes to the training of AI, I admit that sometimes I like to submit sequences that are "a little bit on fringe", to serve as a challenge to any programmed agents.
  @anttikarttunen11262 жыл бұрын
- @@rylaczero3740 And in any case, the scope of mathematics is infinitely larger than what is "useful" for any application we might think of.
  @anttikarttunen11262 жыл бұрын
Hm, I guess if they wanted to add conditionals to the language in order to make it more able to recognize things using it, e.g. collatz / hailstone / collapse / ((3n+1)/2) sequences , they would have to make the tree support ternary operations. Not sure how much more that would require
@drdca82632 жыл бұрын
- The thing about the embeddings for the tokens for integers, makes me wonder if it would be beneficial to (before normalizing them I guess) hard code in a few of the dimensions some basic properties of the integer, such as number of distinct prime factors, number of divisors, whether it is divisible by 2, whether divisible by 3, whether divisible by 5, whether it is one more than something divisible by 3, whether it is 2 more than something divisible by 3, and similarly for one more and one less than divisible by 5, and maybe a handful more, and then let the other dimensions of the embedded vector be initially random and learned . Would this increase performance, or reduce it?
  @drdca82632 жыл бұрын
- @@drdca8263 Well, having an access to the prime factorization of n would be great (e.g., for 12 = 2*2*3, 18 = 2*3*3, 19 = 19), or figuring that out by itself (of course then you can also detect which numbers are primes). Also, I wonder how utopistic it would be to go "full AlphaZero" with OEIS data (if that even makes any sense?). And whether it would then able to learn to detect some common patterns in sequences, like for example the divisibility sequences (of which keyword:mult seqs form a big subset), and the additive sequences. Sequences that are monotonic or not, injective or not, that satisfy the "restricted growth" constraint, etc, etc. Detecting sequences that are shifted convolutions of themselves (e.g. Catalans, A000108), or eigensequences of other simple transforms.
  @anttikarttunen11262 жыл бұрын
Why not represent the numbers as matricies or complex values? Then you could reproduce the group structure of addition or multiplication, in addition to being able to make each number high dim
@Eltro1012 жыл бұрын
The linked demo fails to continue the sequence 0,0,0,0,0,0: "Predicted expression: Unknown error"
@victorkasatkin97842 жыл бұрын
What is the unary operator "relu" ?
@anttikarttunen11262 жыл бұрын
- relu(x) = { x if x>0 , 0 otherwise
  @MrGreenWhiteRedTulip2 жыл бұрын
- @@MrGreenWhiteRedTulip Thanks, that was new to me, as I'm an outsider in this field. Just found a Wikipedia article about ReLU (Rectified Linear Unit) activation function.
  @anttikarttunen11262 жыл бұрын
- No prob. It’s just used as a function here though, not an activation function!
  @MrGreenWhiteRedTulip2 жыл бұрын
- @@anttikarttunen1126 Makes sense. For people in field, all operators except ReLU are outside their common operators.
  @rylaczero37402 жыл бұрын
- @@rylaczero3740 I see: that's why people in the field are so prejudiced about most of the mathematical sequences, thinking that they are absolutely ridiculous if not expressible with relu(s). 🤔
  @anttikarttunen11262 жыл бұрын
How hard would it be for the system like this to find the Euclid's gcd-algorithm by itself? With that it could then detect which sequences are divisibility sequences, multiplicative or additive sequences. That is, most of the essential number theoretic sequences, A000005, A000010, A000203, A001221, A001222, A001511, A007913, A007947, A008683, A008966, A048250 (and a couple of thousands others) are either multiplicative or additive. I'm not saying that it would yet find the formula for most such sequences, but at least make an informed hypothesis about them.
@anttikarttunen11262 жыл бұрын
- While in the "base"-world, how hard it would be to detect which sequences are k-automatic or k-regular?
  @anttikarttunen11262 жыл бұрын
- I mean, could this be done in "AlphaZero way", so that it would find such concepts by itself, without need of hardcoding them?
  @anttikarttunen11262 жыл бұрын
Your French pronounciation is great for a non native
@julius3333332 жыл бұрын
I liked it until he said it was trained for weeks on 16 GPUs 😢
@viktortodosijevic32702 жыл бұрын
14:08 hmmm where have I heard about base 6 being the best fit before... kzhead.info/sun/pK19YqZshH1tjGg/bejne.html
@veliulvinen2 жыл бұрын
- kzhead.info/sun/f82OmZR6pJGKe5E/bejne.html
  @anttikarttunen11262 жыл бұрын
Jan misali was right all along…
@MrGreenWhiteRedTulip2 жыл бұрын
- Yes: kzhead.info/sun/f82OmZR6pJGKe5E/bejne.html
  @anttikarttunen11262 жыл бұрын
Sorry, but I must object what you both at point 1:04:20 seem to suggest, that only the keyword:easy sequences in OEIS (as of now 80236 sequences) are the ones with some logic behind them, and that the rest of the sequences (268849 as of now) were all some kind of "bus stop sequences" with no logic of whatsoever behind them. Certainly the absolute majority (98% at least) of the sequences in OEIS are well-defined in mathematical sense, even though they do not always conform to a simple recurrence model that your project is based on. For example, primes, and most of sequences arising in the elementary number theory. Moreover, although your paper is certainly interesting from the machine learning perspective, its performance doesn't seem to me any better than many of the programs and algorithms listed at the Superseeker page of the OEIS, some of which are of considerable age. (See the separate text file whose link is given in the section Source Code For Email Servers and Superseeker at the bottom of that OEIS page).
@anttikarttunen11262 жыл бұрын
- Also, Christian Krause's LODA project and Jon Maiga's Sequence Machine are very good in mining new programs and formulas for the OEIS sequences, mainly because they can also search for the relations _between_ the sequences, instead of limiting themselves to just standalone expressions with a few arithmetical operations.
  @anttikarttunen11262 жыл бұрын
why do you wear sunglasses when watching a screen? :(
@WhiterockFTP2 жыл бұрын
There is little "symbolic" in this thing other than the name.
@JP-re3bc2 жыл бұрын
- or it is in the data? Tokens are basically representing semantics and the embeddings represent relations in the numbers.
  @ThetaPhiPsi2 жыл бұрын