I Made This Open-Source Project

2024 ж. 19 Сәу.

65 624 Рет қаралды

After MONTHS, I finally made another open-source project. This one was a ton of fun to build and I hope to turn this into an API we can all benefit from with any user-generated data on our web apps.
-- links
website: www.profanity.dev/
github (leave a ⭐ pls thx): github.com/joschan21/profanit...
I'll post a complete build on this API on my second channel (linked below) soon!
-- my links
second channel (in depth videos): / @joshtriedupstash
newletter: www.joshtriedcoding.com/
discord: / discord
github: github.com/joschan21

Пікірлер

Disappointed. I thought it was gonna be an API that serves profanity.
@phsopherАй бұрын
- fr 😢
  @ShadowOctoАй бұрын
- Ferb, I know what we're building today!
  @wlockuz4467Ай бұрын
- Okay ,let's build an open source profanity maker that bypasses this apis check.😺
  @unbiasedperson1155Ай бұрын
- @@unbiasedperson1155that's a great idea
  @anhdunghisinhАй бұрын
- @@anhdunghisinh YEAH! F PROFANITY FILTERS!
  @akam9919Ай бұрын
funny but ... "You son of a mother" - profanity "fucking awesome" - profanity "damn, that's great" - profanity
@ChristianKolbowАй бұрын
- well, "fucking awesome" is in fact profane
  @rxn720 күн бұрын
- "see you" is profanity :) the API sucks tbh
  @visu713519 күн бұрын
- that is why he implemented the score system i think... but is open source, if you want, you can modified or see how he build it... btw... fucking awesome makes sense.. damn also.. and depend of the context, "you son of a mother" too... XD
  @albert_ac104518 күн бұрын
- those are profanities though
  @CornerKingsReal17 күн бұрын
- @@visu7135 It's too short to be accurate...
  @smithrockford-dv1nb16 күн бұрын
I typed "Son of a mother" and it responded with profanity detected
@gregthomas5887Ай бұрын
- lmaoo
  @virivАй бұрын
- I tried "No need to waste more oxygen, just do it
  @_the_mohamedАй бұрын
- That’s the beauty of open source, now more people can contribute to fix this edge cases in theory right?
  @elvis_gastelumАй бұрын
- I typed "daughter of a father" and it says "Crispy clean input, no profanities" . LMAO!
  @nirajkhatiwada6696Ай бұрын
- @@elvis_gastelum Why work on a half assed not working project tho ?
  @elrydevАй бұрын
I typed "I fucking love pizza" and it responded "OH GOD, VERY BIG PROFANITY DETECTED!!! "
@oskarsmusic865Ай бұрын
- fucking is profanity
  @ValipPowa21 күн бұрын
Google's content moderation api is the best as it gives seperate score for each field like insulting , toxicity, etc, accurately and doesn't take much time and also it's free
@luckysolanki9440Ай бұрын
🚨🚨😱😱 OH GOD, VERY BIG PROFANITY DETECTED!! 🚨🚨😱😱 score (higher is worse): 1.000 and I typed "mosquitos suck blood" lol
@thatonecoder737Ай бұрын
- acoustic model
  @pastori2672Ай бұрын
- "suck" is a banned word if you look at his training data
  @yichenchong7728Ай бұрын
- @@yichenchong7728 except it's also a normal word that's fine to use in official conversation when the concept comes up. So putting it in a blacklist is objectively incorrect. But hey, it's the best one can do with a system that can't understand context, which is why it's not worth trying to make such a system.
  @ilonachan14 күн бұрын
Btw, consider choosing a license. Technically this is not really open source yet, you just uploaded the code on the web and hoped for the best. In case you want to keep it simple there is BSD license or MIT license that is very short, but in case you want something more solid year may want to choose the Apache license that is not as different from MIT but as a bunch of legalese to protect your ass from patent trolls and contributors with malicious intent. Then there are also copyleft open source licenses like gpl though I am not a fan of those, it is not my idea of freedom.
@gabrielesilinicАй бұрын
- chill out harvey specter
  @chrislgr23Ай бұрын
- Is there a website for me to quickly read about and select Licenses?
  @ativercАй бұрын
- @@ativerc so, KZhead is very big brain so it removed my comment where I was trying to help you cuz it was an URL. Anyway. There is choosealicense that is a website made by GitHub. Also whenever you add a file from GitHub UI and it's name contains the word license GitHub will offer you a license picker. For more complex commercial scenarios case you are a business there is also a specific source available license that lets your software convert to open source after a set amount of time from publication, it is the functional source license, but most people got by with open source licenses, generally, if you are unsure just make coffee and read them.
  @gabrielesilinicАй бұрын
- @@ativercfrom GitHub there is "choose a license" which you may search up
  @gnsfАй бұрын
- oh damn.. really? isnt it open source if like you said he just uploaded the code on the internet?
  @davepeace60328 күн бұрын
the type 1 error on this tool makes it kind of unusable. my favorite perfectly normal prompts that get detected as profanity: - "double slit experiment" - "single pen" / "pen test" - "toxic person" - "Abbie Lee" (possible person name) - "garden hoe" - "what a jerk" (i suppose some people might think this is profane)
@yichenchong7728Ай бұрын
Awesome, I once needed to urgently implement profanity filter, I used a simple list comparison which doesn’t work in many cases. Yours look awesome 🙌 Thanks
@syedumair31722 күн бұрын
Josh, can you make a video about how to train a tensor model?
@devinlauderdale9635Ай бұрын
- This
  @lee.g.vАй бұрын
- yes please
  @TotomenuАй бұрын
Supercool project, Cheers from Norway!
@FullflexnoАй бұрын
It would be awesome to see some content on how you trained your model (costs, services..etc.). I'm looking for that kind of content.
@xav_624Ай бұрын
Interesting concept - similar to Semantic router. A combination approach that filters for single-word profanities and vector similarity for longer sentences that pass the single-word filter would absolutely be a "good enough approach" for most profanity detection use cases.
@roberth8737Ай бұрын
using vector embeddings is actually so creative i love it
@NithinJune21 күн бұрын
Worth looking at how other languages would be handled as well. Saw a PR adding some words from Spanish and I had planned to add some Chinese and Thai, but I saw an issue open about the potential of adding a langs parameter so that clean words and phrases in one language don't trigger the filter in another.
@bkschatzkiАй бұрын
Holy moly bro, I needed this very badly!
@shubhankartrivediАй бұрын
It would be useful which words are profane, in the api response giving a list of words or start and end index of the word, so in the clientside apps, we can replace this with * or something similar.
@prajwalaradhya4379Ай бұрын
I am working on a similar problem of finding similarity between two sentences, they need not be exact but similar words. And I was baffled that there is so simple solution to this, thanks for this I will not look into vector databases.
@v1d300Ай бұрын
im working on a review website right now and i could use this to flag reviews and put a mature rating on it or something. this is amazing. great job
@Manofthebean22 күн бұрын
- doesnt work so well, easily bypassible what i type: "you are so SHlT lol" Crispy clean input, no profanities :)) 👍👍 score (higher is worse): 0.801
  @PrismFave19 күн бұрын
- this review website is so A55 rispy clean input, no profanities :)) 👍👍 score (higher is worse): 0.784
  @PrismFave19 күн бұрын
- @@PrismFave dam I haven't tested it out yet so i dont know but looking on the git yeah im gonna wait until it getes better
  @Manofthebean19 күн бұрын
Curious why you chose to use Upstash Vector db vs Cloudflare's Vectorize? Especially since you're using cloudflare's stack for hosting
@adiswa123Ай бұрын
congrats on the launch!
@nro33728 күн бұрын
it doesnt detect profanity in german
@herrkatzegaming20 күн бұрын
A fucking great project
@IvyCreamMathieuАй бұрын
- Profanity DETECTED (score 99999) 😂😂
  @ashishsharma__Ай бұрын
I think if you combined the ml model with a word list approach you could improve the accuracy. Basically give the ML output but then look in the blacklist and whitelist to see if that changes the outcome. Best of both worlds. This will also solve the single word issues you had.
@blockwhisperers835211 күн бұрын
Could've used the text-embedding-large model that could've packed more information in your embedding model due to it's large dimension which would've improved your accuracy even on large num tokens.
@kaustubhpatangeАй бұрын
There should be some internationalization context added. One of the biggest coffee shops in Vietnam (where I spend time) is Phúc Long. Testing with the string "my favorite coffee shop is phuc long" raises a score of 1.000! Also curious as to why the range is so small - seems it starts at 0.8?
@gosnookyАй бұрын
Make a video on minimum standards does a open source project should have for better reach and scalability
@godofwar8262Ай бұрын
Very nice, what softwares are you using to make your videos? Share screen and show your face at the same time?
@practicaluseofАй бұрын
Not the unignored .DS_Store 😭
@taep96Ай бұрын
Great Project
@SiddharthSharma-ei8osАй бұрын
Exciting! What about different languages. Auto detect language? Explicitly set? One model for all, a lot of models for each language? So much questions🤣
@user-he3io6lo9tАй бұрын
Fantastic video Josh
@parkerrexАй бұрын
Important to note that although the source is viewable on GitHub, this is not currently classed as as "Open Source" software as it lacks a license. See issue #6 on the GitHub repo.
@mjddev21 күн бұрын
I wonder if there is some type of list of tests people have made with fails? Would love to see the edge cases.
@xMrAfonso22 күн бұрын
The value of the resource is not very clear, since I can’t paste the whole article (the text is too big) and I can’t understand where exactly the profanity is located
@SpektRProductionАй бұрын
Does it filter out ones from other languages? Does it filter out ones with typos? How many normal messages will be considered profanity and will be filtered? Why did you write it in JavaScript/TypeScript? it will be way faster and less error prone if you switch over to a statically compiled language.
@m4rt_18 күн бұрын
may be training on twitter tweets can make this model perform well
@prasanthpedaprolu2261Ай бұрын
Does anyone know what APP he's using to switch app on the left sidebar? I think Theo also use it
@haryormedayjoshua281Ай бұрын
- Arc Web browser
  @petersusan215Ай бұрын
For the very short texts why don't you just pad out the input text with neutral words?
@Thomas777m120 күн бұрын
Would be awesome if you could make a tutorial why you use Hono over Express :) for your api
@TellToblerАй бұрын
Ey, what framework did you used to design the website? I love it
@joshuarodriguez2219Ай бұрын
- follow up what do you use to record your videos?
  @joshuarodriguez2219Ай бұрын
Everybody is scared of KZhead demoneytization! Just chill and keep crushing it!
@enic-maАй бұрын
Insert 'KZhead would like to connect to your API' jokes here
@NiklasZiermannАй бұрын
Does anyone know what is the app he is using to draw the schemas (min 1:00)?
@armandmalci495Ай бұрын
- tldraw
  @Shorts4DАй бұрын
- It's Excalidraw
  @koudy008Ай бұрын
@joshtriedcoding why do still use yarn in 2024? Either pnpm or bun are better in every category
@davidsiewert864929 күн бұрын
- New doesn't equal better.
  @blockshift75820 күн бұрын
Does it work only for english ? would you be interested to open it to other languages ?
@paullouppe9947Ай бұрын
- It seems so to only work for English as foreign languages (like polish) didn’t flag these swear words as profanity
  @MateuszWierzejski12 күн бұрын
Hi I wanna add an e-commerce store app for my portfolio. I wonder which react stack is solid for it in 2024. Can someone suggest something? As a back I would prefer Firebase, also for styling scss+mui but need recommendations about state manager and other technologies and tools. Thanks!
@asmet270128 күн бұрын
Well, it drops when the message is larger than ~750 chars due to the execution time limit. Tokenization makes BOOM
@scarlatum29 күн бұрын
Cool project 👍
@Axorax29 күн бұрын
A question what is your browser
@arshgemrie462121 күн бұрын
Thank you
@zakariazain8790Ай бұрын
"what the hell" (0.966) or "what the heck" (0.912) both return profanity. Even if we use the totally safe version of this phrase, "what in the world", it's still profanity (0.859). then how are we supposed to express that idea on the other hand, "I hate this [blank] taco" returns clean for "flipping", "frigging" and "freaking", all of which lesser versions of the F bomb
@67083924513 күн бұрын
Can we do one for images too?
@BrightCodeАй бұрын
i got pretty sure this is profanity on: THIS IS VERY PROFANE
@bed_destroyed21 күн бұрын
Basically the score goes from 0.810 to 0.880 seems like there's not a lot of margin for error given "clean input" is 0.840, and limiting the content size drastically reduces it's usefulness After a bit of testing it seems your product is definitely not ready, you should update your landing page as it is not reliable at all.
@lel753124 күн бұрын
Can it be made to respond which word is profane as well? So that i can just *** it
@_purple_44_13 күн бұрын
this is really good project, actually you can use it not only for profanity, you can detect ads, span, scam and etc, isn't?
@snatvbАй бұрын
I typed "you are very sexy" and it responded with: Crispy clean input, no profanities :))
@user-vk6cb1zu7pАй бұрын
- it's insane!!
  @user-vk6cb1zu7pАй бұрын
"This doesn't use AI, just a machine learning model"
@lilrow420613 күн бұрын
sir josh can you make a tutorial how to use rpc of hono with next
@igmtinkАй бұрын
Great now I will make a version that creates profanity
@cablesalty19 күн бұрын
That profanity score is very weird. Why the score is always around .8? Why not use the range from 0 to 1?
@AmodeusRАй бұрын
Maybe add something to convert unicode look-a-likes, because those wont get detected
@Erik-pk8rw23 күн бұрын
f@#k!ng great project!
@ihsanmohamad521Ай бұрын
Holy moly gets 0.912 🚨😱 BIG PROFANITY DETECTED!! 🚨😱
@BoxEnjoyer21 күн бұрын
2:33 and did it happen?
@hoteny4 күн бұрын
tensor model < bunch of ifs
@evan_ryАй бұрын
Why is it so strict? "dumb person" is apparently extremely profane
@_ultravioletАй бұрын
- because this is not production ready, it's at best a Proof of Concept. it obviously cannot detect or understand any context, it can just maybe detect bad words, that's it, it doesn't care about context at all.
  @depralexcrimsonАй бұрын
"ds fdsfds dsf dsf sdfdsfssd fdsfds" : 😱 PRETTY SURE THIS IS A PROFANITY 😱
@pcoi947 күн бұрын
Cool idea but it's super impractical and easy to bypass. Needs some more work because simply chaining 2 swear words together without a space can usually bypass it.
@Renner4k19 күн бұрын
great work Josha🔥🔥🫡
@ronitgurjar5747Ай бұрын
wow wow - 🚨 PROFANITY DETECTED!! 🚨
@sal0021 күн бұрын
dumb dumb: 🚨😱 BIG PROFANITY DETECTED!! 🚨😱 - 0.937
@chiroyce19 күн бұрын
my prompt: "you are so S.HIT at this game" rispy clean input, no profanities :)) 👍👍 score (higher is worse): 0.822 ----------------------------------------------------------------------- my prompt: "you are so SHlT lol" rispy clean input, no profanities :)) 👍👍 score (higher is worse): 0.801
@PrismFave19 күн бұрын
TL;DW It's basically AI... Heck the use of vector database puts it closer to LLM technology.
@kushaagrАй бұрын
One issue is internationalisation: "Ich geh nach Fucking", is a German sentence without any profanity, because "Fucking" is an actual town.
@mikaay4269Ай бұрын
Cool man!
@Michael-MartellАй бұрын
Josh, by design this system is fastest when there is profanity, and slowest when there is none. Is it even possible to design one with the opposite? fastest when no profanity, and slowest when there is?
@DS-ow2geАй бұрын
- Well if you think about it, to declare something as profane you need to find only 1 profanity. However to declare something as clean you need to make sure there are no profanities at all. So in one case you stop when you find a profanity, but in the other case you have to check the whole thing
  @rorymaxАй бұрын
cool, but what does “zip in the wire” and “zipperhead” means? 😭
@blaizeWАй бұрын
“I can’t say this word because KZhead may demonetize the *hell* out of me.”
@ErrorINAOfficial13 күн бұрын
heard of Akismet?
@purpshell21 күн бұрын
Love it
@LRSKWTKWSKАй бұрын
It`s like semantic search
@kapa9436Ай бұрын
The website does not work anymore, since the website uses HSTS.
@coopener29 күн бұрын
my pen is broken - 😱 PRETTY SURE THIS IS A PROFANITY 😱 you what - 😱 PRETTY SURE THIS IS A PROFANITY 😱 How much have you been drinking - 😱 PRETTY SURE THIS IS A PROFANITY 😱
@TheDragonDesigns18 күн бұрын
"ì I" 😱 PRETTY SURE THIS IS A PROFANITY 😱 score (higher is worse): 0.857 LMAOO
@spinxooo15 күн бұрын
Typed meow meow and the rating was: 😱 PRETTY SURE THIS IS A PROFANITY 😱 score (higher is worse): 0.865
@wenelol20 күн бұрын
The phrases "I love doing it with my sister"(0.802) and "I want to end your life"(0.783) have lower scores than your examples of clean input. I think this needs a lot of work, only obvious profanity gets detected.
@j0hnr3xАй бұрын
Good fucking video
@GratuityMediaАй бұрын
👍 Useful
@ovnaАй бұрын
The problem is its only English as a German myself i testet the famous german swear wort "hu rr ensohn" and it sayed its not a swear wort
@theminecraft69021 күн бұрын
Im sorry but why go such an extra mile if OpenAI's Moderation API is free and quite fast at that.
@mrkostya008Ай бұрын
- I thought you could only use their API for outputs from their own model and they disallow other usage
  @leonardodoujinshiАй бұрын
i typed "gfasgda asfga" into the checker and it said it was profanity. might want to fine tune the model a little more it also said "i got a new diamond hoe in minecraft, it has a lot of durability" was profanity. also might want to add context reading.
@theaviationbee20 күн бұрын
2000 requests doesn’t mean you had 2000 people try this
@betweenbracketsАй бұрын
Why would I want an API for this? There's tons of libraries that solves this.
@marcuss.abildskov7175Ай бұрын
it says that "dog cat" is profanity
@perfectionyt10 күн бұрын
The model needs more data
@cheapbucks9590Ай бұрын
upstash really profit from you working there😂😂
@TheIpiconАй бұрын
what web browser do you use
@xastralmarsАй бұрын
- That's Arc Browser
  @techworld3255Ай бұрын
why this didn't catch kurwa?
@destropolАй бұрын