Truth in IT
    • Sign In
    • Register
        • Videos
        • Channels
        • Pages
        • Galleries
        • News
        • Events
        • All
Truth in IT Truth in IT
  • Data Management ▼
    • Converged Infrastructure
    • DevOps
    • Networking
    • Storage
    • Virtualization
  • Cybersecurity ▼
    • Application Security
    • Backup & Recovery
    • Data Security
    • Identity & Access Management (IAM)
    • Zero Trust
    • Compliance & GRC
    • Endpoint Security
  • Cloud ▼
    • Hybrid Cloud
    • Private Cloud
    • Public Cloud
  • Webinar Library
  • TiPs
  • DRAW

Claroty: Generating Malicious OT Data to Test Security Tools

Claroty
06/29/2026
0 (0%)
Share
  • Comments
  • Download
  • Transcript
Report Like Favorite
  • Share/Embed
  • Email
Link
Embed

Transcript


Cyber is with me here at S4 live and in person, so it's good to see you. Yeah, great seeing you again. I know it's been a while since we've done a podcast and excited to be back. Yeah, absolutely. You did a good talk this week on generating malicious OT data to test all these security products and different systems. I told you, I went to your talk, I kind of tried to keep up, but it was good. I mean, there was a lot of different stuff in there that I don't think a lot of people have really considered. Yeah, you know, and it's a big challenge and especially with OT, we're in a market where especially as security companies, right? The big premise of the talk is we've read in detections we're doing all this great analytic work. Maybe we've generated data enough to prove that it roughly works, but you know, oftentimes you need that more data when you're working detection and sharing, when you're working through and saying like, you know, not just can I lower the false positive rate and increase the true positive rate on there, but like how you collect data in OT is inherently challenging, right? And my talk really did focus around being like, how can folks get more representative data? Obviously, operational and live data is great, but you still often need to emulate that data or generate it out, whether you're testing MITRE ATT&CK techniques or really just testing out rules. Right. So you're not just talking to vendors, you're talking to day-to-day guys in the trenches, right? Yeah, absolutely. And for them, what's interesting, right, is we'll go down to asset owners and they may have a lab, they may have some of the systems out there. And the other interesting part of their talk is with emulation. There's things you can do when you're just dealing with like a PLC you get off eBay and maybe software that you purchased through the OEM. When you get into more complicated industrial processes, when you get into refineries, when you get into energy management system, EMS and ADMS networks, where it's series of apps, emulating that data is harder because the process is a lot more complex, right? It's not just HMI to PLC. And the complexity of OT anyway, you mentioned, like, I mean, how much does all the proprietary stuff kind of get in the way? All of it. Yeah, yeah. And even with that, what's interesting with that to the emulation is you go to like Schneider Electric PLCs, they use Modbus. If you're using a lot of the open source tools, you'll see, hey, I'll see function code 90, which is the regular Modbus function code for Schneider. With data generation and where it gets interesting is if you actually want visibility going to those Schneider PLCs, you want to see what's being said between HMI and PLC. You often like if you just look at function code 90, you're just saying like, hey, I see it speaking Schneider's protocol. If you don't parse below, if you don't generate data, you don't get to the stop PLC, start command. Some of the other management behaviors, sending firmware, uploading logic, all of that sits at that layer below. And so, again, from like detections perspective, if you're focused on improving your MITRE ATT&CK approach, like that's when generating that data is key. Right. And either doing it with real devices, again, get Schneider PLCs off eBay. If you're using an M221, the software is free. If you're using some of the newer PLCs, then you will have to buy a higher version. Right. But that's where the crux of my talk is really getting at, hey, you may be able to do it for free. You may get in a situation where, you know, you do have to buy some components. Yeah. I mean, that was the part of the talk to where I think ears perked up is like, yeah, I don't have budget for this. And you kind of provided a few paths there. Maybe go through a couple of those. Yeah, exactly. So one I'll say with OT malware, what's interesting, right? We've seen a few cases where attackers have written malware with OT protocols in it. It's funny with one of the pieces of malware we were testing. We set up real devices. We got them configured. We fired the malware against it. And the malware actually didn't work because the attackers had bugs in their code in the malware. It ended up being the free solution that we just Googled for this protocol simulator. And we're able to pull the free one and got the malware to talk to it, got the data. And so we were able to do our detection thing, you know, from the step up. Obviously, that next step is going to, hey, maybe I need a real device talking to, you know, HMI or to software on that side. There's been great clarity research. I'll point to two, right. We were talking before we started recording on this, but there's a really good white paper right on fingerprinting different protocol stacks of Rockwell, I think it was. And what's interesting, right, if you go through that clarity research is the clarity team was able to be like, hey, I know it's this family of PLC because of how they implemented that specific protocol. If you're looking to emulate that data, then, hey, having that specific version is key. Right. There's other vendors, 2G, other deals where certain software PLCs will only implement certain commands. Right. And that's where it does become when you're getting into real devices. It's important to know, does this device actually have the functionality that I need at the layer it's at? And then to the complex, when you get to those ADMS and EMS networks and its applications talking to each other. Right. That's where it gets more complicated. Right. Because it's a lot of containerized apps. And yeah, sometimes you can't start the applications without others. So there's those dependencies. And so different layers, I always recommend start with and in my talk, I talked about start with the fidelity, start with defining that fidelity you need. You know, don't over-engineer, over-invent the solution. Right. Because it'll become really costly, really quick. So what motivated you to kind of look at this at this problem? Did you identify gaps or? Yep. Oh, it's certainly a gap. And even with even with well-resourced asset owners, even on the vendor side, it's a challenge. Right. I think all of us product companies have some sort of product team and some sort of folks that understand OT well. And oftentimes your software engineers, really smart at software engineering. They need that OT data. They need someone with OT and operational smarts to kind of help them. Right. And so after the talk, we did get a lot of folks coming in and being like, hey, you know, are there libraries? Are there other things out there? There's big demand we see kind of for that data. Right. Because again, there's nothing worse than buying a product and thinking it's going to detect it. And then when you start to pull data and throw it through and be like, hey, this should detect and it's not right. Right. So like all of kind of the vendor claims are tested best through that generated data. Is it because a lot of it's black box, basically, or? A lot of it's black box. And, you know, there's there's certain vendors that are easy to get. There are others that are challenging. And then just with vendor claims, there's nothing better than like no one gives a warmer blanket to that asset earner that made that purchase. Then, hey, here's a PCAP or here's a file of the real data, the evidence you would see. And let's put it through and make sure it actually works. Right. And so maybe this is just me asking a naive question, but like, is virus total help here or any of those kind of repositories? Virus total absolutely can be. So it's both a help because you can get the malware samples you need to for the emulation side. Oftentimes, as we would set up data. So there's a tool called Triangle Microworks, a pretty popular test harness based tool that a lot of engineering and other firms use. They have what are called substation configuration files, SCL files. And with those SCLs, it actually describes in a file like the whole layout of a substation or power area. What's cool with some of those emulation tools is we found the files on virus total of like real plants in their configuration, put it into the test harness. And so then we had again like, hey, this is someone's file that went up. Here's here's kind of a simulated environment of their turn on test. And then right if you grab that malware sample or you're you have a script to emulate out that technique you need to do, you can do it. And again, test both tests yourself. And if you're going into like an exercise or going into something, throw it across an incident responder and say, hey, let me trigger that alert in your detection platform and see how you handle it. It's funny. In another life, I was a journalist and I remember writing a story about the stuff that people were uploading to virus total that weren't just samples. It was just a lot of stuff you shouldn't be putting up there. And it was just like this goldmine for people kind of mining other data. Oh, absolutely. And, you know, early on, there were products out there that maybe they didn't tell folks that they were sending it up or they weren't just doing a hash search. Right. And so but no, I mean, we still see and even from our samples, we'll see malware samples come out very quickly when they start popping up in regions of the world. And so, you know, it helps the threat intel community out. But it also helps us out in the R&D and detection engineering side on how do we keep up in real time with stuff starts popping out. I know team 82 uses a lot of emulators in their work. I mean, how complete are they in terms of, you know, helping, you know, you achieve the goals that you want. So I haven't seen a lot of those. Can you talk through some of those examples? Yeah, put me on the spot. No. So I know like there was some research that was published about a year ago. One of our researchers used an emulator and it only went so far in the project. And then they needed to either buy the device or. Yeah. So I just wonder if you run into things like that. Oh, absolutely. Yeah. So in the sample where we use the the free tool for getting the ICS malware to talk to us, what was interesting was the real devices properly implemented the protocol stack. And so when it reached the point where the attackers miswrote the application, the industrial devices properly said, I'm not going to talk to you because I know you're not a valid device. You're far enough from the engineering spec that this isn't valid. I'm going to go ahead and turn it off. That piece of software we used actually was not built to that engineering grade. It was if you say a word to me or ask me a question, I'm going to answer it. And then I'm not going to keep track of where we are in the conversation. And so if it's things like, hey, you should ask these questions ahead of these others. You know, the real device was like, hey, you didn't ask the questions leading into what you should have to ask the thing you did with that. It just it did it right. And so sometimes those more basic emulators work. But to the point of getting to the functionality, if you need one, if you need that functionality later in there, you have to make sure that the application has it right or the emulator has it. And that's where like honeypots get in trouble, too, right? Like attackers will start talking to a honeypot and it's like, OK, I know I know you're a conpotter, I know you're a gaspot, right? Because no PLC really responds like that. So even the OT malware can recognize if it's in a test environment, for example, smart attackers. Yeah, yeah, absolutely. Yeah, absolutely. Interesting. What's your thought on like, I mean, I don't want to pump these guys up, but the quality of OT malware that you see out there, I mean, a lot of it is sophisticated stuff we hear about. Yeah. You know, on one front, you know, especially nation states have budgets, too. Right. And with OT malware, the scary thing is kind of there isn't a whole lot of complexity still yet required. Right. Look at tools like Frosty Goop. Look at even Pipe Dream and some of the other tools. Right. There are Python libraries out there to where you can find certain Python libraries, lightly extend them if you know about the protocol and go for it. Right. And so with OT, because like, you know, a lot of PLCs still don't have authentication, they don't have those layers. Right. There's not a huge need right now for that sophistication. Right. And that's where I think we've seen, you know, I don't want to say more basic because it's worked and it's worked on a level, but it hasn't required a huge level of sophistication. And there's other libraries. It's funny. We write rules around because we're like, OK, we know this exists in Python or Rust or some other language. And, you know, it's yeah, it's I think we're still in that world. Right. Yes. But but what I will say is it requires that bit of OT knowledge. And so, you know, we're attackers, I think, have made mistakes sometimes when you look at something like Trisis, right. There's stories that like, OK, if Trisis would have been caught later, wouldn't been caught as soon as it was. What would have that looked like? Right. Makes sense. Makes sense. You mentioned representative data. You said it a couple of times in your talk. Give me some examples. Exactly what do you mean by it? Yeah. So representative data. And this is key, right, because there are detections that often can trigger on like, hey, I see this byte at this offset of the packet and I know this means this. Hey, if you call a firmware upload from a new machine that I haven't seen do that. Right. Those are more basic alerts. Those are things you can learn often off of one, two or small number of packets. If you're getting into statistical detection or you're starting to get into like language model based detection, LLM based detection. Right. Sometimes you need to set that statistical baseline or that footprint to then deviate from it. Right. OK. And the big challenge becomes also on that statistical side of what is the, you know, what does that look like in my lab environment or emulation environment? And then when I take it into production, when I have latency, when I have, you know, physics and everything working against me, whether it be through, you know, fiber optic and other physics or whether it be through like vibration and where all these OT sites are like that can introduce noise that sometimes can hit. Right. So the best representative data is often from the environment, knowing how rough it is, knowing where it's at. No one's giving that up. Yeah. Yeah. And that's the hard thing to give up. Right. But it's it's super important. You know, during S4, I've talked to a lot of vendors or to a lot of asset owners. And when they talk about vendors, they're like, hey, like some of these products are designed for like pristine conditions. And so like, you know, from that detection or from the from the perspective of representative representative data, like having that is key. Right. Because you need some of that. Like, what does the real world look like versus the lab? Sure. What's the biggest eye opener that came out of this work for you? I'm sure it's ongoing, but I mean, what are you learning? Like, has there been a before and after kind of comparison? I think we still see a real hunger. You know, budgets are still limited. Everyone's going to have a limited budget. And, you know, it's interesting here talking about, you know, the space we're in and the fact that the security budget, the IT budget is actually a fraction, too, of the ops budget. Right. And I think we sit in an interesting place as OT security professionals because, yes, we are, you know, OT and often a lot of us come from IT security side. Right. But our responsibility is to the operations side. It's the reliability. It's the availability. It's that security side of it, too. Right. And, you know, what stands out to me is still the hunger and the desire to push towards that. And we're seeing folks that are more pushing towards like, hey, you know, let's build a let's do good stuff on the security side and push that. But also, like, how do we bring more ops people to the table? How do we work with those engineers and our most successful asset owners and groups? And what I like hearing is when, you know, the OT security teams or the IT security teams responsible have built a great relationship with the ops side and the folks responsible. So that's that both sticks out to me. And it's also one where I'm excited as we step into, you know, new ways of doing detections. You know, AI is a buzzword in a lot of ways, but there's also ways that it's getting applied in really cool ways. It's helping. Right. And it's not something we can avoid. And it's not just on the security side. Right. You look at, I think it is some of the predictive maintenance that airlines use. It's based on machine learning. And I think they're starting to introduce some AI things. But there's an airline where that predictive maintenance not just helps keeps plane in the air, but it's saving one of the airlines like eight billion dollars a year, knowing the exact amount of gas. Right. You know, based on all the systems and everything it's reading in. Right. And so I think we can't be scared on some of the emerging tech. But that's also where it's exciting to be like, hey, how do we get more or how do we get security more aware of like operational things going on and help correlate between the two. Yeah. I mean, there were some cool examples on the main stage on day one. I don't know if you were in some of the talks, but, you know, just like, hey, look at the last 30 days of this process, the production line, find the variances, what happened, you know, whether it was a scheduled downtime, was it something else? And it's, you know, it would take a person, I don't know how long, but a machine, you know, in a relatively short amount of time. Absolutely. And what's interesting with it, right, is we've seen, you know, and specifically with AI, specifically with others like, right, AI started in the 60s and 70s. And at that time, like it was all academic. It kind of didn't really catch any commercial interest. And, you know, it died out kind of early on, came back up in the 80s. They were working on it then. Then they were like, hey, we don't really have good computing power then. Like at the time, the computing wasn't great. Pop back up in the 2016s, and we kind of saw the start of the LLM world and then now kicking into now why it's significant. And even for representative data is now we have the ability to process large amounts of data and models are getting better significantly in this time. Right. And so I think that both can help some of the data representation and help us process better. But we're also at a point where we can help folks that, you know, if they're an OT responder responsible for a bunch of sites, we can really help them scale and then give that operational context. Hey, you know, this site was out or this site was operating degraded while we saw this anomaly alert on this side while we saw maintenance going on in the plant from someone that shouldn't be there. I mean, that was the neat thing, too, about again, coming out of the talk. I saw it was just like, OK, cyber is just one part of this. It reaches into the resource management applications and systems and it reaches into the process systems and pulls all that centralized. It's kind of wild. Yeah, absolutely. And I think it was after Dale's opening talk that a really good talk was given on some of the like now the premise of the talk. Right. Was we don't really have an industry agreed upon or it might have been agreed upon, but it's not heavily used way of representing operational data. Right. And I think that's where we'll see, especially as we get more representative security data and data from there combined with that. I think it's going to be really interesting. Yeah. Let me just pick your brain on just some some of the incidents that are coming to light. Do you see it as a spike? Like obviously Poland is a very high profile incident, but I mean, it seems to be a little bit more frequency. I think we're talking about it more. And I think with the focus, with cybersecurity getting more of a budget on a national level, not just in the U.S., but elsewhere with it getting more of a focus, some more regulation popping up in pipelines and other areas. I mean, I think we hear about it more at the same time. The challenge with OT security is always like you don't know what you don't hear about. Sure. When you go into the typhoons, the U.S. government in their reporting said Volt Typhoon had a five plus year dwell time to where there weren't very like open community tracked groups for that. Right. And so I want to say we're doing probably a better job of kicking some of that up and responding to it. Because again, I mean, the fact is, you know, as a military like U.S. military, even in allied militaries back to World War Two, back to before like we were bombing like ball bearing factories in places we were doing things right. And so war producing infrastructure and all that, like it'd be kind of naive to not think that nation states aren't going to keep doing what we've done for the last hundred years of warfare in their budget. Right. And we don't hear a lot about that side, but that's some of the side, too, where it's like, you know, how can we as defenders, regardless of our nationalities, like help preserve our countries or help keep safe when that time comes? And I think that's a really hard OT problem. Right. Because you come into the U.S., you go to other countries and those sites are spread out like a huge spread. Right. So I know your company does a lot of work around threat hunting on OT. Are you getting a lot are you getting better questions about, you know, why we need to do this? And, you know, what are you telling them? We are. We are. And what excites me is helping helping really pull the next generation of threat hunters up. OT, something that you asked earlier is something I learned here. We will always have, unfortunately, a shortage of smart OT folks. Right. And that's where like as both a company that does services and both a company that has a product side, like helping helping those services, people that are in the trenches dealing with, you know, hundreds, thousands of sites, what it is. They're asking better questions, but there's still a lot of unknown and there's still a huge education gap on that Schneider example. On my talk, I talked about that Function Code 90 in UMass and the proprietary protocol. A lot of folks still aren't aware, like again, below Function Code 90 is all of this really important stuff to watch that it's not in the open source tools. It's not in Tshark. It's not in Malcolm. They're great tools, but, you know, they don't have those published out. Right. And so we're seeing better questions, but there's still also that huge appetite and I think increasing appetite to like learn. Right. We put out a lot of content, you all put out a lot of content and like, you know, Mike Holcomb, other folks in the community, you know, what's interesting, I was talking to Mike a day or two ago and we were just talking about like, hey, what what content are people liking? Like how's stuff going? And it's it's really interesting because, again, like from a representative data environment, like everyone wants OT representative data for detection and sharing. And from an educational environment, it's really cool seeing like, you know, you put out content, it shows up in college classes. We're in a few college classes on papers we've read, and I'm sure a lot of y'all's are, too. Yeah, I'm sure. And yeah, it's that's what excites me. And that's, you know, the mark that we hope we can leave is in like helping pull, you know, write great tools, get great software out there, help people, but also like help bring the next generation. I mean, you've got so many IT people that are now have to manage this stuff, you know, and they're just kind of at square one and, you know, obviously the content's valuable to them, but yeah, they just need resources. They do. They do. And a lot of them don't control, you know, they don't control the fact that oftentimes they're put in charge of both the IT and OT side. And even some of the major asset owners we work at, like the incident response team doesn't stop at the IT OT boundary. They're responsible for both sides. Right. And so it is really key to help those folks, you know, grow up to that level. Yeah. I'm curious when you're dealing with somebody from IT who is now responsible for OT and in terms of threat hunting, I mean, how much advocacy or informing education do you have to do to them with them? And this isn't exactly the same as IT threat hunting. Yeah, it's you know, I wrote I wrote a sans white paper on it was a few years back. I want to say it was six or seven years back on OT threat hunting for kind of a process to approach on there where OT becomes challenging. Right. Is that side of the processes can be different. Right. Yeah. Where can you get data data from? Where can you get access to who's playing along? And some of the hard part still is like, you know, no one chooses to have an incident and have to do incident response. Threat hunting can be a bit of a luxury good when we have time. And so some of the focus also is then on the tool side, like how do I how do I most equip you to use those small time cycles you have to cover as much either landscape or tactics and techniques you need to. When you engage with a client in that direction, do you have to like threat model for them that they come like what do they come to? Yeah, it can depend. And so we have sites to where, you know, maybe maybe they've heard they're a target. Maybe they've had a ransomware incident. We've seen ransomware kick up. You ask on what what's kicked things up. We've had folks be like, hey, we were hit by ransomware. Hey, this happened. I mean, current events we were also talking about, right. Jaguar, Orlando over the largest breach where the government, the UK government was literally bailing folks out to make sure they didn't have to lay off their workforce. Right. And so, you know, it's what's interesting there to me is, yes, there certainly is that education side that kicks in. But it's it's picking up. It's yeah. All right, man. Thanks so much for coming on the podcast. Really great to see you. Yeah, thanks. All right. Thanks.

TL;DR

  • OT security teams struggle to validate detection systems because generating representative malicious data requires emulating complex industrial processes and proprietary protocols—a capability most organizations lack
  • Budget-conscious approaches exist: start with free protocol simulators, progress to used PLCs from eBay with vendor software, and scale to complex emulation only when fidelity requirements demand it
  • Representative data must reflect real-world conditions including latency and environmental noise, especially for statistical or ML-based detection that requires accurate baselines
  • The OT security field faces persistent expertise shortages as IT professionals inherit OT responsibilities, while AI and machine learning are beginning to transform both security operations and industrial processes
  • OT malware remains relatively unsophisticated due to lack of authentication in many PLCs, but the threat landscape is evolving as nation-state actors maintain persistent access to critical infrastructure

The Challenge of OT Security Testing

Dan Gunter, CEO of Insane Cyber, addresses a critical gap in operational technology security: the difficulty of obtaining representative malicious data to test detection systems. Unlike IT environments where threat data is more readily available, OT security teams struggle to validate whether their intrusion detection systems and security analytics actually work against real-world attack techniques. The challenge stems from the proprietary nature of industrial protocols, the complexity of industrial processes, and the scarcity of live operational data that can be used for testing. Gunter explains that while security vendors claim their products detect specific MITRE ATT&CK techniques, proving those claims requires generating authentic attack traffic against realistic OT environments—a capability most organizations lack.

Budget-Conscious Emulation Strategies

The discussion reveals practical approaches to generating test data without enterprise-level budgets. Gunter describes a tiered strategy starting with free protocol simulators and open-source tools, progressing to purchasing used PLCs from eBay paired with vendor software (sometimes free, sometimes requiring licensing), and ultimately building more complex emulated environments for sophisticated industrial processes like energy management systems. A notable example involved testing OT malware where real devices rejected the malicious code due to proper protocol implementation, but a basic free simulator accepted it—demonstrating that fidelity requirements vary by use case. The key insight is defining the required fidelity level before investing in infrastructure, avoiding over-engineering while ensuring sufficient realism for meaningful detection validation.

Representative Data and Detection Engineering

Gunter emphasizes the concept of 'representative data'—test traffic that accurately reflects real-world OT environments including latency, noise, and operational conditions. This becomes critical when moving beyond simple signature-based detection to statistical analysis or machine learning approaches that require baseline establishment. The challenge intensifies with proprietary protocol implementations: Claroty research has shown that different PLC families implement protocols differently, meaning effective emulation requires matching specific vendor implementations. Gunter notes that even well-resourced asset owners and security vendors struggle with this, as software engineers need OT domain expertise to generate meaningful test data. The gap between lab conditions and production reality—where vibration, environmental factors, and network conditions introduce variability—makes representative data essential for tuning detection systems that won't fail when deployed.

Industry Evolution and Threat Landscape

The conversation touches on broader OT security trends, including the increasing visibility of incidents (though Gunter cautions that reporting improvements may create perception of increased frequency rather than actual spikes), the persistent shortage of OT security expertise, and the growing role of AI and machine learning in both security and operations. Gunter points to predictive maintenance systems saving airlines billions annually as evidence that advanced analytics are already transforming industrial operations—security applications are following similar trajectories. He notes that OT malware remains relatively unsophisticated because many PLCs lack authentication and basic security controls, though this may change as defenses improve. The discussion also addresses the challenge of IT professionals suddenly responsible for OT security, highlighting the need for educational resources and tools that bridge the knowledge gap between traditional IT threat hunting and OT-specific detection requirements.

Chapters

0:00 - Introduction and S4 Conference Context
0:47 - The Challenge of Generating OT Test Data
4:05 - Budget-Conscious Emulation Strategies
7:17 - VirusTotal and Data Sources
9:46 - Emulator Limitations and Real Device Requirements
11:52 - OT Malware Sophistication Assessment
13:27 - Representative Data Definition and Importance
15:24 - Industry Insights and Budget Realities
17:00 - AI and Machine Learning in OT
20:13 - Incident Frequency and Threat Landscape
22:05 - Threat Hunting in OT Environments
24:29 - IT-OT Convergence Challenges

Key Quotes

1:27 "How you collect data in OT is inherently challenging, right? And my talk really did focus around being like, how can folks get more representative data? ..."
4:20 "We set up real devices. We got them configured. We fired the malware against it. And the malware actually didn't work because the attackers had bugs in their code in the malware."
7:04 "There's nothing worse than buying a product and thinking it's going to detect it. And then when you start to pull data and throw it through and be like, hey, this should detect and it's not right."
12:34 "A lot of PLCs still don't have authentication, they don't have those layers. Right. There's not a huge need right now for that sophistication."
14:41 "The best representative data is often from the environment, knowing how rough it is, knowing where it's at."
23:00 "A lot of folks still aren't aware, like again, below Function Code 90 is all of this really important stuff to watch that it's not in the open source tools. It's not in Tshark. It's not in Malcolm."

FAQ

What's the most cost-effective way to start generating OT test data for security validation?

Begin with free protocol simulators and open-source tools to test basic detection rules. If you need higher fidelity, purchase used PLCs from eBay—many vendors offer free engineering software for older models like Schneider's M221. Only invest in complex emulation environments when you need to test statistical detection or specific vendor protocol implementations. The key is defining your fidelity requirements first to avoid over-engineering the solution.

Why can't I just use open-source tools like Wireshark to analyze proprietary OT protocols?

Open-source tools like Wireshark and Malcolm provide basic protocol visibility but often don't parse proprietary extensions. For example, Schneider PLCs use Modbus Function Code 90, but all the critical commands—stop PLC, start PLC, firmware upload, logic changes—exist in proprietary layers below that function code. Without parsing those layers, you can only see that Schneider's protocol is being used, not what's actually being commanded. This limits your ability to detect meaningful attack techniques.

How does representative data differ from just having accurate protocol emulation?

Representative data includes not just correct protocol implementation but also real-world operational conditions. Lab environments provide pristine network conditions, but production OT environments have latency, environmental noise from vibration and temperature, and degraded operation modes. Statistical detection and machine learning models trained on clean lab data may fail in production because the baseline doesn't account for normal operational variability. The best representative data comes from actual operational environments, though emulation can approximate it when configured to include realistic impairments.


Categories:
  • » Data Protection
Channels:
News:
Events:
Tags:
  • OT
  • IoT Security
  • Technical Deep Dive
  • Best Practices
  • Threat Intelligence
  • Security Operations
  • OT Security Testing
  • Malicious Data Generation
  • Protocol Emulation
  • Industrial Control Systems
  • Detection Engineering
  • MITRE ATT&CK for ICS
  • PLC Security
Show more Show less

Browse videos

  • Related
  • Featured
  • By date
  • Most viewed
  • Top rated
  •  

              Video's comments: Claroty: Generating Malicious OT Data to Test Security Tools

              Upcoming Webinar Calendar

              • 06/30/2026
                01:00 PM
                06/30/2026
                Master Active Directory Certificate Services and Maintain Your Edge
                https://www.truthinit.com/index.php/channel/2018/master-active-directory-certificate-services-and-maintain-your-edge/
              • 07/01/2026
                04:00 AM
                07/01/2026
                Integrating Security in AI: Automated Red Teaming Strategies for Private Models
                https://www.truthinit.com/index.php/channel/1969/integrating-security-in-ai-automated-red-teaming-strategies-for-private-models/
              • 07/01/2026
                04:00 AM
                07/01/2026
                Schutz von KI in Anwendungen, Agenten und APIs.
                https://www.truthinit.com/index.php/channel/2008/schutz-von-ki-in-anwendungen-agenten-und-apis/
              • 07/01/2026
                01:00 PM
                07/01/2026
                How to Prevent Your AI from Outsmarting You
                https://www.truthinit.com/index.php/channel/2021/how-to-prevent-your-ai-from-outsmarting-you/
              • 07/02/2026
                10:00 AM
                07/02/2026
                Building Resilience Against Hybrid Threats in a Dark Cloud Environment
                https://www.truthinit.com/index.php/channel/2011/building-resilience-against-hybrid-threats-in-a-dark-cloud-environment/
              • 07/08/2026
                02:00 PM
                07/08/2026
                Understanding the Crucial Role of Context in AI Data
                https://www.truthinit.com/index.php/channel/2037/understanding-the-crucial-role-of-context-in-ai-data/
              • 07/09/2026
                01:00 PM
                07/09/2026
                The HUMAN Experience: Empowering Agentic Trust in Practice
                https://www.truthinit.com/index.php/channel/2026/the-human-experience-empowering-agentic-trust-in-practice/
              • 07/14/2026
                01:00 PM
                07/14/2026
                Crafting a Championship-Worthy Security Team for Maximum Defense Effectiveness
                https://www.truthinit.com/index.php/channel/2025/crafting-a-championship-worthy-security-team-for-maximum-defense-effectiveness/
              • 07/21/2026
                04:00 AM
                07/21/2026
                Strategies for Managing AI Governance and Securing App-to-LLM API Traffic
                https://www.truthinit.com/index.php/channel/1967/strategies-for-managing-ai-governance-and-securing-app-to-llm-api-traffic/
              • 07/21/2026
                01:00 PM
                07/21/2026
                HUMAN Dialogue: Insights from Attackers During the FIFA World Cup
                https://www.truthinit.com/index.php/channel/2029/human-dialogue-insights-from-attackers-during-the-fifa-world-cup/
              • 07/22/2026
                06:30 AM
                07/22/2026
                Insights and Strategies from the DPDP Webinar
                https://www.truthinit.com/index.php/channel/2000/insights-and-strategies-from-the-dpdp-webinar/
              • 07/28/2026
                01:00 PM
                07/28/2026
                Illumio + Netskope: Zero Trust in the Age of AI Autonomy
                https://www.truthinit.com/index.php/channel/2031/illumio-netskope-zero-trust-in-the-age-of-ai-autonomy/
              • 07/29/2026
                04:00 AM
                07/29/2026
                Real-Time Strategies for Safeguarding Against Prompt Injections
                https://www.truthinit.com/index.php/channel/1968/real-time-strategies-for-safeguarding-against-prompt-injections/
              • 08/19/2026
                12:00 PM
                08/19/2026
                Master Agent-Ready Skills in 30 Days with Cyera Agent Security
                https://www.truthinit.com/index.php/channel/2036/master-agent-ready-skills-in-30-days-with-cyera-agent-security/
              • 09/30/2026
                04:00 AM
                09/30/2026
                AI Command Center: Optimizing Visibility and Control in Your Operations
                https://www.truthinit.com/index.php/channel/2024/ai-command-center-optimizing-visibility-and-control-in-your-operations/

              Upcoming Events

              • Jun
                30

                Master Active Directory Certificate Services and Maintain Your Edge

                06/30/202601:00 PM ET
                • Jul
                  01

                  Schutz von KI in Anwendungen, Agenten und APIs.

                  07/01/202604:00 AM ET
                  • Jul
                    01

                    Integrating Security in AI: Automated Red Teaming Strategies for Private Models

                    07/01/202604:00 AM ET
                    • Jul
                      01

                      How to Prevent Your AI from Outsmarting You

                      07/01/202601:00 PM ET
                      • Jul
                        02

                        Building Resilience Against Hybrid Threats in a Dark Cloud Environment

                        07/02/202610:00 AM ET
                        More events
                        Truth in IT
                        • Sponsor
                        • About Us
                        • Terms of Service
                        • Privacy Policy
                        • Contact Us
                        • Preference Management
                        Desktop version
                        Standard version