When training the ScrapeBox Learning Mode Poster to new forms, what variables can be used?
Note: I have had success using GPTs like Chat GPT to help train platforms. You can feed it all of the below along with the form html and that can get you headed in the right direction.
Platform Guide
- Train new Platforms
- Enhance Existing Platforms
- Detect Different Captchas
- Modify Search Footprints
- Multi-step Forms
- Change Success/Fail Footprints
Also you can look here:
http://www.scrapebox.com/training-new-platforms
Since 2011 ScrapeBox has had the ability to learn new platforms, and ScrapeBox can post to virtually any platform or form that doesn’t require a user account to be created on the site website. So it can post to blog platforms, guestbooks, contact forms, trackbacks, some open forums and wiki’s.
In order to work with a platform, you will need to create a definition file which is just a plain text file using the Microsoft .ini format like the screenshot above. This consists of [Sections] which contain a number of Name=value keys. The first section in the ScrapeBox platform files is…
[SetUp]
The setup decides the basics on what footprints are used in the harvester to find the platform, how ScrapeBox can identify this platform once it loads a page, how can ScrapeBox detect if a comment to this platform was successful or failed and things like how to handle the URL’s and navigating the pages. Below are the available Name= entries that are valid for the setup.
FriendlyName= Any name you want to call the platform, will be used in the GUI.
UseBlackList= Values can be 1 to use a blacklist or 0 not not use the blacklist. This is the bad words list you can edit in the poster.
UseWhiteList= Values can be 1 to use a whitelist or 0 not not use the whitelist. This is the bad words list you can edit in the poster.
Platform= This is the type of platform it is, such as Blog, GuestBook, Image, Forum, Contact Form, Trackback and used used to group similar platforms.
Markup= How to handle links and code, values can be HTML or BB
PageMustContain= If any of the given strings can be found in the pagecode, the page is valid. | is interpreted as OR, * is interpreted as AND
Success= If any of the given strings can be found in the resultpage after post, the submission was a success. | is interpreted as OR, * is interpreted as AND
Failed= If any of the given string can be found in the resultpage after post, the submission failed. | is interpreted as OR, * is interpreted as AND
All platform definition files should have the above fields added and set, they are essentially the minimum “Required” fields to form the [Setup] for a platform platform file. The fields below are not required, but often must be used to perform more advanced functions in order to post to some platforms.
PageMustNotContain= If any of the given strings can be found in the pagecode, the page is invalid. | is interpreted as OR, * is interpreted as AND
Enctype= The Encoding type if you wish to override the forms default encoding such as application/x-www-form-urlencoded
LoadUrl= Locate the given url and load the target page. Will skip RemovefromUrl, RemoveFromUrlAfter, and ModifyUrl
LoadUrlFromAnchor= Locate the given anchor, grab the url and load the target page. Will skip RemovefromUrl, RemoveFromUrlAfter, and ModifyUrl
RemoveFromUrl= Remove given strings from the baseurl. Multiple strings are separated with |
RemoveFromUrlAfter= Remove everything from the position of given strings in the baseurl. Multiple strings are separated with |
ModifyUrl= Add something to the baseurl. variables %host% and %path% can be used to rebuild the baseurl.
DeleteCookies= List of cookie names to delete
Guestbook Example
Here you can see a basic example of the [Setup] for Bella Guestbook.
For the PageMustContain, PageMustNotContain, Success and Failed values this scans the page contents for the markers you add so you can add text, html, javascript or anything in the page content.
This platform also uses 2 optional values RemoveFromUrl and ModifyUrl. This tells ScrapeBox when it lands on the guestbook, no matter what the page it should trim index.php and sign.php and everything after these like querystrings from the URL, then load %host%%path%sign.php so if it landed on scrapebox.com/guestbook/index.php?page=123 it would strip the last part and load scrapebox.com/guestbook/sign.php
This is used when the page you need to post the comment on is different then the page you load. So you can train ScrapeBox to navigate to the correct page to make the post.
[Step]
Once the [Setup] has been created, next is the [Step] which deals with making the post. The following are the available options and variables for the Step sections.
DoStepIf= Process this step only when any of the given strings can be found in the page code. | is interpreted as OR, * is interpreted as AND. If not set, the step will be processed always.
FormMustContain= The form is valid when any of the given strings can be found in the form. | is interpreted as OR, * is interpreted as AND
FormMustNotContain= If the form contains any of the given strings, the form is invalid. | is interpreted as OR, * is interpreted as AND
PostUrl= A | separated list of url parts used to grab the post url. It looks between <form and >
AddToPostUrl= A value added to post url. Masks (%…%) can be used.
DelayPost= Delay post by the given number of seconds. The variable %rndnum-x-y% can be use too.
DelayPostIf= Only delay the post when any of the listed strings can be found. Multiple strings are separated with |
AddToPostDataIfInpage= Will add all AddToPostData= fields when any of the with | separated strings is found in the pagecode.
AddToPostData= fieldname=variable will be added to the postdata when the AddToPostdataIfInPage condition is true. When no AddToPostDataIfInpage if set, AddToPostData will be added always.
EncodeFieldNames= 1 will url encode fieldnames.
Fieldnames can contain * as a wildcard. So if fieldname is captcha_code123 where 123 is different on each blog/post then captcha_code*=%captcha% will match.
Variables:
All ini setting using variables allow spintax, for example thename={%rnd-name%|%rnd-email%} is valid. Values assigned to variables also allow spintax.
%host% Represents the host name of the target url
%path% Represents the path of the target url
%rnd-name% Returns a random name from the file ~cpn.txt. Spintax allowed.
%rnd-email% Returns a random email from the file ~cpe.txt Spintax allowed.
%rnd-website% Returns a random website from the file ~cpw.txt Spintax allowed.
%rnd-comment% Returns a random comment from the file ~cpc.txt Spintax allowed.
%rnd-option% Return a random option. Values are grabbed from the <select/option tags of the form
%rnd-location% Spintax allowed.
%rndnum-x-y% Returns a random number between x and y.
%ignore% Just use the original value represented in the form.
%user-domain% Extract the domain of the user’s website generated previously by %rnd-website%
%user-name% Previously by %rnd-name% generated username
%user-email% Previously by %rnd-email% generated email
%user-comment% Previously by %rnd-comment% generated comment
%user-location% Previously by %rnd-location% generated location
%user-website% Previously by %rnd-website% generated website
%wphashcash% Result of WPHashCash processing (internal code)
%captcha% Image captcha result
%question% text captcha result
%serverstatus-200% Represents server status code 200
%serverstatus-302% Represents server status code 302
%header-xxxx% Checking the post header for the presence of xxxx in it.
%unixtimestamp% returns the current unix timestamp
%unixtimestampms% returns the current unix timestamp in milli seconds
%xxxxxx% Executing a section with the name xxxxx
You can have multiple [Step] configured for multi-step forms that may require you to fill out info on 2 or more pages.
Sections
[xxxxx]
Action=extract (extract a text between before and after)
Before= The text before the wanted part
After= The text after the wanted part
Default= If no part can be extracted, this is what will be used by default
[xxxxx]
Action=getfieldvalue (return the value of a field)
Fieldname=The name of the field
Other
processwpspamfree=1 use this to force the check for Wp-SpamFree
Failed MASK = Matches a Failed= ini response.
Note: The xxxxxx between the brackets can be anything you want.
Once you have a section setup, you use it in the [Step] section with the:
%xxxxxx% Executing a section with the name xxxxx
~~
For example if you wanted to include the following hidden field
<input type='hidden' name='contact-form-hash-value' value='df6619356840577fbc7abc197f3a23509eeeeb72' />
Then you would do like so
[hidden-field-1]
Action=extract
Before=name='contact-form-hash-value' value='
After='
Default=
Then in the [step] section you would put
contact-form-hash-value=%hidden-field-1%
Name Field
When training form fields your looking for the name=X field.
So for this sample form from the Icybook platform, some of the form code looks like
] <th class="newleft">*Name:</th> <td class="newright"><input type="text" size="30" name="autor" maxlength="30" value="" /></td> </tr> <tr> <th class="newleft">Email:</th> <td class="newright"><input type="text" size="30" name="email" maxlength="50" value="" /></td> </tr> <tr> <th class="newleft">Homepage:</th> <td class="newright"><input type="text" size="30" name="homepage" maxlength="50" value="http://" /></td> </tr>
So name="autor" is associated with the actual "Name" field where you put your name, so it would look like
autor=%rnd-name%
The above will look for the form field, which is on the left and populate that form field with what is on the right. In this case it will get a random name from your names.txt file (which is the topmost of the 5 boxes that you load in in Scrapebox when posting).
Then for the email, which is the next box down that you load in, it would be
email=%rnd-email%
And for your website link, from websites.txt box
homepage=%rnd-website%
So under the [Step] section you would have
autor=%rnd-name%
email=%rnd-email%
homepage=%rnd-website%
The fields on the left of the equals sign are the X in "name=X" of the form data and the variables (With percent signs) on the right hand side of the equals sign pull data from the files you load into Scrapebox.
Checkboxes
If you want to enable a checkbox it works off a system of zeros and ones. So a checkbox is either a 0 or a 1 like all the other ini settings. For example in the wordpress definition you can see
gasp_checkbox=1
0 is unchecked and 1 is checked.
The format is
Field name=value
Where field name is the name of the checkbox, as discussed in the section above this one and value is either 0 or 1. Again 0 is unchecked and 1 is checked.
Captchas
Captcha data is stored in the captcha.def and textcaptcha.def files.
So if you use %captcha% then scrapebox will look in captcha.def for the before and after markers to get the image to send to the captcha solving service/program.
Same goes for if you use %question% it will look for the markers in the textcaptcha.def files.
In textcaptcha.def:
The type=static is to extract a word from the contact form like "Please enter the word RED" and insert in to the post data. And type=variable is for a math question like 1+2=
captcha.def:
The name that you see above each element does not matter, it does not correspond to anything. Its just a friendly name to help you keep track of what is used for what. Also the number next to it is just so each has a unique name.
Multiple Steps
You can use multiple steps, like you see in the IcyBook Guestbook, Jambook Guestbook etc... definition files. You can also use the DoStepIf= parameter. However each step simply need
[STEP]
Then
[STEP1]
Then
[STEP2]
And so on.
Debugging
The simple truth is there can be things that happen behind the scenes that aren't always apparent. A invaluable tool that I have found for debugging is Http Debugger Pro
At the time of this writing there is a free trial and they have always had a free trial for years and years. So you can try it out and if you are doing one form it can help. If you are planning to train a lot of forms/platforms then its probably a good investment if it makes sense after using the trial.
I find it helpful run debugger and submit a form in a browser and look at the POST data and then you can know what the end server is expecting and make sure your ini will post the same data to the server.
Please note I am no way affiliate with http debugger pro. It has just saved me hours of time and made things possible I would not have otherwise been able to do.